![]() |
Next-gen Odroid announcement
So I finally got round to registering for the Odroid forum and posting re. the Mlucas-for-ARMv8 SIMD-code port [url=https://forum.odroid.com/viewtopic.php?f=140&t=29930]yesterday[/url], in my post I also loudly yearned for an Odroid update based on the newer/faster Cortex A57, today got a "Wish granted" reply from user 'rooted' pointing to this thread posted - not sure if coincidentally - just a few hours after I started mine:
[url]https://forum.odroid.com/viewtopic.php?t=29932[/url] |
Mooaarrr power, can't go wrong with that :smile:.
|
Where's the PS2 port? :grin:
|
That's interesting, I have some open questions[list][*]what can we expect out of the dual core A72 in terms of performance/power draw/efficiency?[*]Can the A72 and A53 run full bore together?[*]I assume the best setup is 2 workers, 1 per cortex[/list]
I like that it's 12v instead of 5V, it works better to use a PSU as the power source as it's the main rail. I'm augmenting an x86 system with some pi/pi clones which use 5V, it would be pretty cool to be able to power some 12V boards from the same molex connectors. I tried to find some benchmarks for comparison. Mediatek-MT8176 (2.1Ghz 2 core A72, 4 core A53): [url]https://www.notebookcheck.net/Mediatek-MT8176-SoC-Benchmarks-and-Specs.187985.0.html[/url] Geekbench 4.1/4.2 64 bit single-core score: 1541 Geekbench 4.1/4.2 64 bit multi-core score: 2489 Mediatek-MT6735 (4 core 1.5Ghz A53): [url]https://www.notebookcheck.net/Mediatek-MT6735-SoC-Benchmarks-and-Specs.147799.0.html[/url] Geekbench 4.1/4.2 64 bit single-core score: 519 Geekbench 4.1/4.2 64 bit multi-core score: 1430 I know it's very rule-of-thumb as it is, but an A72 core having triple the bench score of an A53 core may mean the board can do 2.5x the throughput of a 4 core A53 SoC. The multi-core score backs this up if the A53 cores were idle during it's run, or maybe the benchmarks can't be compared in this way and this is all fluff. |
[QUOTE=M344587487;479544]I like that it's 12v instead of 5V, it works better to use a PSU as the power source as it's the main rail. I'm augmenting an x86 system with some pi/pi clones which use 5V, it would be pretty cool to be able to power some 12V boards from the same molex connectors.
[/QUOTE] The molex connector on that board can only be used to power an external drive, not vice versa :smile: |
[QUOTE=ldesnogu;479551]The molex connector on that board can only be used to power an external drive, not vice versa :smile:[/QUOTE]
I didn't mean to use the molex connector on the board, I meant to do the 12V equivalent of this: [url]https://www.ebay.co.uk/itm/USB-To-4-Pin-IDE-Molex-Cooling-Fan-Connector-Cable-Adapter-Cord-For-Computer-PC/322188761549?epid=505932814&hash=item4b03f259cd:g:1uUAAOSw0kNXg2vy[/url] |
1 Attachment(s)
According to ARM the A72 core should perform about 26% better than the A57 in FloatPoint (same frequency, process and memory subsystem)
However, looking at the bottom of this: [URL]https://www.anandtech.com/show/11088/hisilicon-kirin-960-performance-and-power/2[/URL] It looks like the A57 and A72 are much closer in performance/MHz (A72 maybe 10% better than A57). A53 core slightly less than half of a A57 core. |
[QUOTE=M344587487;479544]That's interesting, I have some open questions[list][*]what can we expect out of the dual core A72 in terms of performance/power draw/efficiency?[*]Can the A72 and A53 run full bore together?[*]I assume the best setup is 2 workers, 1 per cortex[/list][/QUOTE]
Good questions - you're probably right re. one worker on the a72 and another on the a53, but it will be interesting to see if and to what extent the two different CPUs can work together on tasks. The multithreading in my code breaks stuff into lots of separate work chunks which can be done independently by the respective threads in a master pool, and which only need be synchronized in the "all work chunks done, let's move on to the next phase" sense, so e.g. having the a53 completing their work units at 1/3rd (or whatever) the rate of the a72 cores should be no problem. I would expect the memory/cache-locality bandwidth between the 2 CPUs to be the bigger gating factor. Anyhow, the quickest way to find out is to get hold of one of the dev-boards they're gifting a small subset of Odroiders, I asked to be included in the list of potential grantees, so we can hope. In the meantime, though, if someone has access to one of currently-available (and pricier) big-LITTLE dual-cortex-CPUs-of-different-flavors dev-boards, they could play with this aspect. |
Looks like the compiler flags to use with gcc for this announced product would be [c]-march=armv8-a -mtune=cortex-a72.cortex-a53[/c]
|
[QUOTE=GP2;479581]Looks like the compiler flags to use with gcc for this announced product would be [c]-march=armv8-a -mtune=cortex-a72.cortex-a53[/c][/QUOTE]
Note that in my C2 SIMD builds I found a slight negative timing impact from using the a53 arch-flags, so I eschew them. YMMV, but AFAICT the only good reason to invoke such flags is if your platform requires them, which is sometimes not easy to tell - e.g. I had one builder whose build runtime-segfaulted sans the arch-flags for his CPU, said issue was cured by invoking them on rebuild. |
Thanks to a well-placed Odroider who was one of the selected recipients of a pre-release N1 system and was kind enough to try out my code on some, [url=https://forum.odroid.com/viewtopic.php?f=140&t=29930#p217216]we have N1 timings[/url] to mull over. Couple of notes:
1. His Debian build bonked with segfaults, likely similar miscompilation issue as TomW hit (but haven't bothered to do the deeper digging needed to precisely localize the cause). But my C2-build (under the standard Ubuntu distro Hardkernel ships with that unit) worked for him. Since that same build appears to run in drop-in mode on a surprising variety of ARMv8 platforms (including Raspberry Pi3), I've posted it to the Mlucas ftp site and added corresponding link/verbiage to the Mlucas readme page. 2. I'm still waiting for more data re. running code on both sockets (i.e. 6 total cores/threads), but preliminarily it looks like running separate jobs on the A72 and the A53 is best, as I surmised would be the case. A72, 2-cores - I've snipped the ROE stats column from both mlucas.cfg files' data for the sake of readability: [code] 1024 msec/iter = 43.98 radices = 256 8 16 16 0 1152 msec/iter = 47.97 radices = 144 16 16 16 0 1280 msec/iter = 51.81 radices = 160 16 16 16 0 1408 msec/iter = 60.31 radices = 176 16 16 16 0 1536 msec/iter = 65.26 radices = 192 16 16 16 0 1664 msec/iter = 71.93 radices = 208 16 16 16 0 1792 msec/iter = 78.33 radices = 224 16 16 16 0 1920 msec/iter = 85.62 radices = 240 16 16 16 0 2048 msec/iter = 91.51 radices = 256 16 16 16 0 2304 msec/iter = 108.23 radices = 288 16 16 16 0 2560 msec/iter = 121.80 radices = 160 32 16 16 0 2816 msec/iter = 140.07 radices = 176 32 16 16 0 3072 msec/iter = 149.53 radices = 192 32 16 16 0 3328 msec/iter = 165.62 radices = 208 32 16 16 0 3584 msec/iter = 180.50 radices = 224 32 16 16 0 3840 msec/iter = 195.86 radices = 240 32 16 16 0 4096 msec/iter = 212.20 radices = 256 32 16 16 0 4608 msec/iter = 249.22 radices = 288 32 16 16 0 5120 msec/iter = 278.39 radices = 160 32 32 16 0 5632 msec/iter = 316.39 radices = 176 32 32 16 0 6144 msec/iter = 339.48 radices = 192 32 32 16 0 6656 msec/iter = 376.19 radices = 208 32 32 16 0 7168 msec/iter = 407.32 radices = 224 32 32 16 0 7680 msec/iter = 446.24 radices = 240 32 32 16 0[/code] A72, 2-cores: [code] Speedup vs A53x4: 1024 msec/iter = 31.18 radices = 64 8 8 8 16 1.41 1152 msec/iter = 35.09 radices = 288 8 16 16 0 1.37 1280 msec/iter = 41.47 radices = 160 16 16 16 0 1.25 1408 msec/iter = 48.19 radices = 176 16 16 16 0 1.25 1536 msec/iter = 51.76 radices = 48 32 32 16 0 1.26 1664 msec/iter = 57.13 radices = 208 16 16 16 0 1.26 1792 msec/iter = 60.23 radices = 224 16 16 16 0 1.30 1920 msec/iter = 65.86 radices = 240 16 16 16 0 1.30 2048 msec/iter = 66.59 radices = 128 16 16 32 0 1.37 2304 msec/iter = 75.49 radices = 144 16 16 32 0 1.43 2560 msec/iter = 80.48 radices = 160 8 8 8 16 1.51 2816 msec/iter = 94.42 radices = 176 8 8 8 16 1.48 3072 msec/iter = 102.73 radices = 192 8 8 8 16 1.46 3328 msec/iter = 110.71 radices = 208 8 8 8 16 1.50 3584 msec/iter = 115.94 radices = 224 8 8 8 16 1.56 3840 msec/iter = 125.06 radices = 240 8 8 8 16 1.57 4096 msec/iter = 134.47 radices = 256 8 8 8 16 1.58 4608 msec/iter = 150.90 radices = 288 8 8 8 16 1.66 5120 msec/iter = 181.31 radices = 160 8 8 16 16 1.54 5632 msec/iter = 210.01 radices = 176 8 8 16 16 1.51 6144 msec/iter = 227.63 radices = 192 8 8 16 16 1.49 6656 msec/iter = 248.11 radices = 208 8 8 16 16 1.52 7168 msec/iter = 261.58 radices = 224 8 8 16 16 1.56 7680 msec/iter = 284.01 radices = 240 8 8 16 16 1.57[/code] Total system throughput is thus ~2.5x that of my A53x4 Odroid C2 and as such, appreciably exceeds that of an Mlucas SSE2 build running 2-threaded on my 2GHz Core2Duo macbook. Likely still lags Prime95 running on the latter hardware, but I expect not by all that much. |
We made the cover of [url=https://forum.odroid.com/viewtopic.php?f=149&t=30437]Odroid Magazine[/url].
|
Wow, nice! congrats!
[QUOTE=ewmayer;482325]We made the cover of [URL="https://forum.odroid.com/viewtopic.php?f=149&t=30437"]Odroid Magazine[/URL].[/QUOTE] |
[QUOTE=ewmayer;482325]We made the cover of [url=https://forum.odroid.com/viewtopic.php?f=149&t=30437]Odroid Magazine[/url].[/QUOTE]
Nice! I backed up the article on a post on LinkedIn, I hope you won't complain. In case I will take it out. Luigi |
Another article on the Odroid N1:
[url=https://www.cnx-software.com/2018/02/06/hardkernel-unveils-odroid-n1-board-with-rockchip-rk3399-processor-4gb-ram-dual-sata-and-more/]Hardkernel Unveils ODROID-N1 Board with Rockchip RK3399 Processor, 4GB RAM, Dual SATA, and More[/url] | Embedded Systems News [quote]Price should be around $110, with the final price depending on the DRAM market price. [b]If the board is popular, an ODROID-N1 Lite version with 2GB RAM and no SATA port may be launched later on for $75.[/b][/quote] Not a PC-hardware (except for the CPUs themselves) wonk ... would someone like me, who crunches primes, care about the lack of a SATA port? If not, the modest $75 free-gear credit I got as a gratuity for my Odroid-Mag primes article would nicely cover the cost of a SATA-less N1. |
[QUOTE=ewmayer;482950]Another article on the Odroid N1:
[url=https://www.cnx-software.com/2018/02/06/hardkernel-unveils-odroid-n1-board-with-rockchip-rk3399-processor-4gb-ram-dual-sata-and-more/]Hardkernel Unveils ODROID-N1 Board with Rockchip RK3399 Processor, 4GB RAM, Dual SATA, and More[/url] | Embedded Systems News Not a PC-hardware (except for the CPUs themselves) wonk ... would someone like me, who crunches primes, care about the lack of a SATA port? If not, the modest $75 free-gear credit I got as a gratuity for my Odroid-Mag primes article would nicely cover the cost of a SATA-less N1.[/QUOTE] Who in the world would ever need a SATA port on a number crunching Odroid? :-) |
Just had a gander at the Odroid-forums for the time in months ... [url=https://forum.odroid.com/viewtopic.php?f=149&t=31277]this June thread[/url] by "odroid" (a.k.a. Hardkernel engineering officialdom) indicates that due to a 1GB mem-chip phase-out-by-supplier they have had to do major redesign, as a result of which the next-gen Odroid will be called N2, not N1. On the good-news front that likely means more memory and they also are doing the redesign around a new SoC, which hopefully will mean better performance and/or lower power draw. They say they are targeting a September rollout. No word on how long after that a less-expensive SATA-port-less iterate of the kind they mentioned for the N1 might roll out.
|
The Odroid N2, the successor to the earlier-planned but since abandoned N1, [url=https://forum.odroid.com/viewtopic.php?f=176&t=33781]is finally available[/url]. The N1 was a BIG.little with a72x2.a53x4 CPU, whereas the N2 is a73x4.a53x2 which is faster due to it putting 4 cores in the BIG a73 part of the die.
Going to the hardkernel.com site and clicking on [url=https://www.hardkernel.com/shop/odroid-n2-with-4gbyte-ram/]the N2 product link[/url] brings up only the 4GB model - I have a $75 credit from HK as thanks for the Odroid-magazine article I wrote a year ago, and am hoping I can get an N2 plus all the needed add-ons for no more than that. 2GB is plenty for LL testing, so I'm looking at the 2GB model, which one can access by simply [url=https://www.hardkernel.com/shop/odroid-n2-with-4gbyte-ram/]fiddling 4 to 2 in the foregoing URL[/url]. Looks like I'll need to also buy a 12V/2A DC power plug an a cheapie $4 plastic case for easier handling and to keep the dust out. Looks like there is also an optional WiFi module, which might be a nice-to-have add-on. It's not clear to me whether one needs a Linux-preflashed eMMC card to boot - I bought an 8GB such for my C2, but for the N2, the above links state: [quote][b]SPI Flash memory boot[/b] ODROID-N2 can boot from on-board SPI memory instead of uSD memory or eMMC cards. The on-board SPI memory is 8MB in size and can include the bootstrap binaries, U-boot, bare minimum Linux kernel, and a ramdisk that includes “Petitboot”. The “Petitboot” software provides a user friendly interface and allows users to select a boot media. Unfortunately, since the SPI bus on S922X shares the hardware interface with eMMC, the SPI flash memory on ODROID-N2 is only accessible at boot until the eMMC hardware block is activated. So you have to remove eMMC module and boot from a SD card to update firmware in the SPI flash easily.[/quote] If I can avoid shelling out extra for an eMMC (the smallest available of which is 16GB @$14.90) my total including shipping will be just a smidge over my target of $75. |
[QUOTE=ewmayer;513998]It's not clear to me whether one needs a Linux-preflashed eMMC card to boot - I bought an 8GB such for my C2, but for the N2, the above links state:
If I can avoid shelling out extra for an eMMC (the smallest available of which is 16GB @$14.90) my total including shipping will be just a smidge over my target of $75.[/QUOTE] Kinda weird. You own this kit, but you're not sure about how much it is going to cost you to use. Am I understanding you correctly? |
[QUOTE=chalsall;514001]Am I understanding you correctly?[/QUOTE]
No - I am planning to put my $75 honorarium toward one of these, trying to figuring out precisely which 'add-ons' are needed. For C2 I needed case/charger/eMMC, for N2 one again needs the first 2 but the description doesn't make it completely clear whether one needs to shell out for an extra eMMC or not. |
[QUOTE=ewmayer;514013]No - I am planning to put my $75 honorarium toward one of these, trying to figuring out precisely which 'add-ons' are needed. For C2 I needed case/charger/eMMC, for N2 one again needs the first 2 but the description doesn't make it completely clear whether one needs to shell out for an extra eMMC or not.[/QUOTE]
At only 8MB for the SPI memory you're going to want a micro SD card or emmc. Sounds like it can boot from either but it can also boot from SPI for some very niche applications. SD card would be cheaper. |
[QUOTE=M344587487;514045]At only 8MB for the SPI memory you're going to want a micro SD card or emmc. Sounds like it can boot from either but it can also boot from SPI for some very niche applications. SD card would be cheaper.[/QUOTE]
Thanks - so basically the same deal as for my C2. Just placed order for 2GB N2, 12V/2A PS, 8GB Linux-preflashed eMMC and clear case. Total $85, and turns out my honorarium was only $65 rather than $75, but OTOH they said shipping is free for any stuff I order with that. So a $20 cash outlay, not bad for something I reckon will provide 2.5-3x the Mlucas throughput of my C2. (FYI, each of my Galaxy S7 phones is about equal to the abortive Odroid N1 at ~2.2x the C2). |
Odroid N2 arrived last week via express mail (they paid for shipping, as part of the Odroid-magazine piece 'thanks') from Hardkernel ... finally got round to setting it up over the weekend. First impressions: the board is ~50% larger in terms of area than the C2, and the whole thing is much heavier, with the entire bottom of the board - the CPU-side - covered by a massive Alu. heat sink. No fan on this model, but I let it sit on my desk and run full-blast overnight, the heatsink gets warm but no signs of throttling in the resulting timings. Setup mostly same as for the C2, except this time an eMMC preflashed with OS image in place of the similar-sized microSD card for the C2.
/proc/cpuinfo shows the 2 'little' a53 cores labeled cpu0-1 and the 4 'big' a73 cores labeled cpu2-5. Mlucas self-tests give these timings per the resulting mlucas.cfg files - I've edited the data columns to remove roundoff error data and trailing 0-radices in favor of an FFT-performance column which in each table uses the 2048K timing as a baseline and lists the timing divided by the ideal n*log(n) scaling for the larger FFT lengths, in the 'perf = ' column: [0] Running 2-threaded on the a53 cores (I killed the self-test after it finished 4096K, since will only do DCs on the a53: [code] 2048 msec/iter = 135.17 radices = 256 16 16 16 perf = 1.000 2304 msec/iter = 159.05 radices = 288 16 16 16 perf = 1.036 2560 msec/iter = 180.14 radices = 160 32 16 16 perf = 1.030 2816 msec/iter = 209.14 radices = 176 32 16 16 perf = 1.080 3072 msec/iter = 219.58 radices = 192 32 16 16 perf = 1.028 3328 msec/iter = 240.32 radices = 208 32 16 16 perf = 1.029 3584 msec/iter = 258.15 radices = 224 32 16 16 perf = 1.017 3840 msec/iter = 282.10 radices = 240 32 16 16 perf = 1.028 4096 msec/iter = 303.91 radices = 256 32 16 16 perf = 1.030[/code] [1] Running 4-threaded on the a73 cores: [code] 2048 msec/iter = 37.86 radices = 256 16 16 16 perf = 1.000 2304 msec/iter = 43.92 radices = 288 16 16 16 perf = 1.015 2560 msec/iter = 48.30 radices = 160 16 16 32 perf = 0.992 2816 msec/iter = 54.99 radices = 176 16 16 32 perf = 1.014 3072 msec/iter = 60.80 radices = 192 16 16 32 perf = 1.017 3328 msec/iter = 66.33 radices = 208 16 16 32 perf = 1.014 3584 msec/iter = 71.01 radices = 224 16 16 32 perf = 0.998 3840 msec/iter = 76.88 radices = 240 16 16 32 perf = 1.001 4096 msec/iter = 82.08 radices = 256 16 16 32 perf = 0.994 4608 msec/iter = 94.85 radices = 288 16 16 32 perf = 1.006 5120 msec/iter = 107.73 radices = 320 16 16 32 perf = 1.016 5632 msec/iter = 130.55 radices = 176 32 32 16 perf = 1.107 6144 msec/iter = 135.81 radices = 768 16 16 16 perf = 1.045 6656 msec/iter = 153.60 radices = 208 32 32 16 perf = 1.081 7168 msec/iter = 164.06 radices = 224 32 32 16 perf = 1.063 7680 msec/iter = 179.47 radices = 240 32 32 16 perf = 1.077[/code] Notice how the quad a73 *really* likes the larger leading FFT radices (call them r0), especially above 4M FFT. For example, notice the timing jump from 5120K, where r0 = 320 is supported in v18, to 5632K, where the current largest r0 = 176. I already knew about this ARM-core trend from my tests on the older 4xa53 C2 and on the quad ARM-based CPUs of my cellphone cluster, so already implemented support for r0 = 352 in my v19 dev-branch code a couple months back, shortly after releasing v18. I may not in fact need r0 = 384 for 6144K FFT since I already have r0 = 768 there, but I'll probably implement it anyway in a future version (likely v20) because it can't hurt, and might help on a decent subset of CPUs. Overall on the N2, the 'big' quadcore gets ~4x the throughput of the 'little' dual-core. (Recall that the experimental-only N1 had a lower-throughput 2xa72|4xa53 combo.) Once I'd done the above self-tests I fired up a DC@2816K on the 2xa53 and a first-time test@5120K on the 4xa73. With both jobs running the DC gest 246 ms/iter and the first-time test gets 118 ms/iter, thus ~10% lower throughput than for the individual big|little timings above. The resulting total throughput is ~1.3x that of one of my Galaxy S7 compute-cluster broke-o-phones, so if one only cares about FLOPS/$ (as opposed to using the N2 with all its handy connector ports for more general purposes), the old-phone route is a clear win. Now, my desk is already cluttered enough, so having done the basic timing tests, I would happily resell my almost-brand-new N2 for slightly under cost to an interested party - here is the inventory and Hardkernel prices: [url=https://www.hardkernel.com/shop/odroid-n2-with-2gbyte-ram/]ODROID-N2 with 2GByte RAM[/url] - $63 [url=https://www.hardkernel.com/shop/12v-2a-power-supply-us-plug/]12V/2A power supply US plug[/url] - $5.50 [url=https://www.hardkernel.com/shop/odroid-n2-case-clear/]ODROID-N2 Case Clear[/url] - $4 [url=https://www.hardkernel.com/shop/8gb-emmc-module-n2-linux/]8GB eMMC Module N2 Linux[/url] - $12.90 Total = $85.40, plus whatever shipping cost Hardkernel charges. How about $75 including free shipping to the US? |
[QUOTE=ewmayer;515969]Total = $85.40, plus whatever shipping cost Hardkernel charges. How about $75 including free shipping to the US?[/QUOTE]What's the lowest cost of shipping to La Palma? (I'm in no hurry.) Do you take PayPal or would I have to put a USD 100 beer voucher in the post to you?
I already have a Raspberry Pi-2 lined up for the role but it would appear that your kit would be more suited as a telescope controller as it has more and faster USB more and faster cpus, ditto memory and 1G ethernet. |
[QUOTE=xilman;516021]What's the lowest cost of shipping to La Palma? (I'm in no hurry.) Do you take PayPal or would I have to put a USD 100 beer voucher in the post to you?
I already have a Raspberry Pi-2 lined up for the role but it would appear that your kit would be more suited as a telescope controller as it has more and faster USB more and faster cpus, ditto memory and 1G ethernet.[/QUOTE] I am interested in getting one of these, maybe using it as a desktop replacement, especially as I can run Debian Stretch. I was looking here [url]https://www.liymo.com/index.php?route=product/product&path=94_239&product_id=849[/url] |
How is it doing in relation to memory bandwidth? I assume it should be ok as it is only the speed of 1 core of a Q6600.
|
[QUOTE=xilman;516021]What's the lowest cost of shipping to La Palma? (I'm in no hurry.) Do you take PayPal or would I have to put a USD 100 beer voucher in the post to you?
I already have a Raspberry Pi-2 lined up for the role but it would appear that your kit would be more suited as a telescope controller as it has more and faster USB more and faster cpus, ditto memory and 1G ethernet.[/QUOTE] Paul, easiest way to cost/ship this to you is via a "private eBay sale". Here is the idea: I post a simple "N2 for sale" item with a ridiculous asking price (to keep anyoe else from bidding on it, or to make me me a lot of $ should someone do so despite the price), but allowing would-be-buyers to make offers, and I set the min-offer threshold very low. Ebay has a global shipping program which saves me all the customs-forms hassle - I just pack the item and ship to a US hub, they take care of the rest. You view the item from your locale should give you a shipping-cost based on said locale. If it's not egregious - and it shouldn't be for a roughly 8oz small package like this - you offer me some insanely low price for the item, which minimizes the 10% ebay fee gouge on the sale. Any remaining difference between the $75 I cited and the min-offer price we officially agreed on you can paypal me vai the "send money to friends & family" option sans fee. But that requires you to have an ebay account [ebay UK should be fine] - do you? If so, e-mail me, it should take no more than a few minutes to get the item listing done. [QUOTE=henryzz;516030]How is it doing in relation to memory bandwidth? I assume it should be ok as it is only the speed of 1 core of a Q6600.[/QUOTE] The flatness of the n*log(n)-normalized timings I posted indicate mem-bandwidth it well matched to the CPU. |
[QUOTE=paulunderwood;516026]I am interested in getting one of these, maybe using it as a desktop replacement, especially as I can run Debian Stretch. I was looking here [url]https://www.liymo.com/index.php?route=product/product&path=94_239&product_id=849[/url][/QUOTE]
107 Euros - does that include shipping and tax(es)? And how does that price out for you compared to going directly to the hardkernel.com site? |
[QUOTE=ewmayer;516082]107 Euros - does that include shipping and tax(es)? And how does that price out for you compared to going directly to the hardkernel.com site?[/QUOTE]
It is £94.90 with a case and transformer. Although I have a spare SD card, I would probably want a big eMMC card -- read 128GB -- running faster-than-SD swap file, if I am to use it as a desktop. |
Just for grins, I [url=https://www.ebay.com/itm/Odroid-N2-hex-core-Micro-PC-with-2GB-RAM/223508043623]created an eBay listing[/url] with an ask of $195 and a min-offer setting of $1. I used a Pi3 as my item template (Odroids are scarce on eBay), so ignore all the Pi3-boilerplate in the listing.
I'd be interested in what shipping charges our 2 Pauls see to their respective locales. |
[QUOTE=ewmayer;516088]Just for grins, I [url=https://www.ebay.com/itm/Odroid-N2-hex-core-Micro-PC-with-2GB-RAM/223508043623]created an eBay listing[/url] with an ask of $195 and a min-offer setting of $1. I used a Pi3 as my item template (Odroids are scarce on eBay), so ignore all the Pi3-boilerplate in the listing.
I'd be interested in what shipping charges our 2 Pauls see to their respective locales.[/QUOTE] That is a 2 gigabyte RAM version of the board? I want 4GB. "This item will ship to United Kingdom, but the seller has not specified shipping options. Contact the seller - opens in a new window or tab and request a shipping method to your location." |
[QUOTE=paulunderwood;516089]That is a 2 gigabyte RAM version of the board? I want 4GB.[/quote]
I know - I was hoping to get shipping-cost data to UK and Grand Canaria from you and the other Paul, no need to commit to buy the item involved (I hope). [quote]"This item will ship to United Kingdom, but the seller has not specified shipping options. Contact the seller - opens in a new window or tab and request a shipping method to your location."[/QUOTE] What do you see when you open said new window/tab? In creating the listing I was only allowed to enable/disable the Global Shipping option, so it seems the detailed shipping method for same is specified by the buyer, via the menu item you cite. |
[QUOTE=ewmayer;516091]I know - I was hoping to get shipping-cost data to UK and Grand Canaria from you and the other Paul, no need to commit to buy the item involved (I hope).
What do you see when you open said new window/tab? In creating the listing I was only allowed to enable/disable the Global Shipping option, so it seems the detailed shipping method for same is specified by the buyer, via the menu item you cite.[/QUOTE] Login screen. I don't have a ebay account but I know someone who does. I want a UK pin plug, 4GB RAM, case and 128GB eMMC. Here that is £166. The US-UK exchange puts it at nearly $200, but then there are import duties for stuff from USA to the UK, even handling fees. So it looks like a local buy is not too bad for me and no hassle for you. |
BTW, should Paul L. opt to pass (not sure how much hassle a small eMMC and US plug would cause for him), any of our US-side readers is welcome to make me a $1 bid, with the remaining $74 to be paypalled separately. As long as the little guy finds a loving home! :)
[One might ask "but as there are no customs forms involved in US shipping, why involve eBay at all?" Turns out it's actually easier to create a shipping label via eBay sale than via the USPS site, and eBay gives me a better shipping discount for such online label creation, to boot. Silly, but true.] |
Is this worth considering?
[url]https://www.pine64.org/rockpro64/[/url] |
[QUOTE=Xyzzy;516130]Is this worth considering?
[url]https://www.pine64.org/rockpro64/[/url][/QUOTE] Yes, RocPro was mentioned on the Odroid forums as a serious competitior in the space. Looks similarly priced to the N2, however the Rockpro64 hex-core SOC is more or less the same as the abandoned Odroid N1: "dual ARM Cortex A72 and quad ARM Cortex A53". Based on my timings on both N1 and N2, each a72/a73 core is ~equal to 2 a53 cores, so the N1 and Rockpro64 represent ~8x a53 worth of crunching power, whereas the N2 represents ~10x. So you need to net out your total system cost including power supply, case (if you want one), shipping and any taxes. Might be useful to get Mlucas timings on the Rockpro64 just to see how the overall system throughput (not just the theoretical based solely on CPU) compares. |
Paul has claimed the N2 for his Canary Islands [strike]superhero lair[/strike] mountaintop observatory and promised to name the next supernova he discovers in my honor, or something. As he won't be there for another month I get that much more crunching in, that should get me ~25% of the way through a first-time LL test on the big quadcore as well as a DC on the little dualcore.
Oh, my fiendishly clever scheme to get eBay to do all the annoying USPS customs paperwork proved a bust - eBay's global shipping program only goes to a modest number of 'popular' high-volume shipping destinations. That includes the UK, but Paul said if I shipped it there he would face a 20% VAT usury fee versus the much lower 6% one in Las Canarias. |
[QUOTE=ewmayer;516080]Ebay has a global shipping program[/QUOTE]
I loathe that program. It's the biggest reason why I stopped buying from US sellers on eBay. The brokerage fees, taxation of items that are tax exempt, and hefty shipping fees (on top of whatever the seller is paying), often double the price of items under $50. |
As a follow up, did you guys install Debian on your N2s?
|
Anyway, I have just ordered one (4GB) and will be using a 64GB micro SD -- there is a link on Odroid's forum for a Debian image. I will be trying to run it as my main desktop with the proviso I can get Skype working on it
|
[QUOTE=paulunderwood;518530]As a follow up, did you guys install Debian on your N2s?[/QUOTE]Not yet. Ernst will (I hope) mail it to me in a week or so.
|
[QUOTE=ewmayer;516500]Paul has claimed the N2 for his Canary Islands [strike]superhero lair[/strike] mountaintop observatory and promised to name the next supernova he discovers in my honor, or something. As he won't be there for another month I get that much more crunching in, that should get me ~25% of the way through a first-time LL test on the big quadcore as well as a DC on the little dualcore.
Oh, my fiendishly clever scheme to get eBay to do all the annoying USPS customs paperwork proved a bust - eBay's global shipping program only goes to a modest number of 'popular' high-volume shipping destinations. That includes the UK, but Paul said if I shipped it there he would face a 20% VAT usury fee versus the much lower 6% one in Las Canarias.[/QUOTE]Plateau-top, actually. I'm almost 2km below the mountain top. If I discover a SN, extremely unlikely because the big boys have that sewn up, you will be mentioned in dispatches. |
[QUOTE=xilman;518549]Not yet. Ernst will (I hope) mail it to me in a week or so.[/QUOTE]
Based on the dates you sent me I am planning to ship it on Monday the 10th ... no idea what transit time to expect, but it sounded like your main concern was that it not arrive earlier than yourself. Until then, it's continuing to crunch a first-time LL test on the 4xa73 core and a DC on the 2xa53 ... there have been no signs of throttling, the big passive heatsink appears to suffice for all reasonable ambient temperatures. |
[QUOTE=ewmayer;518565]Based on the dates you sent me I am planning to ship it on Monday the 10th ... no idea what transit time to expect, but it sounded like your main concern was that it not arrive earlier than yourself. Until then, it's continuing to crunch a first-time LL test on the 4xa73 core and a DC on the 2xa53 ... there have been no signs of throttling, the big passive heatsink appears to suffice for all reasonable ambient temperatures.[/QUOTE]
Excellent. It's intended for use as a telescope controller and so is likely to be used mostly at night, where the ambient temperature is almost always between 8C and 18C. |
[QUOTE=ewmayer;518565]Based on the dates you sent me I am planning to ship it on Monday the 10th ... no idea what transit time to expect, but it sounded like your main concern was that it not arrive earlier than yourself. Until then, it's continuing to crunch a first-time LL test on the 4xa73 core and a DC on the 2xa53 ... there have been no signs of throttling, the big passive heatsink appears to suffice for all reasonable ambient temperatures.[/QUOTE]
My N2 is here, but I am awaiting a SDcard for it. Ernst, how long does it take to do a wavefront LL test on it? I will be looking for some number crunching for it. Is it quite easy to download your code and get it running on a N2? Instructions please! |
1 Attachment(s)
[QUOTE=paulunderwood;518790]My N2 is here, but I am awaiting a SDcard for it. Ernst, how long does it take to do a wavefront LL test on it? I will be looking for some number crunching for it. Is it quite easy to download your code and get it running on a N2? Instructions please![/QUOTE]
[[b]Edit:[/b] I just realized that the cfg-file data I copied below are from my "advance peek" v19 binary, so you might as well just use that one from the get-go - cf. the attachment at bottom.] Yes, it's very easy to get up and running - download the ARMv8 binary linked at the [url=http://www.mersenneforum.org/mayer/README.html]readme page[/url], set up a pair of rundirs, one for the job which will run on the big and little CPUs, respectively. On the N2, the two a53 cores are numbered 0-1 in /proc/cpuinfo and the four a73 cores are 2-5 - I suggest you double-check that in your own copy of said file, because it's crucial to getting the most out of your Mlucas runs. So say you call the a73-rundir 'run0' and the a53 one 'run1'. To create the optimal-FFT-config files in each: In run0: [path to exec] -s m -iters 100 -cpu 2:5 In run1: [path to exec] -s m -iters 100 -cpu 0:1 I suggest doing these sequentially, to avoid any timing weirdness from the short timing subtests done by each self-test somehow throwing each other off. By way of reference, here is the 4a73 mlucas.cfg from my N2 self-tests: [code]18.0 2048 msec/iter = 37.48 ROE[avg,max] = [0.325223214, 0.375000000] radices = 256 16 16 16 0 0 0 0 0 0 2304 msec/iter = 43.51 ROE[avg,max] = [0.287946429, 0.343750000] radices = 288 16 16 16 0 0 0 0 0 0 2560 msec/iter = 48.17 ROE[avg,max] = [0.275669643, 0.312500000] radices = 160 16 16 32 0 0 0 0 0 0 2816 msec/iter = 54.51 ROE[avg,max] = [0.259933036, 0.312500000] radices = 352 16 16 16 0 0 0 0 0 0 3072 msec/iter = 60.39 ROE[avg,max] = [0.316294643, 0.400000000] radices = 192 16 16 32 0 0 0 0 0 0 3328 msec/iter = 65.47 ROE[avg,max] = [0.280580357, 0.375000000] radices = 208 16 16 32 0 0 0 0 0 0 3584 msec/iter = 69.10 ROE[avg,max] = [0.325000000, 0.375000000] radices = 224 16 16 32 0 0 0 0 0 0 3840 msec/iter = 75.63 ROE[avg,max] = [0.275892857, 0.312500000] radices = 240 16 16 32 0 0 0 0 0 0 4096 msec/iter = 79.60 ROE[avg,max] = [0.267633929, 0.343750000] radices = 256 16 16 32 0 0 0 0 0 0 4608 msec/iter = 91.84 ROE[avg,max] = [0.284375000, 0.375000000] radices = 288 16 16 32 0 0 0 0 0 0 5120 msec/iter = 104.83 ROE[avg,max] = [0.323437500, 0.406250000] radices = 320 16 16 32 0 0 0 0 0 0 5632 msec/iter = 114.77 ROE[avg,max] = [0.228450230, 0.250000000] radices = 352 16 16 32 0 0 0 0 0 0 6144 msec/iter = 134.42 ROE[avg,max] = [0.240848214, 0.281250000] radices = 768 16 16 16 0 0 0 0 0 0 6656 msec/iter = 149.31 ROE[avg,max] = [0.266964286, 0.343750000] radices = 208 32 32 16 0 0 0 0 0 0 7168 msec/iter = 159.99 ROE[avg,max] = [0.228906250, 0.281250000] radices = 224 32 32 16 0 0 0 0 0 0 7680 msec/iter = 174.89 ROE[avg,max] = [0.252455357, 0.312500000] radices = 240 32 32 16 0 0 0 0 0 0[/code] Those timings reflect no crunching going on on the a53 CPU - with jobs running on both you can expect a 5-10% timing hit. On my N2 a first-time LL-test with p ~ 89M [5120K FFT] is getting ~110ms/iter 4-threaded on the a73, thus ~4 months per test; a DC with p ~50M [2816K FFT] is getting ~220ms/iter 2-threaded on the a53. The latter core is only half as strong as the quad-a53 on my Odroid C2, so even DCs run painfully slowly on it. The quad Snapdragon CPU in each of my dozen Galaxy S7 broke-o-phones is about the same speed as the N2's quad-a73 core, by way of comparison. Once you're up and running and have a few checkpoints under your belt, I'll post an "advance peek" v19 binary - same build I recently switched all my ARMs to - which still lacks the PRP support which will go into the final v19 release, but has some speedups related to relaxing the floating-point accuracy requirements for exponents not close to the p_max for each FFT length. I'm getting 2-8% speedup (depending on FFT length, exponent and random run-to-run timing variations) from using the new code, for the ~90% of exponents at each FFT length which are eligible for the accuracy-for-speed tradeoff. From the user perspective, it's a simply drop-in binary replacement, though. |
[QUOTE=ewmayer;518816][[b]Edit:[/b] I just realized that the cfg-file data I copied below are from my "advance peek" v19 binary, so you might as well just use that one from the get-go - cf. the attachment at bottom.]
Yes, it's very easy to get up and running - download the ARMv8 binary linked at the [url=http://www.mersenneforum.org/mayer/README.html]readme page[/url], set up a pair of rundirs, one for the job which will run on the big and little CPUs, respectively. On the N2, the two a53 cores are numbered 0-1 in /proc/cpuinfo and the four a73 cores are 2-5 - I suggest you double-check that in your own copy of said file, because it's crucial to getting the most out of your Mlucas runs. So say you call the a73-rundir 'run0' and the a53 one 'run1'. To create the optimal-FFT-config files in each: In run0: [path to exec] -s m -iters 100 -cpu 2:5 In run1: [path to exec] -s m -iters 100 -cpu 0:1 I suggest doing these sequentially, to avoid any timing weirdness from the short timing subtests done by each self-test somehow throwing each other off. By way of reference, here is the 4a73 mlucas.cfg from my N2 self-tests: [code]18.0 2048 msec/iter = 37.48 ROE[avg,max] = [0.325223214, 0.375000000] radices = 256 16 16 16 0 0 0 0 0 0 2304 msec/iter = 43.51 ROE[avg,max] = [0.287946429, 0.343750000] radices = 288 16 16 16 0 0 0 0 0 0 2560 msec/iter = 48.17 ROE[avg,max] = [0.275669643, 0.312500000] radices = 160 16 16 32 0 0 0 0 0 0 2816 msec/iter = 54.51 ROE[avg,max] = [0.259933036, 0.312500000] radices = 352 16 16 16 0 0 0 0 0 0 3072 msec/iter = 60.39 ROE[avg,max] = [0.316294643, 0.400000000] radices = 192 16 16 32 0 0 0 0 0 0 3328 msec/iter = 65.47 ROE[avg,max] = [0.280580357, 0.375000000] radices = 208 16 16 32 0 0 0 0 0 0 3584 msec/iter = 69.10 ROE[avg,max] = [0.325000000, 0.375000000] radices = 224 16 16 32 0 0 0 0 0 0 3840 msec/iter = 75.63 ROE[avg,max] = [0.275892857, 0.312500000] radices = 240 16 16 32 0 0 0 0 0 0 4096 msec/iter = 79.60 ROE[avg,max] = [0.267633929, 0.343750000] radices = 256 16 16 32 0 0 0 0 0 0 4608 msec/ihttps://www.mersenneforum.org/mayer/README.html#downloadter = 91.84 ROE[avg,max] = [0.284375000, 0.375000000] radices = 288 16 16 32 0 0 0 0 0 0 5120 msec/iter = 104.83 ROE[avg,max] = [0.323437500, 0.406250000] radices = 320 16 16 32 0 0 0 0 0 0 5632 msec/iter = 114.77 ROE[avg,max] = [0.228450230, 0.250000000] radices = 352 16 16 32 0 0 0 0 0 0 6144 msec/iter = 134.42 ROE[avg,max] = [0.240848214, 0.281250000] radices = 768 16 16 16 0 0 0 0 0 0 6656 msec/iter = 149.31 ROE[avg,max] = [0.266964286, 0.343750000] radices = 208 32 32 16 0 0 0 0 0 0 7168 msec/iter = 159.99 ROE[avg,max] = [0.228906250, 0.281250000] radices = 224 32 32 16 0 0 0 0 0 0 7680 msec/iter = 174.89 ROE[avg,max] = [0.252455357, 0.312500000] radices = 240 32 32 16 0 0 0 0 0 0[/code] Those timings reflect no crunching going on on the a53 CPU - with jobs running on both you can expect a 5-10% timing hit. On my N2 a first-time LL-test with p ~ 89M [5120K FFT] is getting ~110ms/iter 4-threaded on the a73, thus ~4 months per test; a DC with p ~50M [2816K FFT] is getting ~220ms/iter 2-threaded on the a53. The latter core is only half as strong as the quad-a53 on my Odroid C2, so even DCs run painfully slowly on it. The quad Snapdragon CPU in each of my dozen Galaxy S7 broke-o-phones is about the same speed as the N2's quad-a73 core, by way of comparison. Once you're up and running and have a few checkpoints under your belt, I'll post an "advance peek" v19 binary - same build I recently switched all my ARMs to - which still lacks the PRP support which will go into the final v19 release, but has some speedups related to relaxing the floating-point accuracy requirements for exponents not close to the p_max for each FFT length. I'm getting 2-8% speedup (depending on FFT length, exponent and random run-to-run timing variations) from using the new code, for the ~90% of exponents at each FFT length which are eligible for the accuracy-for-speed tradeoff. From the user perspective, it's a simply drop-in binary replacement, though.[/QUOTE] Thanks Ernst. I hope to get it up and running soon. I can wait for 4 months for the PRP version :wink: Nice website btw. :tu: |
[QUOTE=paulunderwood;518824]Thanks Ernst. I hope to get it up and running soon. I can wait for 4 months for the PRP version :wink: Nice website btw. :tu:[/QUOTE]
But there's really no reason to wait for the PRP version - all my various ARM-based crunching devices, including the ones you'd think might be exceedingly unreliable under 24/7 load, the broke-o-phones, have proven to be superbly reliable. The one tweak I made to the v18 release based on the multiple phone DCs was to add handling for a particular data corruption error those appear more prone to than PC-style platforms, but even the DCs interrupted by said error (which proceeded to the next worktodo.in file entry before I added the error-handling logic) later completed with first-test-matching results. PRP+Gerbicz is expected to reduce the rate of bad results, but said rate is very low to begin with. BTW, for my phones, I am requiring each one to produce 2 matching DC results prior to letting it start first-time-LL-test work. I do that via the priment.py script - first time I run it I use [i] ./*py -d -t 0 -T DoubleCheck -u [uid] -p [pwd] [/i] which creates a 2-entry worktodo.ini file. Then on subsequent invocations (whenever the device in question completes an LL-job of either kind) I use [i] ./*py -d -t 0 -T SmallestAvail -u [uid] -p [pwd] [/i] ("-d" enables debug, causing the script to provide some basic informational printing of work-submit and assignment-fetch. "-t 0" means run in single-shot once-only mode, as opposed to the automated every-6-hours mode which is the default.) Thanks for the thumbs-up on the Readme page - it's a continuing struggle to srike a balance between providing enough info but not overwhelming the new user, I rely on user feedback to help me maintain said balance. |
[QUOTE=ewmayer;518828]But there's really no reason to wait for the PRP version - all my various ARM-based crunching devices, including the ones you'd think might be exceedingly unreliable under 24/7 load, the broke-o-phones, have proven to be superbly reliable. The one tweak I made to the v18 release based on the multiple phone DCs was to add handling for a particular data corruption error those appear more prone to than PC-style platforms, but even the DCs interrupted by said error (which proceeded to the next worktodo.in file entry before I added the error-handling logic) later completed with first-test-matching results. PRP+Gerbicz is expected to reduce the rate of bad results, but said rate is very low to begin with.
BTW, for my phones, I am requiring each one to produce 2 matching DC results prior to letting it start first-time-LL-test work. I do that via the priment.py script - first time I run it I use [i] ./*py -d -t 0 -T DoubleCheck -u [uid] -p [pwd] [/i] which creates a 2-entry worktodo.ini file. Then on subsequent invocations (whenever the device in question completes an LL-job of either kind) I use [i] ./*py -d -t 0 -T SmallestAvail -u [uid] -p [pwd] [/i] ("-d" enables debug, causing the script to provide some basic informational printing of work-submit and assignment-fetch. "-t 0" means run in single-shot once-only mode, as opposed to the automated every-6-hours mode which is the default.) Thanks for the thumbs-up on the Readme page - it's a continuing struggle to srike a balance between providing enough info but not overwhelming the new user, I rely on user feedback to help me maintain said balance.[/QUOTE] To remove any ambiguity in my terse remarks; I plan to run a first time LL test straight off the bat on the a73 (maybe leaving the a53 free for day-to-day desktop use and running some of my own code). By the time that test has finished, your Gerbicz code should be ready, hopefully. |
[QUOTE=paulunderwood;518829]To remove any ambiguity in my terse remarks; I plan to run a first time LL test straight off the bat on the a73 (maybe leaving the a53 free for day-to-day desktop use and running some of my own code). By the time that test has finished, your Gerbicz code should be ready, hopefully.[/QUOTE]
Ah, gotcha - looking forward to seeing your 4xa73 timings and error levels. |
[QUOTE=ewmayer;518831]Ah, gotcha - looking forward to seeing your 4xa73 timings and error levels.[/QUOTE]
[CODE]18.0 2048 msec/iter = 39.39 ROE[avg,max] = [0.003125000, 0.375000000] radices = 128 16 16 32 0 0 0 0 0 0 2304 msec/iter = 44.54 ROE[avg,max] = [0.002785714, 0.375000000] radices = 144 16 16 32 0 0 0 0 0 0 2560 msec/iter = 48.91 ROE[avg,max] = [0.002387312, 0.281250000] radices = 160 16 16 32 0 0 0 0 0 0 2816 msec/iter = 55.53 ROE[avg,max] = [0.002627232, 0.312500000] radices = 176 16 16 32 0 0 0 0 0 0 3072 msec/iter = 61.21 ROE[avg,max] = [0.002651786, 0.375000000] radices = 192 16 16 32 0 0 0 0 0 0 3328 msec/iter = 65.64 ROE[avg,max] = [0.002812500, 0.312500000] radices = 208 16 16 32 0 0 0 0 0 0 3584 msec/iter = 70.97 ROE[avg,max] = [0.002535714, 0.281250000] radices = 224 16 16 32 0 0 0 0 0 0 3840 msec/iter = 76.91 ROE[avg,max] = [0.002471819, 0.281250000] radices = 240 16 16 32 0 0 0 0 0 0 4096 msec/iter = 81.47 ROE[avg,max] = [0.002280134, 0.281250000] radices = 256 16 16 32 0 0 0 0 0 0 4608 msec/iter = 94.10 ROE[avg,max] = [0.002476144, 0.281250000] radices = 288 16 16 32 0 0 0 0 0 0 5120 msec/iter = 107.07 ROE[avg,max] = [0.003209821, 0.375000000] radices = 320 16 16 32 0 0 0 0 0 0 5632 msec/iter = 129.73 ROE[avg,max] = [0.002598214, 0.312500000] radices = 176 32 32 16 0 0 0 0 0 0 6144 msec/iter = 141.46 ROE[avg,max] = [0.002475446, 0.281250000] radices = 192 32 32 16 0 0 0 0 0 0 6656 msec/iter = 152.57 ROE[avg,max] = [0.002642857, 0.312500000] radices = 208 32 32 16 0 0 0 0 0 0 7168 msec/iter = 164.32 ROE[avg,max] = [0.002260045, 0.250000000] radices = 224 32 32 16 0 0 0 0 0 0 7680 msec/iter = 178.51 ROE[avg,max] = [0.002350551, 0.281250000] radices = 240 32 32 16 0 0 0 0 0 0[/CODE] Having installed Debian+Mate and now running a browser and ssh sessions (in particular to Skype on Intel box), top shows 400-401% usage. :smile: |
[QUOTE=paulunderwood;518888][CODE]18.0
2048 msec/iter = 39.39 ROE[avg,max] = [0.003125000, 0.375000000] radices = 128 16 16 32 0 0 0 0 0 0 2304 msec/iter = 44.54 ROE[avg,max] = [0.002785714, 0.375000000] radices = 144 16 16 32 0 0 0 0 0 0 2560 msec/iter = 48.91 ROE[avg,max] = [0.002387312, 0.281250000] radices = 160 16 16 32 0 0 0 0 0 0 2816 msec/iter = 55.53 ROE[avg,max] = [0.002627232, 0.312500000] radices = 176 16 16 32 0 0 0 0 0 0 3072 msec/iter = 61.21 ROE[avg,max] = [0.002651786, 0.375000000] radices = 192 16 16 32 0 0 0 0 0 0 3328 msec/iter = 65.64 ROE[avg,max] = [0.002812500, 0.312500000] radices = 208 16 16 32 0 0 0 0 0 0 3584 msec/iter = 70.97 ROE[avg,max] = [0.002535714, 0.281250000] radices = 224 16 16 32 0 0 0 0 0 0 3840 msec/iter = 76.91 ROE[avg,max] = [0.002471819, 0.281250000] radices = 240 16 16 32 0 0 0 0 0 0 4096 msec/iter = 81.47 ROE[avg,max] = [0.002280134, 0.281250000] radices = 256 16 16 32 0 0 0 0 0 0 4608 msec/iter = 94.10 ROE[avg,max] = [0.002476144, 0.281250000] radices = 288 16 16 32 0 0 0 0 0 0 5120 msec/iter = 107.07 ROE[avg,max] = [0.003209821, 0.375000000] radices = 320 16 16 32 0 0 0 0 0 0 5632 msec/iter = 129.73 ROE[avg,max] = [0.002598214, 0.312500000] radices = 176 32 32 16 0 0 0 0 0 0 6144 msec/iter = 141.46 ROE[avg,max] = [0.002475446, 0.281250000] radices = 192 32 32 16 0 0 0 0 0 0 6656 msec/iter = 152.57 ROE[avg,max] = [0.002642857, 0.312500000] radices = 208 32 32 16 0 0 0 0 0 0 7168 msec/iter = 164.32 ROE[avg,max] = [0.002260045, 0.250000000] radices = 224 32 32 16 0 0 0 0 0 0 7680 msec/iter = 178.51 ROE[avg,max] = [0.002350551, 0.281250000] radices = 240 32 32 16 0 0 0 0 0 0[/CODE] Having installed Debian+Mate and now running a browser and ssh sessions (in particular to Skype on Intel box), top shows 400-401% usage. :smile:[/QUOTE] Thanks - are those from the official v18 release binary or the advance-peek v19 one I attached above? In particular the 5632K timing points to the former - it's why I added leading radix 352 = 11*32 to v19. Are you running a full-blown GIMPS assignment now? I'd be interested in seeing a sample of the typical checkpoint timing line from the p*.stat file. (And if you started said run using v18, what effect ctrl-c and restart using the v19 binary has - you could just use the above cfg-file for that, unless you are testing @2816K or 5632K, in which case radix-352 will likely help, timing-wise). Will you be using the N2 for development work of your own? |
[QUOTE=ewmayer;518897]Thanks - are those from the official v18 release binary or the advance-peek v19 one I attached above? In particular the 5632K timing points to the former - it's why I added leading radix 352 = 11*32 to v19.
Are you running a full-blown GIMPS assignment now? I'd be interested in seeing a sample of the typical checkpoint timing line from the p*.stat file. (And if you started said run using v18, what effect ctrl-c and restart using the v19 binary has - you could just use the above cfg-file for that, unless you are testing @2816K or 5632K, in which case radix-352 will likely help, timing-wise). Will you be using the N2 for development work of your own?[/QUOTE] I am running v18 with a work fetched from PrimeNet -- first time LL. [CODE]INFO: no restart file found...starting run from scratch. M9141xxxxx: using FFT length 5120K = 5242880 8-byte floats, initial residue shift count = 6324947 this gives an average 17.435125541687011 bits per digit Using complex FFT radices 320 16 16 32 [Jun 08 19:23:42] M914xxxxx Iter# = 10000 [ 0.01% complete] clocks = 00:18:13.72 7 [109.3727 msec/iter] Res64: 771472D5BD75657A. AvgMaxErr = 0.062755114. MaxErr = 0.085937500. Residue shift count = 24468007. [Jun 08 19:41:24] M914xxxxx Iter# = 20000 [ 0.02% complete] clocks = 00:17:40.61 6 [106.0617 msec/iter] Res64: 6A3AB4D6D38D864F. AvgMaxErr = 0.062874775. MaxErr = 0.093750000. Residue shift count = 41145087. [Jun 08 19:59:16] M914xxxxx Iter# = 30000 [ 0.03% complete] clocks = 00:17:50.35 4 [107.0355 msec/iter] Res64: 6A42564A06E2381C. AvgMaxErr = 0.062869088. MaxErr = 0.085937500. Residue shift count = 28935869. [Jun 08 20:17:42] M914xxxxx Iter# = 40000 [ 0.04% complete] clocks = 00:18:21.24 7 [110.1247 msec/iter] Res64: 4F6CF208BAE55456. AvgMaxErr = 0.062931570. MaxErr = 0.085937500. Residue shift count = 80180192. [Jun 08 20:35:48] M914xxxxx Iter# = 50000 [ 0.05% complete] clocks = 00:18:03.30 9 [108.3310 msec/iter] Res64: C9F7EABB3783A435. AvgMaxErr = 0.062861539. MaxErr = 0.085937500. Residue shift count = 84044778. [Jun 08 20:55:01] M914xxxxx Iter# = 60000 [ 0.07% complete] clocks = 00:19:10.38 9 [115.0389 msec/iter] Res64: DB534D6782A1A68E. AvgMaxErr = 0.062953338. MaxErr = 0.093750000. Residue shift count = 15509133. [Jun 08 21:12:44] M914xxxxx Iter# = 70000 [ 0.08% complete] clocks = 00:17:38.15 5 [105.8155 msec/iter] Res64: 82734EA25CAAE188. AvgMaxErr = 0.062877951. MaxErr = 0.085937500. Residue shift count = 59420150. [Jun 08 21:30:15] M914xxxxx Iter# = 80000 [ 0.09% complete] clocks = 00:17:26.34 2 [104.6343 msec/iter] Res64: B3DDB30E8EA490B8. AvgMaxErr = 0.062913278. MaxErr = 0.093750000. Residue shift count = 36766103. [Jun 08 21:47:43] M914xxxxx Iter# = 90000 [ 0.10% complete] clocks = 00:17:25.12 5 [104.5126 msec/iter] Res64: 9F0AAFBA656E82BC. AvgMaxErr = 0.062897896. MaxErr = 0.093750000. Residue shift count = 87580274. [/CODE] I will be running my own code from time to time -- it is early days. I will install Pari/GP, GMP etc. Why do you ask? I will aim not to interfere with Mlucas on the a73. |
[QUOTE=paulunderwood;518900]I will be running my own code from time to time -- it is early days. I will install Pari/GP, GMP etc. Why do you ask? I will aim not to interfere with Mlucas on the a73.[/QUOTE]
Mainly just interested to hear from other folks who may be doing ARM-oriented code development. Thanks for the data! |
[QUOTE=ewmayer;518901]Mainly just interested to hear from other folks who may be doing ARM-oriented code development. Thanks for the data![/QUOTE]
Well, I am running some pure C code on a R-pi 3B+ (under 64 bit Gentoo). All I had to do was compile what I had developed on x86_64 again and it runs perfectly. The pi is throttling at 80C. It runs and runs. But I have yet to delve into vector operations. Nor have I looked at ARM assembly. My efforts on Intel YASM assembly were no better than C for timings, but I did realize that swapping around Jacobi Symbol and Fermat PRP (based on Euler Phi function) tests greatly improved throughput by an amazing 300%. I think this was partially due to the Jacob Symbol test uses the % operator whereas my Fermat PRP test does not. p.s. How often should the client report into PrimeNet? |
[QUOTE=ewmayer;518816][[b]Edit:[/b] I just realized that the cfg-file data I copied below are from my "advance peek" v19 binary, so you might as well just use that one from the get-go - cf. the attachment at bottom.]
[/QUOTE] The attachment seems to be a source code package though, not a precompiled binary... |
1 Attachment(s)
[QUOTE=nomead;518914]The attachment seems to be a source code package though, not a precompiled binary...[/QUOTE]
My mistake - the previous attachment is indeed the source tarball from which I built the advanced-look v19 binary I mentioned - here is the latter (SIMD binary only) together with the resulting cfg-file running 4-threaded on the a73 core of my N2 (-cpu 2:5), and a copy the primenet.py script, i.e. all the files needed for someone to get up and running on a fresh device with similar CPU. md5sum = 7b5850114211d68234c391ff1a3d62eb: |
[QUOTE=ewmayer;518963]My mistake - the previous attachment is indeed the source tarball from which I built the advanced-look v19 binary I mentioned - here is the latter (SIMD binary only) together with the resulting cfg-file running 4-threaded on the a73 core of my N2 (-cpu 2:5), and a copy the primenet.py script, i.e. all the files needed for someone to get up and running on a fresh device with similar CPU. md5sum = 7b5850114211d68234c391ff1a3d62eb:[/QUOTE]
Would I only need to drop the above 3 files into the directory currently running v18 to get v19 running? |
[QUOTE=paulunderwood;518965]Would I only need to drop the above 3 files into the directory currently running v18 to get v19 running?[/QUOTE]
Yep! And if you 'fg' your current v18 job and ctrl-c it, it should write a checkpoint file for the current iteration, i.e. you won't lose any work due to a partially-completed checkpoint interval. That signal-handling code is still not fully reliable across all Linux platforms, but I haven't encountered any issues with it on my various ARM devices, including the N2. |
[QUOTE=ewmayer;518966]Yep! And if you 'fg' your current v18 job and ctrl-c it, it should write a checkpoint file for the current iteration, i.e. you won't lose any work due to a partially-completed checkpoint interval. That signal-handling code is still not fully reliable across all Linux platforms, but I haven't encountered any issues with it on my various ARM devices, including the N2.[/QUOTE]
v18 was running in the foreground in a terminal. ^C did not kill it. I killed the process from top. Thanks! Oops. It was running in the background and I forgot to do "fg" :redface: |
Paul, did you notice any change in timing/ROE-levels after switching your current LL test to the v19 build?
|
[QUOTE=ewmayer;519130]Paul, did you notice any change in timing/ROE-levels after switching your current LL test to the v19 build?[/QUOTE]
The Av/MaxROE seem about the same -- but is difficult to say by inspection. The timings have improved from about 106.7ms to 103.5ms per iteration -- minimum values. The N2 run is at 2.67% -- that is after 3 days. I have a patient nature! I have no number crunching on the a53 -- I just use it for desktop work. I am running Skype form an Intel box with [c]ssh -X[/c] over 100Mbs soon to be upgraded to 1000Mbs. |
Cool - 3% speedup translates to 3 days sooner. Though note this is within the level of one-run-to-the-next timing variability, which in my experience can be as much as 5%. But 3% is roughly what I saw on average when I switched all my Galaxy S7s to v19 a few weeks ago.
Yah, this kind of ARM micro-PC hardware is not for the results-greedy, in my case it helps to have a whole mess of such devices (Odroid C2 and N2, 12 cellphones, plus my Intel 2-core Broadwell NUC running Mlucas avx2 build) patiently and quietly working away. |
[QUOTE=ewmayer;519142]Cool - 3% speedup translates to 3 days sooner. Though note this is within the level of one-run-to-the-next timing variability, which in my experience can be as much as 5%. But 3% is roughly what I saw on average when I switched all my Galaxy S7s to v19 a few weeks ago.
Yah, this kind of ARM micro-PC hardware is not for the results-greedy, in my case it helps to have a whole mess of such devices (Odroid C2 and N2, 12 cellphones, plus my Intel 2-core Broadwell NUC running Mlucas avx2 build) patiently and quietly working away.[/QUOTE] Plus I have none of this on my Arm machines: [code]bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds [/code] :grin: |
The good news is that when the N2 is idle the iteration time is ~102.5 ms per iteration.
A little disconcerting is that I occasionally get a crash of the [i]open[/i] tab in FireFox-esr and when this happens the Max ROE jumps to 0.125. Is this a software phenomenon or hardware related? |
[QUOTE=paulunderwood;519534]The good news is that when the N2 is idle the iteration time is ~102.5 ms per iteration.
A little disconcerting is that I occasionally get a crash of the [i]open[/i] tab in FireFox-esr and when this happens the Max ROE jumps to 0.125. Is this a software phenomenon or hardware related?[/QUOTE] Would you be so kind as to post your p*.stat file and list at least one approximate iteration interval where the phenomenon you describe occurred? After successfully completing a pair of DCs, the first-set-up of my S7 compute-o-phones has been crunching away on an exponent ~87M for several weeks. As I noted previously the quad-core Snapdragon CPU in the S7 is roughly equal to the 4xa73 portion of the N2. This exponent is sufficiently close to the upper limit for 4608K FFT that on a half-dozen occasions it's hit ROE = 0.4375, causing it to restart from the most-recent savefile and resume @5120K, with a resulting 10-15% performance hit, from 95-100ms/iter @4608K to 112-117ms/iter @5120K. Whenever I see such a jump has occurred, I kill the run and force resumption @4608K via "nohup nice ./Mlucas -cpu 0:3 -fftlen 4608 &", but this points out another desirable feature-add for the next release ... each run gathers running statistics about such FFT-length-increasing ROEs, and if their frequency is sufficiently low, the program should simply re-do the iteration interval of the ROE >= 0.4375 occurrence (assuming same is repeatable-on-retry-at-same-length, which check is already done first) at the next-larger FFT length and then drop back down to the original default length. |
[code]
[Jun 18 15:01:01] M914***** Iter# = 7950000 [ 8.70% complete] clocks = 00:17:53.663 [107.3663 msec/iter] Res64: 4D01D71A8D0A3597. AvgMaxErr = 0.079252374. MaxErr = 0.109375000. Residue shift count = 26123447. [Jun 18 15:18:54] M914***** Iter# = 7960000 [ 8.71% complete] clocks = 00:17:51.167 [107.1167 msec/iter] Res64: 1C64720358E8B548. AvgMaxErr = 0.079291315. MaxErr = 0.117187500. Residue shift count = 27945342. [Jun 18 15:37:26] M914***** Iter# = 7970000 [ 8.72% complete] clocks = 00:18:29.727 [110.9727 msec/iter] Res64: 4E201EDBE693F80B. AvgMaxErr = 0.079358624. MaxErr = 0.109375000. Residue shift count = 11087326. [Jun 18 15:55:43] M914***** Iter# = 7980000 [ 8.73% complete] clocks = 00:18:11.301 [109.1301 msec/iter] Res64: 46A3EF697E0B7999. AvgMaxErr = 0.079285226. MaxErr = 0.109375000. Residue shift count = 35877845. [Jun 18 16:13:34] M914***** Iter# = 7990000 [ 8.74% complete] clocks = 00:17:42.310 [106.2310 msec/iter] Res64: CA50AFF28616551E. AvgMaxErr = 0.079323541. MaxErr = 0.109375000. Residue shift count = 54051585. [Jun 18 16:31:38] M914***** Iter# = 8000000 [ 8.75% complete] clocks = 00:17:58.692 [107.8693 msec/iter] Res64: E86895B693971CA9. AvgMaxErr = 0.079318311. MaxErr = 0.109375000. Residue shift count = 39944003. [Jun 18 16:49:33] M914***** Iter# = 8010000 [ 8.76% complete] clocks = 00:17:50.335 [107.0336 msec/iter] Res64: 638A7E1A7236BE15. AvgMaxErr = 0.079304779. MaxErr = 0.109375000. Residue shift count = 61849059. [Jun 18 17:07:03] M914***** Iter# = 8020000 [ 8.77% complete] clocks = 00:17:26.925 [104.6925 msec/iter] Res64: 35938AA44D32EB85. AvgMaxErr = 0.079322753. MaxErr = 0.109375000. Residue shift count = 4298838. [Jun 18 17:24:56] M914***** Iter# = 8030000 [ 8.78% complete] clocks = 00:17:48.605 [106.8605 msec/iter] Res64: EC952DCA2CBF36C9. AvgMaxErr = 0.079391460. [b]MaxErr = 0.125000000.[/b] Residue shift count = 13515903. [Jun 18 17:42:58] M914***** Iter# = 8040000 [ 8.80% complete] clocks = 00:17:59.617 [107.9617 msec/iter] Res64: 841E0FE38CC87672. AvgMaxErr = 0.079387793. MaxErr = 0.109375000. Residue shift count = 8677722. [Jun 18 18:01:06] M914***** Iter# = 8050000 [ 8.81% complete] clocks = 00:18:05.056 [108.5057 msec/iter] Res64: DEC2F07E46B82700. AvgMaxErr = 0.079358276. MaxErr = 0.109375000. Residue shift count = 13727072. [Jun 18 18:18:55] M914***** Iter# = 8060000 [ 8.82% complete] clocks = 00:17:46.838 [106.6838 msec/iter] Res64: 7F1DD640CD6E9735. AvgMaxErr = 0.079310718. MaxErr = 0.109375000. Residue shift count = 29121784. [Jun 18 18:36:39] M914***** Iter# = 8070000 [ 8.83% complete] clocks = 00:17:41.997 [106.1997 msec/iter] Res64: 3E50C0CB0075B71D. AvgMaxErr = 0.079339769. MaxErr = 0.109375000. Residue shift count = 86075204. [Jun 18 18:54:28] M914***** Iter# = 8080000 [ 8.84% complete] clocks = 00:17:46.468 [106.6469 msec/iter] Res64: 6CD28733082AA7C5. AvgMaxErr = 0.079340430. MaxErr = 0.109375000. Residue shift count = 43325490. [Jun 18 19:15:57] M914***** Iter# = 8090000 [ 8.85% complete] clocks = 00:21:26.050 [128.6051 msec/iter] Res64: 0AC48D068E32D8F1. AvgMaxErr = 0.079287323. [b]MaxErr = 0.125000000[/b]. Residue shift count = 32575426. [Jun 18 19:42:27] M914***** Iter# = 8100000 [ 8.86% complete] clocks = 00:26:28.569 [158.8570 msec/iter] Res64: 71329024D00CDCCC. AvgMaxErr = 0.079361060. MaxErr = 0.109375000. Residue shift count = 67915085. [Jun 18 20:09:00] M914***** Iter# = 8110000 [ 8.87% complete] clocks = 00:26:30.682 [159.0682 msec/iter] Res64: D74F2FF344960F67. AvgMaxErr = 0.079284901. MaxErr = 0.109375000. Residue shift count = 61052515. [Jun 18 20:35:27] M914***** Iter# = 8120000 [ 8.88% complete] clocks = 00:26:24.374 [158.4374 msec/iter] Res64: C11BB80026F99829. AvgMaxErr = 0.079352954. MaxErr = 0.109375000. Residue shift count = 30064815. [/code] I guess the a73 was being used a little whilst I was just askyping. |
[QUOTE=ewmayer;519550]Thanks - those strike me as unremarkable ... the maxROEs are generally quite granular and have fractions of form p/q, with q a small power of 2, as attractors - in your case the dominant maxROE value is 0.109375 = 7/64, with occasional excursions up to 0.125 = 1/8. The variable-length DWT-weights-chain code in v19 which gave you the ~3% speedup also appears more prone to these kinds of blips.
I suggest when next you happen to have a decently long interval without Firefox open-tab-crashage, check the overlapping stat-file entries and see if any 0.125 maxROEs occurred. If there really were a data corruption issue at work here, one would expect it to lead to random anomalous ROEs, not ones which are precisely in a consistent range.[/QUOTE] Okay. I checked the logs in /var/log and there was nothing reported that indicated a hardware issue. (I too have pressed "edit" rather than "quote" as a mod :wink:) |
Bit of an hit-edit-instead-of-reply-glitch on my part there ... I accidentally overwrote Paul's ROE-snippet post with my reply. He restored the former, but what you see in his followup is his reply to my now-gone reply-in-place-of-his-original-post. Clear as mud!
|
I had another browser tab crash with a change in Max ROE. I upgraded the system which upgraded dbus amongst other things -- maybe it will be more stable now.
|
If the browser crashes really are impacting the Mlucas run in the manner you describe, that is bizarre - might be a good idea to get someone to do an early DC of that exponent, or use Prime95/mprime on some x86 hardware of your own to do the secondary run yourself, with interimResidues (or whatever that Prime95 option is called) enabled.
This sort of thing is why I started all my cellphone nodes on DCs before switching them to first-time LL. |
[QUOTE=ewmayer;519556]If the browser crashes really are impacting the Mlucas run in the manner you describe, that is bizarre - might be a good idea to get someone to do an early DC of that exponent, or use Prime95/mprime on some x86 hardware of your own to do the secondary run yourself, with interimResidues (or whatever that Prime95 option is called) enabled.
This sort of thing is why I started all my cellphone nodes on DCs before switching them to first-time LL.[/QUOTE] I downloaded, built and configured Mlucas_v18. It will take less than a day and half to see I have all matching RES64. [code] Mlucas 18.0 http://www.mersenneforum.org/mayer/README.html INFO: testing qfloat routines... CPU Family = x86_64, OS = Linux, 64-bit Version, compiled with Gnu C [or other compatible], Version 9.1.0. INFO: Build uses AVX2 instruction set. INFO: Using inline-macro form of MUL_LOHI64. INFO: MLUCAS_PATH is set to "" INFO: using 64-bit-significand form of floating-double rounding constant for scalar-mode DNINT emulation. Setting DAT_BITS = 10, PAD_BITS = 2 INFO: testing IMUL routines... INFO: System has 4 available processor cores. INFO: testing FFT radix tables... Set affinity for the following 4 cores: 0.1.2.3. NTHREADS = 4 looking for worktodo.ini file... worktodo.ini file found...checking next exponent in range... INFO: primary restart file p914***** not found...looking for secondary... INFO: no restart file found...starting run from scratch. mers_mod_square: Init threadpool of 4 threads radix16_dif_dit_pass pfetch_dist = 4096 radix16_wrapper_square: pfetch_dist = 4096 Using 4 threads in carry step [/code] [code] M9141*****: using FFT length 5120K = 5242880 8-byte floats, initial residue shift count = 6324947 this gives an average 17.435125541687011 bits per digit Using complex FFT radices 160 16 32 32 [Jun 19 01:18:48] M914***** Iter# = 10000 [ 0.01% complete] clocks = 00:02:25.632 [ 14.5632 msec/iter] Res64: 771472D5BD75657A. AvgMaxErr = 0.061904874. MaxErr = 0.078125000. Residue shift count = 24468007. [/code] |
Paul, is that avx2 system hyperthreaded? If so, you may get a modest boost from using -cpu 0:7.
|
[QUOTE=ewmayer;519579]Paul, is that avx2 system hyperthreaded? If so, you may get a modest boost from using -cpu 0:7.[/QUOTE]
It is, but I have it switched off in BIOS. I can't be bothered with monitor changes and reboots etc. I have matching RES64 1.66% / 10% -- when the AVX2 machine catches up with the N2 and everything seems okay. I will kill v18 and let the box go on with its other task: LLR Incidentally, no more browser tab crashes have occurred -- which might be due to the OS upgrade. |
I have matching RES64 at 10%. So I will let the N2 carry on with completing the LL test -- I have stopped the AVX2 run. (Again: no browser tab crashes after the OS upgrade.) :smile:
|
That's good news - I'll be interested to hear if you get any more ROE = 0.125 occurrences with the browser-crash issue fixed.
|
[QUOTE=ewmayer;519668]That's good news - I'll be interested to hear if you get any more ROE = 0.125 occurrences with the browser-crash issue fixed.[/QUOTE]
Yes, it is giving some of those ROE. A new problem: the iteration times have jumped to 109.3 when idle, up from 102.5. The only difference is I am running from a @reboot command in crontab. I will renice it 0 and see if that makes an improvement I just touched the heatsink -- it is hot. I will have to think of a way to cool it. Maybe there is a USB fan I can put under it. |
I've see timing variations of that order of magnitude from one run invocation to the next - seems to be partly a function of whatever memory mapping one gets at run start. Also seen poor timings one day, then same run cranking along 4-5% faster the next day, without any change in user-perspective system load or ambient temps. (I only use my odroid for Mlucas runs, plus occasional builds and print jobs - it's the only one of my devices which supports my bought-this-year cheapie HP printer, as I have no Windows devices and my Mac's OS is older than said printer requires.
Re. heat, I had my N2 (before I shipped it off in Paul L's direction) sitting on my desk, no fan air, just relying on convective air wafting around the heat sink, which did get quite warm to the touch. The heat sink is large enough that any small amount of moving air should suffice, whether secondary air from a nearby device's exhaust fan or an open window. |
[QUOTE=ewmayer;519678]I've see timing variations of that order of magnitude from one run invocation to the next - seems to be partly a function of whatever memory mapping one gets at run start. Also seen poor timings one day, then same run cranking along 4-5% faster the next day, without any change in user-perspective system load or ambient temps. (I only use my odroid for Mlucas runs, plus occasional builds and print jobs - it's the only one of my devices which supports my bought-this-year cheapie HP printer, as I have no Windows devices and my Mac's OS is older than said printer requires.
Re. heat, I had my N2 (before I shipped it off in Paul L's direction) sitting on my desk, no fan air, just relying on convective air wafting around the heat sink, which did get quite warm to the touch. The heat sink is large enough that any small amount of moving air should suffice, whether secondary air from a nearby device's exhaust fan or an open window.[/QUOTE] I've ordered a to-arrive-tomorrow USB fan to put underneath it. My C integer only program heats my R-pi 3B+ up to 80C and it throttles. So, your floating point is going to heat up the N2! Although the heat sink is huge, it is quite hot to touch if I leave my finger on it for 3 or more seconds -- it must have taken several days for the heat to build in it. When the fan is installed I will try to ascertain cause and effect. |
[QUOTE=paulunderwood;519682]... it is quite hot to touch if I leave my finger on it for 3 or more seconds ...[/QUOTE]To me that would appear to be ~55C. Generally I find that 60C equates to about 1 second for most people before the pain dictates some quick action. And 50C equates to no desperate need to remove the fingers.
But the heat-sink temperature isn't really too important. It is the junction temps that need monitoring. If those are less than 90C then everything should be good without the need for throttling. If there is some setting in the OS it might be worth adjusting it to get better throttling behaviour. |
[QUOTE=retina;519686]To me that would appear to be ~55C. Generally I find that 60C equates to about 1 second for most people before the pain dictates some quick action. And 50C equates to no desperate need to remove the fingers.
But the heat-sink temperature isn't really too important. It is the junction temps that need monitoring. If those are less than 90C then everything should be good without the need for throttling. If there is some setting in the OS it might be worth adjusting it to get better throttling behaviour.[/QUOTE] I installed "cpulimit" and limited the mlucas_v19 process to 300% -- was 400% The temperature of the heat sink should start to drop. "lm-sensors" does not work on the N2., but I will do the finger test in a few hours and if it seems to have cooled significantly. I am hoping the new fan cures the overheating problem. I have seen videos where the first thing done with a new N2 is to replace the thermal paste, but I don't plan to do that. |
[QUOTE=paulunderwood;519682]My C integer only program heats my R-pi 3B+ up to 80C and it throttles. So, your floating point is going to heat up the N2! [/QUOTE]
The Raspberry Pi (3B+ and 3A+) will actually start throttling the clock at 60C, unless set otherwise (temp_soft_limit in config.txt). Very light throttling in the beginning, 1.4 GHz to 1.2 GHz and dropping the core voltage a bit at the same time. Then there's a hard limit at 85C. What I've found is that without a heat sink, there is no hope at all to run at full load without throttling. And with Mlucas using the NEON ASIMD instructions, there needs to be some airflow over that heatsink. Not much though. I've stacked five RPI 3A+ with 14x14x14 mm heatsinks and have two of these stacks side by side. Without a fan, they will go up to about 75C. With a single undervolted 120mm fan (12V fan running at 5V) cooling the whole 2x5 stack, all of them stay comfortable at around 45-50C. But the N2 is very much a different beast. The Raspberry Pi processor is made on a 40 nm process which is not that great anymore for anything that needs to do things instead of mostly sitting idle. Smaller process nodes may have bigger leakage currents i.e. idle consumption, but the operating power consumption is still getting smaller from node to node. And the N2 processor, Amlogic S922X, is made in 12 nm, so even with the bigger A73 cores, the stock heatsink should be enough for most normal loads. Of course, again the vector instructions generate more heat than normal float. For temperature monitoring take a peek in [C]/sys/class/thermal[/C] . This varies a bit from device to device, but there should at least be one directory called [C]thermal_zone0[/C] below that. Maybe more, [C]thermal_zone1[/C] etc. depending on the chip. Anyway, in each of those directories, there is a file called [C]temp[/C] that tells the temperature (probably scaled by x1000) and another file called [C]type[/C] that tells what's being measured. |
Thanks to nomead for the pointers to /sys/class/thermal/thermal_zone0/temp. With cpulimit set to 300% the temperature was 49.5C, without (i.e. 400%) it was 61.1. On fitting the USB fan the temperature dropped rapidly to 44.1C. However it seems there is no impact on iterations time and that Ernst's observation about different runs have different timings seems to be true. To wit I will be restarting mlucas_v19 until I get it back to 102.5ms instead of 109.5ms.
|
[QUOTE=paulunderwood;519712]Thanks to nomead for the pointers to /sys/class/thermal/thermal_zone0/temp. With cpulimit set to 300% the temperature was 49.5C, without (i.e. 400%) it was 61.1. On fitting the USB fan the temperature dropped rapidly to 44.1C. However it seems there is no impact on iterations time and that Ernst's observation about different runs have different timings seems to be true. To wit I will be restarting mlucas_v19 until I get it back to 102.5ms instead of 109.5ms.[/QUOTE]
Yeah, crank it up - no guts, no glory! Like I said, my low-tech way to detect throttling is simply watching the timings, but of course that is reliable within the context of a single run, i.e. in the absence of run-to-run timing variability, which I've found to be much larger on my ARM devices than my x86 ones. The sys/class/thermal tip is definitely useful in terms of more-precise tracking of thermals, what do the various entries mean? On my Odroid C2, sys/class/thermal has links to cooling_device0-3 and thermal_zone0-1 ... don't know what the former are about; the latter 2 have 'temp' files with entries 68000 (which presumably means 68C) and 2000. Paul, does your N2 have separate thermal_zone dirs for the a53 and a73 portions of the chip? |
[QUOTE=ewmayer;519735]Yeah, crank it up - no guts, no glory! Like I said, my low-tech way to detect throttling is simply watching the timings, but of course that is reliable within the context of a single run, i.e. in the absence of run-to-run timing variability, which I've found to be much larger on my ARM devices than my x86 ones. The sys/class/thermal tip is definitely useful in terms of more-precise tracking of thermals, what do the various entries mean? On my Odroid C2, sys/class/thermal has links to cooling_device0-3 and thermal_zone0-1 ... don't know what the former are about; the latter 2 have 'temp' files with entries 68000 (which presumably means 68C) and 2000. Paul, does your N2 have separate thermal_zone dirs for the a53 and a73 portions of the chip?[/QUOTE]
It has two zones -- which I guess is for each chip: [code] cat /sys/class/thermal/thermal_zone0/temp 45100 [/code] [code] cat /sys/class/thermal/thermal_zone1/temp 42900 [/code] On this run I am getting a minimum of 107.7ms/iteration. I guess there will be no use using all six cores. Will the a53 slow down the a73? |
[QUOTE=paulunderwood;519736]It has two zones -- which I guess is for each chip:
[code] cat /sys/class/thermal/thermal_zone0/temp 45100 [/code] [code] cat /sys/class/thermal/thermal_zone1/temp 42900 [/code] On this run I am getting a minimum of 107.7ms/iteration. I guess there will be no use using all six cores. Will the a53 slow down the a73?[/QUOTE] OK, so it seems the thermal_zone1 on my C2 is just a placeholder, since there is no second CPU on the die. Re. running on both - yes, that will slow down your a73 run, but the total throughput will still be more than a73-only, albeit only modestly. On my N2, here were the approximate numbers from my runs: o Dual-core a53 is ~1/4 the FLOPS of the 4-core a73, with respect to each running in standalone mode at max throughput (2-threaded Mlucas on a53, 4-threaded on a73); o Running on both a53 and a73 slows each down by ~10% versus that-CPU-only running, thus total throughput is equivalent to roughly 0.9*(4+1) = 4.5 a73 CPUs. |
[QUOTE=ewmayer;519741]OK, so it seems the thermal_zone1 on my C2 is just a placeholder, since there is no second CPU on the die. Re. running on both - yes, that will slow down your a73 run, but the total throughput will still be more than a73-only, albeit only modestly. On my N2, here were the approximate numbers from my runs:
o Dual-core a53 is ~1/4 the FLOPS of the 4-core a73, with respect to each running in standalone mode at max throughput (2-threaded Mlucas on a53, 4-threaded on a73); o Running on both a53 and a73 slows each down by ~10% versus that-CPU-only running, thus total throughput is equivalent to roughly 0.9*(4+1) = 4.5 a73 CPUs.[/QUOTE] I'd like to try all six. Do I need to rerun the self test/configuration? |
[QUOTE=paulunderwood;519742]I'd like to try all six. Do I need to rerun the self test/configuration?[/QUOTE]
You should only have to re-run self-tests for the a53 CPU ... in my experience using both CPUs increases the absolute timings but does not appreciably affect the optimal-FFT-parameters for each CPU, so you can run the a53 self-test without pausing your a73 job. Just make sure to run the a53 timings in a separate dir, so as to create a separate mlucas.cfg file for that CPU, and use -s m -cpu 0:1, obviously. Even using just the default 100 iterations per timing sample that self-test will take a while due to the puniness of the a53 CPU, probably a couple of hours. If I still had my N2 I'd shoot you the a53-specific mlucas.cfg file, but I didn't save copies of those config files before shipping the unit off to Paul L., I only copied the .stat and savefiles for the 2 jobs I was running on it - the a73 LL-test is now queued up on my Intel NUC, and the a53 DC on my Odroid C2. |
[QUOTE=ewmayer;519746]You should only have to re-run self-tests for the a53 CPU ... in my experience using both CPUs increases the absolute timings but does not appreciably affect the optimal-FFT-parameters for each CPU, so you can run the a53 self-test without pausing your a73 job. Just make sure to run the a53 timings in a separate dir, so as to create a separate mlucas.cfg file for that CPU, and use -s m -cpu 0:1, obviously. Even using just the default 100 iterations per timing sample that self-test will take a while due to the puniness of the a53 CPU, probably a couple of hours.
If I still had my N2 I'd shoot you the a53-specific mlucas.cfg file, but I didn't save copies of those config files before shipping the unit off to Paul L., I only copied the .stat and savefiles for the 2 jobs I was running on it - the a73 LL-test is now queued up on my Intel NUC, and the a53 DC on my Odroid C2.[/QUOTE] Thanks but I have just now configured it for one LL on all six cores. [c]top[/c] is showing ~470%. Of course this will be more when I shutdown the browser for the night! |
[QUOTE=paulunderwood;519750]Thanks but I have just now configured it for one LL on all six cores. [c]top[/c] is showing ~470%. Of course this will be more when I shutdown the browser for the night![/QUOTE]
Hmm ... 470% sounds great, but what do the actual run timings show? In my experience, running one job across both CPUs is worse than just using the a73 CPU. |
[QUOTE=ewmayer;519752]Hmm ... 470% sounds great, but what do the actual run timings show? In my experience, running one job across both CPUs is worse than just using the a73 CPU.[/QUOTE]
136.4894ms, but that was not idle -- I reverted it back to -cpu 2:5 (with the correct configuration file). The CPUs dropped 3C each during the 6 core run. I am hoping for a great run-to-run value. |
I don't know where to post so here we go :smile:
I ran mlucas_v19 (posted on this thread IIRC) on a board with a Qualcomm SD845 (Cortex-A75) [URL]https://www.96boards.org/product/rb3-platform/[/URL] There's no heatsink and no fan so obviously the CPU throttled during the run. setaffinity was failing; it works for 0:3 but these are the little CPU; 4:7 fails for some reason. I'll take a look at that. [code]./mlucas -s m -iters 100 -cpu 4:7 18.0 2048 msec/iter = 44.03 ROE[avg,max] = [0.255691964, 0.312500000] radices = 128 16 16 32 0 0 0 0 0 0 2304 msec/iter = 54.09 ROE[avg,max] = [0.247767857, 0.312500000] radices = 288 16 16 16 0 0 0 0 0 0 2560 msec/iter = 55.49 ROE[avg,max] = [0.236635045, 0.281250000] radices = 160 16 16 32 0 0 0 0 0 0 2816 msec/iter = 65.30 ROE[avg,max] = [0.223967634, 0.250000000] radices = 44 32 32 32 0 0 0 0 0 0 3072 msec/iter = 67.83 ROE[avg,max] = [0.270591518, 0.312500000] radices = 192 16 16 32 0 0 0 0 0 0 3328 msec/iter = 74.63 ROE[avg,max] = [0.224553571, 0.281250000] radices = 208 8 8 8 16 0 0 0 0 0 3584 msec/iter = 80.38 ROE[avg,max] = [0.273772321, 0.312500000] radices = 224 16 16 32 0 0 0 0 0 0 3840 msec/iter = 83.62 ROE[avg,max] = [0.249135045, 0.312500000] radices = 240 16 16 32 0 0 0 0 0 0 4096 msec/iter = 91.08 ROE[avg,max] = [0.252901786, 0.281250000] radices = 128 16 32 32 0 0 0 0 0 0 4608 msec/iter = 101.29 ROE[avg,max] = [0.248046875, 0.312500000] radices = 288 32 16 16 0 0 0 0 0 0 5120 msec/iter = 106.82 ROE[avg,max] = [0.235030692, 0.281250000] radices = 160 8 8 16 16 0 0 0 0 0 5632 msec/iter = 129.85 ROE[avg,max] = [0.223102679, 0.250000000] radices = 352 16 16 32 0 0 0 0 0 0 6144 msec/iter = 138.96 ROE[avg,max] = [0.222753906, 0.281250000] radices = 768 16 16 16 0 0 0 0 0 0 6656 msec/iter = 149.21 ROE[avg,max] = [0.271651786, 0.312500000] radices = 208 8 8 16 16 0 0 0 0 0 7168 msec/iter = 146.54 ROE[avg,max] = [0.242801339, 0.312500000] radices = 224 16 32 32 0 0 0 0 0 0 7680 msec/iter = 150.23 ROE[avg,max] = [0.243743025, 0.312500000] radices = 240 16 32 32 0 0 0 0 0 0 [/code]I cross-compiled it myself. |
[QUOTE=ldesnogu;520680]I don't know where to post so here we go :smile:
I ran mlucas_v19 (posted on this thread IIRC) on a board with a Qualcomm SD845 (Cortex-A75) [URL]https://www.96boards.org/product/rb3-platform/[/URL] There's no heatsink and no fan so obviously the CPU throttled during the run. setaffinity was failing; it works for 0:3 but these are the little CPU; 4:7 fails for some reason. I'll take a look at that. [code]./mlucas -s m -iters 100 -cpu 4:7 [snip][/code]I cross-compiled it myself.[/QUOTE] So what about -cpu 4:7 failed for you? The data you posted look OK, only weirdness I see is that 5632K seems slower than expected and the timings from 6144-7680 are all in a narrow range, possibly related to throttling. If you have any way to get some decent airflow over the CPU during your tests, even if it's not a practical one for long-term running, that might be useful, as well as monitoring the temperature data in /sys/class/thermal/thermal_zone*/temp during the run, if those files exist on your system. Oh, what does the cfg-file for -cpu 0:3 look like? |
[QUOTE=ewmayer;520690]So what about -cpu 4:7 failed for you?[/quote]
I get messages like this during the run: [code]sched_setaffinity: Invalid argument[/code] I will see if I can fix this the issue. [quote]The data you posted look OK, only weirdness I see is that 5632K seems slower than expected and the timings from 6144-7680 are all in a narrow range, possibly related to throttling. If you have any way to get some decent airflow over the CPU during your tests, even if it's not a practical one for long-term running, that might be useful, as well as monitoring the temperature data in /sys/class/thermal/thermal_zone*/temp during the run, if those files exist on your system.[/quote] Yeah I will definitely have to fix that throttling issue as I intend on using the board for benchmarking. [quote]Oh, what does the cfg-file for -cpu 0:3 look like?[/QUOTE] Here it is: [code] 2048 msec/iter = 71.01 ROE[avg,max] = [0.238755580, 0.312500000] radices = 256 16 16 16 0 0 0 0 0 0 2304 msec/iter = 79.91 ROE[avg,max] = [0.247767857, 0.312500000] radices = 288 16 16 16 0 0 0 0 0 0 2560 msec/iter = 87.87 ROE[avg,max] = [0.236635045, 0.281250000] radices = 160 16 16 32 0 0 0 0 0 0 2816 msec/iter = 98.75 ROE[avg,max] = [0.270312500, 0.375000000] radices = 176 16 16 32 0 0 0 0 0 0 3072 msec/iter = 106.91 ROE[avg,max] = [0.270591518, 0.312500000] radices = 192 16 16 32 0 0 0 0 0 0 3328 msec/iter = 116.82 ROE[avg,max] = [0.252232143, 0.312500000] radices = 208 16 16 32 0 0 0 0 0 0 3584 msec/iter = 123.03 ROE[avg,max] = [0.273772321, 0.312500000] radices = 224 16 16 32 0 0 0 0 0 0 3840 msec/iter = 133.95 ROE[avg,max] = [0.249135045, 0.312500000] radices = 240 16 16 32 0 0 0 0 0 0 4096 msec/iter = 139.86 ROE[avg,max] = [0.227650670, 0.250000000] radices = 256 16 16 32 0 0 0 0 0 0 4608 msec/iter = 160.17 ROE[avg,max] = [0.250837054, 0.343750000] radices = 288 16 16 32 0 0 0 0 0 0 5120 msec/iter = 180.82 ROE[avg,max] = [0.296875000, 0.343750000] radices = 320 16 16 32 0 0 0 0 0 0 5632 msec/iter = 196.83 ROE[avg,max] = [0.223102679, 0.250000000] radices = 352 16 16 32 0 0 0 0 0 0 6144 msec/iter = 223.74 ROE[avg,max] = [0.253571429, 0.281250000] radices = 192 16 32 32 0 0 0 0 0 0 6656 msec/iter = 243.88 ROE[avg,max] = [0.232924107, 0.250000000] radices = 208 16 32 32 0 0 0 0 0 0 7168 msec/iter = 259.09 ROE[avg,max] = [0.242801339, 0.312500000] radices = 224 16 32 32 0 0 0 0 0 0 7680 msec/iter = 280.20 ROE[avg,max] = [0.243743025, 0.312500000] radices = 240 16 32 32 0 0 0 0 0 0[/code] By the way results.txt contains these 3 lines: [code]FATAL: iter = 14; nonzero exit carry in radix384_ditN_cy_dif1 - input wordsize may be too small. FATAL: iter = 14; nonzero exit carry in radix384_ditN_cy_dif1 - input wordsize may be too small. FATAL: iter = 12; nonzero exit carry in radix384_ditN_cy_dif1 - input wordsize may be too small.[/code] |
[QUOTE=ldesnogu;520680]...
setaffinity was failing; it works for 0:3 but these are the little CPU; 4:7 fails for some reason. I'll take a look at that. ...[/QUOTE] I've encountered much weirdness when it comes to big.LITTLE CPUs in phones, maybe your problem is related. There are different ways the cores can be configured and presented to the user. Try "cat /proc/cpuinfo" under no load, under load, and many times in quick succession (to see how the SoC reacts to going from no load to some load). All the issues boiled down to the number of cores presented to the user being dynamic and seemingly done differently by every manufacturer and gen to gen. If a core is not present in cpuinfo when mlucas tries to do something with that specific core (like set affinity) it usually fails. It looks like your chip is a Snapdragon 845 with DynamIQ, the successor to big.LITTLE ( [URL]https://en.wikipedia.org/wiki/DynamIQ#Scheduling[/URL] ). Never tested one so it'll be interesting to see what quirks it has. If it does need some caressing hopefully it's as simple as running two jobs on 0:3 and letting the SoC do the load balancing for you. |
[QUOTE=M344587487;520718]I've encountered much weirdness when it comes to big.LITTLE CPUs in phones, maybe your problem is related. There are different ways the cores can be configured and presented to the user. Try "cat /proc/cpuinfo" under no load, under load, and many times in quick succession (to see how the SoC reacts to going from no load to some load).[/QUOTE]
I tried that but /proc/cpuinfo always displays the same result. [quote]All the issues boiled down to the number of cores presented to the user being dynamic and seemingly done differently by every manufacturer and gen to gen. If a core is not present in cpuinfo when mlucas tries to do something with that specific core (like set affinity) it usually fails. It looks like your chip is a Snapdragon 845 with DynamIQ, the successor to big.LITTLE ( [URL]https://en.wikipedia.org/wiki/DynamIQ#Scheduling[/URL] ). Never tested one so it'll be interesting to see what quirks it has. If it does need some caressing hopefully it's as simple as running two jobs on 0:3 and letting the SoC do the load balancing for you.[/quote]Time permitting I will investigate some more. Thanks! I set the governor to performance and got this: [code] 2048 msec/iter = 39.31 ROE[avg,max] = [0.255691964, 0.312500000] radices = 128 16 16 32 0 0 0 0 0 0 2304 msec/iter = 47.31 ROE[avg,max] = [0.228906250, 0.265625000] radices = 36 32 32 32 0 0 0 0 0 0 2560 msec/iter = 48.45 ROE[avg,max] = [0.236635045, 0.281250000] radices = 160 16 16 32 0 0 0 0 0 0 2816 msec/iter = 62.14 ROE[avg,max] = [0.243805804, 0.312500000] radices = 352 16 16 16 0 0 0 0 0 0 3072 msec/iter = 63.29 ROE[avg,max] = [0.217623465, 0.250000000] radices = 48 32 32 32 0 0 0 0 0 0 3328 msec/iter = 70.39 ROE[avg,max] = [0.219866071, 0.250000000] radices = 52 32 32 32 0 0 0 0 0 0 3584 msec/iter = 73.67 ROE[avg,max] = [0.213588170, 0.265625000] radices = 56 32 32 32 0 0 0 0 0 0 3840 msec/iter = 78.74 ROE[avg,max] = [0.249135045, 0.312500000] radices = 240 16 16 32 0 0 0 0 0 0 4096 msec/iter = 81.66 ROE[avg,max] = [0.252901786, 0.281250000] radices = 128 16 32 32 0 0 0 0 0 0 4608 msec/iter = 92.61 ROE[avg,max] = [0.299107143, 0.375000000] radices = 144 16 32 32 0 0 0 0 0 0 5120 msec/iter = 100.73 ROE[avg,max] = [0.234685407, 0.281250000] radices = 160 16 32 32 0 0 0 0 0 0 5632 msec/iter = 118.50 ROE[avg,max] = [0.246205357, 0.312500000] radices = 176 8 8 16 16 0 0 0 0 0 6144 msec/iter = 127.28 ROE[avg,max] = [0.253571429, 0.281250000] radices = 192 16 32 32 0 0 0 0 0 0 6656 msec/iter = 139.66 ROE[avg,max] = [0.271651786, 0.312500000] radices = 208 8 8 16 16 0 0 0 0 0 7168 msec/iter = 144.24 ROE[avg,max] = [0.242801339, 0.312500000] radices = 224 16 32 32 0 0 0 0 0 0 7680 msec/iter = 154.26 ROE[avg,max] = [0.243743025, 0.312500000] radices = 240 16 32 32 0 0 0 0 0 0[/code]It's faster than the run above except for 7680: [code] perfor ratio 2048 39.31 44.03 1.12 2304 47.31 54.09 1.14 2560 48.45 55.49 1.15 2816 62.14 65.3 1.05 3072 63.29 67.83 1.07 3328 70.39 74.63 1.06 3584 73.67 80.38 1.09 3840 78.74 83.62 1.06 4096 81.66 91.08 1.12 4608 92.61 101.29 1.09 5120 100.73 106.82 1.06 5632 118.5 129.85 1.10 6144 127.28 138.96 1.09 6656 139.66 149.21 1.07 7168 144.24 146.54 1.02 7680 154.26 150.23 0.97 [/code]I saw the temperature going above 95 degrees in some of the thermal zones (the kernel exposes more than 70 thermal zones, hard to know what is what). With nothing running max temp is 75 degrees. I checked frequency a few times, and it always was 2.8 GHz on the fast chips and 1.8 on the slowest. Given the ratio above it's possible that part of the last two sizes were on the slower CPU. |
Another run:
[code] 2048 msec/iter = 35.88 ROE[avg,max] = [0.250446429, 0.281250000] radices = 1024 32 32 0 0 0 0 0 0 0 2304 msec/iter = 42.26 ROE[avg,max] = [0.228906250, 0.265625000] radices = 36 32 32 32 0 0 0 0 0 0 2560 msec/iter = 45.14 ROE[avg,max] = [0.241992188, 0.281250000] radices = 40 32 32 32 0 0 0 0 0 0 2816 msec/iter = 54.73 ROE[avg,max] = [0.223967634, 0.250000000] radices = 44 32 32 32 0 0 0 0 0 0 3072 msec/iter = 55.49 ROE[avg,max] = [0.270591518, 0.312500000] radices = 192 16 16 32 0 0 0 0 0 0 3328 msec/iter = 66.12 ROE[avg,max] = [0.252232143, 0.312500000] radices = 208 16 16 32 0 0 0 0 0 0 3584 msec/iter = 70.29 ROE[avg,max] = [0.273772321, 0.312500000] radices = 224 16 16 32 0 0 0 0 0 0 3840 msec/iter = 74.62 ROE[avg,max] = [0.249135045, 0.312500000] radices = 240 16 16 32 0 0 0 0 0 0 4096 msec/iter = 80.18 ROE[avg,max] = [0.252901786, 0.281250000] radices = 128 16 32 32 0 0 0 0 0 0 4608 msec/iter = 92.21 ROE[avg,max] = [0.299107143, 0.375000000] radices = 144 16 32 32 0 0 0 0 0 0 5120 msec/iter = 100.66 ROE[avg,max] = [0.234685407, 0.281250000] radices = 160 16 32 32 0 0 0 0 0 0 5632 msec/iter = 115.37 ROE[avg,max] = [0.223102679, 0.250000000] radices = 352 16 16 32 0 0 0 0 0 0 6144 msec/iter = 127.71 ROE[avg,max] = [0.253571429, 0.281250000] radices = 192 16 32 32 0 0 0 0 0 0 6656 msec/iter = 140.97 ROE[avg,max] = [0.232924107, 0.250000000] radices = 208 16 32 32 0 0 0 0 0 0 7168 msec/iter = 146.96 ROE[avg,max] = [0.242801339, 0.312500000] radices = 224 16 32 32 0 0 0 0 0 0 7680 msec/iter = 156.33 ROE[avg,max] = [0.243743025, 0.312500000] radices = 240 16 32 32 0 0 0 0 0 0[/code]Crazy variability against previous run. I monitored frequency and temps every second and frequency didn't change. [code]while true; do sleep 1; cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq | tr '\012' ' '; cat /sys/class/thermal/thermal_zone*/temp 2> /dev/null | sort -n | uniq | tail -1 | tr '\012' ' '; date "+%H:%M:%S"; done [/code] After some research it's not obvious I will be able to properly cool the beast since it's using PoP RAM (RAM chip is stacked on the SoC). Add to that the Linux on it is useless (cross compilation is required) and I'm starting to wonder if that board isn't completely useless for my needs :sad: |
| All times are UTC. The time now is 04:28. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.