![]() |
|
|
#177 | |
|
"Composite as Heck"
Oct 2017
2×52×19 Posts |
Paper specs:
Quote:
Under load core frequencies: Code:
pi@NanoPi-NEO4:/sys/devices/system/cpu$ sudo cat /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_cur_freq 1416000 1416000 1416000 1416000 1800000 1800000 1024K: Code:
pi@NanoPi-NEO4:~/mlucas_v17.1$ cat big/p20000047.stat INFO: no restart file found...starting run from scratch. M20000047: using FFT length 1024K = 1048576 8-byte floats. this gives an average 19.073531150817871 bits per digit Using complex FFT radices 16 8 16 16 16 [Jan 14 15:07:24] M20000047 Iter# = 10000 [ 0.05% complete] clocks = 00:09:50.612 [ 0.0591 sec/iter] Res64: 9A2AF744DE060296. AvgMaxErr = 0.220881351. MaxErr = 0.312500000. [Jan 14 15:17:14] M20000047 Iter# = 20000 [ 0.10% complete] clocks = 00:09:49.950 [ 0.0590 sec/iter] Res64: D99B4D255F5C0C74. AvgMaxErr = 0.221486410. MaxErr = 0.312500000. pi@NanoPi-NEO4:~/mlucas_v17.1$ cat little/p20000047.stat INFO: no restart file found...starting run from scratch. M20000047: using FFT length 1024K = 1048576 8-byte floats. this gives an average 19.073531150817871 bits per digit Using complex FFT radices 64 32 16 16 [Jan 14 15:09:47] M20000047 Iter# = 10000 [ 0.05% complete] clocks = 00:12:14.866 [ 0.0735 sec/iter] Res64: 9A2AF744DE060296. AvgMaxErr = 0.230923241. MaxErr = 0.312500000. [Jan 14 15:22:02] M20000047 Iter# = 20000 [ 0.10% complete] clocks = 00:12:14.665 [ 0.0735 sec/iter] Res64: D99B4D255F5C0C74. AvgMaxErr = 0.231118560. MaxErr = 0.343750000. Code:
pi@NanoPi-NEO4:~/mlucas_v17.1$ cat big/p49005071.stat INFO: no restart file found...starting run from scratch. M49005071: using FFT length 2560K = 2621440 8-byte floats. this gives an average 18.693951034545897 bits per digit Using complex FFT radices 40 32 32 32 [Jan 14 15:52:39] M49005071 Iter# = 10000 [ 0.02% complete] clocks = 00:23:46.485 [ 0.1426 sec/iter] Res64: 8E7E56F23C735CF2. AvgMaxErr = 0.245389789. MaxErr = 0.375000000. pi@NanoPi-NEO4:~/mlucas_v17.1$ cat little/p49005071.stat INFO: no restart file found...starting run from scratch. M49005071: using FFT length 2560K = 2621440 8-byte floats. this gives an average 18.693951034545897 bits per digit Using complex FFT radices 160 8 8 8 16 M49005071 Roundoff warning on iteration 181, maxerr = 0.437500000000 M49005071 Roundoff warning on iteration 4240, maxerr = 0.437500000000 M49005071 Roundoff warning on iteration 8110, maxerr = 0.437500000000 [Jan 14 15:58:12] M49005071 Iter# = 10000 [ 0.02% complete] clocks = 00:29:18.200 [ 0.1758 sec/iter] Res64: 8E7E56F23C735CF2. AvgMaxErr = 0.282188452. MaxErr = 0.437500000. Code:
pi@NanoPi-NEO4:~/mlucas_v17.1$ cat big/p87068977.stat INFO: no restart file found...starting run from scratch. M87068977: using FFT length 4608K = 4718592 8-byte floats. this gives an average 18.452321582370335 bits per digit Using complex FFT radices 144 32 32 16 [Jan 14 16:48:14] M87068977 Iter# = 10000 [ 0.01% complete] clocks = 00:36:37.950 [ 0.2198 sec/iter] Res64: 13BB5C9DDF0CD3D6. AvgMaxErr = 0.256021777. MaxErr = 0.375000000. [Jan 14 17:24:52] M87068977 Iter# = 20000 [ 0.02% complete] clocks = 00:36:37.133 [ 0.2197 sec/iter] Res64: C43069A17478EF46. AvgMaxErr = 0.256765161. MaxErr = 0.343750000. pi@NanoPi-NEO4:~/mlucas_v17.1$ cat little/p87068977.stat INFO: no restart file found...starting run from scratch. M87068977: using FFT length 4608K = 4718592 8-byte floats. this gives an average 18.452321582370335 bits per digit Using complex FFT radices 288 32 16 16 [Jan 14 17:04:38] M87068977 Iter# = 10000 [ 0.01% complete] clocks = 00:53:00.131 [ 0.3180 sec/iter] Res64: 13BB5C9DDF0CD3D6. AvgMaxErr = 0.249227525. MaxErr = 0.375000000. [Jan 14 17:57:39] M87068977 Iter# = 20000 [ 0.02% complete] clocks = 00:52:59.939 [ 0.3180 sec/iter] Res64: C43069A17478EF46. AvgMaxErr = 0.249523122. MaxErr = 0.343750000. Code:
pi@NanoPi-NEO4:~/mlucas_v17.1$ cat big/p143472073.stat INFO: no restart file found...starting run from scratch. M143472073: using FFT length 7680K = 7864320 8-byte floats. this gives an average 18.243417485555014 bits per digit Using complex FFT radices 240 16 32 32 [Jan 14 19:06:56] M143472073 Iter# = 10000 [ 0.01% complete] clocks = 01:02:23.065 [ 0.3743 sec/iter] Res64: C7B182C990710B46. AvgMaxErr = 0.241344566. MaxErr = 0.343750000. [Jan 14 20:09:15] M143472073 Iter# = 20000 [ 0.01% complete] clocks = 01:02:18.024 [ 0.3738 sec/iter] Res64: 181335759D5BB711. AvgMaxErr = 0.241826627. MaxErr = 0.375000000. [Jan 14 21:11:38] M143472073 Iter# = 30000 [ 0.02% complete] clocks = 01:02:21.381 [ 0.3741 sec/iter] Res64: 126EDB1E9B6580C4. AvgMaxErr = 0.241919198. MaxErr = 0.343750000. pi@NanoPi-NEO4:~/mlucas_v17.1$ cat little/p143472073.stat INFO: no restart file found...starting run from scratch. M143472073: using FFT length 7680K = 7864320 8-byte floats. this gives an average 18.243417485555014 bits per digit Using complex FFT radices 240 32 32 16 [Jan 14 19:44:16] M143472073 Iter# = 10000 [ 0.01% complete] clocks = 01:39:44.262 [ 0.5984 sec/iter] Res64: C7B182C990710B46. AvgMaxErr = 0.235340244. MaxErr = 0.343750000. [Jan 14 21:23:58] M143472073 Iter# = 20000 [ 0.01% complete] clocks = 01:39:39.004 [ 0.5979 sec/iter] Res64: 181335759D5BB711. AvgMaxErr = 0.236132050. MaxErr = 0.375000000. Code:
pi@NanoPi-NEO4:~/mlucas_v17.1$ cat big/p332220523.stat INFO: no restart file found...starting run from scratch. M332220523: using FFT length 18432K = 18874368 8-byte floats. this gives an average 17.601676676008438 bits per digit Using complex FFT radices 288 32 32 32 [Jan 15 00:29:12] M332220523 Iter# = 10000 [ 0.00% complete] clocks = 02:57:36.013 [ 1.0656 sec/iter] Res64: 1A313D709BFA6663. AvgMaxErr = 0.186972266. MaxErr = 0.250000000. M332220523 Roundoff warning on iteration 11467, maxerr = 0.500000000000 Retrying iteration interval to see if roundoff error is reproducible. Restarting M332220523 at iteration = 10000. Res64: 1A313D709BFA6663 M332220523: using FFT length 18432K = 18874368 8-byte floats. this gives an average 17.601676676008438 bits per digit Retry of iteration interval with fatal roundoff error was successful. [Jan 15 03:52:50] M332220523 Iter# = 20000 [ 0.01% complete] clocks = 02:57:28.763 [ 1.0649 sec/iter] Res64: 73DC7A5C8B839081. AvgMaxErr = 0.187356934. MaxErr = 0.250000000. [Jan 15 06:50:22] M332220523 Iter# = 30000 [ 0.01% complete] clocks = 02:57:28.523 [ 1.0649 sec/iter] Res64: B928CD22434EEC7C. AvgMaxErr = 0.187289062. MaxErr = 0.281250000. [Jan 15 09:47:49] M332220523 Iter# = 40000 [ 0.01% complete] clocks = 02:57:24.003 [ 1.0644 sec/iter] Res64: 307ECB47139AEB31. AvgMaxErr = 0.187450000. MaxErr = 0.250000000. pi@NanoPi-NEO4:~/mlucas_v17.1$ cat little/p332220523.stat INFO: no restart file found...starting run from scratch. M332220523: using FFT length 18432K = 18874368 8-byte floats. this gives an average 17.601676676008438 bits per digit Using complex FFT radices 288 32 32 32 [Jan 15 03:04:59] M332220523 Iter# = 10000 [ 0.00% complete] clocks = 05:33:22.437 [ 2.0002 sec/iter] Res64: 1A313D709BFA6663. AvgMaxErr = 0.186969141. MaxErr = 0.250000000. [Jan 15 08:38:04] M332220523 Iter# = 20000 [ 0.01% complete] clocks = 05:32:58.179 [ 1.9978 sec/iter] Res64: 73DC7A5C8B839081. AvgMaxErr = 0.187339746. MaxErr = 0.250000000. Code:
1024K 32.73 ms/it 2560K 78.73 ms/it 4608K 129.93 ms/it 7680K 230.12 ms/it 18432K 694.90 ms/it Code:
pi@NanoPi-NEO4:~/mlucas_v17.1$ cat big/p87068977.stat INFO: no restart file found...starting run from scratch. M87068977: using FFT length 4608K = 4718592 8-byte floats. this gives an average 18.452321582370335 bits per digit Using complex FFT radices 144 32 32 16 [Jan 15 11:18:17] M87068977 Iter# = 10000 [ 0.01% complete] clocks = 00:37:42.276 [ 0.2262 sec/iter] Res64: 13BB5C9DDF0CD3D6. AvgMaxErr = 0.256021777. MaxErr = 0.375000000. [Jan 15 11:55:59] M87068977 Iter# = 20000 [ 0.02% complete] clocks = 00:37:41.853 [ 0.2262 sec/iter] Res64: C43069A17478EF46. AvgMaxErr = 0.256765161. MaxErr = 0.343750000. pi@NanoPi-NEO4:~/mlucas_v17.1$ cat little/p49005071.stat INFO: no restart file found...starting run from scratch. M49005071: using FFT length 2560K = 2621440 8-byte floats. this gives an average 18.693951034545897 bits per digit Using complex FFT radices 160 8 8 8 16 M49005071 Roundoff warning on iteration 181, maxerr = 0.437500000000 M49005071 Roundoff warning on iteration 4240, maxerr = 0.437500000000 M49005071 Roundoff warning on iteration 8110, maxerr = 0.437500000000 [Jan 15 11:07:39] M49005071 Iter# = 10000 [ 0.02% complete] clocks = 00:27:03.713 [ 0.1624 sec/iter] Res64: 8E7E56F23C735CF2. AvgMaxErr = 0.282188452. MaxErr = 0.437500000. [Jan 15 11:34:45] M49005071 Iter# = 20000 [ 0.04% complete] clocks = 00:27:05.668 [ 0.1626 sec/iter] Res64: 6CD0428337CA1430. AvgMaxErr = 0.282933594. MaxErr = 0.406250000. M49005071 Roundoff warning on iteration 20522, maxerr = 0.437500000000 M49005071 Roundoff warning on iteration 24876, maxerr = 0.437500000000 M49005071 Roundoff warning on iteration 25658, maxerr = 0.437500000000 [Jan 15 12:01:52] M49005071 Iter# = 30000 [ 0.06% complete] clocks = 00:27:05.741 [ 0.1626 sec/iter] Res64: 106C93EFA0800D81. AvgMaxErr = 0.282969043. MaxErr = 0.437500000. Code:
pi@NanoPi-NEO4:~/mlucas_v17.1$ cat big/p49005071.stat INFO: no restart file found...starting run from scratch. M49005071: using FFT length 2560K = 2621440 8-byte floats. this gives an average 18.693951034545897 bits per digit Using complex FFT radices 40 32 32 32 [Jan 15 12:54:52] M49005071 Iter# = 10000 [ 0.02% complete] clocks = 00:23:16.719 [ 0.1397 sec/iter] Res64: 8E7E56F23C735CF2. AvgMaxErr = 0.245389789. MaxErr = 0.375000000. [Jan 15 13:18:09] M49005071 Iter# = 20000 [ 0.04% complete] clocks = 00:23:16.597 [ 0.1397 sec/iter] Res64: 6CD0428337CA1430. AvgMaxErr = 0.246327235. MaxErr = 0.375000000. [Jan 15 13:41:25] M49005071 Iter# = 30000 [ 0.06% complete] clocks = 00:23:16.016 [ 0.1396 sec/iter] Res64: 106C93EFA0800D81. AvgMaxErr = 0.246085634. MaxErr = 0.375000000. pi@NanoPi-NEO4:~/mlucas_v17.1$ cat little/p87068977.stat INFO: no restart file found...starting run from scratch. M87068977: using FFT length 4608K = 4718592 8-byte floats. this gives an average 18.452321582370335 bits per digit Using complex FFT radices 288 32 16 16 [Jan 15 13:28:13] M87068977 Iter# = 10000 [ 0.01% complete] clocks = 00:56:36.492 [ 0.3396 sec/iter] Res64: 13BB5C9DDF0CD3D6. AvgMaxErr = 0.249227525. MaxErr = 0.375000000. This board is tiny (60x45mm), the SoC is on the underside and the heatsink covers the entire underside. It might not make sense from a power or hardware cost perspective to use these boards for GIMPS, but creating a DIY radiator to heat your house with a cluster of these is tempting. Using the 2200G and 8100 numbers from above, we need ~15.5 NEO4 to match a 2200G, ~21.5 to match an 8100. I don't have a wattmeter handy but online benchmarks indicate power usage is probably give or take 11W per NEO4, a win for x86 by some margin I think. |
|
|
|
|
|
|
#178 |
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
Thanks for the data, M344587487! That is indeed a dramatic falling-off of the A53 throughput once you get above 4M FFT length - on my A53-quad-based C2 I see falloff from the strictly-arithmetic-opcount-based O(n log n) scaling, but nowhere near what you see in your big+little combined tests.
How much did the full kit cost you? And do you have anything in mind to reduce per-node cost for the possible homebuilt cluster you describe? |
|
|
|
|
|
#179 |
|
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/
24·199 Posts |
I'd be interested in building a small 4 or 7 node cluster of devices like this, if it's more cost effective for DC than running mprime on x86-64.
|
|
|
|
|
|
#180 | ||
|
"Composite as Heck"
Oct 2017
2·52·19 Posts |
Quote:
$45 for the board, $6 for the heatsink, $5 postage, £3 for a USB-C cable, £3 for an SD card, I already had a USB-C power source. You can probably get a 10 port USB power hub on ebay for £10-£20 but they're likely terrible efficiency. I'm not certain but have a feeling that the best solution for efficiency would be to mod an ATX PSU or maybe a laptop transformer, massive headache though. You absolutely need the board, heatsink and USB cable per node. You might be able to DIY a shared heatsink on the cheap if you're building a wall of these, you'd just need some thermal pads on the SoCs and the backside of the heatsink can be flat. No need for a switch as there's wifi. I don't think you can network boot these, but even if you could you'd need a switch and would gain nothing (except the reliability of not using SD cards). It may be possible to eliminate the SD card if the OS can be made small enough to run fully in RAM, but it's a bit of an admin nightmare on intital boot and if there's a crash or god forbid power cut. Quote:
I didn't buy a NEO4 heatsink and instead used one salvaged from an old computer. It's pretty ridiculous as it dwarfs the board in all three dimensions. |
||
|
|
|
|
|
#181 |
|
∂2ω=0
Sep 2002
República de California
22×2,939 Posts |
Hmm, so let's think along extreme cost-cutting lines, how low can we go?
o Was hoping that (like Odroid) one might be able to get a bulk discount (say 10%) on these - might be worth e-mailing the mfr to ask. (I just did so.) o Heat sink: One should able to get a properly-sized set of these in either a cheap bulk-pack (I see a 10-pack of 25x25X5mm ones on ebay for ~$10) or in form of some cut-to-desired-length extruded Alu. finned stock. A total per-unit cost < $50 is getting pretty close to "worth a try as a feasibility study" range. I'll be very interested to seeing accurate TPD numbers for these. When one factors in the wattage of the entire package (CPU + rest-of-mobo + PSU + SSD + case fans, what is the typical TPD at the wall outlet for a typical high-bang-per-watt Intel or AMD multicore system? Last fiddled with by ewmayer on 2019-01-15 at 21:43 |
|
|
|
|
|
#182 |
|
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/
318410 Posts |
With a single gold power supply delivering to 4, 4 core boards underclocked and undervolted to match memory, about 67.5 watts at the wall each to give 186 iter/sec for a 4k fft (143 iter/sec for 5k, 295 iter/sec for a 2.5k).
|
|
|
|
|
|
#183 |
|
∂2ω=0
Sep 2002
República de California
2DEC16 Posts |
I started a thread about the RockPi4 on the Odroid forums - Local-expert tkaiser has some useful insights there re. suitable OS images.
|
|
|
|
|
|
#184 |
|
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts |
Just a quick note as a reference.
I've seen many tables and measurements for Raspberry Pi power consumption on the interwebs, but they are somewhat misleading because apparently their "full load" isn't anywhere near what Mlucas and ASIMD is capable of achieving while running. So, Mlucas (still v17.1, sorry) - on Raspberry Pi 3A+ at stock 1.4 GHz. 64-bit Gentoo "sakaki" build, fresher image from December 2018 so that it can run on the 3A as well. There is some slight difference in the firmware, and the old image from June 2018 wouldn't start on 3A+. X disabled, of course. On 1 GB it makes only a small difference in the running time, but 512MB seems to be too small for a graphical environment, even doing nothing. Idle 220 mA (from 5V) Proper full load 840 mA with 880mA spikes, perhaps more, my current meter isn't that fast. What I've seen on the net has generally been in the 400-500 mA range "full load" so take those figures with a grain of salt... As a side note, the 3A+ "should" be as fast as the 3B+, but for whatever reason, is actually a few percent slower. Maybe the smaller memory chip makes that difference? The Elpida chip is marked -1D-F on the end of the device code which means 533 MHz, and the default speed should be 500 MHz, so no difference there. (The memory on my 3B+ cards is -8D-F, by the way, which is 400 MHz so the default setting is already overclocking it a bit!) |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Economic prospects for solar photovoltaic power | cheesehead | Science & Technology | 137 | 2018-06-26 15:46 |
| Which SIMD flag to use for Raspberry Pi | BrainStone | Mlucas | 14 | 2017-11-19 00:59 |
| compiler/assembler optimizations possible? | ixfd64 | Software | 7 | 2011-02-25 20:05 |
| Running 32-bit builds on a Win7 system | ewmayer | Programming | 34 | 2010-10-18 22:36 |
| SIMD string->int | fivemack | Software | 7 | 2009-03-23 18:15 |