mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Mlucas (https://www.mersenneforum.org/forumdisplay.php?f=118)
-   -   Mlucas v19.1 available (https://www.mersenneforum.org/showthread.php?t=26483)

ewmayer 2021-02-17 23:02

@Lorenzo:

Thanks for all the timings, that is very useful. I will add a note recommending '-cpu 0:7' for M1 users to the README. Might you have a wall-plug wattmeter you can use to compare (under-load - idle) wattages for those 2 systems, for whatever FFT lengths they are using to run their current GIMPS assignments? I'd be curious to get some idea regarding relative performance-per-watt.

In any event, happy crunching!

Lorenzo 2021-02-18 09:36

[QUOTE=ewmayer;571851]@Lorenzo:

Thanks for all the timings, that is very useful. I will add a note recommending '-cpu 0:7' for M1 users to the README. Might you have a wall-plug wattmeter you can use to compare (under-load - idle) wattages for those 2 systems, for whatever FFT lengths they are using to run their current GIMPS assignments? I'd be curious to get some idea regarding relative performance-per-watt.

In any event, happy crunching![/QUOTE]


Sorry, but I don't have the wall-plug wattmeter.

ldesnogu 2021-02-18 14:32

[QUOTE=Lorenzo;571838]I just want to share my experience with Apple M1 CPU.[/QUOTE]
Thanks for the results!


What machine is that? Mini, MBA or MBP? I'd expect MBA to throttle given the noise my MBP does when running 4 threads :smile:

Lorenzo 2021-02-18 14:46

[QUOTE=ldesnogu;571898]Thanks for the results!


What machine is that? Mini, MBA or MBP? I'd expect MBA to throttle given the noise my MBP does when running 4 threads :smile:[/QUOTE]
Hi. This is an Apple Mac mini M1.

ewmayer 2021-02-18 21:08

[QUOTE=Lorenzo;571901]Hi. This is an Apple Mac mini M1.[/QUOTE]

Some pics via Amazon.com [url=https://www.amazon.com/Apple-Mini-Chip-256GB-Storage/dp/B08N5PHB83]here[/url]. The pic of the rear side shows an exhaust vent similar to those on my Intel NUCs - Lorenzo, are there intake vents on the bottom?

Had a closer look at some of your -cpu 0:7 timings ... the only obvious anomaly is at the very end, 26624K FFT, the timing for that is anomalously large. This is mainly for down-the-road as this FFT is way beyond the GIMPS wavefront, but looking at the pattern of best-timing FFT radices for the rows above it, this machine seems to really like larger leading FFT radices (call the leftmost radix r0) and combos of the form r0,16,32,32 and r0,32,32,32. At this 26M FFT length, there is no such available combo because I did not (yet) implement a radix-416 FFT-pass routine, thus instead of 416,32,32,32 the best we can do is 208,16,16,16,16, which means an extra pass through the data each iteration.

If you wold be so kind, could you pause any running jobs (I believe 'kill -STOP [pid]' works on MacOS same as Linux, then 'kill -CONT [pid]' to resume, and either 'pidof' or 'top' will give you the process ID), and re-run just the 26M-FFT timing? Here is how:
[i]
./Mlucas -iters 1000 -cpu 0:7 -fftlen 26624 >& test.log
[/i]
After that completes, paste the new last-line that got appended to mlucas.cfg as a result, and please attach the test.log . Thanks.

Lorenzo 2021-02-18 22:12

1 Attachment(s)
[QUOTE=ewmayer;571948]... the only obvious anomaly is at the very end, 26624K FFT, the timing for that is anomalously large.[/QUOTE]
Actually when I posted results for i-8100 I did cut the line with timings for 26624. I thought the same, that it was some heavy load from background application when I did the benchmark.
So I just tried to make redoing on i-8100 (OS Oracle Linux 7) and I see the same: big jump from ~69 to ~141 msec exactly for 26624.
So I think it's not a platform specific issue.
[CODE] 18432 msec/iter = 53.16 ROE[avg,max] = [0.236424995, 0.281250000] radices = 288 32 32 32 0 0 0 0 0 0
20480 msec/iter = 62.92 ROE[avg,max] = [0.237479031, 0.312500000] radices = 320 32 32 32 0 0 0 0 0 0
22528 msec/iter = 66.03 ROE[avg,max] = [0.228240432, 0.312500000] radices = 352 32 32 32 0 0 0 0 0 0
24576 msec/iter = 69.49 ROE[avg,max] = [0.261424145, 0.343750000] radices = 768 16 32 32 0 0 0 0 0 0
26624 msec/iter = 144.86 ROE[avg,max] = [0.272725339, 0.343750000] radices = 52 16 16 32 32 0 0 0 0 0
26624 msec/iter = 141.33 ROE[avg,max] = [0.272368315, 0.375000000] radices = 52 16 16 32 32 0 0 0 0 0
24576 msec/iter = 68.38 ROE[avg,max] = [0.261777142, 0.359375000] radices = 768 16 32 32 0 0 0 0 0 0
26624 msec/iter = 141.06 ROE[avg,max] = [0.272368315, 0.375000000] radices = 52 16 16 32 32 0 0 0 0 0[/CODE]

Lorenzo 2021-02-19 07:45

Full test for large fft on i3-8100:
[CODE] 8192 msec/iter = 19.61 ROE[avg,max] = [0.272732764, 0.375000000] radices = 256 32 32 16 0 0 0 0 0 0
9216 msec/iter = 23.07 ROE[avg,max] = [0.239072536, 0.312500000] radices = 288 16 32 32 0 0 0 0 0 0
10240 msec/iter = 27.33 ROE[avg,max] = [0.271287049, 0.375000000] radices = 320 32 32 16 0 0 0 0 0 0
11264 msec/iter = 28.73 ROE[avg,max] = [0.271818621, 0.375000000] radices = 352 32 32 16 0 0 0 0 0 0
12288 msec/iter = 32.18 ROE[avg,max] = [0.259570478, 0.312500000] radices = 768 16 16 32 0 0 0 0 0 0
13312 msec/iter = 36.87 ROE[avg,max] = [0.254703482, 0.312500000] radices = 208 32 32 32 0 0 0 0 0 0
14336 msec/iter = 39.92 ROE[avg,max] = [0.234003331, 0.296875000] radices = 224 32 32 32 0 0 0 0 0 0
15360 msec/iter = 42.65 ROE[avg,max] = [0.245504855, 0.312500000] radices = 960 16 16 32 0 0 0 0 0 0
16384 msec/iter = 44.85 ROE[avg,max] = [0.272600878, 0.375000000] radices = 256 32 32 32 0 0 0 0 0 0
18432 msec/iter = 52.67 ROE[avg,max] = [0.236424995, 0.281250000] radices = 288 32 32 32 0 0 0 0 0 0
20480 msec/iter = 61.48 ROE[avg,max] = [0.237479031, 0.312500000] radices = 320 32 32 32 0 0 0 0 0 0
22528 msec/iter = 65.70 ROE[avg,max] = [0.228240432, 0.312500000] radices = 352 32 32 32 0 0 0 0 0 0
24576 msec/iter = 68.40 ROE[avg,max] = [0.261424145, 0.343750000] radices = 768 16 32 32 0 0 0 0 0 0
26624 msec/iter = 141.14 ROE[avg,max] = [0.272725339, 0.343750000] radices = 52 16 16 32 32 0 0 0 0 0
28672 msec/iter = 106.92 ROE[avg,max] = [0.252042892, 0.312500000] radices = 224 16 16 16 16 0 0 0 0 0
30720 msec/iter = 114.56 ROE[avg,max] = [0.288327813, 0.375000000] radices = 240 16 16 16 16 0 0 0 0 0
32768 msec/iter = 101.20 ROE[avg,max] = [0.238132941, 0.312500000] radices = 1024 16 32 32 0 0 0 0 0 0
36864 msec/iter = 137.73 ROE[avg,max] = [0.265349020, 0.312500000] radices = 288 16 16 16 16 0 0 0 0 0
40960 msec/iter = 161.66 ROE[avg,max] = [0.251543120, 0.312500000] radices = 320 16 16 16 16 0 0 0 0 0
45056 msec/iter = 170.85 ROE[avg,max] = [0.244248223, 0.312500000] radices = 352 16 16 16 16 0 0 0 0 0
49152 msec/iter = 153.04 ROE[avg,max] = [0.255821747, 0.343750000] radices = 768 32 32 32 0 0 0 0 0 0
53248 msec/iter = 293.04 ROE[avg,max] = [0.262757669, 0.312500000] radices = 52 16 32 32 32 0 0 0 0 0
57344 msec/iter = 270.81 ROE[avg,max] = [0.265370288, 0.375000000] radices = 224 16 16 16 32 0 0 0 0 0
61440 msec/iter = 204.05 ROE[avg,max] = [0.246525841, 0.343750000] radices = 960 32 32 32 0 0 0 0 0 0[/CODE]

LaurV 2021-02-19 09:20

1 Attachment(s)
[QUOTE=Lorenzo;571883]Sorry, but I don't have the wall-plug wattmeter.[/QUOTE]
Whaaaaattttt? :shock:
You must buy one, try Aliexpress, [URL="https://www.aliexpress.com/item/1005001462804114.html"]here[/URL], you can even smart-measure some parameters of your "wife" with it! (whatever that means :razz:)
[ATTACH]24350[/ATTACH]
(photo for posterity, in case they change it; to be clear, this is a joke, I do not promote nor endorse that product, but "one click operation for wife" I would buy any time!).

ewmayer 2021-02-19 19:53

[QUOTE=Lorenzo;571993]Full test for large fft on i3-8100:
[snip][/QUOTE]

Thanks - that is very helpful as far as future roadmapping goes - so for selected of the FFT lengths:
o 26M: r0 = 208 needs to be made more accurate (rejected in your test.log due to excess ROE), also need r0 = 416;
o 28,30M: Need r0 = 448,480;
o 36,40,44,52,56M: Need r0 = 576,640,704,832,896.

bayanne 2021-05-10 13:12

Apple M1 on iMac
 
My new iMac will be arriving shortly, and I hope to compile and run MLucas v19.1 on it.

However where do I find the simple version on how to compile on ARM cpu?

Uncwilly 2021-05-10 14:03

[QUOTE=LaurV;571998]Whaaaaattttt? :shock:
You must buy one, try Aliexpress,[/QUOTE]
Aliexpress wants an image of the front of my government ID. That is a hard nope. If it was located in my country that would be a nope. For a company behind the great firewall it is NOPE[SIZE="4"][SUP]∞[/SUP][/SIZE]


All times are UTC. The time now is 05:45.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.