mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software > Mlucas

Reply
 
Thread Tools
Old 2021-02-17, 23:02   #12
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3·53·31 Posts
Default

@Lorenzo:

Thanks for all the timings, that is very useful. I will add a note recommending '-cpu 0:7' for M1 users to the README. Might you have a wall-plug wattmeter you can use to compare (under-load - idle) wattages for those 2 systems, for whatever FFT lengths they are using to run their current GIMPS assignments? I'd be curious to get some idea regarding relative performance-per-watt.

In any event, happy crunching!
ewmayer is online now   Reply With Quote
Old 2021-02-18, 09:36   #13
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

2·89 Posts
Default

Quote:
Originally Posted by ewmayer View Post
@Lorenzo:

Thanks for all the timings, that is very useful. I will add a note recommending '-cpu 0:7' for M1 users to the README. Might you have a wall-plug wattmeter you can use to compare (under-load - idle) wattages for those 2 systems, for whatever FFT lengths they are using to run their current GIMPS assignments? I'd be curious to get some idea regarding relative performance-per-watt.

In any event, happy crunching!

Sorry, but I don't have the wall-plug wattmeter.
Lorenzo is offline   Reply With Quote
Old 2021-02-18, 14:32   #14
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

3×181 Posts
Default

Quote:
Originally Posted by Lorenzo View Post
I just want to share my experience with Apple M1 CPU.
Thanks for the results!


What machine is that? Mini, MBA or MBP? I'd expect MBA to throttle given the noise my MBP does when running 4 threads
ldesnogu is offline   Reply With Quote
Old 2021-02-18, 14:46   #15
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

17810 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
Thanks for the results!


What machine is that? Mini, MBA or MBP? I'd expect MBA to throttle given the noise my MBP does when running 4 threads
Hi. This is an Apple Mac mini M1.
Lorenzo is offline   Reply With Quote
Old 2021-02-18, 21:08   #16
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3·53·31 Posts
Default

Quote:
Originally Posted by Lorenzo View Post
Hi. This is an Apple Mac mini M1.
Some pics via Amazon.com here. The pic of the rear side shows an exhaust vent similar to those on my Intel NUCs - Lorenzo, are there intake vents on the bottom?

Had a closer look at some of your -cpu 0:7 timings ... the only obvious anomaly is at the very end, 26624K FFT, the timing for that is anomalously large. This is mainly for down-the-road as this FFT is way beyond the GIMPS wavefront, but looking at the pattern of best-timing FFT radices for the rows above it, this machine seems to really like larger leading FFT radices (call the leftmost radix r0) and combos of the form r0,16,32,32 and r0,32,32,32. At this 26M FFT length, there is no such available combo because I did not (yet) implement a radix-416 FFT-pass routine, thus instead of 416,32,32,32 the best we can do is 208,16,16,16,16, which means an extra pass through the data each iteration.

If you wold be so kind, could you pause any running jobs (I believe 'kill -STOP [pid]' works on MacOS same as Linux, then 'kill -CONT [pid]' to resume, and either 'pidof' or 'top' will give you the process ID), and re-run just the 26M-FFT timing? Here is how:

./Mlucas -iters 1000 -cpu 0:7 -fftlen 26624 >& test.log

After that completes, paste the new last-line that got appended to mlucas.cfg as a result, and please attach the test.log . Thanks.

Last fiddled with by ewmayer on 2021-02-18 at 21:09
ewmayer is online now   Reply With Quote
Old 2021-02-18, 22:12   #17
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

2×89 Posts
Default

Quote:
Originally Posted by ewmayer View Post
... the only obvious anomaly is at the very end, 26624K FFT, the timing for that is anomalously large.
Actually when I posted results for i-8100 I did cut the line with timings for 26624. I thought the same, that it was some heavy load from background application when I did the benchmark.
So I just tried to make redoing on i-8100 (OS Oracle Linux 7) and I see the same: big jump from ~69 to ~141 msec exactly for 26624.
So I think it's not a platform specific issue.
Code:
     18432  msec/iter =   53.16  ROE[avg,max] = [0.236424995, 0.281250000]  radices = 288 32 32 32  0  0  0  0  0  0
     20480  msec/iter =   62.92  ROE[avg,max] = [0.237479031, 0.312500000]  radices = 320 32 32 32  0  0  0  0  0  0
     22528  msec/iter =   66.03  ROE[avg,max] = [0.228240432, 0.312500000]  radices = 352 32 32 32  0  0  0  0  0  0
     24576  msec/iter =   69.49  ROE[avg,max] = [0.261424145, 0.343750000]  radices = 768 16 32 32  0  0  0  0  0  0
     26624  msec/iter =  144.86  ROE[avg,max] = [0.272725339, 0.343750000]  radices =  52 16 16 32 32  0  0  0  0  0
     26624  msec/iter =  141.33  ROE[avg,max] = [0.272368315, 0.375000000]  radices =  52 16 16 32 32  0  0  0  0  0
     24576  msec/iter =   68.38  ROE[avg,max] = [0.261777142, 0.359375000]  radices = 768 16 32 32  0  0  0  0  0  0
     26624  msec/iter =  141.06  ROE[avg,max] = [0.272368315, 0.375000000]  radices =  52 16 16 32 32  0  0  0  0  0
Attached Files
File Type: log test.log (3.1 KB, 20 views)
Lorenzo is offline   Reply With Quote
Old 2021-02-19, 07:45   #18
Lorenzo
 
Lorenzo's Avatar
 
Aug 2010
Republic of Belarus

101100102 Posts
Default

Full test for large fft on i3-8100:
Code:
      8192  msec/iter =   19.61  ROE[avg,max] = [0.272732764, 0.375000000]  radices = 256 32 32 16  0  0  0  0  0  0
      9216  msec/iter =   23.07  ROE[avg,max] = [0.239072536, 0.312500000]  radices = 288 16 32 32  0  0  0  0  0  0
     10240  msec/iter =   27.33  ROE[avg,max] = [0.271287049, 0.375000000]  radices = 320 32 32 16  0  0  0  0  0  0
     11264  msec/iter =   28.73  ROE[avg,max] = [0.271818621, 0.375000000]  radices = 352 32 32 16  0  0  0  0  0  0
     12288  msec/iter =   32.18  ROE[avg,max] = [0.259570478, 0.312500000]  radices = 768 16 16 32  0  0  0  0  0  0
     13312  msec/iter =   36.87  ROE[avg,max] = [0.254703482, 0.312500000]  radices = 208 32 32 32  0  0  0  0  0  0
     14336  msec/iter =   39.92  ROE[avg,max] = [0.234003331, 0.296875000]  radices = 224 32 32 32  0  0  0  0  0  0
     15360  msec/iter =   42.65  ROE[avg,max] = [0.245504855, 0.312500000]  radices = 960 16 16 32  0  0  0  0  0  0
     16384  msec/iter =   44.85  ROE[avg,max] = [0.272600878, 0.375000000]  radices = 256 32 32 32  0  0  0  0  0  0
     18432  msec/iter =   52.67  ROE[avg,max] = [0.236424995, 0.281250000]  radices = 288 32 32 32  0  0  0  0  0  0
     20480  msec/iter =   61.48  ROE[avg,max] = [0.237479031, 0.312500000]  radices = 320 32 32 32  0  0  0  0  0  0
     22528  msec/iter =   65.70  ROE[avg,max] = [0.228240432, 0.312500000]  radices = 352 32 32 32  0  0  0  0  0  0
     24576  msec/iter =   68.40  ROE[avg,max] = [0.261424145, 0.343750000]  radices = 768 16 32 32  0  0  0  0  0  0
     26624  msec/iter =  141.14  ROE[avg,max] = [0.272725339, 0.343750000]  radices =  52 16 16 32 32  0  0  0  0  0
     28672  msec/iter =  106.92  ROE[avg,max] = [0.252042892, 0.312500000]  radices = 224 16 16 16 16  0  0  0  0  0
     30720  msec/iter =  114.56  ROE[avg,max] = [0.288327813, 0.375000000]  radices = 240 16 16 16 16  0  0  0  0  0
     32768  msec/iter =  101.20  ROE[avg,max] = [0.238132941, 0.312500000]  radices = 1024 16 32 32  0  0  0  0  0  0
     36864  msec/iter =  137.73  ROE[avg,max] = [0.265349020, 0.312500000]  radices = 288 16 16 16 16  0  0  0  0  0
     40960  msec/iter =  161.66  ROE[avg,max] = [0.251543120, 0.312500000]  radices = 320 16 16 16 16  0  0  0  0  0
     45056  msec/iter =  170.85  ROE[avg,max] = [0.244248223, 0.312500000]  radices = 352 16 16 16 16  0  0  0  0  0
     49152  msec/iter =  153.04  ROE[avg,max] = [0.255821747, 0.343750000]  radices = 768 32 32 32  0  0  0  0  0  0
     53248  msec/iter =  293.04  ROE[avg,max] = [0.262757669, 0.312500000]  radices =  52 16 32 32 32  0  0  0  0  0
     57344  msec/iter =  270.81  ROE[avg,max] = [0.265370288, 0.375000000]  radices = 224 16 16 16 32  0  0  0  0  0
     61440  msec/iter =  204.05  ROE[avg,max] = [0.246525841, 0.343750000]  radices = 960 32 32 32  0  0  0  0  0  0
Lorenzo is offline   Reply With Quote
Old 2021-02-19, 09:20   #19
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

100100100110012 Posts
Default

Quote:
Originally Posted by Lorenzo View Post
Sorry, but I don't have the wall-plug wattmeter.
Whaaaaattttt?
You must buy one, try Aliexpress, here, you can even smart-measure some parameters of your "wife" with it! (whatever that means )
Click image for larger version

Name:	wife  smart meter.JPG
Views:	34
Size:	33.3 KB
ID:	24350
(photo for posterity, in case they change it; to be clear, this is a joke, I do not promote nor endorse that product, but "one click operation for wife" I would buy any time!).

Last fiddled with by LaurV on 2021-02-19 at 09:22
LaurV is offline   Reply With Quote
Old 2021-02-19, 19:53   #20
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3×53×31 Posts
Default

Quote:
Originally Posted by Lorenzo View Post
Full test for large fft on i3-8100:
[snip]
Thanks - that is very helpful as far as future roadmapping goes - so for selected of the FFT lengths:
o 26M: r0 = 208 needs to be made more accurate (rejected in your test.log due to excess ROE), also need r0 = 416;
o 28,30M: Need r0 = 448,480;
o 36,40,44,52,56M: Need r0 = 576,640,704,832,896.
ewmayer is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Mlucas v19 available ewmayer Mlucas 89 2021-02-01 20:37
Mlucas v18 available ewmayer Mlucas 48 2019-11-28 02:53
Mlucas on ubuntu Damian Mlucas 17 2017-11-13 18:12
Mlucas version 17 ewmayer Mlucas 3 2017-06-17 11:18
mlucas on sun delta_t Mlucas 14 2007-10-04 05:45

All times are UTC. The time now is 00:20.

Sat Apr 17 00:20:30 UTC 2021 up 8 days, 19:01, 0 users, load averages: 1.26, 1.34, 1.42

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.