View Single Post
Old 2022-06-29, 16:35   #15
timbit
 
Mar 2009

248 Posts
Default

Ok, I managed to see the autobench last night (1 worker, 4 cores). The fastest FFT implementation did not get selected.


I then decided to go back to basics. 1 worker, 1 core that's it. I deleted existing gwnum.txt and results.bench.txt.


I ran ./mprime -m, chose item 17, benchmark. 48k FFT, 1 worker, 1 core.


Attached are the results.bench.txt, and gwnum.txt. When I started ECM on 999xxx exponent, B1=1000000. I can see that a non-optimal FFT was chosen.


How can I get mprime to choose the optimal FFT? Is there anything in prime.txt or local.txt that can manually choose an FFT implementation? I've run as 1 worker, 1 core, no excuses now. Also nothing else running on my Ubuntu 2204 x64 (Intel Xeon E5-2680 v4).


From results.bench.txt: (bold is fastest)


Prime95 64-bit version 30.7, RdtscTiming=1
FFTlen=48K, Type=3, Arch=4, Pass1=256, Pass2=192, clm=4 (1 core, 1 worker): 0.47 ms. Throughput: 2138.80 iter/sec.
FFTlen=48K, Type=3, Arch=4, Pass1=256, Pass2=192, clm=2 (1 core, 1 worker): 0.45 ms. Throughput: 2238.99 iter/sec.
FFTlen=48K, Type=3, Arch=4, Pass1=256, Pass2=192, clm=1 (1 core, 1 worker): 0.26 ms. Throughput: 3908.64 iter/sec.
FFTlen=48K, Type=3, Arch=4, Pass1=768, Pass2=64, clm=4 (1 core, 1 worker): 0.21 ms. Throughput: 4709.43 iter/sec.
FFTlen=48K, Type=3, Arch=4, Pass1=768, Pass2=64, clm=2 (1 core, 1 worker): 0.26 ms. Throughput: 3907.09 iter/sec.
FFTlen=48K, Type=3, Arch=4, Pass1=768, Pass2=64, clm=1 (1 core, 1 worker): 0.21 ms. Throughput: 4874.29 iter/sec.


When I start ./mprime -d, I see:


[Main thread Jun 29 09:26] Mersenne number primality test program version 30.7
[Main thread Jun 29 09:26] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 14x256 KB, L3 cache size: 35 MB
[Main thread Jun 29 09:26] Starting worker.
[Work thread Jun 29 09:26] Worker starting
[Work thread Jun 29 09:26] Setting affinity to run worker on CPU core #2
[Work thread Jun 29 09:26]
[Work thread Jun 29 09:26] Using FMA3 FFT length 48K, Pass1=768, Pass2=64, clm=2
[Work thread Jun 29 09:26] 0.052 bits-per-word below FFT limit (more than 0.509 allows extra optimizations)
[Work thread Jun 29 09:26] ECM on M999217: curve #1 with s=7014263894342847, B1=1000000, B2=TBD



Non-optimal FFT chosen. It's truly bizarre.


Any thoughts on what a root cause may be?
Attached Files
File Type: txt results.bench.txt (5.1 KB, 34 views)
File Type: txt gwnum.txt (375 Bytes, 32 views)
timbit is offline   Reply With Quote