![]() |
|
|
#12 |
|
May 2011
Orange Park, FL
3×5×59 Posts |
This is an excellent improvement. I am running a PRP first test with an AVX-512 FFT length 4608K and the speed is about 20% faster.
The test was only about 3% complete, so I just shut down Prime95, copied in the new executable (Windows) and restarted. It resumed from the savefile with no difficulty. |
|
|
|
|
|
#13 |
|
Feb 2016
UK
3×5×29 Posts |
Doing testing on 7800X now... getting some scary temperatures if HT used, >100C on a delidded CPU, with TIM (can't use liquid metal as testing extreme cooling other times), and watercooling loop. Temps more sane if HT disabled. CPU is at stock settings apart from cache at 3000 (2000 stock, didn't touch voltage).
|
|
|
|
|
|
#14 |
|
Sep 2003
5×11×47 Posts |
The benchmarks I did (before they crashed) indicated that on Skylake AVX-512, hyperthreading gives lower throughput with v 29.5, as opposed to v 29.4 where the opposite was true.
|
|
|
|
|
|
#15 |
|
Sep 2003
1010000110012 Posts |
Since it's a new release, has any further thought been given to enabling an option to output 2048-bit residues?
|
|
|
|
|
|
#16 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
11101011011112 Posts |
|
|
|
|
|
|
#17 |
|
Feb 2016
UK
3×5×29 Posts |
On 7800X, running 1 core per worker, saw ball park 70% increase in throughput for small FFTs, up to around 256k, where it drops into the ram limited zone around 512k FFT. I can't be 100% sure I didn't change my ram comparing it to older test, but there's still a boost ball park 10% there.
Harder to say what's happening with 1 worker 6 cores, but there seems to be a clear gain between 1024k - 2560k FFT, less clear either side of that. I did a spot check for power usage at 64k FFT with 6 threads in place stress test. Estimated CPU power usage went up 25%, or system power up 14%, for 60%+ throughput increase sounds like a good deal to me. Assuming this still applies to other relatively small FFT sizes and I can keep temperatures in check. Will look forward to this getting rolled into LLR. |
|
|
|
|
|
#18 |
|
Nov 2012
23 Posts |
I run benchmark with all-complex FFT in windows 10 on 7980XE at stock setting.
It shows 50%+ improvement on multithreaded works (i.e. within L3 chache ) However, it hang while 10cores hyperthreaded 4workers test, where CPU usage remains 0% on taskmanager. [Main thread Oct 22 09:50] Mersenne number primality test program version 29.5 [Main thread Oct 22 09:50] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 25344 KB ... [Worker #1 Oct 22 10:16] Timing 2048K all-complex FFT, 10 cores, 7 workers. Average times: 11.57, 11.75, 11.74, 11.51, 5.81, 4.57, 4.84 ms. Total throughput: 941.13 iter/sec. [Worker #1 Oct 22 10:16] Timing 2048K all-complex FFT, 10 cores, 8 workers. Average times: 11.72, 10.76, 11.86, 11.78, 11.78, 11.88, 5.29, 4.44 ms. Total throughput: 930.87 iter/sec. [Worker #1 Oct 22 10:16] Timing 2048K all-complex FFT, 10 cores, 9 workers. Average times: 11.99, 10.89, 11.80, 12.06, 11.99, 12.02, 11.81, 10.91, 4.55 ms. Total throughput: 905.85 iter/sec. [Worker #1 Oct 22 10:17] Timing 2048K all-complex FFT, 10 cores, 10 workers. Average times: 12.50, 12.32, 11.13, 12.25, 12.27, 12.23, 12.51, 12.22, 12.23, 11.15 ms. Total throughput: 829.16 iter/se [Worker #1 Oct 22 10:17] Timing 2048K all-complex FFT, 10 cores hyperthreaded, 1 worker. Average times: 0.78 ms. Total throughput: 1289.81 iter/sec. [Worker #1 Oct 22 10:17] Timing 2048K all-complex FFT, 10 cores hyperthreaded, 2 workers. Average times: 1.43, 1.43 ms. Total throughput: 1401.32 iter/sec. [Worker #1 Oct 22 10:17] Timing 2048K all-complex FFT, 10 cores hyperthreaded, 3 workers. Average times: 3.47, 3.25, 2.10 ms. Total throughput: 1071.34 iter/sec. [Worker #1 Oct 22 10:18] Timing 2048K all-complex FFT, 10 cores hyperthreaded, 4 workers. [Main thread Oct 22 10:29] Stopping all worker threads. I stopped it manually and because I can not exit it, I killed the prime95 using taskmanager. It is reproducible, but at other positions. |
|
|
|
|
|
#19 |
|
Einyen
Dec 2003
Denmark
61268 Posts |
I resumed an ongoing PRP DC assignment on EC2 with 29.5b2 and later b3. There might be an issue with FFT selection.
I did not check if worktodo.txt had FFT2=4M in the line but I doubt it since 29.5b2 started out at 4200K AVX512 FFT. Quickly it switched to 4M FFT which is too low it seems because it got a lot of ROUNDOFF > 0.4 errors. I noticed now that it had FFT2=4M in the worktodo line and I removed it, but it still chose 4M FFT when I restarted. I stopped again and added FFT2=4200K in the line instead so it will finish the exponent at 4200K. I have SoftCrossoverAdjust=-0.004 in prime.txt but it never tested FFT size because it was halfway through already probably. 295b3.txt |
|
|
|
|
|
#20 |
|
Einyen
Dec 2003
Denmark
2×1,579 Posts |
Throughput benchmark when you choose "Limit FFT sizes (mimic older benchmarking code) (N):" to No, it still skips 4096K (4M) and 8192K (8M) FFTs, even though it clearly uses 4M AVX512 FFT, see previous post.
If you choose "Limit FFT sizes (mimic older benchmarking code) (N):" to Yes it only tests 1 FFT size and then stops even though a large range was chosen. |
|
|
|
|
|
#21 |
|
Sep 2003
1010000110012 Posts |
I am attaching a worktodo.txt file suitable for PRP cofactor testing of the known fully-factored Mersenne exponents.
With 29.5 there are problems with very small exponents. For the tiniest exponents the program halts with this kind of error message: Code:
[Tue Oct 23 17:16:24 2018] PRP cannot initialize FFT code for M11, errcode=1002 Number sent to gwsetup is too large for the FFTs to handle. [Tue Oct 23 17:17:03 2018] PRP cannot initialize FFT code for M101, errcode=1002 Number sent to gwsetup is too large for the FFTs to handle. [Tue Oct 23 17:17:25 2018] PRP cannot initialize FFT code for M503, errcode=1002 Number sent to gwsetup is too large for the FFTs to handle. [Tue Oct 23 17:18:11 2018] PRP cannot initialize FFT code for M1009, errcode=1002 Number sent to gwsetup is too large for the FFTs to handle. Code:
[Tue Oct 23 17:18:45 2018] ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0. Continuing from last save file. ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0. Continuing from last save file. ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0. Continuing from last save file. ... For exponent 3041 it worked, but it had "finite loop" issues at the start: Code:
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3041/24329/5565031 using AVX-512 FFT length 1K [Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0. [Work thread Oct 23 17:20] Continuing from last save file. [Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3041/24329/5565031 using AVX-512 FFT length 1K [Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0. [Work thread Oct 23 17:20] Continuing from last save file. [Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3041/24329/5565031 using AVX-512 FFT length 1K [Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0. [Work thread Oct 23 17:20] Continuing from last save file. [Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3041/24329/5565031 using AVX-512 FFT length 1K [Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0. [Work thread Oct 23 17:20] Continuing from last save file. [Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3041/24329/5565031 using AVX-512 FFT length 1K [Work thread Oct 23 17:20] Gerbicz error check passed at iteration 3025. [Work thread Oct 23 17:20] M3041/24329/5565031 is a probable prime! Wh8: 17C217C2,2135,00400000 [Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3079/25324846649810648887383180721 using AVX-512 FFT length 1K Similar behavior is seen for Wagstaff PRP testing. We do occasionally see new factors for exponents this small, for instance in 2017 and 2018 there were new factors for 1471, 1489, 1549, 2789, 2819, 2861, 2957. |
|
|
|
|
|
#22 |
|
Sep 2003
5·11·47 Posts |
Actually, taking a closer look, the exponent 3547 also had "finite loop" issues, just like 3041. However the intermediate exponents 3079, 3259, 3359 did not.
Code:
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3547/148823192092809407/1948447035193 using AVX-512 FFT length 1K [Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0. [Work thread Oct 23 17:20] Continuing from last save file. [Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3547/148823192092809407/1948447035193 using AVX-512 FFT length 1K [Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0. [Work thread Oct 23 17:20] Continuing from last save file. [Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3547/148823192092809407/1948447035193 using AVX-512 FFT length 1K [Work thread Oct 23 17:20] Gerbicz error check passed at iteration 3481. [Work thread Oct 23 17:20] Gerbicz error check passed at iteration 3545. [Work thread Oct 23 17:20] M3547/148823192092809407/1948447035193 is a probable prime! Wh8: 1BB61BB6,1795,00200000 |
|
|
|