mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2018-10-21, 05:59   #12
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

3×5×59 Posts
Default

This is an excellent improvement. I am running a PRP first test with an AVX-512 FFT length 4608K and the speed is about 20% faster.

The test was only about 3% complete, so I just shut down Prime95, copied in the new executable (Windows) and restarted. It resumed from the savefile with no difficulty.
Chuck is offline   Reply With Quote
Old 2018-10-21, 09:29   #13
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

3×5×29 Posts
Default

Doing testing on 7800X now... getting some scary temperatures if HT used, >100C on a delidded CPU, with TIM (can't use liquid metal as testing extreme cooling other times), and watercooling loop. Temps more sane if HT disabled. CPU is at stock settings apart from cache at 3000 (2000 stock, didn't touch voltage).
mackerel is offline   Reply With Quote
Old 2018-10-21, 12:32   #14
GP2
 
GP2's Avatar
 
Sep 2003

5×11×47 Posts
Default

Quote:
Originally Posted by mackerel View Post
Doing testing on 7800X now... getting some scary temperatures if HT used, >100C on a delidded CPU, with TIM (can't use liquid metal as testing extreme cooling other times), and watercooling loop. Temps more sane if HT disabled.
The benchmarks I did (before they crashed) indicated that on Skylake AVX-512, hyperthreading gives lower throughput with v 29.5, as opposed to v 29.4 where the opposite was true.
GP2 is offline   Reply With Quote
Old 2018-10-21, 12:39   #15
GP2
 
GP2's Avatar
 
Sep 2003

1010000110012 Posts
Default

Since it's a new release, has any further thought been given to enabling an option to output 2048-bit residues?
GP2 is offline   Reply With Quote
Old 2018-10-21, 15:16   #16
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101011011112 Posts
Default

Quote:
Originally Posted by GP2 View Post
Since it's a new release, has any further thought been given to enabling an option to output 2048-bit residues?
I believe that feature is on by default for PRP tests.
Prime95 is online now   Reply With Quote
Old 2018-10-21, 17:00   #17
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

3×5×29 Posts
Default

On 7800X, running 1 core per worker, saw ball park 70% increase in throughput for small FFTs, up to around 256k, where it drops into the ram limited zone around 512k FFT. I can't be 100% sure I didn't change my ram comparing it to older test, but there's still a boost ball park 10% there.

Harder to say what's happening with 1 worker 6 cores, but there seems to be a clear gain between 1024k - 2560k FFT, less clear either side of that.

I did a spot check for power usage at 64k FFT with 6 threads in place stress test. Estimated CPU power usage went up 25%, or system power up 14%, for 60%+ throughput increase sounds like a good deal to me. Assuming this still applies to other relatively small FFT sizes and I can keep temperatures in check. Will look forward to this getting rolled into LLR.
mackerel is offline   Reply With Quote
Old 2018-10-22, 04:23   #18
tshinozk
 
Nov 2012

23 Posts
Default

I run benchmark with all-complex FFT in windows 10 on 7980XE at stock setting.
It shows 50%+ improvement on multithreaded works (i.e. within L3 chache )

However, it hang while 10cores hyperthreaded 4workers test, where CPU usage remains 0% on taskmanager.

[Main thread Oct 22 09:50] Mersenne number primality test program version 29.5
[Main thread Oct 22 09:50] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 25344 KB
...
[Worker #1 Oct 22 10:16] Timing 2048K all-complex FFT, 10 cores, 7 workers. Average times: 11.57, 11.75, 11.74, 11.51, 5.81, 4.57, 4.84 ms. Total throughput: 941.13 iter/sec.
[Worker #1 Oct 22 10:16] Timing 2048K all-complex FFT, 10 cores, 8 workers. Average times: 11.72, 10.76, 11.86, 11.78, 11.78, 11.88, 5.29, 4.44 ms. Total throughput: 930.87 iter/sec.
[Worker #1 Oct 22 10:16] Timing 2048K all-complex FFT, 10 cores, 9 workers. Average times: 11.99, 10.89, 11.80, 12.06, 11.99, 12.02, 11.81, 10.91, 4.55 ms. Total throughput: 905.85 iter/sec.
[Worker #1 Oct 22 10:17] Timing 2048K all-complex FFT, 10 cores, 10 workers. Average times: 12.50, 12.32, 11.13, 12.25, 12.27, 12.23, 12.51, 12.22, 12.23, 11.15 ms. Total throughput: 829.16 iter/se
[Worker #1 Oct 22 10:17] Timing 2048K all-complex FFT, 10 cores hyperthreaded, 1 worker. Average times: 0.78 ms. Total throughput: 1289.81 iter/sec.
[Worker #1 Oct 22 10:17] Timing 2048K all-complex FFT, 10 cores hyperthreaded, 2 workers. Average times: 1.43, 1.43 ms. Total throughput: 1401.32 iter/sec.
[Worker #1 Oct 22 10:17] Timing 2048K all-complex FFT, 10 cores hyperthreaded, 3 workers. Average times: 3.47, 3.25, 2.10 ms. Total throughput: 1071.34 iter/sec.
[Worker #1 Oct 22 10:18] Timing 2048K all-complex FFT, 10 cores hyperthreaded, 4 workers. [Main thread Oct 22 10:29] Stopping all worker threads.

I stopped it manually and because I can not exit it, I killed the prime95 using taskmanager.
It is reproducible, but at other positions.
tshinozk is offline   Reply With Quote
Old 2018-10-23, 15:42   #19
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

61268 Posts
Default

I resumed an ongoing PRP DC assignment on EC2 with 29.5b2 and later b3. There might be an issue with FFT selection.

I did not check if worktodo.txt had FFT2=4M in the line but I doubt it since 29.5b2 started out at 4200K AVX512 FFT.

Quickly it switched to 4M FFT which is too low it seems because it got a lot of ROUNDOFF > 0.4 errors.
I noticed now that it had FFT2=4M in the worktodo line and I removed it, but it still chose 4M FFT when I restarted.
I stopped again and added FFT2=4200K in the line instead so it will finish the exponent at 4200K.

I have SoftCrossoverAdjust=-0.004 in prime.txt but it never tested FFT size because it was halfway through already probably.

295b3.txt
ATH is offline   Reply With Quote
Old 2018-10-23, 16:51   #20
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2×1,579 Posts
Default

Throughput benchmark when you choose "Limit FFT sizes (mimic older benchmarking code) (N):" to No, it still skips 4096K (4M) and 8192K (8M) FFTs, even though it clearly uses 4M AVX512 FFT, see previous post.

If you choose "Limit FFT sizes (mimic older benchmarking code) (N):" to Yes it only tests 1 FFT size and then stops even though a large range was chosen.
ATH is offline   Reply With Quote
Old 2018-10-23, 17:45   #21
GP2
 
GP2's Avatar
 
Sep 2003

1010000110012 Posts
Default PRP fails with very small exponents

I am attaching a worktodo.txt file suitable for PRP cofactor testing of the known fully-factored Mersenne exponents.

With 29.5 there are problems with very small exponents.

For the tiniest exponents the program halts with this kind of error message:

Code:
[Tue Oct 23 17:16:24 2018]
PRP cannot initialize FFT code for M11, errcode=1002
Number sent to gwsetup is too large for the FFTs to handle.
[Tue Oct 23 17:17:03 2018]
PRP cannot initialize FFT code for M101, errcode=1002
Number sent to gwsetup is too large for the FFTs to handle.
[Tue Oct 23 17:17:25 2018]
PRP cannot initialize FFT code for M503, errcode=1002
Number sent to gwsetup is too large for the FFTs to handle.
[Tue Oct 23 17:18:11 2018]
PRP cannot initialize FFT code for M1009, errcode=1002
Number sent to gwsetup is too large for the FFTs to handle.
For exponents like 1531 and 2069, you get an infinite loop:

Code:
[Tue Oct 23 17:18:45 2018]
ERROR: Comparing Gerbicz checksum values failed.  Rolling back to iteration 0.
Continuing from last save file.
ERROR: Comparing Gerbicz checksum values failed.  Rolling back to iteration 0.
Continuing from last save file.
ERROR: Comparing Gerbicz checksum values failed.  Rolling back to iteration 0.
Continuing from last save file.
...
The save file gets created during the run itself, it shows one iteration completed.

For exponent 3041 it worked, but it had "finite loop" issues at the start:

Code:
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3041/24329/5565031 using AVX-512 FFT length 1K
[Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed.  Rolling back to iteration 0.
[Work thread Oct 23 17:20] Continuing from last save file.
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3041/24329/5565031 using AVX-512 FFT length 1K
[Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed.  Rolling back to iteration 0.
[Work thread Oct 23 17:20] Continuing from last save file.
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3041/24329/5565031 using AVX-512 FFT length 1K
[Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed.  Rolling back to iteration 0.
[Work thread Oct 23 17:20] Continuing from last save file.
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3041/24329/5565031 using AVX-512 FFT length 1K
[Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed.  Rolling back to iteration 0.
[Work thread Oct 23 17:20] Continuing from last save file.
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3041/24329/5565031 using AVX-512 FFT length 1K
[Work thread Oct 23 17:20] Gerbicz error check passed at iteration 3025.
[Work thread Oct 23 17:20] M3041/24329/5565031 is a probable prime! Wh8: 17C217C2,2135,00400000
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3079/25324846649810648887383180721 using AVX-512 FFT length 1K
And for larger exponents all is well.

Similar behavior is seen for Wagstaff PRP testing.

We do occasionally see new factors for exponents this small, for instance in 2017 and 2018 there were new factors for 1471, 1489, 1549, 2789, 2819, 2861, 2957.
Attached Files
File Type: txt worktodo.txt (24.8 KB, 91 views)
GP2 is offline   Reply With Quote
Old 2018-10-23, 21:06   #22
GP2
 
GP2's Avatar
 
Sep 2003

5·11·47 Posts
Default

Actually, taking a closer look, the exponent 3547 also had "finite loop" issues, just like 3041. However the intermediate exponents 3079, 3259, 3359 did not.

Code:
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3547/148823192092809407/1948447035193 using AVX-512 FFT length 1K
[Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed.  Rolling back to iteration 0.
[Work thread Oct 23 17:20] Continuing from last save file.
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3547/148823192092809407/1948447035193 using AVX-512 FFT length 1K
[Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed.  Rolling back to iteration 0.
[Work thread Oct 23 17:20] Continuing from last save file.
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3547/148823192092809407/1948447035193 using AVX-512 FFT length 1K
[Work thread Oct 23 17:20] Gerbicz error check passed at iteration 3481.
[Work thread Oct 23 17:20] Gerbicz error check passed at iteration 3545.
[Work thread Oct 23 17:20] M3547/148823192092809407/1948447035193 is a probable prime! Wh8: 1BB61BB6,1795,00200000
GP2 is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 18:24.


Sun Aug 1 18:24:39 UTC 2021 up 9 days, 12:53, 0 users, load averages: 2.18, 2.80, 2.74

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.