mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   SkylakeX teasers (aka prime95 29.5) (https://www.mersenneforum.org/showthread.php?t=23723)

Chuck 2018-10-21 05:59

This is an excellent improvement. I am running a PRP first test with an AVX-512 FFT length 4608K and the speed is about 20% faster.

The test was only about 3% complete, so I just shut down Prime95, copied in the new executable (Windows) and restarted. It resumed from the savefile with no difficulty.

mackerel 2018-10-21 09:29

Doing testing on 7800X now... getting some scary temperatures if HT used, >100C on a delidded CPU, with TIM (can't use liquid metal as testing extreme cooling other times), and watercooling loop. Temps more sane if HT disabled. CPU is at stock settings apart from cache at 3000 (2000 stock, didn't touch voltage).

GP2 2018-10-21 12:32

[QUOTE=mackerel;498421]Doing testing on 7800X now... getting some scary temperatures if HT used, >100C on a delidded CPU, with TIM (can't use liquid metal as testing extreme cooling other times), and watercooling loop. Temps more sane if HT disabled.[/QUOTE]

The benchmarks I did (before they crashed) indicated that on Skylake AVX-512, hyperthreading gives lower throughput with v 29.5, as opposed to v 29.4 where the opposite was true.

GP2 2018-10-21 12:39

Since it's a new release, has any further thought been given to enabling an option to output 2048-bit residues?

Prime95 2018-10-21 15:16

[QUOTE=GP2;498424]Since it's a new release, has any further thought been given to enabling an option to output 2048-bit residues?[/QUOTE]

I believe that feature is on by default for PRP tests.

mackerel 2018-10-21 17:00

On 7800X, running 1 core per worker, saw ball park 70% increase in throughput for small FFTs, up to around 256k, where it drops into the ram limited zone around 512k FFT. I can't be 100% sure I didn't change my ram comparing it to older test, but there's still a boost ball park 10% there.

Harder to say what's happening with 1 worker 6 cores, but there seems to be a clear gain between 1024k - 2560k FFT, less clear either side of that.

I did a spot check for power usage at 64k FFT with 6 threads in place stress test. Estimated CPU power usage went up 25%, or system power up 14%, for 60%+ throughput increase sounds like a good deal to me. Assuming this still applies to other relatively small FFT sizes and I can keep temperatures in check. Will look forward to this getting rolled into LLR.

tshinozk 2018-10-22 04:23

I run benchmark with all-complex FFT in windows 10 on 7980XE at stock setting.
It shows 50%+ improvement on multithreaded works (i.e. within L3 chache )

However, it hang while 10cores hyperthreaded 4workers test, where CPU usage remains 0% on taskmanager.

[Main thread Oct 22 09:50] Mersenne number primality test program version 29.5
[Main thread Oct 22 09:50] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 25344 KB
...
[Worker #1 Oct 22 10:16] Timing 2048K all-complex FFT, 10 cores, 7 workers. Average times: 11.57, 11.75, 11.74, 11.51, 5.81, 4.57, 4.84 ms. Total throughput: 941.13 iter/sec.
[Worker #1 Oct 22 10:16] Timing 2048K all-complex FFT, 10 cores, 8 workers. Average times: 11.72, 10.76, 11.86, 11.78, 11.78, 11.88, 5.29, 4.44 ms. Total throughput: 930.87 iter/sec.
[Worker #1 Oct 22 10:16] Timing 2048K all-complex FFT, 10 cores, 9 workers. Average times: 11.99, 10.89, 11.80, 12.06, 11.99, 12.02, 11.81, 10.91, 4.55 ms. Total throughput: 905.85 iter/sec.
[Worker #1 Oct 22 10:17] Timing 2048K all-complex FFT, 10 cores, 10 workers. Average times: 12.50, 12.32, 11.13, 12.25, 12.27, 12.23, 12.51, 12.22, 12.23, 11.15 ms. Total throughput: 829.16 iter/se
[Worker #1 Oct 22 10:17] Timing 2048K all-complex FFT, 10 cores hyperthreaded, 1 worker. Average times: 0.78 ms. Total throughput: 1289.81 iter/sec.
[Worker #1 Oct 22 10:17] Timing 2048K all-complex FFT, 10 cores hyperthreaded, 2 workers. Average times: 1.43, 1.43 ms. Total throughput: 1401.32 iter/sec.
[Worker #1 Oct 22 10:17] Timing 2048K all-complex FFT, 10 cores hyperthreaded, 3 workers. Average times: 3.47, 3.25, 2.10 ms. Total throughput: 1071.34 iter/sec.
[Worker #1 Oct 22 10:18] Timing 2048K all-complex FFT, 10 cores hyperthreaded, 4 workers. [Main thread Oct 22 10:29] Stopping all worker threads.

I stopped it manually and because I can not exit it, I killed the prime95 using taskmanager.
It is reproducible, but at other positions.

ATH 2018-10-23 15:42

I resumed an ongoing PRP DC assignment on EC2 with 29.5b2 and later b3. There might be an issue with FFT selection.

I did not check if worktodo.txt had FFT2=4M in the line but I doubt it since 29.5b2 started out at 4200K AVX512 FFT.

Quickly it switched to 4M FFT which is too low it seems because it got a lot of ROUNDOFF > 0.4 errors.
I noticed now that it had FFT2=4M in the worktodo line and I removed it, but it still chose 4M FFT when I restarted.
I stopped again and added FFT2=4200K in the line instead so it will finish the exponent at 4200K.

I have SoftCrossoverAdjust=-0.004 in prime.txt but it never tested FFT size because it was halfway through already probably.

[URL="http://hoegge.dk/mersenne/295b3.txt"]295b3.txt[/URL]

ATH 2018-10-23 16:51

Throughput benchmark when you choose "Limit FFT sizes (mimic older benchmarking code) (N):" to No, it still skips 4096K (4M) and 8192K (8M) FFTs, even though it clearly uses 4M AVX512 FFT, see previous post.

If you choose "Limit FFT sizes (mimic older benchmarking code) (N):" to Yes it only tests 1 FFT size and then stops even though a large range was chosen.

GP2 2018-10-23 17:45

PRP fails with very small exponents
 
1 Attachment(s)
I am attaching a worktodo.txt file suitable for PRP cofactor testing of the known fully-factored Mersenne exponents.

With 29.5 there are problems with very small exponents.

For the tiniest exponents the program halts with this kind of error message:

[CODE]
[Tue Oct 23 17:16:24 2018]
PRP cannot initialize FFT code for M11, errcode=1002
Number sent to gwsetup is too large for the FFTs to handle.
[Tue Oct 23 17:17:03 2018]
PRP cannot initialize FFT code for M101, errcode=1002
Number sent to gwsetup is too large for the FFTs to handle.
[Tue Oct 23 17:17:25 2018]
PRP cannot initialize FFT code for M503, errcode=1002
Number sent to gwsetup is too large for the FFTs to handle.
[Tue Oct 23 17:18:11 2018]
PRP cannot initialize FFT code for M1009, errcode=1002
Number sent to gwsetup is too large for the FFTs to handle.
[/CODE]

For exponents like 1531 and 2069, you get an infinite loop:

[CODE]
[Tue Oct 23 17:18:45 2018]
ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0.
Continuing from last save file.
ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0.
Continuing from last save file.
ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0.
Continuing from last save file.
...
[/CODE]

The save file gets created during the run itself, it shows one iteration completed.

For exponent 3041 it worked, but it had "finite loop" issues at the start:

[CODE]
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3041/24329/5565031 using AVX-512 FFT length 1K
[Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0.
[Work thread Oct 23 17:20] Continuing from last save file.
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3041/24329/5565031 using AVX-512 FFT length 1K
[Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0.
[Work thread Oct 23 17:20] Continuing from last save file.
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3041/24329/5565031 using AVX-512 FFT length 1K
[Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0.
[Work thread Oct 23 17:20] Continuing from last save file.
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3041/24329/5565031 using AVX-512 FFT length 1K
[Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0.
[Work thread Oct 23 17:20] Continuing from last save file.
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3041/24329/5565031 using AVX-512 FFT length 1K
[Work thread Oct 23 17:20] Gerbicz error check passed at iteration 3025.
[Work thread Oct 23 17:20] M3041/24329/5565031 is a probable prime! Wh8: 17C217C2,2135,00400000
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3079/25324846649810648887383180721 using AVX-512 FFT length 1K
[/CODE]

And for larger exponents all is well.

Similar behavior is seen for Wagstaff PRP testing.

We do occasionally see new factors for exponents this small, for instance in 2017 and 2018 there were new factors for [M]1471[/M], [M]1489[/M], [M]1549[/M], [M]2789[/M], [M]2819[/M], [M]2861[/M], [M]2957[/M].

GP2 2018-10-23 21:06

Actually, taking a closer look, the exponent 3547 also had "finite loop" issues, just like 3041. However the intermediate exponents 3079, 3259, 3359 did not.

[CODE]
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3547/148823192092809407/1948447035193 using AVX-512 FFT length 1K
[Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0.
[Work thread Oct 23 17:20] Continuing from last save file.
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3547/148823192092809407/1948447035193 using AVX-512 FFT length 1K
[Work thread Oct 23 17:20] ERROR: Comparing Gerbicz checksum values failed. Rolling back to iteration 0.
[Work thread Oct 23 17:20] Continuing from last save file.
[Work thread Oct 23 17:20] Starting Gerbicz error-checking PRP test of M3547/148823192092809407/1948447035193 using AVX-512 FFT length 1K
[Work thread Oct 23 17:20] Gerbicz error check passed at iteration 3481.
[Work thread Oct 23 17:20] Gerbicz error check passed at iteration 3545.
[Work thread Oct 23 17:20] M3547/148823192092809407/1948447035193 is a probable prime! Wh8: 1BB61BB6,1795,00200000
[/CODE]


All times are UTC. The time now is 02:02.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.