mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   SkylakeX teasers (aka prime95 29.5) (https://www.mersenneforum.org/showthread.php?t=23723)

Prime95 2018-10-23 23:26

[QUOTE=GP2;498588]With 29.5 there are problems with very small exponents..[/QUOTE]

Yes, I should have mentioned this. The smallest AVX-512 FFT is 1K. There may be issues with propagating carries when there are only 3 or 4 bits per FFT word.

I intend to revert to AVX FFTs for small exponents, but I have not yet investigated where the crossover needs to be.

Prime95 2018-10-23 23:36

[QUOTE=ATH;498581]Throughput benchmark when you choose "Limit FFT sizes (mimic older benchmarking code) (N):" to No, it still skips 4096K (4M) and 8192K (8M) FFTs, even though it clearly uses 4M AVX512 FFT, see previous post.

If you choose "Limit FFT sizes (mimic older benchmarking code) (N):" to Yes it only tests 1 FFT size and then stops even though a large range was chosen.[/QUOTE]

Believe it or not this is expected. The 4M and 8M FFT sizes exist and you can force prime95 to use them with the FFT2= worktodo trick. However, the next larger FFT size gets more throughput (for me at least). Thus, the default setup for prime95 is to not use the 4M and 8M FFT sizes. You can benchmark these sizes by selecting the bench-all-implementations option.

If 4M and 8M has better throughput than the next larger FFT size for you, that would be interesting.

Note that anomalies such as slower than expected 4M and 8M timings needs to be looked into by me. They could indicate some macros that need more optimization, a memory layout problem, or a prefetching bug.

GP2 2018-10-24 01:02

[QUOTE=Prime95;498622]Yes, I should have mentioned this. The smallest AVX-512 FFT is 1K. There may be issues with propagating carries when there are only 3 or 4 bits per FFT word.[/QUOTE]

I wonder why exponents 3041 and 3547 do succeed eventually after failing the first few times. You'd think it would be an infinite loop. Is some parameter randomly changed each time the program tries to recover from a Gerbicz checksum error?

ATH 2018-10-24 05:20

[QUOTE=Prime95;498624]If 4M and 8M has better throughput than the next larger FFT size for you, that would be interesting.[/QUOTE]

It seems to be better at 4M if you look at the log I linked:
[URL="http://hoegge.dk/mersenne/295b3.txt"]295b3.txt[/URL]

The best benchmarks:
[Main thread Oct 21 05:20:46] Timing 4096K FFT, 1 core hyperthreaded, 1 worker. Average times: 13.02 ms. Total throughput: 76.78 iter/sec.
[Main thread Oct 21 05:21:24] Timing 4116K FFT, 1 core, 1 worker. Average times: 13.52 ms. Total throughput: 73.95 iter/sec.
[Main thread Oct 21 05:22:52] Timing 4200K FFT, 1 core hyperthreaded, 1 worker. Average times: 13.80 ms. Total throughput: 72.49 iter/sec.

(4116K FFT is not part of throughput benchmark either)

The average iteration time during the 4M part: ms/iter: 13.041
after the switch to 4200K: ms/iter: 16.731

Prime95 2018-10-24 05:26

[QUOTE=ATH;498641]
The best benchmarks:
[Main thread Oct 21 05:20:46] Timing 4096K FFT, 1 core hyperthreaded, 1 worker. Average times: 13.02 ms. Total throughput: 76.78 iter/sec.
[Main thread Oct 21 05:21:24] Timing 4116K FFT, 1 core, 1 worker. Average times: 13.52 ms. Total throughput: 73.95 iter/sec.[/QUOTE]

I based my default FFT selection on the throughput benchmark using 8 cores (of my 8 core SkylakeX).

Are you benching an Amazon ECX instance?

ATH 2018-10-24 05:35

Yes this exponent is running on a c5d.large which has 2 vCPU, so it is just 1 core hyperthreaded.

HyperthreadLL=1 is a tiny bit faster (this is from another instance):

[CODE]With Hyperthreading:
[Work thread Oct 21 10:25:06] Iteration: 43100000 / 77310307 [55.749358%], roundoff: 0.362, ms/iter: 13.917, ETA: 5d 12:15
[Work thread Oct 21 10:48:19] Iteration: 43200000 / 77310307 [55.878707%], roundoff: 0.362, ms/iter: 13.906, ETA: 5d 11:45
[Work thread Oct 21 11:11:30] Iteration: 43300000 / 77310307 [56.008055%], roundoff: 0.362, ms/iter: 13.890, ETA: 5d 11:13
[Work thread Oct 21 11:34:42] Iteration: 43400000 / 77310307 [56.137404%], roundoff: 0.362, ms/iter: 13.897, ETA: 5d 10:53
[Work thread Oct 21 11:57:53] Iteration: 43500000 / 77310307 [56.266753%], roundoff: 0.362, ms/iter: 13.896, ETA: 5d 10:30
[Work thread Oct 21 12:21:04] Iteration: 43600000 / 77310307 [56.396102%], roundoff: 0.362, ms/iter: 13.892, ETA: 5d 10:05
[Work thread Oct 21 12:44:16] Iteration: 43700000 / 77310307 [56.525451%], roundoff: 0.362, ms/iter: 13.893, ETA: 5d 09:42
[Work thread Oct 21 13:07:26] Iteration: 43800000 / 77310307 [56.654800%], roundoff: 0.362, ms/iter: 13.891, ETA: 5d 09:18
[Work thread Oct 21 13:30:38] Iteration: 43900000 / 77310307 [56.784149%], roundoff: 0.362, ms/iter: 13.894, ETA: 5d 08:56
[Work thread Oct 21 13:53:50] Iteration: 44000000 / 77310307 [56.913497%], roundoff: 0.362, ms/iter: 13.901, ETA: 5d 08:37

Without hyperthreading:
[Work thread Oct 22 06:08:52] Iteration: 48100000 / 77310307 [62.216801%], roundoff: 0.321, ms/iter: 14.151, ETA: 4d 18:49
[Work thread Oct 22 06:32:30] Iteration: 48200000 / 77310307 [62.346150%], roundoff: 0.321, ms/iter: 14.153, ETA: 4d 18:26
[Work thread Oct 22 06:56:06] Iteration: 48300000 / 77310307 [62.475498%], roundoff: 0.336, ms/iter: 14.150, ETA: 4d 18:01
[Work thread Oct 22 07:19:49] Iteration: 48400000 / 77310307 [62.604847%], roundoff: 0.336, ms/iter: 14.199, ETA: 4d 18:01
[Work thread Oct 22 07:43:34] Iteration: 48500000 / 77310307 [62.734196%], roundoff: 0.336, ms/iter: 14.240, ETA: 4d 17:57
[Work thread Oct 22 08:07:20] Iteration: 48600000 / 77310307 [62.863545%], roundoff: 0.336, ms/iter: 14.239, ETA: 4d 17:33
[Work thread Oct 22 08:31:07] Iteration: 48700000 / 77310307 [62.992894%], roundoff: 0.336, ms/iter: 14.244, ETA: 4d 17:12
[Work thread Oct 22 08:54:53] Iteration: 48800000 / 77310307 [63.122243%], roundoff: 0.336, ms/iter: 14.245, ETA: 4d 16:48
[Work thread Oct 22 09:18:39] Iteration: 48900000 / 77310307 [63.251592%], roundoff: 0.336, ms/iter: 14.246, ETA: 4d 16:25
[Work thread Oct 22 09:42:26] Iteration: 49000000 / 77310307 [63.380940%], roundoff: 0.336, ms/iter: 14.247, ETA: 4d 16:02

Hyperthreading back on:
[Work thread Oct 22 11:18:27] Iteration: 49400000 / 77310307 [63.898336%], roundoff: 0.318, ms/iter: 13.959, ETA: 4d 12:13
[Work thread Oct 22 11:41:46] Iteration: 49500000 / 77310307 [64.027685%], roundoff: 0.357, ms/iter: 13.972, ETA: 4d 11:56
[Work thread Oct 22 12:05:05] Iteration: 49600000 / 77310307 [64.157034%], roundoff: 0.357, ms/iter: 13.971, ETA: 4d 11:32
[Work thread Oct 22 12:28:25] Iteration: 49700000 / 77310307 [64.286382%], roundoff: 0.357, ms/iter: 13.975, ETA: 4d 11:10
[Work thread Oct 22 12:51:44] Iteration: 49800000 / 77310307 [64.415731%], roundoff: 0.357, ms/iter: 13.975, ETA: 4d 10:47
[Work thread Oct 22 13:15:03] Iteration: 49900000 / 77310307 [64.545080%], roundoff: 0.357, ms/iter: 13.980, ETA: 4d 10:26
[Work thread Oct 22 13:38:24] Iteration: 50000000 / 77310307 [64.674429%], roundoff: 0.357, ms/iter: 13.979, ETA: 4d 10:02
[/CODE]

Chuck 2018-10-24 12:42

Program error messages this morning
 
I was nearing the end of the PRP first test of 87255060 when the program began outputting error messages (I obscured the AID).

It then started outputting a continuous stream of messages for the next queued test, a PRP double-check of 77979067.

[CODE][Wed Oct 24 07:43:41 2018]
Iteration: 87255060/87255083, Possible error: round off (0.2268580493) > -42387
Iteration: 87255060/87255083, Possible error: round off (0.1564778853) > -42387
Iteration: 87255062/87255083, Possible error: round off (0.2268580493) > -35310
Iteration: 87255065/87255083, Possible error: round off (0.2268580493) > -1.0911e+005
Iteration: 87255067/87255083, Possible error: round off (0.2268580493) > -87582
Iteration: 87255067/87255083, Possible error: round off (0.153231718) > -87582
Iteration: 87255068/87255083, Possible error: round off (0.2268580493) > -41809
Iteration: 87255069/87255083, Possible error: round off (0.2268580493) > -1.1253e+005
Iteration: 87255070/87255083, Possible error: round off (0.2268580493) > -14919
Iteration: 87255073/87255083, Possible error: round off (0.2268580493) > -64820
Iteration: 87255068/87255083, Possible error: round off (0.2149417585) > -1.2398e+005
Iteration: 87255069/87255083, Possible error: round off (0.2268580493) > -14680
Iteration: 87255073/87255083, Possible error: round off (0.2268580493) > -1.0284e+005
Iteration: 87255074/87255083, Possible error: round off (0.2268580493) > -35973
Iteration: 87255074/87255083, Possible error: round off (0.1575920773) > -35973
Iteration: 87255076/87255083, Possible error: round off (0.2268580493) > -1.1785e+005
Iteration: 87255080/87255083, Possible error: round off (0.2268580493) > -86994
Iteration: 87255081/87255083, Possible error: round off (0.2268580493) > -45048
Iteration: 87255083/87255083, Possible error: round off (0.2268580493) > -69856
Iteration: 87255076/87255083, Possible error: round off (0.2268580493) > -1.1785e+005
Iteration: 87255080/87255083, Possible error: round off (0.2268580493) > -86994
Iteration: 87255081/87255083, Possible error: round off (0.2268580493) > -45048
Iteration: 87255083/87255083, Possible error: round off (0.2268580493) > -69856
{"status":"C", "k":1, "b":2, "n":87255083, "c":-1, "worktype":"PRP-3", "res64":"B753D12F3E0435D3", "residue-type":1, "res2048":"F572805416335AB7767ED1208B6CA1873B96438C66EBD147B7DBC451144A4535265274A678657B36FDE4E19B4A6B9DC9697B68C7D5BE60F94063A9A09F5AEFF0980F97F832D641D64097C2CDA17225BE491E781AE684A5BD62BC3692670B3C22FED772058D7F8D3995A67DDC4D2F19F023DDF8A28A4B72D3CA70B9A4B7DF674F56B59DD4ACFD293F5E67CCD71D38D3CD57FC1AA3B45FF8A5E98A4D601708540D1EA06ADFB10D7B0589CEA026ED794B178904B5CC0B46F4B8B59131244D6952FED053C789CB41DB748DA1F676CAB6DAC26FA8FF41C895CBCCF4CB88FE6192F50290EBC1FF863B14FB9B75EAAB3E8A63D02CC9415078EC7070B753D12F3E0435D3", "fft-length":4718592, "shift-count":557107, "error-code":"00001700", "security-code":"C5B35D13", "program":{"name":"Prime95", "version":"29.5", "build":3, "port":4}, "timestamp":"2018-10-24 11:43:51", "errors":{"gerbicz":0}, "user":"jaxbuilder", "computer":"Maingear_i7-7800", "aid":"#################"}
Iteration: 1/77979067, Possible error: round off (0.1412162686) > 0
Iteration: 2/77979067, Possible error: round off (0.1367013401) > 0
Iteration: 3/77979067, Possible error: round off (0.1334150282) > 0
Iteration: 4/77979067, Possible error: round off (0.136214313) > 0
Iteration: 5/77979067, Possible error: round off (0.1355326745) > 0
Iteration: 6/77979067, Possible error: round off (0.1466572128) > 0
Iteration: 7/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 8/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 9/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 10/77979067, Possible error: round off (0.1399357129) > 0
Iteration: 11/77979067, Possible error: round off (0.135522705) > 0
Iteration: 12/77979067, Possible error: round off (0.1533570224) > 0
Iteration: 13/77979067, Possible error: round off (0.1372912802) > 0
Iteration: 14/77979067, Possible error: round off (0.1359849303) > 0
Iteration: 15/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 16/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 17/77979067, Possible error: round off (0.1423625708) > 0
Iteration: 18/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 19/77979067, Possible error: round off (0.1359548851) > 0
Iteration: 20/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 21/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 22/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 24/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 25/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 26/77979067, Possible error: round off (0.1351397599) > -29938
Iteration: 27/77979067, Possible error: round off (0.1351397599) > -1.2351e+005
Iteration: 28/77979067, Possible error: round off (0.1351397599) > -22360
Iteration: 30/77979067, Possible error: round off (0.1351397599) > -88632
Iteration: 31/77979067, Possible error: round off (0.1484038053) > -88632
Iteration: 32/77979067, Possible error: round off (0.1475221147) > -88632
Iteration: 33/77979067, Possible error: round off (0.1330651214) > -88632
Iteration: 34/77979067, Possible error: round off (0.1481876628) > -88632
Iteration: 35/77979067, Possible error: round off (0.137989409) > -88632
Iteration: 36/77979067, Possible error: round off (0.1441307379) > -88632
Iteration: 37/77979067, Possible error: round off (0.1288571716) > -88632
Iteration: 38/77979067, Possible error: round off (0.1533668243) > -88632
Iteration: 39/77979067, Possible error: round off (0.1419894314) > -88632
Iteration: 40/77979067, Possible error: round off (0.1306012338) > -88632
Iteration: 41/77979067, Possible error: round off (0.1453630195) > -88632
Iteration: 42/77979067, Possible error: round off (0.1452567279) > -88632
Iteration: 43/77979067, Possible error: round off (0.1455717164) > -88632
Iteration: 44/77979067, Possible error: round off (0.150328931) > -88632[/CODE]

Chuck 2018-10-24 12:50

Error on PRP double check
 
I deleted the backup files and restarted the PRP double check of 77979067. I immediately got a stream or errors.

Starting Gerbicz error-checking PRP test of M77979067 using AVX-512 FFT length 4200K, Pass1=1920, Pass2=2240, clm=1, 5 threads

[CODE][Wed Oct 24 08:44:28 2018]
Iteration: 1/77979067, Possible error: round off (0.1569584618) > 0
Iteration: 2/77979067, Possible error: round off (0.14001889) > 0
Iteration: 3/77979067, Possible error: round off (0.1397896085) > 0
Iteration: 4/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 5/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 6/77979067, Possible error: round off (0.1334150282) > 0
Iteration: 7/77979067, Possible error: round off (0.1334150282) > 0
Iteration: 8/77979067, Possible error: round off (0.1688922839) > 0
Iteration: 9/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 10/77979067, Possible error: round off (0.1401436378) > 0
Iteration: 11/77979067, Possible error: round off (0.1356045784) > 0
Iteration: 12/77979067, Possible error: round off (0.1440369386) > 0
Iteration: 13/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 14/77979067, Possible error: round off (0.1471612118) > 0
Iteration: 15/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 16/77979067, Possible error: round off (0.1373752097) > 0
Iteration: 17/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 18/77979067, Possible error: round off (0.1383202986) > 0
Iteration: 19/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 20/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 21/77979067, Possible error: round off (0.1356798274) > 0
Iteration: 22/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 23/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 24/77979067, Possible error: round off (0.1334150282) > 0
Iteration: 26/77979067, Possible error: round off (0.1351397599) > -23943
Iteration: 27/77979067, Possible error: round off (0.1351397599) > -96928
Iteration: 30/77979067, Possible error: round off (0.1351397599) > -13599
Iteration: 31/77979067, Possible error: round off (0.1360927037) > -13599
Iteration: 32/77979067, Possible error: round off (0.1462716058) > -13599
Iteration: 33/77979067, Possible error: round off (0.1318965346) > -13599
Iteration: 34/77979067, Possible error: round off (0.1508120858) > -13599
Iteration: 35/77979067, Possible error: round off (0.1354844715) > -13599
Iteration: 36/77979067, Possible error: round off (0.1301119176) > -13599
Iteration: 37/77979067, Possible error: round off (0.141492689) > -13599
Iteration: 38/77979067, Possible error: round off (0.1330696416) > -13599
Iteration: 39/77979067, Possible error: round off (0.13587089) > -13599
Iteration: 40/77979067, Possible error: round off (0.1363885126) > -13599
Iteration: 41/77979067, Possible error: round off (0.1480976757) > -13599
Iteration: 42/77979067, Possible error: round off (0.1353371833) > -13599
Iteration: 43/77979067, Possible error: round off (0.1297843534) > -13599
Iteration: 44/77979067, Possible error: round off (0.1496594111) > -13599
Iteration: 45/77979067, Possible error: round off (0.1391846252) > -13599
Iteration: 46/77979067, Possible error: round off (0.1279003836) > -13599
Iteration: 47/77979067, Possible error: round off (0.142696835) > -13599
Iteration: 48/77979067, Possible error: round off (0.1402622526) > -13599
Iteration: 49/77979067, Possible error: round off (0.1429398311) > -13599
Iteration: 50/77979067, Possible error: round off (0.1426179457) > -13599
Iteration: 129/77979067, Possible error: round off (0.1429085169) > -13599
Iteration: 257/77979067, Possible error: round off (0.1594783021) > -13599[/CODE]

I restarted with version 29.4 and the program used a larger FFT and is running OK.

[Wed Oct 24 08:51:59 2018]
Trying 1000 iterations for exponent 77979067 using 4096K FFT.
If average roundoff error is above 0.143, then a larger FFT will be used.
Final average roundoff error is 0.2176, using 4480K FFT for exponent 77979067.

ATH 2018-10-24 14:31

You could try add FFT2=4480K to the worktodo.txt line for the 29.5 version like this:

PRP=<assignmentkey>,FFT2=4480K,1,2,77979067,-1

Chuck 2018-10-24 16:24

I made the post to point out that something is wrong with the new version.

Prime95 2018-10-24 18:37

[QUOTE=Chuck;498668]I made the post to point out that something is wrong with the new version.[/QUOTE]

I will investigate. It looks like the FFT was running fine -- the roundoff errors are reasonable. The stack variable containing the value to compare against was roached.

My gut reaction (I could well be wrong) is there is a memory corruption problem running multithreaded FFTs. You were running 5 threads per worker. tshinozk had a problem 10 cores 4 or 5 workers benchmark. Whereas, I've been running single threaded PRP tests for the last few months without an issue.


All times are UTC. The time now is 02:43.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.