mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2018-10-23, 23:26   #23
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5·11·137 Posts
Default

Quote:
Originally Posted by GP2 View Post
With 29.5 there are problems with very small exponents..
Yes, I should have mentioned this. The smallest AVX-512 FFT is 1K. There may be issues with propagating carries when there are only 3 or 4 bits per FFT word.

I intend to revert to AVX FFTs for small exponents, but I have not yet investigated where the crossover needs to be.
Prime95 is online now   Reply With Quote
Old 2018-10-23, 23:36   #24
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5·11·137 Posts
Default

Quote:
Originally Posted by ATH View Post
Throughput benchmark when you choose "Limit FFT sizes (mimic older benchmarking code) (N):" to No, it still skips 4096K (4M) and 8192K (8M) FFTs, even though it clearly uses 4M AVX512 FFT, see previous post.

If you choose "Limit FFT sizes (mimic older benchmarking code) (N):" to Yes it only tests 1 FFT size and then stops even though a large range was chosen.
Believe it or not this is expected. The 4M and 8M FFT sizes exist and you can force prime95 to use them with the FFT2= worktodo trick. However, the next larger FFT size gets more throughput (for me at least). Thus, the default setup for prime95 is to not use the 4M and 8M FFT sizes. You can benchmark these sizes by selecting the bench-all-implementations option.

If 4M and 8M has better throughput than the next larger FFT size for you, that would be interesting.

Note that anomalies such as slower than expected 4M and 8M timings needs to be looked into by me. They could indicate some macros that need more optimization, a memory layout problem, or a prefetching bug.
Prime95 is online now   Reply With Quote
Old 2018-10-24, 01:02   #25
GP2
 
GP2's Avatar
 
Sep 2003

5×11×47 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Yes, I should have mentioned this. The smallest AVX-512 FFT is 1K. There may be issues with propagating carries when there are only 3 or 4 bits per FFT word.
I wonder why exponents 3041 and 3547 do succeed eventually after failing the first few times. You'd think it would be an infinite loop. Is some parameter randomly changed each time the program tries to recover from a Gerbicz checksum error?
GP2 is offline   Reply With Quote
Old 2018-10-24, 05:20   #26
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2·1,579 Posts
Default

Quote:
Originally Posted by Prime95 View Post
If 4M and 8M has better throughput than the next larger FFT size for you, that would be interesting.
It seems to be better at 4M if you look at the log I linked:
295b3.txt

The best benchmarks:
[Main thread Oct 21 05:20:46] Timing 4096K FFT, 1 core hyperthreaded, 1 worker. Average times: 13.02 ms. Total throughput: 76.78 iter/sec.
[Main thread Oct 21 05:21:24] Timing 4116K FFT, 1 core, 1 worker. Average times: 13.52 ms. Total throughput: 73.95 iter/sec.
[Main thread Oct 21 05:22:52] Timing 4200K FFT, 1 core hyperthreaded, 1 worker. Average times: 13.80 ms. Total throughput: 72.49 iter/sec.

(4116K FFT is not part of throughput benchmark either)

The average iteration time during the 4M part: ms/iter: 13.041
after the switch to 4200K: ms/iter: 16.731
ATH is offline   Reply With Quote
Old 2018-10-24, 05:26   #27
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1D6F16 Posts
Default

Quote:
Originally Posted by ATH View Post
The best benchmarks:
[Main thread Oct 21 05:20:46] Timing 4096K FFT, 1 core hyperthreaded, 1 worker. Average times: 13.02 ms. Total throughput: 76.78 iter/sec.
[Main thread Oct 21 05:21:24] Timing 4116K FFT, 1 core, 1 worker. Average times: 13.52 ms. Total throughput: 73.95 iter/sec.
I based my default FFT selection on the throughput benchmark using 8 cores (of my 8 core SkylakeX).

Are you benching an Amazon ECX instance?
Prime95 is online now   Reply With Quote
Old 2018-10-24, 05:35   #28
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

1100010101102 Posts
Default

Yes this exponent is running on a c5d.large which has 2 vCPU, so it is just 1 core hyperthreaded.

HyperthreadLL=1 is a tiny bit faster (this is from another instance):

Code:
With Hyperthreading:
[Work thread Oct 21 10:25:06] Iteration: 43100000 / 77310307 [55.749358%], roundoff: 0.362, ms/iter: 13.917, ETA: 5d 12:15
[Work thread Oct 21 10:48:19] Iteration: 43200000 / 77310307 [55.878707%], roundoff: 0.362, ms/iter: 13.906, ETA: 5d 11:45
[Work thread Oct 21 11:11:30] Iteration: 43300000 / 77310307 [56.008055%], roundoff: 0.362, ms/iter: 13.890, ETA: 5d 11:13
[Work thread Oct 21 11:34:42] Iteration: 43400000 / 77310307 [56.137404%], roundoff: 0.362, ms/iter: 13.897, ETA: 5d 10:53
[Work thread Oct 21 11:57:53] Iteration: 43500000 / 77310307 [56.266753%], roundoff: 0.362, ms/iter: 13.896, ETA: 5d 10:30
[Work thread Oct 21 12:21:04] Iteration: 43600000 / 77310307 [56.396102%], roundoff: 0.362, ms/iter: 13.892, ETA: 5d 10:05
[Work thread Oct 21 12:44:16] Iteration: 43700000 / 77310307 [56.525451%], roundoff: 0.362, ms/iter: 13.893, ETA: 5d 09:42
[Work thread Oct 21 13:07:26] Iteration: 43800000 / 77310307 [56.654800%], roundoff: 0.362, ms/iter: 13.891, ETA: 5d 09:18
[Work thread Oct 21 13:30:38] Iteration: 43900000 / 77310307 [56.784149%], roundoff: 0.362, ms/iter: 13.894, ETA: 5d 08:56
[Work thread Oct 21 13:53:50] Iteration: 44000000 / 77310307 [56.913497%], roundoff: 0.362, ms/iter: 13.901, ETA: 5d 08:37

Without hyperthreading:
[Work thread Oct 22 06:08:52] Iteration: 48100000 / 77310307 [62.216801%], roundoff: 0.321, ms/iter: 14.151, ETA: 4d 18:49
[Work thread Oct 22 06:32:30] Iteration: 48200000 / 77310307 [62.346150%], roundoff: 0.321, ms/iter: 14.153, ETA: 4d 18:26
[Work thread Oct 22 06:56:06] Iteration: 48300000 / 77310307 [62.475498%], roundoff: 0.336, ms/iter: 14.150, ETA: 4d 18:01
[Work thread Oct 22 07:19:49] Iteration: 48400000 / 77310307 [62.604847%], roundoff: 0.336, ms/iter: 14.199, ETA: 4d 18:01
[Work thread Oct 22 07:43:34] Iteration: 48500000 / 77310307 [62.734196%], roundoff: 0.336, ms/iter: 14.240, ETA: 4d 17:57
[Work thread Oct 22 08:07:20] Iteration: 48600000 / 77310307 [62.863545%], roundoff: 0.336, ms/iter: 14.239, ETA: 4d 17:33
[Work thread Oct 22 08:31:07] Iteration: 48700000 / 77310307 [62.992894%], roundoff: 0.336, ms/iter: 14.244, ETA: 4d 17:12
[Work thread Oct 22 08:54:53] Iteration: 48800000 / 77310307 [63.122243%], roundoff: 0.336, ms/iter: 14.245, ETA: 4d 16:48
[Work thread Oct 22 09:18:39] Iteration: 48900000 / 77310307 [63.251592%], roundoff: 0.336, ms/iter: 14.246, ETA: 4d 16:25
[Work thread Oct 22 09:42:26] Iteration: 49000000 / 77310307 [63.380940%], roundoff: 0.336, ms/iter: 14.247, ETA: 4d 16:02

Hyperthreading back on:
[Work thread Oct 22 11:18:27] Iteration: 49400000 / 77310307 [63.898336%], roundoff: 0.318, ms/iter: 13.959, ETA: 4d 12:13
[Work thread Oct 22 11:41:46] Iteration: 49500000 / 77310307 [64.027685%], roundoff: 0.357, ms/iter: 13.972, ETA: 4d 11:56
[Work thread Oct 22 12:05:05] Iteration: 49600000 / 77310307 [64.157034%], roundoff: 0.357, ms/iter: 13.971, ETA: 4d 11:32
[Work thread Oct 22 12:28:25] Iteration: 49700000 / 77310307 [64.286382%], roundoff: 0.357, ms/iter: 13.975, ETA: 4d 11:10
[Work thread Oct 22 12:51:44] Iteration: 49800000 / 77310307 [64.415731%], roundoff: 0.357, ms/iter: 13.975, ETA: 4d 10:47
[Work thread Oct 22 13:15:03] Iteration: 49900000 / 77310307 [64.545080%], roundoff: 0.357, ms/iter: 13.980, ETA: 4d 10:26
[Work thread Oct 22 13:38:24] Iteration: 50000000 / 77310307 [64.674429%], roundoff: 0.357, ms/iter: 13.979, ETA: 4d 10:02

Last fiddled with by ATH on 2018-10-24 at 05:37
ATH is offline   Reply With Quote
Old 2018-10-24, 12:42   #29
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

3·5·59 Posts
Default Program error messages this morning

I was nearing the end of the PRP first test of 87255060 when the program began outputting error messages (I obscured the AID).

It then started outputting a continuous stream of messages for the next queued test, a PRP double-check of 77979067.

Code:
[Wed Oct 24 07:43:41 2018]
Iteration: 87255060/87255083, Possible error: round off (0.2268580493) > -42387
Iteration: 87255060/87255083, Possible error: round off (0.1564778853) > -42387
Iteration: 87255062/87255083, Possible error: round off (0.2268580493) > -35310
Iteration: 87255065/87255083, Possible error: round off (0.2268580493) > -1.0911e+005
Iteration: 87255067/87255083, Possible error: round off (0.2268580493) > -87582
Iteration: 87255067/87255083, Possible error: round off (0.153231718) > -87582
Iteration: 87255068/87255083, Possible error: round off (0.2268580493) > -41809
Iteration: 87255069/87255083, Possible error: round off (0.2268580493) > -1.1253e+005
Iteration: 87255070/87255083, Possible error: round off (0.2268580493) > -14919
Iteration: 87255073/87255083, Possible error: round off (0.2268580493) > -64820
Iteration: 87255068/87255083, Possible error: round off (0.2149417585) > -1.2398e+005
Iteration: 87255069/87255083, Possible error: round off (0.2268580493) > -14680
Iteration: 87255073/87255083, Possible error: round off (0.2268580493) > -1.0284e+005
Iteration: 87255074/87255083, Possible error: round off (0.2268580493) > -35973
Iteration: 87255074/87255083, Possible error: round off (0.1575920773) > -35973
Iteration: 87255076/87255083, Possible error: round off (0.2268580493) > -1.1785e+005
Iteration: 87255080/87255083, Possible error: round off (0.2268580493) > -86994
Iteration: 87255081/87255083, Possible error: round off (0.2268580493) > -45048
Iteration: 87255083/87255083, Possible error: round off (0.2268580493) > -69856
Iteration: 87255076/87255083, Possible error: round off (0.2268580493) > -1.1785e+005
Iteration: 87255080/87255083, Possible error: round off (0.2268580493) > -86994
Iteration: 87255081/87255083, Possible error: round off (0.2268580493) > -45048
Iteration: 87255083/87255083, Possible error: round off (0.2268580493) > -69856
{"status":"C", "k":1, "b":2, "n":87255083, "c":-1, "worktype":"PRP-3", "res64":"B753D12F3E0435D3", "residue-type":1, "res2048":"F572805416335AB7767ED1208B6CA1873B96438C66EBD147B7DBC451144A4535265274A678657B36FDE4E19B4A6B9DC9697B68C7D5BE60F94063A9A09F5AEFF0980F97F832D641D64097C2CDA17225BE491E781AE684A5BD62BC3692670B3C22FED772058D7F8D3995A67DDC4D2F19F023DDF8A28A4B72D3CA70B9A4B7DF674F56B59DD4ACFD293F5E67CCD71D38D3CD57FC1AA3B45FF8A5E98A4D601708540D1EA06ADFB10D7B0589CEA026ED794B178904B5CC0B46F4B8B59131244D6952FED053C789CB41DB748DA1F676CAB6DAC26FA8FF41C895CBCCF4CB88FE6192F50290EBC1FF863B14FB9B75EAAB3E8A63D02CC9415078EC7070B753D12F3E0435D3", "fft-length":4718592, "shift-count":557107, "error-code":"00001700", "security-code":"C5B35D13", "program":{"name":"Prime95", "version":"29.5", "build":3, "port":4}, "timestamp":"2018-10-24 11:43:51", "errors":{"gerbicz":0}, "user":"jaxbuilder", "computer":"Maingear_i7-7800", "aid":"#################"}
Iteration: 1/77979067, Possible error: round off (0.1412162686) > 0
Iteration: 2/77979067, Possible error: round off (0.1367013401) > 0
Iteration: 3/77979067, Possible error: round off (0.1334150282) > 0
Iteration: 4/77979067, Possible error: round off (0.136214313) > 0
Iteration: 5/77979067, Possible error: round off (0.1355326745) > 0
Iteration: 6/77979067, Possible error: round off (0.1466572128) > 0
Iteration: 7/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 8/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 9/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 10/77979067, Possible error: round off (0.1399357129) > 0
Iteration: 11/77979067, Possible error: round off (0.135522705) > 0
Iteration: 12/77979067, Possible error: round off (0.1533570224) > 0
Iteration: 13/77979067, Possible error: round off (0.1372912802) > 0
Iteration: 14/77979067, Possible error: round off (0.1359849303) > 0
Iteration: 15/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 16/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 17/77979067, Possible error: round off (0.1423625708) > 0
Iteration: 18/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 19/77979067, Possible error: round off (0.1359548851) > 0
Iteration: 20/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 21/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 22/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 24/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 25/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 26/77979067, Possible error: round off (0.1351397599) > -29938
Iteration: 27/77979067, Possible error: round off (0.1351397599) > -1.2351e+005
Iteration: 28/77979067, Possible error: round off (0.1351397599) > -22360
Iteration: 30/77979067, Possible error: round off (0.1351397599) > -88632
Iteration: 31/77979067, Possible error: round off (0.1484038053) > -88632
Iteration: 32/77979067, Possible error: round off (0.1475221147) > -88632
Iteration: 33/77979067, Possible error: round off (0.1330651214) > -88632
Iteration: 34/77979067, Possible error: round off (0.1481876628) > -88632
Iteration: 35/77979067, Possible error: round off (0.137989409) > -88632
Iteration: 36/77979067, Possible error: round off (0.1441307379) > -88632
Iteration: 37/77979067, Possible error: round off (0.1288571716) > -88632
Iteration: 38/77979067, Possible error: round off (0.1533668243) > -88632
Iteration: 39/77979067, Possible error: round off (0.1419894314) > -88632
Iteration: 40/77979067, Possible error: round off (0.1306012338) > -88632
Iteration: 41/77979067, Possible error: round off (0.1453630195) > -88632
Iteration: 42/77979067, Possible error: round off (0.1452567279) > -88632
Iteration: 43/77979067, Possible error: round off (0.1455717164) > -88632
Iteration: 44/77979067, Possible error: round off (0.150328931) > -88632
Chuck is offline   Reply With Quote
Old 2018-10-24, 12:50   #30
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

3·5·59 Posts
Default Error on PRP double check

I deleted the backup files and restarted the PRP double check of 77979067. I immediately got a stream or errors.

Starting Gerbicz error-checking PRP test of M77979067 using AVX-512 FFT length 4200K, Pass1=1920, Pass2=2240, clm=1, 5 threads

Code:
[Wed Oct 24 08:44:28 2018]
Iteration: 1/77979067, Possible error: round off (0.1569584618) > 0
Iteration: 2/77979067, Possible error: round off (0.14001889) > 0
Iteration: 3/77979067, Possible error: round off (0.1397896085) > 0
Iteration: 4/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 5/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 6/77979067, Possible error: round off (0.1334150282) > 0
Iteration: 7/77979067, Possible error: round off (0.1334150282) > 0
Iteration: 8/77979067, Possible error: round off (0.1688922839) > 0
Iteration: 9/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 10/77979067, Possible error: round off (0.1401436378) > 0
Iteration: 11/77979067, Possible error: round off (0.1356045784) > 0
Iteration: 12/77979067, Possible error: round off (0.1440369386) > 0
Iteration: 13/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 14/77979067, Possible error: round off (0.1471612118) > 0
Iteration: 15/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 16/77979067, Possible error: round off (0.1373752097) > 0
Iteration: 17/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 18/77979067, Possible error: round off (0.1383202986) > 0
Iteration: 19/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 20/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 21/77979067, Possible error: round off (0.1356798274) > 0
Iteration: 22/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 23/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 24/77979067, Possible error: round off (0.1334150282) > 0
Iteration: 26/77979067, Possible error: round off (0.1351397599) > -23943
Iteration: 27/77979067, Possible error: round off (0.1351397599) > -96928
Iteration: 30/77979067, Possible error: round off (0.1351397599) > -13599
Iteration: 31/77979067, Possible error: round off (0.1360927037) > -13599
Iteration: 32/77979067, Possible error: round off (0.1462716058) > -13599
Iteration: 33/77979067, Possible error: round off (0.1318965346) > -13599
Iteration: 34/77979067, Possible error: round off (0.1508120858) > -13599
Iteration: 35/77979067, Possible error: round off (0.1354844715) > -13599
Iteration: 36/77979067, Possible error: round off (0.1301119176) > -13599
Iteration: 37/77979067, Possible error: round off (0.141492689) > -13599
Iteration: 38/77979067, Possible error: round off (0.1330696416) > -13599
Iteration: 39/77979067, Possible error: round off (0.13587089) > -13599
Iteration: 40/77979067, Possible error: round off (0.1363885126) > -13599
Iteration: 41/77979067, Possible error: round off (0.1480976757) > -13599
Iteration: 42/77979067, Possible error: round off (0.1353371833) > -13599
Iteration: 43/77979067, Possible error: round off (0.1297843534) > -13599
Iteration: 44/77979067, Possible error: round off (0.1496594111) > -13599
Iteration: 45/77979067, Possible error: round off (0.1391846252) > -13599
Iteration: 46/77979067, Possible error: round off (0.1279003836) > -13599
Iteration: 47/77979067, Possible error: round off (0.142696835) > -13599
Iteration: 48/77979067, Possible error: round off (0.1402622526) > -13599
Iteration: 49/77979067, Possible error: round off (0.1429398311) > -13599
Iteration: 50/77979067, Possible error: round off (0.1426179457) > -13599
Iteration: 129/77979067, Possible error: round off (0.1429085169) > -13599
Iteration: 257/77979067, Possible error: round off (0.1594783021) > -13599
I restarted with version 29.4 and the program used a larger FFT and is running OK.

[Wed Oct 24 08:51:59 2018]
Trying 1000 iterations for exponent 77979067 using 4096K FFT.
If average roundoff error is above 0.143, then a larger FFT will be used.
Final average roundoff error is 0.2176, using 4480K FFT for exponent 77979067.

Last fiddled with by Chuck on 2018-10-24 at 12:54 Reason: Restarted with version 29.4
Chuck is offline   Reply With Quote
Old 2018-10-24, 14:31   #31
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

315810 Posts
Default

You could try add FFT2=4480K to the worktodo.txt line for the 29.5 version like this:

PRP=<assignmentkey>,FFT2=4480K,1,2,77979067,-1
ATH is offline   Reply With Quote
Old 2018-10-24, 16:24   #32
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

3·5·59 Posts
Default

I made the post to point out that something is wrong with the new version.
Chuck is offline   Reply With Quote
Old 2018-10-24, 18:37   #33
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

5·11·137 Posts
Default

Quote:
Originally Posted by Chuck View Post
I made the post to point out that something is wrong with the new version.
I will investigate. It looks like the FFT was running fine -- the roundoff errors are reasonable. The stack variable containing the value to compare against was roached.

My gut reaction (I could well be wrong) is there is a memory corruption problem running multithreaded FFTs. You were running 5 threads per worker. tshinozk had a problem 10 cores 4 or 5 workers benchmark. Whereas, I've been running single threaded PRP tests for the last few months without an issue.
Prime95 is online now   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 18:20.


Sun Aug 1 18:20:33 UTC 2021 up 9 days, 12:49, 0 users, load averages: 3.02, 3.02, 2.78

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.