![]() |
|
|
#23 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
5·11·137 Posts |
Yes, I should have mentioned this. The smallest AVX-512 FFT is 1K. There may be issues with propagating carries when there are only 3 or 4 bits per FFT word.
I intend to revert to AVX FFTs for small exponents, but I have not yet investigated where the crossover needs to be. |
|
|
|
|
|
#24 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
5·11·137 Posts |
Quote:
If 4M and 8M has better throughput than the next larger FFT size for you, that would be interesting. Note that anomalies such as slower than expected 4M and 8M timings needs to be looked into by me. They could indicate some macros that need more optimization, a memory layout problem, or a prefetching bug. |
|
|
|
|
|
|
#25 |
|
Sep 2003
5×11×47 Posts |
I wonder why exponents 3041 and 3547 do succeed eventually after failing the first few times. You'd think it would be an infinite loop. Is some parameter randomly changed each time the program tries to recover from a Gerbicz checksum error?
|
|
|
|
|
|
#26 | |
|
Einyen
Dec 2003
Denmark
2·1,579 Posts |
Quote:
295b3.txt The best benchmarks: [Main thread Oct 21 05:20:46] Timing 4096K FFT, 1 core hyperthreaded, 1 worker. Average times: 13.02 ms. Total throughput: 76.78 iter/sec. [Main thread Oct 21 05:21:24] Timing 4116K FFT, 1 core, 1 worker. Average times: 13.52 ms. Total throughput: 73.95 iter/sec. [Main thread Oct 21 05:22:52] Timing 4200K FFT, 1 core hyperthreaded, 1 worker. Average times: 13.80 ms. Total throughput: 72.49 iter/sec. (4116K FFT is not part of throughput benchmark either) The average iteration time during the 4M part: ms/iter: 13.041 after the switch to 4200K: ms/iter: 16.731 |
|
|
|
|
|
|
#27 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
1D6F16 Posts |
Quote:
Are you benching an Amazon ECX instance? |
|
|
|
|
|
|
#28 |
|
Einyen
Dec 2003
Denmark
1100010101102 Posts |
Yes this exponent is running on a c5d.large which has 2 vCPU, so it is just 1 core hyperthreaded.
HyperthreadLL=1 is a tiny bit faster (this is from another instance): Code:
With Hyperthreading: [Work thread Oct 21 10:25:06] Iteration: 43100000 / 77310307 [55.749358%], roundoff: 0.362, ms/iter: 13.917, ETA: 5d 12:15 [Work thread Oct 21 10:48:19] Iteration: 43200000 / 77310307 [55.878707%], roundoff: 0.362, ms/iter: 13.906, ETA: 5d 11:45 [Work thread Oct 21 11:11:30] Iteration: 43300000 / 77310307 [56.008055%], roundoff: 0.362, ms/iter: 13.890, ETA: 5d 11:13 [Work thread Oct 21 11:34:42] Iteration: 43400000 / 77310307 [56.137404%], roundoff: 0.362, ms/iter: 13.897, ETA: 5d 10:53 [Work thread Oct 21 11:57:53] Iteration: 43500000 / 77310307 [56.266753%], roundoff: 0.362, ms/iter: 13.896, ETA: 5d 10:30 [Work thread Oct 21 12:21:04] Iteration: 43600000 / 77310307 [56.396102%], roundoff: 0.362, ms/iter: 13.892, ETA: 5d 10:05 [Work thread Oct 21 12:44:16] Iteration: 43700000 / 77310307 [56.525451%], roundoff: 0.362, ms/iter: 13.893, ETA: 5d 09:42 [Work thread Oct 21 13:07:26] Iteration: 43800000 / 77310307 [56.654800%], roundoff: 0.362, ms/iter: 13.891, ETA: 5d 09:18 [Work thread Oct 21 13:30:38] Iteration: 43900000 / 77310307 [56.784149%], roundoff: 0.362, ms/iter: 13.894, ETA: 5d 08:56 [Work thread Oct 21 13:53:50] Iteration: 44000000 / 77310307 [56.913497%], roundoff: 0.362, ms/iter: 13.901, ETA: 5d 08:37 Without hyperthreading: [Work thread Oct 22 06:08:52] Iteration: 48100000 / 77310307 [62.216801%], roundoff: 0.321, ms/iter: 14.151, ETA: 4d 18:49 [Work thread Oct 22 06:32:30] Iteration: 48200000 / 77310307 [62.346150%], roundoff: 0.321, ms/iter: 14.153, ETA: 4d 18:26 [Work thread Oct 22 06:56:06] Iteration: 48300000 / 77310307 [62.475498%], roundoff: 0.336, ms/iter: 14.150, ETA: 4d 18:01 [Work thread Oct 22 07:19:49] Iteration: 48400000 / 77310307 [62.604847%], roundoff: 0.336, ms/iter: 14.199, ETA: 4d 18:01 [Work thread Oct 22 07:43:34] Iteration: 48500000 / 77310307 [62.734196%], roundoff: 0.336, ms/iter: 14.240, ETA: 4d 17:57 [Work thread Oct 22 08:07:20] Iteration: 48600000 / 77310307 [62.863545%], roundoff: 0.336, ms/iter: 14.239, ETA: 4d 17:33 [Work thread Oct 22 08:31:07] Iteration: 48700000 / 77310307 [62.992894%], roundoff: 0.336, ms/iter: 14.244, ETA: 4d 17:12 [Work thread Oct 22 08:54:53] Iteration: 48800000 / 77310307 [63.122243%], roundoff: 0.336, ms/iter: 14.245, ETA: 4d 16:48 [Work thread Oct 22 09:18:39] Iteration: 48900000 / 77310307 [63.251592%], roundoff: 0.336, ms/iter: 14.246, ETA: 4d 16:25 [Work thread Oct 22 09:42:26] Iteration: 49000000 / 77310307 [63.380940%], roundoff: 0.336, ms/iter: 14.247, ETA: 4d 16:02 Hyperthreading back on: [Work thread Oct 22 11:18:27] Iteration: 49400000 / 77310307 [63.898336%], roundoff: 0.318, ms/iter: 13.959, ETA: 4d 12:13 [Work thread Oct 22 11:41:46] Iteration: 49500000 / 77310307 [64.027685%], roundoff: 0.357, ms/iter: 13.972, ETA: 4d 11:56 [Work thread Oct 22 12:05:05] Iteration: 49600000 / 77310307 [64.157034%], roundoff: 0.357, ms/iter: 13.971, ETA: 4d 11:32 [Work thread Oct 22 12:28:25] Iteration: 49700000 / 77310307 [64.286382%], roundoff: 0.357, ms/iter: 13.975, ETA: 4d 11:10 [Work thread Oct 22 12:51:44] Iteration: 49800000 / 77310307 [64.415731%], roundoff: 0.357, ms/iter: 13.975, ETA: 4d 10:47 [Work thread Oct 22 13:15:03] Iteration: 49900000 / 77310307 [64.545080%], roundoff: 0.357, ms/iter: 13.980, ETA: 4d 10:26 [Work thread Oct 22 13:38:24] Iteration: 50000000 / 77310307 [64.674429%], roundoff: 0.357, ms/iter: 13.979, ETA: 4d 10:02 Last fiddled with by ATH on 2018-10-24 at 05:37 |
|
|
|
|
|
#29 |
|
May 2011
Orange Park, FL
3·5·59 Posts |
I was nearing the end of the PRP first test of 87255060 when the program began outputting error messages (I obscured the AID).
It then started outputting a continuous stream of messages for the next queued test, a PRP double-check of 77979067. Code:
[Wed Oct 24 07:43:41 2018]
Iteration: 87255060/87255083, Possible error: round off (0.2268580493) > -42387
Iteration: 87255060/87255083, Possible error: round off (0.1564778853) > -42387
Iteration: 87255062/87255083, Possible error: round off (0.2268580493) > -35310
Iteration: 87255065/87255083, Possible error: round off (0.2268580493) > -1.0911e+005
Iteration: 87255067/87255083, Possible error: round off (0.2268580493) > -87582
Iteration: 87255067/87255083, Possible error: round off (0.153231718) > -87582
Iteration: 87255068/87255083, Possible error: round off (0.2268580493) > -41809
Iteration: 87255069/87255083, Possible error: round off (0.2268580493) > -1.1253e+005
Iteration: 87255070/87255083, Possible error: round off (0.2268580493) > -14919
Iteration: 87255073/87255083, Possible error: round off (0.2268580493) > -64820
Iteration: 87255068/87255083, Possible error: round off (0.2149417585) > -1.2398e+005
Iteration: 87255069/87255083, Possible error: round off (0.2268580493) > -14680
Iteration: 87255073/87255083, Possible error: round off (0.2268580493) > -1.0284e+005
Iteration: 87255074/87255083, Possible error: round off (0.2268580493) > -35973
Iteration: 87255074/87255083, Possible error: round off (0.1575920773) > -35973
Iteration: 87255076/87255083, Possible error: round off (0.2268580493) > -1.1785e+005
Iteration: 87255080/87255083, Possible error: round off (0.2268580493) > -86994
Iteration: 87255081/87255083, Possible error: round off (0.2268580493) > -45048
Iteration: 87255083/87255083, Possible error: round off (0.2268580493) > -69856
Iteration: 87255076/87255083, Possible error: round off (0.2268580493) > -1.1785e+005
Iteration: 87255080/87255083, Possible error: round off (0.2268580493) > -86994
Iteration: 87255081/87255083, Possible error: round off (0.2268580493) > -45048
Iteration: 87255083/87255083, Possible error: round off (0.2268580493) > -69856
{"status":"C", "k":1, "b":2, "n":87255083, "c":-1, "worktype":"PRP-3", "res64":"B753D12F3E0435D3", "residue-type":1, "res2048":"F572805416335AB7767ED1208B6CA1873B96438C66EBD147B7DBC451144A4535265274A678657B36FDE4E19B4A6B9DC9697B68C7D5BE60F94063A9A09F5AEFF0980F97F832D641D64097C2CDA17225BE491E781AE684A5BD62BC3692670B3C22FED772058D7F8D3995A67DDC4D2F19F023DDF8A28A4B72D3CA70B9A4B7DF674F56B59DD4ACFD293F5E67CCD71D38D3CD57FC1AA3B45FF8A5E98A4D601708540D1EA06ADFB10D7B0589CEA026ED794B178904B5CC0B46F4B8B59131244D6952FED053C789CB41DB748DA1F676CAB6DAC26FA8FF41C895CBCCF4CB88FE6192F50290EBC1FF863B14FB9B75EAAB3E8A63D02CC9415078EC7070B753D12F3E0435D3", "fft-length":4718592, "shift-count":557107, "error-code":"00001700", "security-code":"C5B35D13", "program":{"name":"Prime95", "version":"29.5", "build":3, "port":4}, "timestamp":"2018-10-24 11:43:51", "errors":{"gerbicz":0}, "user":"jaxbuilder", "computer":"Maingear_i7-7800", "aid":"#################"}
Iteration: 1/77979067, Possible error: round off (0.1412162686) > 0
Iteration: 2/77979067, Possible error: round off (0.1367013401) > 0
Iteration: 3/77979067, Possible error: round off (0.1334150282) > 0
Iteration: 4/77979067, Possible error: round off (0.136214313) > 0
Iteration: 5/77979067, Possible error: round off (0.1355326745) > 0
Iteration: 6/77979067, Possible error: round off (0.1466572128) > 0
Iteration: 7/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 8/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 9/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 10/77979067, Possible error: round off (0.1399357129) > 0
Iteration: 11/77979067, Possible error: round off (0.135522705) > 0
Iteration: 12/77979067, Possible error: round off (0.1533570224) > 0
Iteration: 13/77979067, Possible error: round off (0.1372912802) > 0
Iteration: 14/77979067, Possible error: round off (0.1359849303) > 0
Iteration: 15/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 16/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 17/77979067, Possible error: round off (0.1423625708) > 0
Iteration: 18/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 19/77979067, Possible error: round off (0.1359548851) > 0
Iteration: 20/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 21/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 22/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 24/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 25/77979067, Possible error: round off (0.1351397599) > 0
Iteration: 26/77979067, Possible error: round off (0.1351397599) > -29938
Iteration: 27/77979067, Possible error: round off (0.1351397599) > -1.2351e+005
Iteration: 28/77979067, Possible error: round off (0.1351397599) > -22360
Iteration: 30/77979067, Possible error: round off (0.1351397599) > -88632
Iteration: 31/77979067, Possible error: round off (0.1484038053) > -88632
Iteration: 32/77979067, Possible error: round off (0.1475221147) > -88632
Iteration: 33/77979067, Possible error: round off (0.1330651214) > -88632
Iteration: 34/77979067, Possible error: round off (0.1481876628) > -88632
Iteration: 35/77979067, Possible error: round off (0.137989409) > -88632
Iteration: 36/77979067, Possible error: round off (0.1441307379) > -88632
Iteration: 37/77979067, Possible error: round off (0.1288571716) > -88632
Iteration: 38/77979067, Possible error: round off (0.1533668243) > -88632
Iteration: 39/77979067, Possible error: round off (0.1419894314) > -88632
Iteration: 40/77979067, Possible error: round off (0.1306012338) > -88632
Iteration: 41/77979067, Possible error: round off (0.1453630195) > -88632
Iteration: 42/77979067, Possible error: round off (0.1452567279) > -88632
Iteration: 43/77979067, Possible error: round off (0.1455717164) > -88632
Iteration: 44/77979067, Possible error: round off (0.150328931) > -88632
|
|
|
|
|
|
#30 |
|
May 2011
Orange Park, FL
3·5·59 Posts |
I deleted the backup files and restarted the PRP double check of 77979067. I immediately got a stream or errors.
Starting Gerbicz error-checking PRP test of M77979067 using AVX-512 FFT length 4200K, Pass1=1920, Pass2=2240, clm=1, 5 threads Code:
[Wed Oct 24 08:44:28 2018] Iteration: 1/77979067, Possible error: round off (0.1569584618) > 0 Iteration: 2/77979067, Possible error: round off (0.14001889) > 0 Iteration: 3/77979067, Possible error: round off (0.1397896085) > 0 Iteration: 4/77979067, Possible error: round off (0.1351397599) > 0 Iteration: 5/77979067, Possible error: round off (0.1351397599) > 0 Iteration: 6/77979067, Possible error: round off (0.1334150282) > 0 Iteration: 7/77979067, Possible error: round off (0.1334150282) > 0 Iteration: 8/77979067, Possible error: round off (0.1688922839) > 0 Iteration: 9/77979067, Possible error: round off (0.1351397599) > 0 Iteration: 10/77979067, Possible error: round off (0.1401436378) > 0 Iteration: 11/77979067, Possible error: round off (0.1356045784) > 0 Iteration: 12/77979067, Possible error: round off (0.1440369386) > 0 Iteration: 13/77979067, Possible error: round off (0.1351397599) > 0 Iteration: 14/77979067, Possible error: round off (0.1471612118) > 0 Iteration: 15/77979067, Possible error: round off (0.1351397599) > 0 Iteration: 16/77979067, Possible error: round off (0.1373752097) > 0 Iteration: 17/77979067, Possible error: round off (0.1351397599) > 0 Iteration: 18/77979067, Possible error: round off (0.1383202986) > 0 Iteration: 19/77979067, Possible error: round off (0.1351397599) > 0 Iteration: 20/77979067, Possible error: round off (0.1351397599) > 0 Iteration: 21/77979067, Possible error: round off (0.1356798274) > 0 Iteration: 22/77979067, Possible error: round off (0.1351397599) > 0 Iteration: 23/77979067, Possible error: round off (0.1351397599) > 0 Iteration: 24/77979067, Possible error: round off (0.1334150282) > 0 Iteration: 26/77979067, Possible error: round off (0.1351397599) > -23943 Iteration: 27/77979067, Possible error: round off (0.1351397599) > -96928 Iteration: 30/77979067, Possible error: round off (0.1351397599) > -13599 Iteration: 31/77979067, Possible error: round off (0.1360927037) > -13599 Iteration: 32/77979067, Possible error: round off (0.1462716058) > -13599 Iteration: 33/77979067, Possible error: round off (0.1318965346) > -13599 Iteration: 34/77979067, Possible error: round off (0.1508120858) > -13599 Iteration: 35/77979067, Possible error: round off (0.1354844715) > -13599 Iteration: 36/77979067, Possible error: round off (0.1301119176) > -13599 Iteration: 37/77979067, Possible error: round off (0.141492689) > -13599 Iteration: 38/77979067, Possible error: round off (0.1330696416) > -13599 Iteration: 39/77979067, Possible error: round off (0.13587089) > -13599 Iteration: 40/77979067, Possible error: round off (0.1363885126) > -13599 Iteration: 41/77979067, Possible error: round off (0.1480976757) > -13599 Iteration: 42/77979067, Possible error: round off (0.1353371833) > -13599 Iteration: 43/77979067, Possible error: round off (0.1297843534) > -13599 Iteration: 44/77979067, Possible error: round off (0.1496594111) > -13599 Iteration: 45/77979067, Possible error: round off (0.1391846252) > -13599 Iteration: 46/77979067, Possible error: round off (0.1279003836) > -13599 Iteration: 47/77979067, Possible error: round off (0.142696835) > -13599 Iteration: 48/77979067, Possible error: round off (0.1402622526) > -13599 Iteration: 49/77979067, Possible error: round off (0.1429398311) > -13599 Iteration: 50/77979067, Possible error: round off (0.1426179457) > -13599 Iteration: 129/77979067, Possible error: round off (0.1429085169) > -13599 Iteration: 257/77979067, Possible error: round off (0.1594783021) > -13599 [Wed Oct 24 08:51:59 2018] Trying 1000 iterations for exponent 77979067 using 4096K FFT. If average roundoff error is above 0.143, then a larger FFT will be used. Final average roundoff error is 0.2176, using 4480K FFT for exponent 77979067. Last fiddled with by Chuck on 2018-10-24 at 12:54 Reason: Restarted with version 29.4 |
|
|
|
|
|
#31 |
|
Einyen
Dec 2003
Denmark
315810 Posts |
You could try add FFT2=4480K to the worktodo.txt line for the 29.5 version like this:
PRP=<assignmentkey>,FFT2=4480K,1,2,77979067,-1 |
|
|
|
|
|
#32 |
|
May 2011
Orange Park, FL
3·5·59 Posts |
I made the post to point out that something is wrong with the new version.
|
|
|
|
|
|
#33 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
5·11·137 Posts |
Quote:
My gut reaction (I could well be wrong) is there is a memory corruption problem running multithreaded FFTs. You were running 5 threads per worker. tshinozk had a problem 10 cores 4 or 5 workers benchmark. Whereas, I've been running single threaded PRP tests for the last few months without an issue. |
|
|
|
|