View Single Post
Old 2019-02-18, 20:50   #5
GP2
 
GP2's Avatar
 
Sep 2003

29×89 Posts
Default

I am doing PRP tests of Wagstaff exponents.

Several dozen machines resumed 29.5 savefiles without a problem, but one has a problem that seems similar to the one reported by Simon.

All of the exponents on all of the machines seem to have a problem similar to the one reported by Simon.

I tried moving away the 29.5 savefiles and starting the same exponent from scratch with 29.6, but the same problem occurred.

The Wagstaff exponent in question is 9081307 (edit: it's not just this exponent). I am testing another exponent now to see if the problem is the exponent or the machine.


The exponent passes the Gerbicz error check at iteration p−1 (9081306), but then fails somehow in the final processing. It continues in an infinite error loop and does not attempt to resume from earlier savefiles. Only when I interrupt the program with SIGINT does it start to process earlier savefiles, but then it terminates right after, obviously.

When I manually delete the more recent savefiles and try to resume from older savefiles (at iteration 9 million and at 8 million), the same problem happens: success until iteration p−1 and then the same infinite error loop.


Code:
PRP=1,2,9081307,1,"3"
Code:
WorkerThreads=1
CoresPerTest=2
HyperthreadLL=1
Code:
PRPBase=3
PRPResidueType=5
results.txt:

Code:
[Mon Feb 18 05:44:04 2019]
ERROR: Comparing PRP double-check values failed.  Rolling back to iteration 9081306.
Continuing from last save file.
ERROR: Comparing PRP double-check values failed.  Rolling back to iteration 9081306.
Continuing from last save file.
ERROR: Comparing PRP double-check values failed.  Rolling back to iteration 9081306.
Continuing from last save file.
ERROR: Comparing PRP double-check values failed.  Rolling back to iteration 9081306.
Continuing from last save file.
...
(about 80,000 lines like this, and growing rapidly)
I sent a SIGINT to stop the program. Then the following lines appeared at the bottom of the results.txt file, after all the tens of thousands of ERROR lines:

Code:
Error reading intermediate file: p9081307
Renaming p9081307 to p9081307.bad1
Trying backup intermediate file: p9081307.bu
The rename succeeded, but obviously the program did terminate right after from the SIGINT.


I tried restarting from each savefile (at iterations 8 million, 9 million, and higher), and all of them eventually gave the same error.

For instance, resuming from iteration 9 million (the .bu3 file):

Code:
$ ./mprime -d
[Main thread Feb 18 18:23] Mersenne number primality test program version 29.6
[Main thread Feb 18 18:23] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 2x1 MB, L3 cache size: 25344 KB
[Main thread Feb 18 18:23] Starting worker.
[Work thread Feb 18 18:23] Worker starting
[Work thread Feb 18 18:23] Setting affinity to run worker on CPU core #1
[Work thread Feb 18 18:23] Setting affinity to run helper thread 1 on CPU core #1
[Work thread Feb 18 18:23] Setting affinity to run helper thread 3 on CPU core #2
[Work thread Feb 18 18:23] Setting affinity to run helper thread 2 on CPU core #2
[Work thread Feb 18 18:23] Trying backup intermediate file: p9081307.bu3
[Work thread Feb 18 18:23] Resuming Gerbicz error-checking PRP test of (2^9081307+1)/3 using all-complex AVX-512 FFT length 480K, Pass1=128, Pass2=3840, clm=2, 4 threads
[Work thread Feb 18 18:23] Iteration: 9000001 / 9081307 [99.10%].
[Work thread Feb 18 18:23] Iteration: 9010000 / 9081307 [99.21%], ms/iter:  0.712, ETA: 00:00:50
[Work thread Feb 18 18:23] Iteration: 9020000 / 9081307 [99.32%], ms/iter:  0.710, ETA: 00:00:43
[Work thread Feb 18 18:23] Iteration: 9030000 / 9081307 [99.43%], ms/iter:  0.710, ETA: 00:00:36
[Work thread Feb 18 18:23] Iteration: 9040000 / 9081307 [99.54%], ms/iter:  0.712, ETA: 00:00:29
[Work thread Feb 18 18:23] Iteration: 9050000 / 9081307 [99.65%], ms/iter:  0.710, ETA: 00:00:22
[Work thread Feb 18 18:23] Iteration: 9060000 / 9081307 [99.76%], ms/iter:  0.710, ETA: 00:00:15
[Work thread Feb 18 18:23] Iteration: 9070000 / 9081307 [99.87%], ms/iter:  0.711, ETA: 00:00:08
[Work thread Feb 18 18:23] Iteration: 9080000 / 9081307 [99.98%], ms/iter:  0.711, ETA: 00:00:00
[Work thread Feb 18 18:24] Gerbicz error check passed at iteration 9081225.
[Work thread Feb 18 18:24] Gerbicz error check passed at iteration 9081306.
[Work thread Feb 18 18:24] ERROR: Comparing PRP double-check values failed.  Rolling back to iteration 9081306.
[Work thread Feb 18 18:24] Continuing from last save file.
[Work thread Feb 18 18:24] Setting affinity to run helper thread 1 on CPU core #1
[Work thread Feb 18 18:24] Setting affinity to run helper thread 3 on CPU core #2
[Work thread Feb 18 18:24] Setting affinity to run helper thread 2 on CPU core #2
[Work thread Feb 18 18:24] Trying backup intermediate file: p9081307.bu3
[Work thread Feb 18 18:24] Resuming Gerbicz error-checking PRP test of (2^9081307+1)/3 using all-complex AVX-512 FFT length 480K, Pass1=128, Pass2=3840, clm=2, 4 threads
[Work thread Feb 18 18:24] Iteration: 9000001 / 9081307 [99.10%].
[Work thread Feb 18 18:24] Hardware errors have occurred during the test!
[Work thread Feb 18 18:24] 1 Gerbicz/double-check error.
[Work thread Feb 18 18:24] Confidence in final result is excellent.
[Work thread Feb 18 18:24] Iteration: 9010000 / 9081307 [99.21%], ms/iter:  0.714, ETA: 00:00:50
[Work thread Feb 18 18:24] Hardware errors have occurred during the test!
[Work thread Feb 18 18:24] 1 Gerbicz/double-check error.
[Work thread Feb 18 18:24] Confidence in final result is excellent.
[Work thread Feb 18 18:24] Iteration: 9020000 / 9081307 [99.32%], ms/iter:  0.713, ETA: 00:00:43
[Work thread Feb 18 18:24] Hardware errors have occurred during the test!
[Work thread Feb 18 18:24] 1 Gerbicz/double-check error.
[Work thread Feb 18 18:24] Confidence in final result is excellent.
[Work thread Feb 18 18:24] Iteration: 9030000 / 9081307 [99.43%], ms/iter:  0.712, ETA: 00:00:36
[Work thread Feb 18 18:24] Hardware errors have occurred during the test!
[Work thread Feb 18 18:24] 1 Gerbicz/double-check error.
[Work thread Feb 18 18:24] Confidence in final result is excellent.
[Work thread Feb 18 18:24] Iteration: 9040000 / 9081307 [99.54%], ms/iter:  0.713, ETA: 00:00:29
[Work thread Feb 18 18:24] Hardware errors have occurred during the test!
[Work thread Feb 18 18:24] 1 Gerbicz/double-check error.
[Work thread Feb 18 18:24] Confidence in final result is excellent.
[Work thread Feb 18 18:24] Iteration: 9050000 / 9081307 [99.65%], ms/iter:  0.713, ETA: 00:00:22
[Work thread Feb 18 18:24] Hardware errors have occurred during the test!
[Work thread Feb 18 18:24] 1 Gerbicz/double-check error.
[Work thread Feb 18 18:24] Confidence in final result is excellent.
[Work thread Feb 18 18:24] Iteration: 9060000 / 9081307 [99.76%], ms/iter:  0.714, ETA: 00:00:15
[Work thread Feb 18 18:24] Hardware errors have occurred during the test!
[Work thread Feb 18 18:24] 1 Gerbicz/double-check error.
[Work thread Feb 18 18:24] Confidence in final result is excellent.
[Work thread Feb 18 18:24] Iteration: 9070000 / 9081307 [99.87%], ms/iter:  0.715, ETA: 00:00:08
[Work thread Feb 18 18:24] Hardware errors have occurred during the test!
[Work thread Feb 18 18:24] 1 Gerbicz/double-check error.
[Work thread Feb 18 18:24] Confidence in final result is excellent.
[Work thread Feb 18 18:24] Iteration: 9080000 / 9081307 [99.98%], ms/iter:  0.713, ETA: 00:00:00
[Work thread Feb 18 18:24] Hardware errors have occurred during the test!
[Work thread Feb 18 18:24] 1 Gerbicz/double-check error.
[Work thread Feb 18 18:24] Confidence in final result is excellent.
[Work thread Feb 18 18:24] Gerbicz error check passed at iteration 9081225.
[Work thread Feb 18 18:24] Gerbicz error check passed at iteration 9081306.
[Work thread Feb 18 18:24] ERROR: Comparing PRP double-check values failed.  Rolling back to iteration 9081306.
[Work thread Feb 18 18:24] Continuing from last save file.
[Work thread Feb 18 18:24] Setting affinity to run helper thread 1 on CPU core #1
[Work thread Feb 18 18:24] Setting affinity to run helper thread 3 on CPU core #2
[Work thread Feb 18 18:24] Setting affinity to run helper thread 2 on CPU core #2
[Work thread Feb 18 18:24] Trying backup intermediate file: p9081307.bu3
[Work thread Feb 18 18:24] Resuming Gerbicz error-checking PRP test of (2^9081307+1)/3 using all-complex AVX-512 FFT length 480K, Pass1=128, Pass2=3840, clm=2, 4 threads
[Work thread Feb 18 18:24] Iteration: 9000001 / 9081307 [99.10%].
[Work thread Feb 18 18:24] Hardware errors have occurred during the test!
[Work thread Feb 18 18:24] 2 Gerbicz/double-check errors.
[Work thread Feb 18 18:24] Confidence in final result is excellent.
[Work thread Feb 18 18:25] Iteration: 9010000 / 9081307 [99.21%], ms/iter:  0.711, ETA: 00:00:50
[Work thread Feb 18 18:25] Hardware errors have occurred during the test!
[Work thread Feb 18 18:25] 2 Gerbicz/double-check errors.
[Work thread Feb 18 18:25] Confidence in final result is excellent.
[Work thread Feb 18 18:25] Iteration: 9020000 / 9081307 [99.32%], ms/iter:  0.711, ETA: 00:00:43
[Work thread Feb 18 18:25] Hardware errors have occurred during the test!
[Work thread Feb 18 18:25] 2 Gerbicz/double-check errors.
[Work thread Feb 18 18:25] Confidence in final result is excellent.
[Work thread Feb 18 18:25] Iteration: 9030000 / 9081307 [99.43%], ms/iter:  0.711, ETA: 00:00:36
[Work thread Feb 18 18:25] Hardware errors have occurred during the test!
[Work thread Feb 18 18:25] 2 Gerbicz/double-check errors.
[Work thread Feb 18 18:25] Confidence in final result is excellent.
[Main thread Feb 18 18:25] Stopping all worker threads.
[Work thread Feb 18 18:25] Stopping PRP test of (2^9081307+1)/3 at iteration 9036608 [99.50%]
[Work thread Feb 18 18:25] Worker stopped.
[Main thread Feb 18 18:25] Execution halted.
If you let it continue, eventually it reaches "15 or more Gerbicz/double-check errors."

Last fiddled with by GP2 on 2019-02-18 at 21:07
GP2 is offline   Reply With Quote