mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   AVX error? (https://www.mersenneforum.org/showthread.php?t=17033)

Xyzzy 2012-08-04 00:46

AVX error?
 
Twice in the last week we have had Mprime bail on a P-1 job. We have never had this happen before. Unfortunately we did not copy the screen so the error is hard to remember. We think it said something about AVX. There is no additional data in any of the log files except what is posted below.

If Mprime experiences an error like this we would prefer for it to go on to the next job rather than retry something over and over. Maybe move on if it errors out 100 times?

FWIW, the box is Mprime stable (24 hours) and not overclocked. Maybe it is one of those boundary FFT errors?

When we restarted Mprime last time it finished the job with no issues. We do not remember where in the job it restarted. This time we noticed that it restarted in the middle of stage 2.

Here is the environment and job:
[CODE][Main thread Aug 3 19:35] Mersenne number primality test program version 27.6
[Main thread Aug 3 19:35] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 8 MB
[Main thread Aug 3 19:35] Logical CPUs 1,5 form one physical CPU.
[Main thread Aug 3 19:35] Logical CPUs 2,6 form one physical CPU.
[Main thread Aug 3 19:35] Logical CPUs 3,7 form one physical CPU.
[Main thread Aug 3 19:35] Logical CPUs 4,8 form one physical CPU.
[Main thread Aug 3 19:35] Starting worker.
[Work thread Aug 3 19:35] Worker starting
[Work thread Aug 3 19:35] Setting affinity to run worker on any logical CPU.
[Work thread Aug 3 19:35] Optimal P-1 factoring of M57049273 using up to 8192MB of memory.
[Work thread Aug 3 19:35] Assuming no factors below 2^73 and 2 primality tests saved if a factor is found.
[Work thread Aug 3 19:35] Optimal bounds are B1=535000, B2=10967500
[Work thread Aug 3 19:35] Chance of finding a factor is an estimated 3.69%
[Work thread Aug 3 19:35] Using AVX FFT length 3M, Pass1=512, Pass2=6K, 4 threads
[Work thread Aug 3 19:35] Setting affinity to run helper thread 1 on any logical CPU.
[Work thread Aug 3 19:35] Setting affinity to run helper thread 2 on any logical CPU.
[Work thread Aug 3 19:35] Setting affinity to run helper thread 3 on any logical CPU.
[Work thread Aug 3 19:35] Using 7392MB of memory. Processing 288 relative primes (0 of 288 already processed).[/CODE]Here is what is in results.txt:
[CODE][Fri Aug 3 16:37:21 2012]
SUMOUT error occurred.
[Fri Aug 3 16:42:30 2012]
SUMOUT error occurred.
[Fri Aug 3 16:47:39 2012]
SUMOUT error occurred.
[Fri Aug 3 16:52:47 2012]
SUMOUT error occurred.
[Fri Aug 3 16:57:56 2012]
SUMOUT error occurred.
[Fri Aug 3 17:03:05 2012]
SUMOUT error occurred.
[Fri Aug 3 17:08:14 2012]
SUMOUT error occurred.
[Fri Aug 3 17:13:23 2012]
SUMOUT error occurred.
[Fri Aug 3 17:18:32 2012]
SUMOUT error occurred.
[Fri Aug 3 17:23:41 2012]
SUMOUT error occurred.
[Fri Aug 3 17:28:50 2012]
SUMOUT error occurred.
[Fri Aug 3 17:33:59 2012]
SUMOUT error occurred.
[Fri Aug 3 17:39:08 2012]
SUMOUT error occurred.
[Fri Aug 3 17:44:17 2012]
SUMOUT error occurred.
[Fri Aug 3 17:49:26 2012]
SUMOUT error occurred.
[Fri Aug 3 17:54:35 2012]
SUMOUT error occurred.
[Fri Aug 3 17:59:44 2012]
SUMOUT error occurred.
[Fri Aug 3 18:04:53 2012]
SUMOUT error occurred.
[Fri Aug 3 18:10:02 2012]
SUMOUT error occurred.
[Fri Aug 3 18:15:11 2012]
SUMOUT error occurred.
[Fri Aug 3 18:20:20 2012]
SUMOUT error occurred.
[Fri Aug 3 18:25:30 2012]
SUMOUT error occurred.
[Fri Aug 3 18:30:39 2012]
SUMOUT error occurred.
[Fri Aug 3 18:35:48 2012]
SUMOUT error occurred.
[Fri Aug 3 18:40:57 2012]
SUMOUT error occurred.
[Fri Aug 3 18:46:06 2012]
SUMOUT error occurred.
[Fri Aug 3 18:51:15 2012]
SUMOUT error occurred.
[Fri Aug 3 18:56:25 2012]
SUMOUT error occurred.
[Fri Aug 3 19:01:34 2012]
SUMOUT error occurred.
[Fri Aug 3 19:06:43 2012]
SUMOUT error occurred.
[Fri Aug 3 19:11:52 2012]
SUMOUT error occurred.
[Fri Aug 3 19:17:02 2012]
SUMOUT error occurred.
[Fri Aug 3 19:22:10 2012]
SUMOUT error occurred.
[Fri Aug 3 19:27:19 2012]
SUMOUT error occurred.[/CODE]:confused:

Dubslow 2012-08-04 01:20

This may or may not be completely useless, but you're using 27.6, not 27.7.
[QUOTE=Prime95;297258]1. A slower trial factoring algorithm is chosen sometimes. Fixed in version 27.6 build 2.
2. The round off error can be incorrectly calculated. Fixed in 27.6 build 3.
3. Mprime -v does not print out the build number. Fixed in version 27.6 build 4.
4. Torture test on small FFTs gets round off errors or crashes after a while. Partially fixed in version 27.6 build 4.
[B]5. Torture test on small FFTs gets round off errors or crashes after a while. In very rare cases, regular tests could crash or raise a round off error. Fixed in version 27.7.
6. Multithreaded FFTs could deadlock. Fixed in version 27.7.[/B][/QUOTE]
My emphasis.

Xyzzy 2012-08-04 02:07

Aha! The name format for 27.7 is different. We didn't even notice it.

p95v277.linux64.tar.gz

Plus, who reads the documentation anyways?

:cmd:

Xyzzy 2012-08-08 18:08

A different box, running 27.7, bailed on us today. The previous box is an i7 and this one is an i5.

:help:

[CODE][Work thread Aug 8 12:16] Waiting five minutes before restarting.
[Work thread Aug 8 12:21] Using AVX FFT length 3M, Pass1=512, Pass2=6K, 4 threads
[Work thread Aug 8 12:21] Setting affinity to run helper thread 1 on any logical CPU.
[Work thread Aug 8 12:21] Setting affinity to run helper thread 3 on any logical CPU.
[Work thread Aug 8 12:21] Setting affinity to run helper thread 2 on any logical CPU.
[Work thread Aug 8 12:21] SUMOUT error occurred.
[Work thread Aug 8 12:21] Waiting five minutes before restarting.
[Work thread Aug 8 12:26] Using AVX FFT length 3M, Pass1=512, Pass2=6K, 4 threads
[Work thread Aug 8 12:26] Setting affinity to run helper thread 1 on any logical CPU.
[Work thread Aug 8 12:26] Setting affinity to run helper thread 2 on any logical CPU.
[Work thread Aug 8 12:26] Setting affinity to run helper thread 3 on any logical CPU.
[Work thread Aug 8 12:26] SUMOUT error occurred.
[Work thread Aug 8 12:26] Waiting five minutes before restarting.
[Work thread Aug 8 12:31] Using AVX FFT length 3M, Pass1=512, Pass2=6K, 4 threads
[Work thread Aug 8 12:31] Setting affinity to run helper thread 2 on any logical CPU.
[Work thread Aug 8 12:31] Setting affinity to run helper thread 1 on any logical CPU.
[Work thread Aug 8 12:31] Setting affinity to run helper thread 3 on any logical CPU.
[Work thread Aug 8 12:31] SUMOUT error occurred.
[Work thread Aug 8 12:31] Waiting five minutes before restarting.
[Work thread Aug 8 12:36] Using AVX FFT length 3M, Pass1=512, Pass2=6K, 4 threads
[Work thread Aug 8 12:36] Setting affinity to run helper thread 1 on any logical CPU.
[Work thread Aug 8 12:36] Setting affinity to run helper thread 2 on any logical CPU.
[Work thread Aug 8 12:36] Setting affinity to run helper thread 3 on any logical CPU.
[Work thread Aug 8 12:36] SUMOUT error occurred.
[Work thread Aug 8 12:36] Waiting five minutes before restarting.
[Work thread Aug 8 12:41] Setting affinity to run helper thread 3 on any logical CPU.
[Work thread Aug 8 12:41] Using AVX FFT length 3M, Pass1=512, Pass2=6K, 4 threads
[Work thread Aug 8 12:41] Setting affinity to run helper thread 1 on any logical CPU.
[Work thread Aug 8 12:41] Setting affinity to run helper thread 2 on any logical CPU.
[Work thread Aug 8 12:41] SUMOUT error occurred.
[Work thread Aug 8 12:41] Waiting five minutes before restarting.
[Work thread Aug 8 12:46] Using AVX FFT length 3M, Pass1=512, Pass2=6K, 4 threads
[Work thread Aug 8 12:46] Setting affinity to run helper thread 1 on any logical CPU.
[Work thread Aug 8 12:46] Setting affinity to run helper thread 2 on any logical CPU.
[Work thread Aug 8 12:46] Setting affinity to run helper thread 3 on any logical CPU.
[Work thread Aug 8 12:46] SUMOUT error occurred.
[Work thread Aug 8 12:46] Waiting five minutes before restarting.
[Work thread Aug 8 12:51] Using AVX FFT length 3M, Pass1=512, Pass2=6K, 4 threads
[Work thread Aug 8 12:51] Setting affinity to run helper thread 2 on any logical CPU.
[Work thread Aug 8 12:51] Setting affinity to run helper thread 3 on any logical CPU.
[Work thread Aug 8 12:51] Setting affinity to run helper thread 1 on any logical CPU.
[Work thread Aug 8 12:51] SUMOUT error occurred.
[Work thread Aug 8 12:51] Waiting five minutes before restarting.
[Work thread Aug 8 12:56] Using AVX FFT length 3M, Pass1=512, Pass2=6K, 4 threads
[Work thread Aug 8 12:56] Setting affinity to run helper thread 1 on any logical CPU.
[Work thread Aug 8 12:56] Setting affinity to run helper thread 2 on any logical CPU.
[Work thread Aug 8 12:56] Setting affinity to run helper thread 3 on any logical CPU.
[Work thread Aug 8 12:56] SUMOUT error occurred.
[Work thread Aug 8 12:56] Waiting five minutes before restarting.
[Work thread Aug 8 13:01] Using AVX FFT length 3M, Pass1=512, Pass2=6K, 4 threads
[Work thread Aug 8 13:01] Setting affinity to run helper thread 1 on any logical CPU.
[Work thread Aug 8 13:01] Setting affinity to run helper thread 2 on any logical CPU.
[Work thread Aug 8 13:01] Setting affinity to run helper thread 3 on any logical CPU.
[Work thread Aug 8 13:01] SUMOUT error occurred.
[Work thread Aug 8 13:01] Waiting five minutes before restarting.[/CODE][CODE][Main thread Aug 8 13:04] Mersenne number primality test program version 27.7
[Main thread Aug 8 13:04] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 6 MB
[Main thread Aug 8 13:04] Starting worker.
[Work thread Aug 8 13:04] Worker starting
[Work thread Aug 8 13:04] Setting affinity to run worker on any logical CPU.
[Work thread Aug 8 13:04] Optimal P-1 factoring of M56332841 using up to 10240MB of memory.
[Work thread Aug 8 13:04] Assuming no factors below 2^72 and 2 primality tests saved if a factor is found.
[Work thread Aug 8 13:04] Optimal bounds are B1=565000, B2=12147500
[Work thread Aug 8 13:04] Chance of finding a factor is an estimated 4.2%
[Work thread Aug 8 13:04] Using AVX FFT length 3M, Pass1=512, Pass2=6K, 4 threads
[Work thread Aug 8 13:04] Setting affinity to run helper thread 1 on any logical CPU.
[Work thread Aug 8 13:04] Setting affinity to run helper thread 2 on any logical CPU.
[Work thread Aug 8 13:04] Setting affinity to run helper thread 3 on any logical CPU.[/CODE]


All times are UTC. The time now is 15:30.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.