![]() |
Temporarily abandoning work unit
I'm running mprime on a Ryzen 5 3600X 6-core, and it got a first test assignment expected to take a few weeks. Earlier today I noticed it go down to one thread, which normally means it's communicating with the server. I checked the log and this was the case. Then it stopped running completely. I checked the directory and tailed the log. They look like this:
phma@puma:~/mersenne$ vdir -rt total 170276 -rw-r--r-- 1 phma phma 55153 Feb 9 2018 whatsnew.txt -rw-r--r-- 1 phma phma 36601 Feb 9 2018 undoc.txt -rw-r--r-- 1 phma phma 20019 Feb 9 2018 readme.txt -rw-r--r-- 1 phma phma 2110 Feb 9 2018 license.txt -rw-r--r-- 1 phma phma 6860 Feb 9 2018 stress.txt -rwxrwxr-x 1 phma phma 34259741 Feb 9 2018 mprime -rwxr-xr-x 1 phma phma 545695 Feb 9 2018 libgmp.so.10.3.2 lrwxrwxrwx 1 phma phma 16 Feb 9 2018 libgmp.so.10 -> libgmp.so.10.3.2 lrwxrwxrwx 1 phma phma 16 Feb 9 2018 libgmp.so -> libgmp.so.10.3.2 -rw-rw-r-- 1 phma phma 177 Sep 8 20:01 prime.txt -rw-rw-r-- 1 phma phma 65 Sep 14 16:27 worktodo.txt -rw-rw-r-- 1 phma phma 13930528 Sep 15 18:39 p111443669.bad1 -rw-rw-r-- 1 phma phma 13930528 Sep 16 15:39 p111443669.bad4 -rw-rw-r-- 1 phma phma 13930528 Sep 16 16:09 p111443669.bad3 -rw-rw-r-- 1 phma phma 13930528 Sep 16 16:39 p111443669.bad2 -rw-rw-r-- 1 phma phma 13930528 Sep 17 01:09 p111443669.bad7 -rw-rw-r-- 1 phma phma 13930528 Sep 17 01:39 p111443669.bad6 -rw-rw-r-- 1 phma phma 13930528 Sep 17 02:01 p111443669.bad5 -rw-rw-r-- 1 phma phma 11307 Sep 17 02:05 gwnum.txt -rw-rw-r-- 1 phma phma 13930528 Sep 17 12:39 p111443669.bad10 -rw-rw-r-- 1 phma phma 13930528 Sep 17 13:09 p111443669.bad9 -rw-rw-r-- 1 phma phma 13930528 Sep 17 13:39 p111443669.bad8 -rw-rw-r-- 1 phma phma 454 Sep 17 14:01 local.txt -rw-rw-r-- 1 phma phma 7324 Sep 17 14:09 prime.log -rw-rw-r-- 1 phma phma 33503 Sep 17 14:20 results.txt phma@puma:~/mersenne$ tail results.txt Trying backup intermediate file: p111443669.bad4 Error reading intermediate file: p111443669.bad4 Trying backup intermediate file: p111443669.bad3 Error reading intermediate file: p111443669.bad3 Trying backup intermediate file: p111443669.bad2 Error reading intermediate file: p111443669.bad2 Trying backup intermediate file: p111443669.bad1 [Thu Sep 17 14:20:29 2020] Error reading intermediate file: p111443669.bad1 All intermediate files bad. Temporarily abandoning work unit. What should I do about this? I've seen bad files before, but it had another assignment. Now it has only one: [Worker #1] Test=<snip>,111443669,76,1 |
Which version of mprime are you using? It seems like your system is very unstable because of RAM, CPU, Temps etc.
|
29.4 build 8.
I found this line in the log before these bad files appeared: Iteration: 17454649/111443669, ERROR: Jacobi error check failed! How do I find out what is the problem? The temperature is okay according to the manufacturer. |
[QUOTE=phma;557259]How do I find out what is the problem?[/QUOTE]
This error message is often discussed here in the forum. e.g. [URL="https://mersenneforum.org/showthread.php?p=522052&highlight=Jacobi+error+check+failed%21#post522052"]https://mersenneforum.org/showthread.php?p=522052&highlight=Jacobi+error+check+failed%21#post522052[/URL] You could run a torture test with prime95/mprime or perform a memory test with the memtest tool. The best thing to do is to switch to PRP which corrects most errors. The latest version is 30.3 build 6. Check whether the X(AMP) profile for your specific RAM modules is activated in the BIOS. |
I'm now running 30.3 build 6, how do I switch to PRP?
|
replace in worktodo.txt
Test=XXXXXXXXXXXXXXXXXXXXXXXXXX,111443669,76,1 with PRP=XXXXXXXXXXXXXXXXXXXXXXXXXX,111443669,-1,76,0,3,0 restart mprime and do no manual communication with the server till the exponent is done, otherwise you may lose the assignment. |
That got me this message as soon as I restarted it:
PRP test of 111443669*18446744073709551615^76+0 aborted -- number is divisible by 3 And I got a different assignment: Test=<snip>,111992581,76,0 |
[QUOTE=moebius;557261]The best thing to do is to switch to PRP which corrects most errors.[/QUOTE]Yes, PRP tests are good because of the more robust error checking. But let's be clear here, it can't correct errors, it only detects errors (with high probability) and redoes computations from a previous save file. If the save file(s) is(are) corrupted then you might end up with the same problem of multiple bad save files with nothing good to recover from.
|
[QUOTE=moebius;557268]replace in worktodo.txt
Test=XXXXXXXXXXXXXXXXXXXXXXXXXX,111443669,76,1 with PRP=XXXXXXXXXXXXXXXXXXXXXXXXXX,111443669,-1,76,0,3,0 restart mprime and do no manual communication with the server till the exponent is done, otherwise you may lose the assignment.[/QUOTE]Your suggestion seems incorrect as illustrated by the result the OP posted :[QUOTE=phma;557273]That got me this message as soon as I restarted it: PRP test of 111443669*18446744073709551615^76+0 aborted -- number is divisible by 3 And I got a different assignment: Test=<snip>,111992581,76,0[/QUOTE]You forgot the ",1,2", it should have been : PRP=XXXXXXXXXXXXXXXXXXXXXXXXXX,1,2,111443669,-1,76,0,3,0 or simpler PRP=XXXXXXXXXXXXXXXXXXXXXXXXXX,1,2,111443669,-1 Jacob |
How do I tell it to get PRP assignments instead of Test assignments?
|
[QUOTE=phma;557277]How do I tell it to get PRP assignments instead of Test assignments?[/QUOTE]
I am currently not using mprime, at windows pulldown menu [B]Test> workerwindows [/B]typeofworktoget:[B] First time prime test.[/B] If you have a account at mersenne.org you can change your Work Peference for every single CPU to PRP-PRP first test. [B]Account/TeamInfo>MyAccount>CPU'S [/B] Or just get the smallest available first time prp test on the mersenne.org website manually. [URL="https://www.mersenne.org/manual_assignment/"]https://www.mersenne.org/manual_assignment/[/URL] And yes retina is right, but the probability that your result with PRP is correct in the end is much higher than with the Lucas Lehmer test. |
| All times are UTC. The time now is 19:26. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.