![]() |
|
|
#1 |
|
Sep 2020
22·3 Posts |
I'm running mprime on a Ryzen 5 3600X 6-core, and it got a first test assignment expected to take a few weeks. Earlier today I noticed it go down to one thread, which normally means it's communicating with the server. I checked the log and this was the case. Then it stopped running completely. I checked the directory and tailed the log. They look like this:
phma@puma:~/mersenne$ vdir -rt total 170276 -rw-r--r-- 1 phma phma 55153 Feb 9 2018 whatsnew.txt -rw-r--r-- 1 phma phma 36601 Feb 9 2018 undoc.txt -rw-r--r-- 1 phma phma 20019 Feb 9 2018 readme.txt -rw-r--r-- 1 phma phma 2110 Feb 9 2018 license.txt -rw-r--r-- 1 phma phma 6860 Feb 9 2018 stress.txt -rwxrwxr-x 1 phma phma 34259741 Feb 9 2018 mprime -rwxr-xr-x 1 phma phma 545695 Feb 9 2018 libgmp.so.10.3.2 lrwxrwxrwx 1 phma phma 16 Feb 9 2018 libgmp.so.10 -> libgmp.so.10.3.2 lrwxrwxrwx 1 phma phma 16 Feb 9 2018 libgmp.so -> libgmp.so.10.3.2 -rw-rw-r-- 1 phma phma 177 Sep 8 20:01 prime.txt -rw-rw-r-- 1 phma phma 65 Sep 14 16:27 worktodo.txt -rw-rw-r-- 1 phma phma 13930528 Sep 15 18:39 p111443669.bad1 -rw-rw-r-- 1 phma phma 13930528 Sep 16 15:39 p111443669.bad4 -rw-rw-r-- 1 phma phma 13930528 Sep 16 16:09 p111443669.bad3 -rw-rw-r-- 1 phma phma 13930528 Sep 16 16:39 p111443669.bad2 -rw-rw-r-- 1 phma phma 13930528 Sep 17 01:09 p111443669.bad7 -rw-rw-r-- 1 phma phma 13930528 Sep 17 01:39 p111443669.bad6 -rw-rw-r-- 1 phma phma 13930528 Sep 17 02:01 p111443669.bad5 -rw-rw-r-- 1 phma phma 11307 Sep 17 02:05 gwnum.txt -rw-rw-r-- 1 phma phma 13930528 Sep 17 12:39 p111443669.bad10 -rw-rw-r-- 1 phma phma 13930528 Sep 17 13:09 p111443669.bad9 -rw-rw-r-- 1 phma phma 13930528 Sep 17 13:39 p111443669.bad8 -rw-rw-r-- 1 phma phma 454 Sep 17 14:01 local.txt -rw-rw-r-- 1 phma phma 7324 Sep 17 14:09 prime.log -rw-rw-r-- 1 phma phma 33503 Sep 17 14:20 results.txt phma@puma:~/mersenne$ tail results.txt Trying backup intermediate file: p111443669.bad4 Error reading intermediate file: p111443669.bad4 Trying backup intermediate file: p111443669.bad3 Error reading intermediate file: p111443669.bad3 Trying backup intermediate file: p111443669.bad2 Error reading intermediate file: p111443669.bad2 Trying backup intermediate file: p111443669.bad1 [Thu Sep 17 14:20:29 2020] Error reading intermediate file: p111443669.bad1 All intermediate files bad. Temporarily abandoning work unit. What should I do about this? I've seen bad files before, but it had another assignment. Now it has only one: [Worker #1] Test=<snip>,111443669,76,1 |
|
|
|
|
|
#2 |
|
Jul 2009
Germany
2·353 Posts |
Which version of mprime are you using? It seems like your system is very unstable because of RAM, CPU, Temps etc.
Last fiddled with by moebius on 2020-09-17 at 20:09 |
|
|
|
|
|
#3 |
|
Sep 2020
148 Posts |
29.4 build 8.
I found this line in the log before these bad files appeared: Iteration: 17454649/111443669, ERROR: Jacobi error check failed! How do I find out what is the problem? The temperature is okay according to the manufacturer. |
|
|
|
|
|
#4 |
|
Jul 2009
Germany
2×353 Posts |
This error message is often discussed here in the forum. e.g.
https://mersenneforum.org/showthread...%21#post522052 You could run a torture test with prime95/mprime or perform a memory test with the memtest tool. The best thing to do is to switch to PRP which corrects most errors. The latest version is 30.3 build 6. Check whether the X(AMP) profile for your specific RAM modules is activated in the BIOS. Last fiddled with by moebius on 2020-09-18 at 01:00 |
|
|
|
|
|
#5 |
|
Sep 2020
22×3 Posts |
I'm now running 30.3 build 6, how do I switch to PRP?
|
|
|
|
|
|
#6 |
|
Jul 2009
Germany
13028 Posts |
replace in worktodo.txt
Test=XXXXXXXXXXXXXXXXXXXXXXXXXX,111443669,76,1 with PRP=XXXXXXXXXXXXXXXXXXXXXXXXXX,111443669,-1,76,0,3,0 restart mprime and do no manual communication with the server till the exponent is done, otherwise you may lose the assignment. Last fiddled with by moebius on 2020-09-18 at 04:48 |
|
|
|
|
|
#7 |
|
Sep 2020
22×3 Posts |
That got me this message as soon as I restarted it:
PRP test of 111443669*18446744073709551615^76+0 aborted -- number is divisible by 3 And I got a different assignment: Test=<snip>,111992581,76,0 |
|
|
|
|
|
#8 |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
6,793 Posts |
Yes, PRP tests are good because of the more robust error checking. But let's be clear here, it can't correct errors, it only detects errors (with high probability) and redoes computations from a previous save file. If the save file(s) is(are) corrupted then you might end up with the same problem of multiple bad save files with nothing good to recover from.
|
|
|
|
|
|
#9 | ||
|
"Jacob"
Sep 2006
Brussels, Belgium
2·977 Posts |
Quote:
Quote:
PRP=XXXXXXXXXXXXXXXXXXXXXXXXXX,1,2,111443669,-1,76,0,3,0 or simpler PRP=XXXXXXXXXXXXXXXXXXXXXXXXXX,1,2,111443669,-1 Jacob |
||
|
|
|
|
|
#10 |
|
Sep 2020
11002 Posts |
How do I tell it to get PRP assignments instead of Test assignments?
|
|
|
|
|
|
#11 | |
|
Jul 2009
Germany
13028 Posts |
Quote:
If you have a account at mersenne.org you can change your Work Peference for every single CPU to PRP-PRP first test. Account/TeamInfo>MyAccount>CPU'S Or just get the smallest available first time prp test on the mersenne.org website manually. https://www.mersenne.org/manual_assignment/ And yes retina is right, but the probability that your result with PRP is correct in the end is much higher than with the Lucas Lehmer test. Last fiddled with by moebius on 2020-09-18 at 15:32 |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Dual Core to process single work unit? | JimboPrimer | Homework Help | 18 | 2011-08-28 04:08 |
| SETI Shutting Down (hopefully only temporarily) | jinydu | Lounge | 2 | 2011-04-27 14:42 |
| Preventing new work unit requests | LinearB | Information & Answers | 3 | 2011-03-07 19:35 |
| Is it just me or are we abandoning more? | petrw1 | PrimeNet | 15 | 2008-02-12 18:57 |
| move work unit from pc to another | Unregistered | PrimeNet | 2 | 2005-10-26 16:26 |