mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Temporarily abandoning work unit (https://www.mersenneforum.org/showthread.php?t=25976)

phma 2020-09-17 19:11

Temporarily abandoning work unit
 
I'm running mprime on a Ryzen 5 3600X 6-core, and it got a first test assignment expected to take a few weeks. Earlier today I noticed it go down to one thread, which normally means it's communicating with the server. I checked the log and this was the case. Then it stopped running completely. I checked the directory and tailed the log. They look like this:

phma@puma:~/mersenne$ vdir -rt
total 170276
-rw-r--r-- 1 phma phma 55153 Feb 9 2018 whatsnew.txt
-rw-r--r-- 1 phma phma 36601 Feb 9 2018 undoc.txt
-rw-r--r-- 1 phma phma 20019 Feb 9 2018 readme.txt
-rw-r--r-- 1 phma phma 2110 Feb 9 2018 license.txt
-rw-r--r-- 1 phma phma 6860 Feb 9 2018 stress.txt
-rwxrwxr-x 1 phma phma 34259741 Feb 9 2018 mprime
-rwxr-xr-x 1 phma phma 545695 Feb 9 2018 libgmp.so.10.3.2
lrwxrwxrwx 1 phma phma 16 Feb 9 2018 libgmp.so.10 -> libgmp.so.10.3.2
lrwxrwxrwx 1 phma phma 16 Feb 9 2018 libgmp.so -> libgmp.so.10.3.2
-rw-rw-r-- 1 phma phma 177 Sep 8 20:01 prime.txt
-rw-rw-r-- 1 phma phma 65 Sep 14 16:27 worktodo.txt
-rw-rw-r-- 1 phma phma 13930528 Sep 15 18:39 p111443669.bad1
-rw-rw-r-- 1 phma phma 13930528 Sep 16 15:39 p111443669.bad4
-rw-rw-r-- 1 phma phma 13930528 Sep 16 16:09 p111443669.bad3
-rw-rw-r-- 1 phma phma 13930528 Sep 16 16:39 p111443669.bad2
-rw-rw-r-- 1 phma phma 13930528 Sep 17 01:09 p111443669.bad7
-rw-rw-r-- 1 phma phma 13930528 Sep 17 01:39 p111443669.bad6
-rw-rw-r-- 1 phma phma 13930528 Sep 17 02:01 p111443669.bad5
-rw-rw-r-- 1 phma phma 11307 Sep 17 02:05 gwnum.txt
-rw-rw-r-- 1 phma phma 13930528 Sep 17 12:39 p111443669.bad10
-rw-rw-r-- 1 phma phma 13930528 Sep 17 13:09 p111443669.bad9
-rw-rw-r-- 1 phma phma 13930528 Sep 17 13:39 p111443669.bad8
-rw-rw-r-- 1 phma phma 454 Sep 17 14:01 local.txt
-rw-rw-r-- 1 phma phma 7324 Sep 17 14:09 prime.log
-rw-rw-r-- 1 phma phma 33503 Sep 17 14:20 results.txt
phma@puma:~/mersenne$ tail results.txt
Trying backup intermediate file: p111443669.bad4
Error reading intermediate file: p111443669.bad4
Trying backup intermediate file: p111443669.bad3
Error reading intermediate file: p111443669.bad3
Trying backup intermediate file: p111443669.bad2
Error reading intermediate file: p111443669.bad2
Trying backup intermediate file: p111443669.bad1
[Thu Sep 17 14:20:29 2020]
Error reading intermediate file: p111443669.bad1
All intermediate files bad. Temporarily abandoning work unit.

What should I do about this? I've seen bad files before, but it had another assignment. Now it has only one:
[Worker #1]
Test=<snip>,111443669,76,1

moebius 2020-09-17 20:09

Which version of mprime are you using? It seems like your system is very unstable because of RAM, CPU, Temps etc.

phma 2020-09-17 23:42

29.4 build 8.

I found this line in the log before these bad files appeared:
Iteration: 17454649/111443669, ERROR: Jacobi error check failed!

How do I find out what is the problem? The temperature is okay according to the manufacturer.

moebius 2020-09-18 00:17

[QUOTE=phma;557259]How do I find out what is the problem?[/QUOTE]

This error message is often discussed here in the forum. e.g.
[URL="https://mersenneforum.org/showthread.php?p=522052&highlight=Jacobi+error+check+failed%21#post522052"]https://mersenneforum.org/showthread.php?p=522052&highlight=Jacobi+error+check+failed%21#post522052[/URL]
You could run a torture test with prime95/mprime or perform a memory test with the memtest tool. The best thing to do is to switch to PRP which corrects most errors. The latest version is 30.3 build 6. Check whether the X(AMP) profile for your specific RAM modules is activated in the BIOS.

phma 2020-09-18 04:22

I'm now running 30.3 build 6, how do I switch to PRP?

moebius 2020-09-18 04:40

replace in worktodo.txt
Test=XXXXXXXXXXXXXXXXXXXXXXXXXX,111443669,76,1
with
PRP=XXXXXXXXXXXXXXXXXXXXXXXXXX,111443669,-1,76,0,3,0
restart mprime and do no manual communication with the server till the exponent is done, otherwise you may lose the assignment.

phma 2020-09-18 07:22

That got me this message as soon as I restarted it:
PRP test of 111443669*18446744073709551615^76+0 aborted -- number is divisible by 3
And I got a different assignment:
Test=<snip>,111992581,76,0

retina 2020-09-18 07:29

[QUOTE=moebius;557261]The best thing to do is to switch to PRP which corrects most errors.[/QUOTE]Yes, PRP tests are good because of the more robust error checking. But let's be clear here, it can't correct errors, it only detects errors (with high probability) and redoes computations from a previous save file. If the save file(s) is(are) corrupted then you might end up with the same problem of multiple bad save files with nothing good to recover from.

S485122 2020-09-18 07:58

[QUOTE=moebius;557268]replace in worktodo.txt
Test=XXXXXXXXXXXXXXXXXXXXXXXXXX,111443669,76,1
with
PRP=XXXXXXXXXXXXXXXXXXXXXXXXXX,111443669,-1,76,0,3,0
restart mprime and do no manual communication with the server till the exponent is done, otherwise you may lose the assignment.[/QUOTE]Your suggestion seems incorrect as illustrated by the result the OP posted :[QUOTE=phma;557273]That got me this message as soon as I restarted it:
PRP test of 111443669*18446744073709551615^76+0 aborted -- number is divisible by 3
And I got a different assignment:
Test=<snip>,111992581,76,0[/QUOTE]You forgot the ",1,2", it should have been :
PRP=XXXXXXXXXXXXXXXXXXXXXXXXXX,1,2,111443669,-1,76,0,3,0
or simpler
PRP=XXXXXXXXXXXXXXXXXXXXXXXXXX,1,2,111443669,-1

Jacob

phma 2020-09-18 08:17

How do I tell it to get PRP assignments instead of Test assignments?

moebius 2020-09-18 14:56

[QUOTE=phma;557277]How do I tell it to get PRP assignments instead of Test assignments?[/QUOTE]

I am currently not using mprime, at windows pulldown menu [B]Test> workerwindows [/B]typeofworktoget:[B] First time prime test.[/B]

If you have a account at mersenne.org you can change your Work Peference for every single CPU to PRP-PRP first test.
[B]Account/TeamInfo>MyAccount>CPU'S
[/B]
Or just get the smallest available first time prp test on the mersenne.org website manually.
[URL="https://www.mersenne.org/manual_assignment/"]https://www.mersenne.org/manual_assignment/[/URL]

And yes retina is right, but the probability that your result with PRP is correct in the end is much higher than with the Lucas Lehmer test.


All times are UTC. The time now is 19:26.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.