mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2020-09-17, 19:11   #1
phma
 
Sep 2020

1012 Posts
Default Temporarily abandoning work unit

I'm running mprime on a Ryzen 5 3600X 6-core, and it got a first test assignment expected to take a few weeks. Earlier today I noticed it go down to one thread, which normally means it's communicating with the server. I checked the log and this was the case. Then it stopped running completely. I checked the directory and tailed the log. They look like this:

phma@puma:~/mersenne$ vdir -rt
total 170276
-rw-r--r-- 1 phma phma 55153 Feb 9 2018 whatsnew.txt
-rw-r--r-- 1 phma phma 36601 Feb 9 2018 undoc.txt
-rw-r--r-- 1 phma phma 20019 Feb 9 2018 readme.txt
-rw-r--r-- 1 phma phma 2110 Feb 9 2018 license.txt
-rw-r--r-- 1 phma phma 6860 Feb 9 2018 stress.txt
-rwxrwxr-x 1 phma phma 34259741 Feb 9 2018 mprime
-rwxr-xr-x 1 phma phma 545695 Feb 9 2018 libgmp.so.10.3.2
lrwxrwxrwx 1 phma phma 16 Feb 9 2018 libgmp.so.10 -> libgmp.so.10.3.2
lrwxrwxrwx 1 phma phma 16 Feb 9 2018 libgmp.so -> libgmp.so.10.3.2
-rw-rw-r-- 1 phma phma 177 Sep 8 20:01 prime.txt
-rw-rw-r-- 1 phma phma 65 Sep 14 16:27 worktodo.txt
-rw-rw-r-- 1 phma phma 13930528 Sep 15 18:39 p111443669.bad1
-rw-rw-r-- 1 phma phma 13930528 Sep 16 15:39 p111443669.bad4
-rw-rw-r-- 1 phma phma 13930528 Sep 16 16:09 p111443669.bad3
-rw-rw-r-- 1 phma phma 13930528 Sep 16 16:39 p111443669.bad2
-rw-rw-r-- 1 phma phma 13930528 Sep 17 01:09 p111443669.bad7
-rw-rw-r-- 1 phma phma 13930528 Sep 17 01:39 p111443669.bad6
-rw-rw-r-- 1 phma phma 13930528 Sep 17 02:01 p111443669.bad5
-rw-rw-r-- 1 phma phma 11307 Sep 17 02:05 gwnum.txt
-rw-rw-r-- 1 phma phma 13930528 Sep 17 12:39 p111443669.bad10
-rw-rw-r-- 1 phma phma 13930528 Sep 17 13:09 p111443669.bad9
-rw-rw-r-- 1 phma phma 13930528 Sep 17 13:39 p111443669.bad8
-rw-rw-r-- 1 phma phma 454 Sep 17 14:01 local.txt
-rw-rw-r-- 1 phma phma 7324 Sep 17 14:09 prime.log
-rw-rw-r-- 1 phma phma 33503 Sep 17 14:20 results.txt
phma@puma:~/mersenne$ tail results.txt
Trying backup intermediate file: p111443669.bad4
Error reading intermediate file: p111443669.bad4
Trying backup intermediate file: p111443669.bad3
Error reading intermediate file: p111443669.bad3
Trying backup intermediate file: p111443669.bad2
Error reading intermediate file: p111443669.bad2
Trying backup intermediate file: p111443669.bad1
[Thu Sep 17 14:20:29 2020]
Error reading intermediate file: p111443669.bad1
All intermediate files bad. Temporarily abandoning work unit.

What should I do about this? I've seen bad files before, but it had another assignment. Now it has only one:
[Worker #1]
Test=<snip>,111443669,76,1
phma is offline   Reply With Quote
Old 2020-09-17, 20:09   #2
moebius
 
moebius's Avatar
 
Jul 2009
Germany

5·67 Posts
Default

Which version of mprime are you using? It seems like your system is very unstable because of RAM, CPU, Temps etc.

Last fiddled with by moebius on 2020-09-17 at 20:09
moebius is offline   Reply With Quote
Old 2020-09-17, 23:42   #3
phma
 
Sep 2020

5 Posts
Default

29.4 build 8.

I found this line in the log before these bad files appeared:
Iteration: 17454649/111443669, ERROR: Jacobi error check failed!

How do I find out what is the problem? The temperature is okay according to the manufacturer.
phma is offline   Reply With Quote
Old 2020-09-18, 00:17   #4
moebius
 
moebius's Avatar
 
Jul 2009
Germany

5×67 Posts
Default

Quote:
Originally Posted by phma View Post
How do I find out what is the problem?
This error message is often discussed here in the forum. e.g.
https://mersenneforum.org/showthread...%21#post522052
You could run a torture test with prime95/mprime or perform a memory test with the memtest tool. The best thing to do is to switch to PRP which corrects most errors. The latest version is 30.3 build 6. Check whether the X(AMP) profile for your specific RAM modules is activated in the BIOS.

Last fiddled with by moebius on 2020-09-18 at 01:00
moebius is offline   Reply With Quote
Old 2020-09-18, 04:22   #5
phma
 
Sep 2020

5 Posts
Default

I'm now running 30.3 build 6, how do I switch to PRP?
phma is offline   Reply With Quote
Old 2020-09-18, 04:40   #6
moebius
 
moebius's Avatar
 
Jul 2009
Germany

5·67 Posts
Default

replace in worktodo.txt
Test=XXXXXXXXXXXXXXXXXXXXXXXXXX,111443669,76,1
with
PRP=XXXXXXXXXXXXXXXXXXXXXXXXXX,111443669,-1,76,0,3,0
restart mprime and do no manual communication with the server till the exponent is done, otherwise you may lose the assignment.

Last fiddled with by moebius on 2020-09-18 at 04:48
moebius is offline   Reply With Quote
Old 2020-09-18, 07:22   #7
phma
 
Sep 2020

516 Posts
Default

That got me this message as soon as I restarted it:
PRP test of 111443669*18446744073709551615^76+0 aborted -- number is divisible by 3
And I got a different assignment:
Test=<snip>,111992581,76,0
phma is offline   Reply With Quote
Old 2020-09-18, 07:29   #8
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

25×179 Posts
Default

Quote:
Originally Posted by moebius View Post
The best thing to do is to switch to PRP which corrects most errors.
Yes, PRP tests are good because of the more robust error checking. But let's be clear here, it can't correct errors, it only detects errors (with high probability) and redoes computations from a previous save file. If the save file(s) is(are) corrupted then you might end up with the same problem of multiple bad save files with nothing good to recover from.
retina is offline   Reply With Quote
Old 2020-09-18, 07:58   #9
S485122
 
S485122's Avatar
 
Sep 2006
Brussels, Belgium

2×5×157 Posts
Default

Quote:
Originally Posted by moebius View Post
replace in worktodo.txt
Test=XXXXXXXXXXXXXXXXXXXXXXXXXX,111443669,76,1
with
PRP=XXXXXXXXXXXXXXXXXXXXXXXXXX,111443669,-1,76,0,3,0
restart mprime and do no manual communication with the server till the exponent is done, otherwise you may lose the assignment.
Your suggestion seems incorrect as illustrated by the result the OP posted :
Quote:
Originally Posted by phma View Post
That got me this message as soon as I restarted it:
PRP test of 111443669*18446744073709551615^76+0 aborted -- number is divisible by 3
And I got a different assignment:
Test=<snip>,111992581,76,0
You forgot the ",1,2", it should have been :
PRP=XXXXXXXXXXXXXXXXXXXXXXXXXX,1,2,111443669,-1,76,0,3,0
or simpler
PRP=XXXXXXXXXXXXXXXXXXXXXXXXXX,1,2,111443669,-1

Jacob
S485122 is online now   Reply With Quote
Old 2020-09-18, 08:17   #10
phma
 
Sep 2020

5 Posts
Default

How do I tell it to get PRP assignments instead of Test assignments?
phma is offline   Reply With Quote
Old 2020-09-18, 14:56   #11
moebius
 
moebius's Avatar
 
Jul 2009
Germany

5×67 Posts
Default

Quote:
Originally Posted by phma View Post
How do I tell it to get PRP assignments instead of Test assignments?
I am currently not using mprime, at windows pulldown menu Test> workerwindows typeofworktoget: First time prime test.

If you have a account at mersenne.org you can change your Work Peference for every single CPU to PRP-PRP first test.
Account/TeamInfo>MyAccount>CPU'S

Or just get the smallest available first time prp test on the mersenne.org website manually.
https://www.mersenne.org/manual_assignment/

And yes retina is right, but the probability that your result with PRP is correct in the end is much higher than with the Lucas Lehmer test.

Last fiddled with by moebius on 2020-09-18 at 15:32
moebius is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Dual Core to process single work unit? JimboPrimer Homework Help 18 2011-08-28 04:08
SETI Shutting Down (hopefully only temporarily) jinydu Lounge 2 2011-04-27 14:42
Preventing new work unit requests LinearB Information & Answers 3 2011-03-07 19:35
Is it just me or are we abandoning more? petrw1 PrimeNet 15 2008-02-12 18:57
move work unit from pc to another Unregistered PrimeNet 2 2005-10-26 16:26

All times are UTC. The time now is 07:53.

Tue Sep 22 07:53:08 UTC 2020 up 12 days, 5:04, 0 users, load averages: 1.07, 1.21, 1.34

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.