![]() |
![]() |
#1 |
Dec 2003
23 Posts |
![]()
Awright -- a while back I had this strange occurence after 38% of a prime number:
Code:
[18:14] Sven [suttang:prime/gimps] > mprime -d Mersenne number primality test program version 23.9 Resuming primality test of M34469543 at iteration 13314917 [38.62%] Iteration: 13314924/34469543, ERROR: ROUND OFF (0.4375) > 0.40 Continuing from last save file. Resuming primality test of M34469543 at iteration 13314917 [38.62%] Disregard last error. Result is reproducible and thus not a hardware problem. For added safety, redoing iteration using a slower, more reliable method. Continuing from last save file. Resuming primality test of M34469543 at iteration 13314917 [38.62%] Iteration: 13314924/34469543, ERROR: SUM(INPUTS) != SUM(OUTPUTS), -1.039789371852531e+16 != -1.039789371852522e+16 Possible hardware failure, consult the readme.txt file. Continuing from last save file. Waiting five minutes before restarting. Segmentation fault http://episteme.arstechnica.com/eve/...001974631/p/27 And commented as follows: So first it tells me it is hardware, then it tells me to disregard that. Then it tells me again that it is hardware. Then it sits for five minutes, then it segfaults. This can be repeated indefinitely - I tried it a couple times, all the same result. HOWEVER: that same machine shows no signs of problems on a 60-hour (friday evening to monday morning) torture test. I'd hate to lose the current crunch, but is it possible that the current save-file is borked? Should I throw it out and re-start? Should I give up GIMPS? Should I join a monastery? Any advice? At the time I decided to move the save-files out of the way and restart the client and see what happens. As it turns out the box picked up nicely and started crunching just fine. Until two days ago, when the end of my log-file shows this: Code:
[May 31 01:42] Iteration: 17100000 / 34564069 [49.47%]. Per iteration time: 0.069 sec. [May 31 03:36] Iteration: 17200000 / 34564069 [49.76%]. Per iteration time: 0.069 sec. [May 31 05:30] Iteration: 17300000 / 34564069 [50.05%]. Per iteration time: 0.069 sec. Iteration: 17317988/34564069, ERROR: ROUND OFF (0.40625) > 0.40 Continuing from last save file. Resuming primality test of M34564069 at iteration 17308161 [50.07%] Disregard last error. Result is reproducible and thus not a hardware problem. For added safety, redoing iteration using a slower, more reliable method. Continuing from last save file. Resuming primality test of M34564069 at iteration 17317981 [50.10%] Iteration: 17317988/34564069, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 1.203546292570829e+16 != 1.20354629257089e+16 Possible hardware failure, consult the readme.txt file. Continuing from last save file. Waiting five minutes before restarting. So what is going on here? Why would my box run fine for two weeks and then suddenly have a hardware failure. Or maybe NOT have one. But not be restartable. Except that it can be restarted just fine if I discard the current crunch. I'm baffled. Anybody got an idea? (FWIW, this is a P4/2.8, running RH10) |
![]() |
![]() |
![]() |
#2 |
P90 years forever!
Aug 2002
Yeehaw, FL
22×5×397 Posts |
![]()
Your hardware is fine. Looks like a bug in the "redoing with slower, more reliable method" code.
Try this until I can investigate further: 1) Copy the save file in case you need to email it to me for debugging. 2) Add "NearFFTLimitPct=0.0" to prime.ini 3) Restart mprime. Let us know if that helps. |
![]() |
![]() |
![]() |
#3 |
Dec 2003
23 Posts |
![]()
Well it was a thought, I suppose:
Code:
[9:43] Sven [suttang:prime/gimps] > echo "NearFFTLimitPct=0.0" >> prime.ini [9:43] Sven [suttang:prime/gimps] > tail -2 prime.ini SilentVictory=0 NearFFTLimitPct=0.0 [9:43] Sven [suttang:prime/gimps] > ./mprime -d Mersenne number primality test program version 23.9 Resuming primality test of M34564069 at iteration 17317981 [50.10%] Iteration: 17317988/34564069, ERROR: ROUND OFF (0.40625) > 0.40 Continuing from last save file. Resuming primality test of M34564069 at iteration 17317981 [50.10%] Disregard last error. Result is reproducible and thus not a hardware problem. For added safety, redoing iteration using a slower, more reliable method. Continuing from last save file. Resuming primality test of M34564069 at iteration 17317981 [50.10%] Iteration: 17317988/34564069, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 1.203546292570829e+16 != 1.20354629257089e+16 Possible hardware failure, consult the readme.txt file. Continuing from last save file. Waiting five minutes before restarting. Segmentation fault That alone didn't do it. ![]() But if the variable NearFFTLimitPct is what I think it is (i.e. what its name suggests) wouldn't it have to be set to something greater than zero to make a difference? ![]() |
![]() |
![]() |
![]() |
#4 |
P90 years forever!
Aug 2002
Yeehaw, FL
1F0416 Posts |
![]()
Try "NearFFTLimit=-2".
This is just a hack to force mprime to not run error-checking every iteration. It will still run error-checking every 128 iterations and thus you may still run into the problem. I'm trying to debug it now. |
![]() |
![]() |
![]() |
#5 |
Dec 2003
278 Posts |
![]() Code:
[17:58] Sven [suttang:prime/gimps] > tail -3 prime.ini TwoBackupFiles=1 NearFFTLimit=-2 SilentVictory=0 [17:58] Sven [suttang:prime/gimps] > ./mprime -d Mersenne number primality test program version 23.9 Contacting PrimeNet Server. Updating computer information on the server Sending expected completion date for M34564069: Jun 19 2005 Done communicating with server. Resuming primality test of M34564069 at iteration 17317981 [50.10%] Iteration: 17317988/34564069, ERROR: ROUND OFF (0.40625) > 0.40 Continuing from last save file. Resuming primality test of M34564069 at iteration 17317981 [50.10%] Disregard last error. Result is reproducible and thus not a hardware problem. For added safety, redoing iteration using a slower, more reliable method. Continuing from last save file. Resuming primality test of M34564069 at iteration 17317981 [50.10%] Iteration: 17317988/34564069, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 1.203546292570829e+16 != 1.20354629257089e+16 Possible hardware failure, consult the readme.txt file. Continuing from last save file. Waiting five minutes before restarting. Segmentation fault ![]() ![]() Of course if "every 128 iterations" includes "the zeroth iteration" and it is the first iteration after the current checkpoint, then this was bound to fail... ![]() Well, let me know if there's anything else I should try -- I'd be happy to try every switch you might have in the software... (I'm slightly baffled here -- am I the only person who uses 23.9 under Linux on a P4? Or what exactly is the determining parameter here?) |
![]() |
![]() |
![]() |
#6 | |
P90 years forever!
Aug 2002
Yeehaw, FL
1F0416 Posts |
![]() Quote:
I'll keep you posted. You could edit local.ini and change 1835008 to 2097152. Run a few iterations and then change it back. |
|
![]() |
![]() |
![]() |
#7 |
P90 years forever!
Aug 2002
Yeehaw, FL
22×5×397 Posts |
![]()
Ugh. I've been debugging 24.12 and the bug I thought I found did not exist in version 23.9. At least a bug was squashed. Back to the drawing board. I wish I had a Linux SSE2 machine here to debug on.
Have you tried version 24.11? If not, give it a whirl. I don't feel comfortable putting up a fixed version 24.12 just yet. You can get 24.11 from ftp://mersenne.org/gimps/mpr2411.tgz |
![]() |
![]() |
![]() |
#8 | ||
Dec 2003
23 Posts |
![]() Quote:
![]() Talk about something I'd never have dreamed of trying... ![]() Quote:
For now, thanks for the hint -- changing the SoftCrossover was apparently all that was needed... ![]() |
||
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Question on Error Message | Unregistered | Information & Answers | 3 | 2013-10-07 12:40 |
Strange Message in my Individual Account Report | jinydu | PrimeNet | 3 | 2006-11-06 11:42 |
error message help? | AurKayne | Hardware | 3 | 2005-08-29 09:13 |
Error message... | Xyzzy | GMP-ECM | 2 | 2005-03-04 20:17 |
Error message | McBryce | NFSNET Discussion | 2 | 2003-07-07 11:35 |