mersenneforum.org Strange error message
 Register FAQ Search Today's Posts Mark Forums Read

 2005-06-04, 03:11 #1 FeLiNe     Dec 2003 23 Posts Strange error message Awright -- a while back I had this strange occurence after 38% of a prime number: Code: [18:14] Sven [suttang:prime/gimps] > mprime -d Mersenne number primality test program version 23.9 Resuming primality test of M34469543 at iteration 13314917 [38.62%] Iteration: 13314924/34469543, ERROR: ROUND OFF (0.4375) > 0.40 Continuing from last save file. Resuming primality test of M34469543 at iteration 13314917 [38.62%] Disregard last error. Result is reproducible and thus not a hardware problem. For added safety, redoing iteration using a slower, more reliable method. Continuing from last save file. Resuming primality test of M34469543 at iteration 13314917 [38.62%] Iteration: 13314924/34469543, ERROR: SUM(INPUTS) != SUM(OUTPUTS), -1.039789371852531e+16 != -1.039789371852522e+16 Possible hardware failure, consult the readme.txt file. Continuing from last save file. Waiting five minutes before restarting. Segmentation fault I posted about this on the Ars board http://episteme.arstechnica.com/eve/...001974631/p/27 And commented as follows: So first it tells me it is hardware, then it tells me to disregard that. Then it tells me again that it is hardware. Then it sits for five minutes, then it segfaults. This can be repeated indefinitely - I tried it a couple times, all the same result. HOWEVER: that same machine shows no signs of problems on a 60-hour (friday evening to monday morning) torture test. I'd hate to lose the current crunch, but is it possible that the current save-file is borked? Should I throw it out and re-start? Should I give up GIMPS? Should I join a monastery? Any advice? At the time I decided to move the save-files out of the way and restart the client and see what happens. As it turns out the box picked up nicely and started crunching just fine. Until two days ago, when the end of my log-file shows this: Code: [May 31 01:42] Iteration: 17100000 / 34564069 [49.47%]. Per iteration time: 0.069 sec. [May 31 03:36] Iteration: 17200000 / 34564069 [49.76%]. Per iteration time: 0.069 sec. [May 31 05:30] Iteration: 17300000 / 34564069 [50.05%]. Per iteration time: 0.069 sec. Iteration: 17317988/34564069, ERROR: ROUND OFF (0.40625) > 0.40 Continuing from last save file. Resuming primality test of M34564069 at iteration 17308161 [50.07%] Disregard last error. Result is reproducible and thus not a hardware problem. For added safety, redoing iteration using a slower, more reliable method. Continuing from last save file. Resuming primality test of M34564069 at iteration 17317981 [50.10%] Iteration: 17317988/34564069, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 1.203546292570829e+16 != 1.20354629257089e+16 Possible hardware failure, consult the readme.txt file. Continuing from last save file. Waiting five minutes before restarting. And that's where it ended. I can restart it by hand, but then I get the same thing again as before: "your hardware is junk", "no, wait, disregard that, it isn't", "no, wait, it is", "I'll restart in 5 minutes", "segfault". So what is going on here? Why would my box run fine for two weeks and then suddenly have a hardware failure. Or maybe NOT have one. But not be restartable. Except that it can be restarted just fine if I discard the current crunch. I'm baffled. Anybody got an idea? (FWIW, this is a P4/2.8, running RH10)
 2005-06-04, 03:28 #2 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 22×5×397 Posts Your hardware is fine. Looks like a bug in the "redoing with slower, more reliable method" code. Try this until I can investigate further: 1) Copy the save file in case you need to email it to me for debugging. 2) Add "NearFFTLimitPct=0.0" to prime.ini 3) Restart mprime. Let us know if that helps.
 2005-06-04, 16:51 #3 FeLiNe     Dec 2003 23 Posts Well it was a thought, I suppose: Code: [9:43] Sven [suttang:prime/gimps] > echo "NearFFTLimitPct=0.0" >> prime.ini [9:43] Sven [suttang:prime/gimps] > tail -2 prime.ini SilentVictory=0 NearFFTLimitPct=0.0 [9:43] Sven [suttang:prime/gimps] > ./mprime -d Mersenne number primality test program version 23.9 Resuming primality test of M34564069 at iteration 17317981 [50.10%] Iteration: 17317988/34564069, ERROR: ROUND OFF (0.40625) > 0.40 Continuing from last save file. Resuming primality test of M34564069 at iteration 17317981 [50.10%] Disregard last error. Result is reproducible and thus not a hardware problem. For added safety, redoing iteration using a slower, more reliable method. Continuing from last save file. Resuming primality test of M34564069 at iteration 17317981 [50.10%] Iteration: 17317988/34564069, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 1.203546292570829e+16 != 1.20354629257089e+16 Possible hardware failure, consult the readme.txt file. Continuing from last save file. Waiting five minutes before restarting. Segmentation fault That alone didn't do it. But if the variable NearFFTLimitPct is what I think it is (i.e. what its name suggests) wouldn't it have to be set to something greater than zero to make a difference?
 2005-06-04, 19:28 #4 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 1F0416 Posts Try "NearFFTLimit=-2". This is just a hack to force mprime to not run error-checking every iteration. It will still run error-checking every 128 iterations and thus you may still run into the problem. I'm trying to debug it now.
 2005-06-06, 01:16 #5 FeLiNe     Dec 2003 278 Posts Code: [17:58] Sven [suttang:prime/gimps] > tail -3 prime.ini TwoBackupFiles=1 NearFFTLimit=-2 SilentVictory=0 [17:58] Sven [suttang:prime/gimps] > ./mprime -d Mersenne number primality test program version 23.9 Contacting PrimeNet Server. Updating computer information on the server Sending expected completion date for M34564069: Jun 19 2005 Done communicating with server. Resuming primality test of M34564069 at iteration 17317981 [50.10%] Iteration: 17317988/34564069, ERROR: ROUND OFF (0.40625) > 0.40 Continuing from last save file. Resuming primality test of M34564069 at iteration 17317981 [50.10%] Disregard last error. Result is reproducible and thus not a hardware problem. For added safety, redoing iteration using a slower, more reliable method. Continuing from last save file. Resuming primality test of M34564069 at iteration 17317981 [50.10%] Iteration: 17317988/34564069, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 1.203546292570829e+16 != 1.20354629257089e+16 Possible hardware failure, consult the readme.txt file. Continuing from last save file. Waiting five minutes before restarting. Segmentation fault Nope, doesn't work either. Of course if "every 128 iterations" includes "the zeroth iteration" and it is the first iteration after the current checkpoint, then this was bound to fail... Well, let me know if there's anything else I should try -- I'd be happy to try every switch you might have in the software... (I'm slightly baffled here -- am I the only person who uses 23.9 under Linux on a P4? Or what exactly is the determining parameter here?)
2005-06-06, 01:38   #6
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

1F0416 Posts

Quote:
 Originally Posted by FeLiNe (I'm slightly baffled here -- am I the only person who uses 23.9 under Linux on a P4? Or what exactly is the determining parameter here?)
I have one other user with the same problem. You have to be testing an exponent near the limit of what the FFT can handle. I'm still not sure if the SUMINP != SUMOUT is due to an unlucky bit pattern or uninitialized variable or some other bug.

I'll keep you posted.

You could edit local.ini and change 1835008 to 2097152. Run a few iterations and then change it back.

 2005-06-06, 02:27 #7 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 22×5×397 Posts Ugh. I've been debugging 24.12 and the bug I thought I found did not exist in version 23.9. At least a bug was squashed. Back to the drawing board. I wish I had a Linux SSE2 machine here to debug on. Have you tried version 24.11? If not, give it a whirl. I don't feel comfortable putting up a fixed version 24.12 just yet. You can get 24.11 from ftp://mersenne.org/gimps/mpr2411.tgz
2005-06-06, 05:18   #8
FeLiNe

Dec 2003

23 Posts

Quote:
 Originally Posted by Prime95 You could edit local.ini and change 1835008 to 2097152. Run a few iterations and then change it back.
Wow -- this seems to have done the trick.

Talk about something I'd never have dreamed of trying...

Quote:
 Originally Posted by Prime95 Have you tried version 24.11? If not, give it a whirl.
Huh? No, I haven't tried that. I tend to run whatever is named on the official download page at http://www.mersenne.org/freesoft.htm as I figure that's what's stable and "recommended" right now. Maybe I'll have a look at the latest version if/when I have a little more time...

For now, thanks for the hint -- changing the SoftCrossover was apparently all that was needed...

 Similar Threads Thread Thread Starter Forum Replies Last Post Unregistered Information & Answers 3 2013-10-07 12:40 jinydu PrimeNet 3 2006-11-06 11:42 AurKayne Hardware 3 2005-08-29 09:13 Xyzzy GMP-ECM 2 2005-03-04 20:17 McBryce NFSNET Discussion 2 2003-07-07 11:35

All times are UTC. The time now is 15:35.

Mon Aug 8 15:35:36 UTC 2022 up 32 days, 10:22, 1 user, load averages: 1.10, 1.29, 1.27