mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2005-06-04, 03:11   #1
FeLiNe
 
FeLiNe's Avatar
 
Dec 2003

23 Posts
Question Strange error message

Awright -- a while back I had this strange occurence after 38% of a prime number:

Code:
[18:14] Sven [suttang:prime/gimps] > mprime -d
Mersenne number primality test program version 23.9
Resuming primality test of M34469543 at iteration 13314917 [38.62%]
Iteration: 13314924/34469543, ERROR: ROUND OFF (0.4375) > 0.40
Continuing from last save file.
Resuming primality test of M34469543 at iteration 13314917 [38.62%]
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Resuming primality test of M34469543 at iteration 13314917 [38.62%]
Iteration: 13314924/34469543, ERROR: SUM(INPUTS) != SUM(OUTPUTS), -1.039789371852531e+16 != -1.039789371852522e+16
Possible hardware failure, consult the readme.txt file.
Continuing from last save file.
Waiting five minutes before restarting.
Segmentation fault
I posted about this on the Ars board

http://episteme.arstechnica.com/eve/...001974631/p/27

And commented as follows:


So first it tells me it is hardware, then it tells me to disregard that. Then it tells me again that it is hardware. Then it sits for five minutes, then it segfaults.

This can be repeated indefinitely - I tried it a couple times, all the same result. HOWEVER: that same machine shows no signs of problems on a 60-hour (friday evening to monday morning) torture test.

I'd hate to lose the current crunch, but is it possible that the current save-file is borked? Should I throw it out and re-start? Should I give up GIMPS? Should I join a monastery?

Any advice?



At the time I decided to move the save-files out of the way and restart the client and see what happens.

As it turns out the box picked up nicely and started crunching just fine. Until two days ago, when the end of my log-file shows this:

Code:
[May 31 01:42] Iteration: 17100000 / 34564069 [49.47%].  Per iteration time: 0.069 sec.
[May 31 03:36] Iteration: 17200000 / 34564069 [49.76%].  Per iteration time: 0.069 sec.
[May 31 05:30] Iteration: 17300000 / 34564069 [50.05%].  Per iteration time: 0.069 sec.
Iteration: 17317988/34564069, ERROR: ROUND OFF (0.40625) > 0.40
Continuing from last save file.
Resuming primality test of M34564069 at iteration 17308161 [50.07%]
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Resuming primality test of M34564069 at iteration 17317981 [50.10%]
Iteration: 17317988/34564069, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 1.203546292570829e+16 != 1.20354629257089e+16
Possible hardware failure, consult the readme.txt file.
Continuing from last save file.
Waiting five minutes before restarting.
And that's where it ended. I can restart it by hand, but then I get the same thing again as before: "your hardware is junk", "no, wait, disregard that, it isn't", "no, wait, it is", "I'll restart in 5 minutes", "segfault".

So what is going on here? Why would my box run fine for two weeks and then suddenly have a hardware failure. Or maybe NOT have one. But not be restartable. Except that it can be restarted just fine if I discard the current crunch.

I'm baffled.

Anybody got an idea?

(FWIW, this is a P4/2.8, running RH10)
FeLiNe is offline   Reply With Quote
Old 2005-06-04, 03:28   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22×5×397 Posts
Default

Your hardware is fine. Looks like a bug in the "redoing with slower, more reliable method" code.

Try this until I can investigate further:

1) Copy the save file in case you need to email it to me for debugging.
2) Add "NearFFTLimitPct=0.0" to prime.ini
3) Restart mprime.

Let us know if that helps.
Prime95 is offline   Reply With Quote
Old 2005-06-04, 16:51   #3
FeLiNe
 
FeLiNe's Avatar
 
Dec 2003

23 Posts
Default

Well it was a thought, I suppose:

Code:
[9:43] Sven [suttang:prime/gimps] > echo "NearFFTLimitPct=0.0" >> prime.ini
[9:43] Sven [suttang:prime/gimps] > tail -2 prime.ini
SilentVictory=0
NearFFTLimitPct=0.0
[9:43] Sven [suttang:prime/gimps] > ./mprime -d
Mersenne number primality test program version 23.9
Resuming primality test of M34564069 at iteration 17317981 [50.10%]
Iteration: 17317988/34564069, ERROR: ROUND OFF (0.40625) > 0.40
Continuing from last save file.
Resuming primality test of M34564069 at iteration 17317981 [50.10%]
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Resuming primality test of M34564069 at iteration 17317981 [50.10%]
Iteration: 17317988/34564069, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 1.203546292570829e+16 != 1.20354629257089e+16
Possible hardware failure, consult the readme.txt file.
Continuing from last save file.
Waiting five minutes before restarting.
Segmentation fault

That alone didn't do it.

But if the variable NearFFTLimitPct is what I think it is (i.e. what its name suggests) wouldn't it have to be set to something greater than zero to make a difference?
FeLiNe is offline   Reply With Quote
Old 2005-06-04, 19:28   #4
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1F0416 Posts
Default

Try "NearFFTLimit=-2".

This is just a hack to force mprime to not run error-checking every iteration. It will still run error-checking every 128 iterations and thus you may still run into the problem.

I'm trying to debug it now.
Prime95 is offline   Reply With Quote
Old 2005-06-06, 01:16   #5
FeLiNe
 
FeLiNe's Avatar
 
Dec 2003

278 Posts
Default

Code:
[17:58] Sven [suttang:prime/gimps] > tail -3 prime.ini
TwoBackupFiles=1
NearFFTLimit=-2
SilentVictory=0
[17:58] Sven [suttang:prime/gimps] > ./mprime -d
Mersenne number primality test program version 23.9
Contacting PrimeNet Server.
Updating computer information on the server
Sending expected completion date for M34564069: Jun 19 2005
Done communicating with server.
Resuming primality test of M34564069 at iteration 17317981 [50.10%]
Iteration: 17317988/34564069, ERROR: ROUND OFF (0.40625) > 0.40
Continuing from last save file.
Resuming primality test of M34564069 at iteration 17317981 [50.10%]
Disregard last error.  Result is reproducible and thus not a hardware problem.
For added safety, redoing iteration using a slower, more reliable method.
Continuing from last save file.
Resuming primality test of M34564069 at iteration 17317981 [50.10%]
Iteration: 17317988/34564069, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 1.203546292570829e+16 != 1.20354629257089e+16
Possible hardware failure, consult the readme.txt file.
Continuing from last save file.
Waiting five minutes before restarting.
Segmentation fault
Nope, doesn't work either.

Of course if "every 128 iterations" includes "the zeroth iteration" and it is the first iteration after the current checkpoint, then this was bound to fail...

Well, let me know if there's anything else I should try -- I'd be happy to try every switch you might have in the software...

(I'm slightly baffled here -- am I the only person who uses 23.9 under Linux on a P4? Or what exactly is the determining parameter here?)
FeLiNe is offline   Reply With Quote
Old 2005-06-06, 01:38   #6
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1F0416 Posts
Default

Quote:
Originally Posted by FeLiNe
(I'm slightly baffled here -- am I the only person who uses 23.9 under Linux on a P4? Or what exactly is the determining parameter here?)
I have one other user with the same problem. You have to be testing an exponent near the limit of what the FFT can handle. I'm still not sure if the SUMINP != SUMOUT is due to an unlucky bit pattern or uninitialized variable or some other bug.

I'll keep you posted.


You could edit local.ini and change 1835008 to 2097152. Run a few iterations and then change it back.
Prime95 is offline   Reply With Quote
Old 2005-06-06, 02:27   #7
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22×5×397 Posts
Default

Ugh. I've been debugging 24.12 and the bug I thought I found did not exist in version 23.9. At least a bug was squashed. Back to the drawing board. I wish I had a Linux SSE2 machine here to debug on.

Have you tried version 24.11? If not, give it a whirl. I don't feel comfortable putting up a fixed version 24.12 just yet.

You can get 24.11 from ftp://mersenne.org/gimps/mpr2411.tgz
Prime95 is offline   Reply With Quote
Old 2005-06-06, 05:18   #8
FeLiNe
 
FeLiNe's Avatar
 
Dec 2003

23 Posts
Default

Quote:
Originally Posted by Prime95
You could edit local.ini and change 1835008 to 2097152. Run a few iterations and then change it back.
Wow -- this seems to have done the trick.

Talk about something I'd never have dreamed of trying...

Quote:
Originally Posted by Prime95
Have you tried version 24.11? If not, give it a whirl.
Huh? No, I haven't tried that. I tend to run whatever is named on the official download page at http://www.mersenne.org/freesoft.htm as I figure that's what's stable and "recommended" right now. Maybe I'll have a look at the latest version if/when I have a little more time...

For now, thanks for the hint -- changing the SoftCrossover was apparently all that was needed...
FeLiNe is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Question on Error Message Unregistered Information & Answers 3 2013-10-07 12:40
Strange Message in my Individual Account Report jinydu PrimeNet 3 2006-11-06 11:42
error message help? AurKayne Hardware 3 2005-08-29 09:13
Error message... Xyzzy GMP-ECM 2 2005-03-04 20:17
Error message McBryce NFSNET Discussion 2 2003-07-07 11:35

All times are UTC. The time now is 15:35.


Mon Aug 8 15:35:36 UTC 2022 up 32 days, 10:22, 1 user, load averages: 1.10, 1.29, 1.27

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔