mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2003-01-12, 23:56   #1
Joe O
 
Joe O's Avatar
 
Aug 2002

10000011012 Posts
Default 0K FFT? How can that be?

George, Why did it try 1000 iterations of 0K FFT?
Quote:
[Sun Jan 12 06:58:15 2003]
Iteration 14200000 / 15263207
[Sun Jan 12 15:19:52 2003]
Iteration: 14277765/15263207, ERROR: ILLEGAL SUMOUT
Possible hardware failure, consult the readme.txt file.
Continuing from last save file.
[Sun Jan 12 15:25:13 2003]
Iteration: 14277172/15263207, ERROR: ILLEGAL SUMOUT
Possible hardware failure, consult the readme.txt file.
Continuing from last save file.
Trying 1000 iterations for exponent 15263207 using 0K FFT.
If average roundoff error is above 0.2345, then a larger FFT will be used.
[Sun Jan 12 15:30:14 2003]
Final average roundoff error is 0, using 0K FFT for exponent 15263207.
Iteration: 14277155/15263207, ERROR: ROUND OFF (0.490234375) > 0.40
Continuing from last save file.
Error reading intermediate file: pF263207
Renaming intermediate file qF263207 to pF263207.
Iteration: 14276865/15263207, ERROR: ROUND OFF (0.4204217202) > 0.40
Continuing from last save file.
Error reading intermediate file: pF263207
Iteration: 3/15263207, ERROR: ILLEGAL SUMOUT
Possible hardware failure, consult the readme.txt file.
Continuing from last save file.
[Sun Jan 12 15:35:52 2003]
Iteration: 3/15263207, ERROR: ILLEGAL SUMOUT
Possible hardware failure, consult the readme.txt file.
The first two illegal sumouts were caused by some sound being played at a website! This is the second time I have seen sound cause the illegal sumout on this machine. The first time was a greeting card, this time it was just background at a web site. Can the client be made more robust?

The result was a loss of 2 months work!
Joe O is offline   Reply With Quote
Old 2003-01-13, 01:43   #2
outlnder
 
outlnder's Avatar
 
Aug 2002

2·3·53 Posts
Default

There must be a bigger problem than just sound playing.

My daily usage machine plays sounds, games, email, word processing, browsing and anything else a computer can do and I never get a sumout error.

May I suggest that you run the torture test for 24 hours and see if you get any errors.
outlnder is offline   Reply With Quote
Old 2003-01-13, 13:53   #3
S3SJK
 
Dec 2002

11112 Posts
Default

On an old Celeron I had, GIMPS would sumout when I used the modem. All you do is when you are likely to have a sumout stop processing !

I never had any other problems with my modem but then again my computer was a cheap one made by an obscure company !
S3SJK is offline   Reply With Quote
Old 2003-01-13, 14:22   #4
Joe O
 
Joe O's Avatar
 
Aug 2002

10000011012 Posts
Default

Outlnder, I agree. It is not just any "sound" that causes it a problem. It is two distinct repeatable instances of "sound" that I have found. This machine ran from Sep 4th to Dec 12th nonstop without an error of any kind. Then the 1st greeting card. Another greeting card on Dec 24th and then the web site on Jan 8th. It has been running the torture test now for 15 hours without a glitch.

Possible suggestions for client changes:
1) allow for more than 2 save files
2) suspend writing save files when there is an error until the "save period" expires without an error
3) change the recovery procedure to not delete the save files when it uses them or at least the oldest one - perhaps there has to be a save file out of the p q sequence that is left alone for manual intervention at least copy the q to the p instead of renaming it. I know that this will take longer, but this is error recovery, not normal operation
4) allow for comments in the worktodo.ini file. this would allow the client to comment out the "problem" exponent and go on to the next one.
5) Don't rely on other programs to "properly use the Floating point registers" reinitialize them as necessary
Joe O is offline   Reply With Quote
Old 2003-01-13, 19:31   #5
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

24×173 Posts
Default

You could make your setup a bit more robust by letting Prime95 save intermediate files every 1000000 iterations or whatever. Look in undoc.txt so see how - or perhaps it's in the readme. I think it is impossible for Prime95 to check on the run if the FPU registers are properly intialized. In a reasonably used system there are context switches taking place all the time and if Prime95 were to check the state of the CPU every time it was switched back into context the thruoghput will go kaput.

As a matter of principle Prime95 should not try to fix a problem that was created by other programs. Let us remember that modem drivers and sound card drivers etc. are SUPPOSED to but the FPU back in the correct state.

Finally, try and hunt around for newer drivers. That solves the problem in several cases.
garo is offline   Reply With Quote
Old 2003-01-13, 19:57   #6
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11110110100012 Posts
Default

Joe, several points:

1) ILLEGAL SUMOUT errors are often caused by bad sound card drivers. Look for a newer one. The good news is these errors are usually quite benign - just a small loss of cpu time.

2) Prime95 cannot detect these bad drivers. The driver can interrupt prime95 after ANY instruction.

3) Your disaster is likely unrelated to the two illegal sumout errors or a bad device driver. My guess is prime95's address space was severely corrupted - of course by cause unknown (prime95 bug, hardware glitch, OS bug, driver bug, etc). The in memory copy of FFT limits must have been bad for prime95 to choose the wrong FFT size. Then reading the intermediate files failed again for reasons unknown.

I feel your pain but I don't know how to improve prime95 for this case. The deleting bad save files code has worked quite well until today. I don't know how prime95 can detect "I'm in a really bad corrupt state and should exit" rather than "The save file is corrupt and I should delete it and try the other one".
Prime95 is offline   Reply With Quote
Old 2003-01-13, 20:04   #7
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

73×23 Posts
Default

Quote:
Originally Posted by Joe O
Possible suggestions for client changes:
1) allow for more than 2 save files
2) suspend writing save files when there is an error until the "save period" expires without an error
3) change the recovery procedure to not delete the save files when it uses them or at least the oldest one - perhaps there has to be a save file out of the p q sequence that is left alone for manual intervention at least copy the q to the p instead of renaming it. I know that this will take longer, but this is error recovery, not normal operation
4) allow for comments in the worktodo.ini file. this would allow the client to comment out the "problem" exponent and go on to the next one.
5) Don't rely on other programs to "properly use the Floating point registers" reinitialize them as necessary
1) More than 2 save files would not have helped you. In prime95's strange state it looks like it would have plowed through and deleted them all.

2) Errors during a save file write operation does not delete the existing save files. Prime95 retries every 10 minutes until successful. You did not have any errors during a save file write.

3) Perhaps prime95 should rename bad save files instead of deleting them. The downside is it might fill up the disk in some other pathological case. However, this deserves further thought. Some compromise might be workable.

4) Yes, that would be nice.

5) I can't reinitialize after every interrupt return - it could happen literally anywhere!
Prime95 is offline   Reply With Quote
Old 2003-01-13, 22:22   #8
Joe O
 
Joe O's Avatar
 
Aug 2002

3×52×7 Posts
Default

George, thank you for your replies.
I must not have written my 2) clearly or I do not understand your reply. I read your reply to be for the situation where the error is in the writing of the file. What I meant to state, is not to write a new file when an error has been detected in the process during the interval. This would preserve the last "good" save file(s) until another good one can be written.

Thanks again!
Joe O is offline   Reply With Quote
Old 2003-01-13, 22:40   #9
Joe O
 
Joe O's Avatar
 
Aug 2002

3×52×7 Posts
Default

Garo,
Do you mean the following from undoc.txt?
Quote:
You can have the program generate save files every n iterations. The files will have a .XXX extension where XXX equals the current iteration divided by n. In prime.ini enter:
InterimFiles=n

You can have the program output residues every n iterations. The default
value is the InterimFiles value. In prime.ini enter:
InterimResidues=n
I tried the InterimFiles once without the InterimResidues and it produced way too much output. I'll give it a try with:
[code:1]InterimFiles=100000
InterimResidues=1000000[/code:1]
and see what happens. Thanks.
Joe O is offline   Reply With Quote
Old 2003-01-14, 00:38   #10
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

73×23 Posts
Default

Set InterimFiles to 500000 or 1 million to avoid a plethora of files.

You do not need InterimResidues.
Prime95 is offline   Reply With Quote
Old 2003-01-14, 00:46   #11
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

73·23 Posts
Default

Quote:
Originally Posted by Joe O
I must not have written my 2) clearly or I do not understand your reply. I read your reply to be for the situation where the error is in the writing of the file. What I meant to state, is not to write a new file when an error has been detected in the process during the interval. This would preserve the last "good" save file(s) until another good one can be written.
I read the orginal output as:

1) You got an illegal sumout at iteration 14277765.
2) Prime95 read the first save file made at iteration 14277155. So far so good.
3) Prime95 became very confused and didn't know what FFT size to use.
4) Prime95 got an error using this funny FFT size.
5) Prime95 tried to read the first save file again - and this time it couldn't!!
6) Prime95 deleted the save file since it could not read it. Then it read the second save file at iteration 14276865.
7) This worked, raised an error, and then rereading the save file failed again!! It was deleted and you were back at ground zero :(

There was no writing of save files during this time. What is truly weird is that it read the save file once, but then failed reading it 2 seconds later.
Prime95 is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 12:56.


Fri May 27 12:56:37 UTC 2022 up 43 days, 10:57, 1 user, load averages: 1.92, 1.79, 1.72

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔