![]() |
![]() |
#1 | |
Aug 2002
10000011012 Posts |
![]()
George, Why did it try 1000 iterations of 0K FFT?
Quote:
The result was a loss of 2 months work! |
|
![]() |
![]() |
![]() |
#2 |
Aug 2002
2·3·53 Posts |
![]()
There must be a bigger problem than just sound playing.
My daily usage machine plays sounds, games, email, word processing, browsing and anything else a computer can do and I never get a sumout error. May I suggest that you run the torture test for 24 hours and see if you get any errors. |
![]() |
![]() |
![]() |
#3 |
Dec 2002
11112 Posts |
![]()
On an old Celeron I had, GIMPS would sumout when I used the modem. All you do is when you are likely to have a sumout stop processing !
I never had any other problems with my modem but then again my computer was a cheap one made by an obscure company ! |
![]() |
![]() |
![]() |
#4 |
Aug 2002
10000011012 Posts |
![]()
Outlnder, I agree. It is not just any "sound" that causes it a problem. It is two distinct repeatable instances of "sound" that I have found. This machine ran from Sep 4th to Dec 12th nonstop without an error of any kind. Then the 1st greeting card. Another greeting card on Dec 24th and then the web site on Jan 8th. It has been running the torture test now for 15 hours without a glitch.
Possible suggestions for client changes: 1) allow for more than 2 save files 2) suspend writing save files when there is an error until the "save period" expires without an error 3) change the recovery procedure to not delete the save files when it uses them or at least the oldest one - perhaps there has to be a save file out of the p q sequence that is left alone for manual intervention at least copy the q to the p instead of renaming it. I know that this will take longer, but this is error recovery, not normal operation 4) allow for comments in the worktodo.ini file. this would allow the client to comment out the "problem" exponent and go on to the next one. 5) Don't rely on other programs to "properly use the Floating point registers" reinitialize them as necessary |
![]() |
![]() |
![]() |
#5 |
Aug 2002
Termonfeckin, IE
24×173 Posts |
![]()
You could make your setup a bit more robust by letting Prime95 save intermediate files every 1000000 iterations or whatever. Look in undoc.txt so see how - or perhaps it's in the readme. I think it is impossible for Prime95 to check on the run if the FPU registers are properly intialized. In a reasonably used system there are context switches taking place all the time and if Prime95 were to check the state of the CPU every time it was switched back into context the thruoghput will go kaput.
As a matter of principle Prime95 should not try to fix a problem that was created by other programs. Let us remember that modem drivers and sound card drivers etc. are SUPPOSED to but the FPU back in the correct state. Finally, try and hunt around for newer drivers. That solves the problem in several cases. |
![]() |
![]() |
![]() |
#6 |
P90 years forever!
Aug 2002
Yeehaw, FL
11110110100012 Posts |
![]()
Joe, several points:
1) ILLEGAL SUMOUT errors are often caused by bad sound card drivers. Look for a newer one. The good news is these errors are usually quite benign - just a small loss of cpu time. 2) Prime95 cannot detect these bad drivers. The driver can interrupt prime95 after ANY instruction. 3) Your disaster is likely unrelated to the two illegal sumout errors or a bad device driver. My guess is prime95's address space was severely corrupted - of course by cause unknown (prime95 bug, hardware glitch, OS bug, driver bug, etc). The in memory copy of FFT limits must have been bad for prime95 to choose the wrong FFT size. Then reading the intermediate files failed again for reasons unknown. I feel your pain but I don't know how to improve prime95 for this case. The deleting bad save files code has worked quite well until today. I don't know how prime95 can detect "I'm in a really bad corrupt state and should exit" rather than "The save file is corrupt and I should delete it and try the other one". |
![]() |
![]() |
![]() |
#7 | |
P90 years forever!
Aug 2002
Yeehaw, FL
73×23 Posts |
![]() Quote:
2) Errors during a save file write operation does not delete the existing save files. Prime95 retries every 10 minutes until successful. You did not have any errors during a save file write. 3) Perhaps prime95 should rename bad save files instead of deleting them. The downside is it might fill up the disk in some other pathological case. However, this deserves further thought. Some compromise might be workable. 4) Yes, that would be nice. 5) I can't reinitialize after every interrupt return - it could happen literally anywhere! |
|
![]() |
![]() |
![]() |
#8 |
Aug 2002
3×52×7 Posts |
![]()
George, thank you for your replies.
I must not have written my 2) clearly or I do not understand your reply. I read your reply to be for the situation where the error is in the writing of the file. What I meant to state, is not to write a new file when an error has been detected in the process during the interval. This would preserve the last "good" save file(s) until another good one can be written. Thanks again! |
![]() |
![]() |
![]() |
#9 | |
Aug 2002
3×52×7 Posts |
![]()
Garo,
Do you mean the following from undoc.txt? Quote:
[code:1]InterimFiles=100000 InterimResidues=1000000[/code:1] and see what happens. Thanks. |
|
![]() |
![]() |
![]() |
#10 |
P90 years forever!
Aug 2002
Yeehaw, FL
73×23 Posts |
![]()
Set InterimFiles to 500000 or 1 million to avoid a plethora of files.
You do not need InterimResidues. |
![]() |
![]() |
![]() |
#11 | |
P90 years forever!
Aug 2002
Yeehaw, FL
73·23 Posts |
![]() Quote:
1) You got an illegal sumout at iteration 14277765. 2) Prime95 read the first save file made at iteration 14277155. So far so good. 3) Prime95 became very confused and didn't know what FFT size to use. 4) Prime95 got an error using this funny FFT size. 5) Prime95 tried to read the first save file again - and this time it couldn't!! 6) Prime95 deleted the save file since it could not read it. Then it read the second save file at iteration 14276865. 7) This worked, raised an error, and then rereading the save file failed again!! It was deleted and you were back at ground zero :( There was no writing of save files during this time. What is truly weird is that it read the save file once, but then failed reading it 2 seconds later. |
|
![]() |
![]() |