mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Sudden sumout error (https://www.mersenneforum.org/showthread.php?t=13510)

esqrkim 2010-06-11 13:08

Sudden sumout error
 
I have successfully completed 6 LL tests on a Quad processor when suddenly the computer crashes due to sumout error after completing about 75% of a LL test on one of the cores. For several months, I have been running LL tests 24/7 on all four cores without a single problem until last night. Now the computer has a problem loading Windows. What could be the problem?

Thanks for your help.

Rhyled 2010-06-11 14:15

Temperatures?
 
SUMOUT errors are usually hardware failures from what I've seen. Since your system has been stable for months, I'd look for something to have changed - perhaps dust build up in the heatsink or a fan failure?

If you don't have another option, and Windows can still run, try RealTemp to check your cpu temperatures and make sure they're not skyrocketing. [URL]http://www.techpowerup.com/realtemp/[/URL]

sdbardwick 2010-06-11 20:12

IME, the most common causes of sudden Prime95 failure after an extended period of faultless operation are:
0. Unstable overclock of CPU or RAM.
1. Heat due to clogged/stopped CPU HSF.
2. Memory failure. Use [URL="http://www.memtest.org/"]Memtest86+[/URL] to verify.
3. Video driver bugs (primarily NVDIA); this has become much less likely than in years past, and is less likely than the first three.
4. Mainboard failure.
5. CPU failure.

esqrkim 2010-06-12 22:46

Well, I turned on the computer this morning and ran torture test for about 8 hours without overclocking. There was no problem.

The core that gave the sumout error had completed about 75% of the iteration for M50315939. The strange thing is that it lost all previous results for M50315939 and has started from zero. Is this normal when a sumout error occurs?

I am thinking that there might have been some software conflict. I allowed the system to perform some Windows related updating the night before. I don't think it ever prompted me for a restart. Has anyone experienced sumout error that was related to software issues?

cheesehead 2010-06-12 23:00

[quote=esqrkim;218370]The core that gave the sumout error had completed about 75% of the iteration for M50315939. The strange thing is that it lost all previous results for M50315939 and has started from zero. Is this normal when a sumout error occurs?[/quote]No, that's unusual. There should have been a save (checkpoint) file, from which it could have continued.

[quote]I am thinking that there might have been some software conflict. I allowed the system to perform some Windows related updating the night before. I don't think it ever prompted me for a restart. Has anyone experienced sumout error that was related to software issues?[/quote]Yes, I have.

Rhyled 2010-06-13 01:29

Curiouser and curiouser...
 
When a SUMOUT error occurs, Prime95 is supposed to return to the previous checkpoint file. That should only cost you 30 minutes or so of lost effort.

Here's a bizzare thought. Check your anti-virus log and see if it quarantined your checkpoint file. There should be 3 versions of your backup files (mine are listed as p3J15893 p3J15893.bu and p3J15893.bu2) in your Prime95 folder. I guess it's theoretically possible that the semi random nature of those backup files matched one of your virus signatures and got yanked.

I really need to get my avatar uploaded.

cheesehead 2010-06-13 03:44

[quote=esqrkim;218370]The strange thing is that it lost all previous results for M50315939 and has started from zero. Is this normal when a sumout error occurs?[/quote]Other things to check: how often is your save file written (default = 30 minutes). Now that you've started again, do save files exist for the current run? Is the folder to which the save files are written write-protected, so that the program does not have the authority to write there?

esqrkim 2010-06-13 17:18

Well, Windows prompted me to update my computer again. I performed the update and restarted the computer. To make a long story short, Prime 95 crashed again. The computer did an auto reboot. I ran the torture test again to see what would happen since the torture test ran for many hours before. This time all four cores has round off errors. I tried to restore the computer to a point prior to Windows update. Restoring operation was successful, but apparently some files got messed up. My next step is to reformat the HD, reinstall WinXP, and see if all the hardwares are functioning correctly. Then I'll run torture test again.

Thanks for all your inputs.


All times are UTC. The time now is 15:47.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.