![]() |
Safe computer disposal: Mariana's Trench?
In Nov. 2006, I installed Prime95 on an old desktop (with AMD Sempro 3000+).
It passed the 896K self-test that day and the 1024K self-test on Jan. 9, 2007. The first two LL tests, M17798563 and M18685553, finished without a problem. On April 10, it got a round-off error during the test of M18974113, which finished on April 13. It then tested M19278971 without a problem. Next, it went through 8 round-off errors before finishing M19458823 and then 2 round-off errors before finishing M18729281. On Sept. 17, while testing M17151733, a new error appeared: SUM(INPUTS) != SUM(OUTPUTS), 7315681248151225 != 7316015028466798 Possible hardware failure, consult the readme.txt file. This was followed by a round-off error on Sept. 20 and another SUM(INPUTS) error on Sept. 23. The LL test finished on Sept. 28. The next test was for M19914847, which featured 3 round-off errors. Then came M19747033, which had 7 round-off errors. The computer then did 1000 iterations for M20426981 using 1024K FFT and got an average roundoff error of 0.22651. Since this was less than 0.242, it went ahead with the 1024K FFT. Dozens of round-off errors and three months later, the test finished. On April 9, the lines "FATAL ERROR: Rounding was 0.5, expected less than 0.4 Hardware failure detected, consult stress.txt file." This message has been interspersed with "Final result was ********, expected: ********. Hardware failure detected, consult stress.txt file." since then, with at least one of the two appearing almost every day until today. As far as Primenet is concerned... [CODE]20621189 D 67 208.3 27.9 60.9 26-Sep-08 23:49 02-Mar-08 20:11 C473F3A73 1990 v19/v20 [/CODE] Any advice? Thanks |
That's a really clear-cut case of an unreliable system. Ten double-check results & *none* verified - two bad & most of the rest have suspect error codes. (According to the [URL="http://v5www.mersenne.org/report_exponent"]v5 server[/URL].) I'd fix whatever ails it first, before letting it do any more GIMPS work.
Common causes are bad ram (run memtest) or overheating (check your temperatures). It might be worth making sure you have the latest drivers for your system board & video, too. For some actual detail & other possibilities, try some of the links from the [URL="http://www.mersenneforum.org/showthread.php?t=5741"]"Please read before posting"[/URL] sticky in Information & Answers. Good luck! |
Have you run a stress test on it beside whatever short self-test Prime95 runs? Might want to do that...
|
[QUOTE=Mini-Geek;143923]Have you run a stress test on it beside whatever short self-test Prime95 runs? Might want to do that...[/QUOTE]
:rtfm: (RTF(start posting) smiley missing here) He *has* run a stress test in the beginning, and doing several double-checks with gave several errors is enough stress-testing to show that this machene is not relyable. @jinydu: Is the machine overclocked? If yes, reduce the overclock by one notch or two. What is the CPU-temperature (idle / max. load)? As markr sayd, at least one of the RAMs might be faulty: try to remove all but one RAMs and then re-run a stress test. (you may want to run stress-testing for 24 or even 48 hours). Swap RAM-elements until you have found the faulty one. (if your machine gives errors with all your RAM-elements, the fault might also be in the CPU or the motherboard.) |
[quote=Andi47;143951]:rtfm: (RTF(start posting) smiley missing here)
He *has* run a stress test in the beginning, and doing several double-checks with gave several errors is enough stress-testing to show that this machene is not relyable.[/quote] I didn't see any mention of a stress test besides the self-test (which I specifically mentioned), although on closer reading, I think the FATAL ERROR ... would only appear during a stress test (not sure). I know that doing several double-checks with errors makes it pretty obvious the machine has problems. I was just wondering if he had ever ran an actual, from-the-menu, manual, stress test. |
No, I've never done an actual stress test. But I can try that next time I get access to the machine.
Thanks |
[QUOTE=jinydu;143972]No, I've never done an actual stress test. But I can try that next time I get access to the machine.[/QUOTE]Just running prime95's stress test will only tell you the machine is unreliable - you already know that. It's time to try to isolate where the problem is. To see if it's memory-related, run memtest or prime95's stress test, but with Andi47's strategy to check one piece of ram at a time. Check the temperatures to see if it's heat-related. Good luck!
An afterthought: you mentioned it's an old machine. If the dust hasn't been cleaned out of its innards, including the heatsinks, in the last few months, then I'd do that too, regardless of what else you do. Mark |
[quote=markr;143975]Just running prime95's stress test will only tell you the machine is unreliable - you already know that. It's time to try to isolate where the problem is. To see if it's memory-related, run memtest or prime95's stress test, but with Andi47's strategy to check one piece of ram at a time. Check the temperatures to see if it's heat-related. Good luck!
An afterthought: you mentioned it's an old machine. If the dust hasn't been cleaned out of its innards, including the heatsinks, in the last few months, then I'd do that too, regardless of what else you do. Mark[/quote] Also what might be useful is to try running the individual portions of the Prime95 test routine: i.e., try the small-FFT test first, then try the large-FFT, and lastly try the blend test. If it fails on the small-FFT but not the large-FFT, then there's a significant chance that the problem isn't with your RAM but instead with your CPU or some other similarly-related component. If it fails the large-FFT but not the small-FFT, then you probably have a RAM-related problem. If it turns out to be a suspected RAM-related problem, as Andi47 and markr suggested, try swapping the RAM sticks in and out to troubleshoot that; you should also run memtest86+, that may give some helpful information. If it's a suspected CPU-related problem, first try giving the system a good cleaning-out. Use a vacuum (a canister vacuum with no special nozzles attached to the hose seems to work best) to clean out the CPU fan and heatsink; use a combination of the vacuum and compressed air on the rest of the stuff in the computer case. Once it's as clean as you're going to get it, try the small-FFT stress test again; if it remains stable over the course of a day or two, then it's probably safe to say your system is stable again. If not...then you either have a CPU that's just plain gone bad, or you have some kind of cooling problem and possibly need to upgrade the CPU fan/heatsink. Hope this helps! :smile: Max (temporarily known as A Sunny Moo, pending receipt of a note I sent to Xyzzy a day or two ago) :smile: |
[quote=A Sunny Moo;144007]Max (temporarily known as A Sunny Moo, pending receipt of a note I sent to Xyzzy a day or two ago) :smile:[/quote]
The ghosts of the gerbils usually mess up such name change requests. Maybe it's hard to type with little gerbil paws. |
A new coat of thermal grease is always a good thing to do, and make sure the heatsink is on the right way, and the fan is turning freely.
Check and monitor your CPU temps until you are satisfied and can eliminate this from the list of things to check. :wink: |
I think that a known-unreliable already-old low-end machine would best be served by the following four-step procedure:
* remove any identifying marks from the computer * find an area of open water far from regulatory authorities * hurl the computer into it from some height * acquire another computer |
| All times are UTC. The time now is 20:52. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.