mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2008-01-17, 20:39   #12
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

22·691 Posts
Default

Usual crap. Eliminate the variables one at a time:

1. Memory
2. Heat
3. Overclock if any stressing either of the above or the FSB.
4. Motherboard
5. CPU

Unfortunately, there is no short cut.
garo is offline   Reply With Quote
Old 2008-01-23, 12:08   #13
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

11001000101102 Posts
Default

Quote:
Originally Posted by garo View Post
Usual crap. Eliminate the variables one at a time:

1. Memory
2. Heat
3. Overclock if any stressing either of the above or the FSB.
4. Motherboard
5. CPU

Unfortunately, there is no short cut.
But how do I go about eliminating those variables? I've left memtest running overnight - no issue appeared. The machine isn't overclocked. I've written my own memory tester and run it on all four cores (admittedly unthreaded) - no issue appeared.

I'm not sure what else I can do without spending money; there's a slight temptation to buy a chunkier cooler for the CPU, but the temperatures I'm measuring on the CPU on the unreliable machine are no different from the ones I measure on a similar system (same CPU, different motherboard, less memory) that works.
fivemack is offline   Reply With Quote
Old 2008-01-23, 13:50   #14
Andi47
 
Andi47's Avatar
 
Oct 2004
Austria

9B216 Posts
Default

As I have heared, memtest does not find every memory issue.

Does a Prime95 torture test fail on this computer?

1.) Try for example to build the computer with only one of the memory sticks and then run the torture test for 24 or 48 hours (or run a msieve postprocessing which fits into the memory), then change the memory sticks and run the torture test (or threaded msieve linear algebra) again, etc., until you have found the faulty memory. (If it fails with (nearly) all memory sticks, it is most propably not a memory issue.) You say that threaded linear algebra fails after 24-48 hours, so propably running a memory test (either memtest or prime95 torture test or something else) only overnight might not be sufficient to detect a memory issue.

You can also put the memory sticks into a known reliable computer and test if it fails now.

2.) What are your CPU- and motherboard temperatures (idle and stressed)?

3.) should be no issue as you say that the box is not OC'ed.

4.) I don't know how tho test this other than eliminating variables 1, 2, 3 and 5

5.) You can for example put the cpu onto an other motherboard and test if torture test or threaded msieve fails.

Last fiddled with by Andi47 on 2008-01-23 at 13:54
Andi47 is offline   Reply With Quote
Old 2008-01-24, 13:11   #15
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

ACC16 Posts
Default

Like Andi47 said above...

There is no OC so that is ruled out. Do tell us what the temps on both machines are. Also, try and swap CPUs between the reliable and unreliable machine so that you can tell if the CPU itself is at fault.
garo is offline   Reply With Quote
Old 2008-01-24, 14:00   #16
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

5×751 Posts
Default

Quote:
Originally Posted by garo View Post
Usual crap. Eliminate the variables one at a time:

1. Memory
2. Heat
3. Overclock if any stressing either of the above or the FSB.
4. Motherboard
5. CPU

Unfortunately, there is no short cut.
Additionally, the PSU (power supply unit) ought to be checked out. With all that memory, plus devices, the PSU might just not be up to it. I suggest you swap PSUs and re-check.

Last fiddled with by paulunderwood on 2008-01-24 at 14:13
paulunderwood is offline   Reply With Quote
Old 2008-01-24, 17:09   #17
Andi47
 
Andi47's Avatar
 
Oct 2004
Austria

2×17×73 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Additionally, the PSU (power supply unit) ought to be checked out. With all that memory, plus devices, the PSU might just not be up to it. I suggest you swap PSUs and re-check.
@Fivemack: What are the Voltages (VCore, etc.) of your PC?

Last fiddled with by Andi47 on 2008-01-24 at 17:09
Andi47 is offline   Reply With Quote
Old 2008-01-27, 10:52   #18
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

2·132·19 Posts
Default

I've looked at all the BIOS options, and found something about memory timing which was set to 'Turbo'; I reset it to a more conservative level, and have managed to run one thread of msieve for 96 hours without hitting issues.

That seems to make sense to explain the symptoms - it is likely that 'Turbo' is an option which doesn't take account of the rather large memory loading in an 8GB system.
fivemack is offline   Reply With Quote
Old 2008-01-27, 12:33   #19
Andi47
 
Andi47's Avatar
 
Oct 2004
Austria

2×17×73 Posts
Default

Quote:
Originally Posted by fivemack View Post
I've looked at all the BIOS options, and found something about memory timing which was set to 'Turbo'; I reset it to a more conservative level, and have managed to run one thread of msieve for 96 hours without hitting issues.

That seems to make sense to explain the symptoms - it is likely that 'Turbo' is an option which doesn't take account of the rather large memory loading in an 8GB system.
Do you also try multithreaded msieve?

Has one-threaded msieve also failed with the Turbo-option on?
Andi47 is offline   Reply With Quote
Old 2008-01-28, 16:31   #20
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

22×691 Posts
Default

While Turbo may be part of the problem, I would advise you to dig deeper and figure out which component the Turbo setting was stressing. That component is probably vulnerable (unless Turbo was really screwing some settings) and more likely to fail soon.
garo is offline   Reply With Quote
Old 2008-01-28, 23:38   #21
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

11001000101102 Posts
Default

Yes, single-threaded msieve had failed in the past with Turbo on.

On the other hand, a four-threaded msieve with Turbo off has just failed.

The particularly tiresome bit in all this is that the fast machine is the NFS server to a small farm of headless machines, and so they can't get anything done while I'm fiddling with the fast machine, and having acquired the farm I feel peculiarly distressed when it's not working efficiently.
fivemack is offline   Reply With Quote
Old 2008-02-04, 10:00   #22
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

191616 Posts
Default

Output from 'sensors' on the unreliable machine

Core temperatures +72, +69, +70, +71 C
Voltages 1.18V, 1.74V, 3.38V, 3.02V, 1.38V, 0V, 0.08V, 3.07V, 3.10V
fan 1991RPM
case temperatures +42, +57, -2 (I presume this last one is an error)

Output from 'sensors' on the reliable machine running four threads of linear algebra:

Core temperatures +59C, +52C, +58C, +52C

Maybe I should go and buy some thermal grease and a 500W PSU; it's quite possible that the CPU fan isn't well-installed on the unreliable machine.
fivemack is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Good air-cooler good enough for overclocked i7-5820K RienS Hardware 17 2014-11-18 22:58
GPU placement on motherboard TheMawn GPU Computing 7 2013-08-17 04:43
Motherboard for AMD FX8350 fivemack Hardware 5 2012-10-23 20:56
Help choosing motherboard please. Flatlander GPU Computing 4 2011-01-26 08:15
Motherboard Selection Help jugbugs Hardware 13 2004-06-04 15:59

All times are UTC. The time now is 08:00.


Tue Jul 27 08:00:34 UTC 2021 up 4 days, 2:29, 0 users, load averages: 1.87, 1.87, 1.86

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.