mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Wanted: known-good motherboard (https://www.mersenneforum.org/showthread.php?t=9833)

garo 2008-01-17 20:39

Usual crap. Eliminate the variables one at a time:

1. Memory
2. Heat
3. Overclock if any stressing either of the above or the FSB.
4. Motherboard
5. CPU

Unfortunately, there is no short cut.

fivemack 2008-01-23 12:08

[QUOTE=garo;123064]Usual crap. Eliminate the variables one at a time:

1. Memory
2. Heat
3. Overclock if any stressing either of the above or the FSB.
4. Motherboard
5. CPU

Unfortunately, there is no short cut.[/QUOTE]

But how do I go about eliminating those variables? I've left memtest running overnight - no issue appeared. The machine isn't overclocked. I've written my own memory tester and run it on all four cores (admittedly unthreaded) - no issue appeared.

I'm not sure what else I can do without spending money; there's a slight temptation to buy a chunkier cooler for the CPU, but the temperatures I'm measuring on the CPU on the unreliable machine are no different from the ones I measure on a similar system (same CPU, different motherboard, less memory) that works.

Andi47 2008-01-23 13:50

As I have heared, memtest does not find every memory issue.

Does a Prime95 torture test fail on this computer?

1.) Try for example to build the computer with only one of the memory sticks and then run the torture test for 24 or 48 hours (or run a msieve postprocessing which fits into the memory), then change the memory sticks and run the torture test (or threaded msieve linear algebra) again, etc., until you have found the faulty memory. (If it fails with (nearly) all memory sticks, it is most propably not a memory issue.) You say that threaded linear algebra fails after 24-48 hours, so propably running a memory test (either memtest or prime95 torture test or something else) only overnight might not be sufficient to detect a memory issue.

You can also put the memory sticks into a known reliable computer and test if it fails now.

2.) What are your CPU- and motherboard temperatures (idle and stressed)?

3.) should be no issue as you say that the box is not OC'ed.

4.) I don't know how tho test this other than eliminating variables 1, 2, 3 and 5

5.) You can for example put the cpu onto an other motherboard and test if torture test or threaded msieve fails.

garo 2008-01-24 13:11

Like Andi47 said above...

There is no OC so that is ruled out. Do tell us what the temps on both machines are. Also, try and swap CPUs between the reliable and unreliable machine so that you can tell if the CPU itself is at fault.

paulunderwood 2008-01-24 14:00

[QUOTE=garo;123064]Usual crap. Eliminate the variables one at a time:

1. Memory
2. Heat
3. Overclock if any stressing either of the above or the FSB.
4. Motherboard
5. CPU

Unfortunately, there is no short cut.[/QUOTE]

Additionally, the PSU (power supply unit) ought to be checked out. With all that memory, plus devices, the PSU might just not be up to it. I suggest you swap PSUs and re-check.

Andi47 2008-01-24 17:09

[QUOTE=paulunderwood;123790]Additionally, the PSU (power supply unit) ought to be checked out. With all that memory, plus devices, the PSU might just not be up to it. I suggest you swap PSUs and re-check.[/QUOTE]

@Fivemack: What are the Voltages (VCore, etc.) of your PC?

fivemack 2008-01-27 10:52

I've looked at all the BIOS options, and found something about memory timing which was set to 'Turbo'; I reset it to a more conservative level, and have managed to run one thread of msieve for 96 hours without hitting issues.

That seems to make sense to explain the symptoms - it is likely that 'Turbo' is an option which doesn't take account of the rather large memory loading in an 8GB system.

Andi47 2008-01-27 12:33

[QUOTE=fivemack;124046]I've looked at all the BIOS options, and found something about memory timing which was set to 'Turbo'; I reset it to a more conservative level, and have managed to run one thread of msieve for 96 hours without hitting issues.

That seems to make sense to explain the symptoms - it is likely that 'Turbo' is an option which doesn't take account of the rather large memory loading in an 8GB system.[/QUOTE]

Do you also try multithreaded msieve?

Has one-threaded msieve also failed with the Turbo-option on?

garo 2008-01-28 16:31

While Turbo may be part of the problem, I would advise you to dig deeper and figure out which component the Turbo setting was stressing. That component is probably vulnerable (unless Turbo was really screwing some settings) and more likely to fail soon.

fivemack 2008-01-28 23:38

Yes, single-threaded msieve had failed in the past with Turbo on.

On the other hand, a four-threaded msieve with Turbo off has just failed.

The particularly tiresome bit in all this is that the fast machine is the NFS server to a small farm of headless machines, and so they can't get anything done while I'm fiddling with the fast machine, and having acquired the farm I feel peculiarly distressed when it's not working efficiently.

fivemack 2008-02-04 10:00

Output from 'sensors' on the unreliable machine

Core temperatures +72, +69, +70, +71 C
Voltages 1.18V, 1.74V, 3.38V, 3.02V, 1.38V, 0V, 0.08V, 3.07V, 3.10V
fan 1991RPM
case temperatures +42, +57, -2 (I presume this last one is an error)

Output from 'sensors' on the reliable machine running four threads of linear algebra:

Core temperatures +59C, +52C, +58C, +52C

Maybe I should go and buy some thermal grease and a 500W PSU; it's quite possible that the CPU fan isn't well-installed on the unreliable machine.


All times are UTC. The time now is 08:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.