mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Real hardware problem? (https://www.mersenneforum.org/showthread.php?t=226)

BigRed 2002-11-24 03:41

Real hardware problem?
 
I got a new 2.4GHz P4 system earlier this month. It's got the 533 FSB, an appropriate motherboard, 512Mb of PC2100 RAM.
It ran double-checks for 2 weeks with no errors reported. I ran the benchmarks and then the torture test for a while.
SelfTest448Passed=1

SelfTest896Passed=1
SelfTest1024Passed=1
SelfTest8Passed=1

I switched to a 1st-time test and errors are being reported
Iteration: 9922560/17177189, ERROR: ROUND OFF (0.5) > 0.40
Iteration: 9973993/17177189, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 4128410768192874 != 4128410755610102
Iteration: 10000902/17177189, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 1.727849801559226e+16 != 1.727849800300918
e+16
Iteration: 10035893/17177189, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 6.778154586969904e+16 != 6.778154585921175
e+16
[skip some more errors]
Iteration: 12870392/17177189, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 4.055797833010234e+16 != 4.055797832276154
e+16
Iteration: 12877696/17177189, ERROR: ROUND OFF (0.5) > 0.40
Iteration: 13033472/17177189, ERROR: ROUND OFF (0.4921875) > 0.40

Do I have a real problem? How did it manage 2 weeks of error-free double-checks?
Could it be some overheating problem that just started?
I'm running Linux kernel 2.4.18

outlnder 2002-11-24 08:01

I'm not sure, but your exponent may be in the range that P4s have problems with. I'm hoping someone can confirm this.

If so, unreserve the exponent and start one a lot bigger or smaller.

PageFault 2002-11-24 18:46

The nearest FFT crossover is M17660000 and this is far enough to be free of that problem. There are multiple error types as well, which indicates a problem.

There are some issues with bad drivers (soundblaster IIRC) which can cause floating point screwup.

Can you try another exponent?

Since it is a new box best to sort things out while it is still possible to take it back. Bad ram will do what you describe - run memtest. Can you survive superpi?

xtreme2k 2002-11-25 00:07

From what I gather I recommend you to run memtest86 the windows version. It is even more sensitive than Prime95 at detecting errors. Run it for around 10000% coverage on ~60% of your RAM and see if you can at least pass it.

BigRed 2002-11-25 21:36

It was overheating
 
No system fan and it started overheating when I moved the box to a new location. I pulled the side off and there's been no errors in the past 3 1/2 hours.
I'll look for a memtest program I can use with Linux - no Windows or DOS anywhere on this box. I'll also get a system fan added.

BigRed 2002-11-27 22:44

M33394931 starting slow?
 
This P4 finished M17177189 with no further problems. It started M33394931 17 hours ago. In prime.ini I've got
OutputIterations=20000
ResultsFileIterations=99999999
DiskWriteTime=30
NetworkRetryTime=2
NetworkRetryTime2=60
TwoBackupFiles=1

There's still only 1 save file being written, pX394931, which gets updated every 30 minutes. Is it doing factoring on this new exponent before the regular 1st-time test happens? I thought factoring liked RAM. I've got 512Mb and told mprime it could use 384. But the process is only using 1.3Mb. :question:
Is everything fine?

Kevin 2002-11-28 00:14

The second part of factoring likes RAM. You're doing the first part of factoring, which uses very little memory.

garo 2002-11-28 17:50

On a 33M exponent you are probably doing some trial factoring first which takes up very little RAM. Next you will do P-1 factoring which takes a lot of RAM (P-1 factoring itself is in two phases the second of which takes up the huge chunk of RAM). And then finally you get to do the LL test!


All times are UTC. The time now is 16:11.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.