![]() |
Isolated the problem, now what?
Here's what I've done so far:
1. Ran Memtest twice, going thru all 9 steps. Took more than an hour. No problem (even though to be more conclusive, it should have run much longer). 2. Ran Prime95 Torture Test - Blend test < 1 min before hardware error message 3. Ran Prime95 Torture Test - Small FFTs < 13 min before hardware error message Thinking that either of my Samsung PC2700 512MB RAM sticks were damaged, I decided to test them seperately. 1. Dimm 1: ran Blend for 12 minutes without any problems (Beat the 1 minute record when having both dimms in). 2. Dimm 2: ran Blend for 8 minutes without any problems (Beat the 1 minute record when having both dimms in). 3. Switched both dimms in slot 2 and 3 (they were in slot 1 and 3 before individual tests). 4. Ran Prime95 Blend and got a hardware error < 2 min Automatically, my motherboard runs both of my twin dimms in dual channel mode, whereas individual dimms are run using single channel mode. I'm thinking that it's the dual channel that is making Prime95 display the hardware error message. My system specs: Athlon XP 3000+ (stock speeds) Asus A7N8X Deluxe BIOS 1007 2x512 MB PC2700 Samsung RAM Geforce 3 Ti 200 Soundblaster Live value NEC 3250A Dual Layer Burner 2x120GB 8MB 7200RPM SATA Maxtor HD 1x60GB 2MB 7200RPM Quantum LM HD BIOS: Optimal settings (vs. aggressive) -FSB 167Mhz -2.5-3-3-7 CAS Timings [URL=http://www.dstudios.ca/temp/mm5.gif]Motherboard Monitor 5 screen capture[/URL] Would you think that its the dual channel that is creating the hardware error? |
try setting it to single channel mode for mem also relax mem timeings try again then. also try one stick then other.
|
As I said, I did try individual sticks, which indicated that it worked. It was only when I put both of them, in dual channel, that it didn't work. I appologize for not being clearer.
I'll have to check to see what is needed for running them in single channel. BIOS setting I figure. Could you suggest a better timming in order to give me a hint of what to change (newbie in CAS interpretations). Thanks! |
Memtest86 speeds report should indicate that mem is in single/dual channel config by what the throughput speed is. From what you posted I don't think you yet tried BOTH mem modules BUT in the slots to still give single channel (as opposed to the particular physical slots you must pair up for dual channel). Does THAT work? ie both modules single channel.
Possible make sure ALL your ram sockets are clean of dust etc. My suggestion would be to slightly increase your RAM voltage if your mobo bios settings support doing this. The reason is that supplying a given voltage, with one chip is ok, but when you have two in the load is greater which may cause a very slight voltage drop. This could be enough to change operation from stable to borderline. |
Well, having 2 dimms in slot 1 and 3 or 2 and 3 runs the memory in dual channel. Which is what Memtest ran under without any hitch. I never tried individual dimms in Memtest just because both, in dual channel, worked.
Now, I have them in slot 1 and 2 which runs them in single channel (BIOS cleaerly indicates this as well as the manual) Other info on my system: [URL=http://www.dstudios.ca/temp/cpuz_cpu.gif]CPUZ's CPU Info[/URL] [URL=http://www.dstudios.ca/temp/cpuz_cpu.gif]CPUZ's Memory Info[/URL] I'll check for dust in the case and will clean the slots with compressed air. Will check the RAM voltages too. Thanks for the tip! So, the new CPU would suck more energy than the old one, thus robbing the memory of some power? I tried Prime95 having both dimms in single channel mode (slot 1 and 2) and it still gave me the error message. So, having both dimms in, either dual or single channel, seems to cause instability in my system... Note: 3 times this morning, the case temps have gone from 26C to 106C? MM5 freaks out and stuff... |
hmm try turning off mbm5 for a bit and run prime95 wonder if its causeing errors.
|
Thanks for the hint! I just tried it without MM5 running:
FATAL ERROR: Resulting wum was -3.914402764883792e+028, expected: 9.6197848261e+016 Hardware failure detected, consult stress.txt file. Torture Test ran 0 minutes - 1 error, 0 warnings. Execution halted. This is the error message that I get across the board in all my test... just different numbers maybe. |
I increased the "DDR Reference Voltage" from 2.6V to 2.7. I then booted and ran Prim95. As soon as I got in the app, it rebooted my whole system. I guess that wasn't it?
|
Do you have an adequate power supply? I think you need at least a 300 W unit.
Is your system stable when not running GIMPS? What kind of heat-sink do you use? What's your CPU temperature under the full load? 52C appearing in your MBM screen capture sounds too low for a full load temp... |
Try underclocking your CPU to half speed and running the tests.
This will reduce heat and power usage by the CPU. |
Try using slower memory timings. Perhaps 3-4-4-8 and see if it stabilises.
|
doesnt worry bout his pent 4 its only 1.8 ghz and stays in range of 99 to 109 degrees
|
moo
I hope those temps are in degrees fahrenheit. :grin: |
o they are ;)
|
My HSF is a Slient Boost from Thermaltake.
My PSU is a 400W from Antec. The temperature mentionned are at idle. Using Prime95, I've seen 62C on MM5... My system was stable with my 2100+ (and waaayyy cooler) but seems rather finiky with my new 3000+. (Premiere crashed, Firefox crashed (seems ok with version 1.0.1 had 1.0.2)... |
Can you try with the case/panels off ?
Should lower the temp significantly. It is how I run my PC, front and both side panels removed. Makes a big difference in the summer (stable or not), and much cooler in the winter. That combined with underclocking CPU/memory will let you find stable operating parameters. |
Your PSU is more than adequate :cool:
But I'm not sure about the heat-sink. [URL=http://www.frostytech.com/articleview.cfm?articleid=1551&page=5]FrostyTech says[/URL] it's good, but as you can see from the list there are better. If you want to run GIMPS 24/7 I'll suggest you forget about the silence and go for sheer performance. As they say in the [URL=http://www.frostytech.com/articleview.cfm?articleID=1274]review[/URL] for Thremalright AX7 [QUOTE]noise levels are enough to wake the dead, but for those of us who really the performance it is the only way to go. [/QUOTE] I see you are not overclocking but maybe the air-flow in your case is poor. Also check the contact between the CPU and the heat-sink. To be certain about the memory, run memtest86+ the first night, Windows Memory Diagnostics the second night, and if possible, GIMPS memory test the 3rd. |
Here's a pic of my case layout. I can't see how I could make this better with everything that's in there...
[URL=http://www.dstudios.ca/temp/case.jpg]Case layout picture[/URL] I'm thinking of possibly going with a better heatsink fan to replace the one on my Silent Boost. The heatsink is copper, so, maybe the fan would give me better results? Copper is copper no? I could start the tests tomorrow morning, I have relatives in town and they sleep in my computer room. Will report back, ASAP... |
It's so neat inside your case :geek:
Try also what [B]dsouza[/B] said, to run with a case panel facing cpu removed. Then direct an external (room) fan to the cpu/mobo and try to run GIMPS. If there are no errors then your problem is cpu overheat. Otherwise it's something else. But you have to fix the problem, it's not good to run without the side panel all the time (mobo/chipset temps go up, not to mention troubles if you have children). Try also the following: (1) if you have used the heatsink for a while, check for dust on heatsink fins, if there is some, remove the fan and vacuum the dust. I have [URL=http://www.frostytech.com/articleview.cfm?articleid=1128&page=1]Coolermaster HHC-001[/URL] and that's what I do once every 3-4 months. The cpu temp goes down 2-3C after cleaning. (2) Try to move the HDD from behind the front case fan (2nd from the bottom) to the slot right under the top HDD to improve the air flow. And good luck! |
insted of moveing case fan to clean buy can o compressed air have vacum in case shoot it was short blasts and vacum up dust before it settles ;)
|
Kosmaj and Dsouza,
I tried Memtest for 6 hours, no problem. I tried MS Memory Diagnostics for 6 hours, no problem. I removed the front panel of my case, loaded Prime95 and ran it. I still got the same "Hardware Error" as usual, and in the same time too <2 min. My temps were 5C less though. This computer has been thouroughly cleaned before doing any of the testing in this thread. I'm running out of :censored: patience and tests! LOL! Thanks for all the help! |
have u underclock proc yet.... also what os
|
This is Windows XP SP2, with all the patches...
Underclock the CPU? The goal would be to bring it back to it's stock speed right? |
Yes, that is the goal.
Right now it could be RAM (bad, or running too fast (maybe isn't up to it's own spec), insufficient voltage) CPU (overheating (heat not removed, poor thermal conduction), insufficient voltage) MB/Chipset (overheating) OS/APPS (interaction with other software: driver not restoring registers/FPU status, other program not restoring registers/FPU status) Settings: With the CPU underclocked 1/2 speed and memory set to conservative settings, side panel(s) removed to reduce heat. Running in Safe Mode, (different generic drivers are run, also only minimal services are run). Most likely in order marginal RAM at given spec, MB/Chipset overheating, CPU, OS/APPS. If it runs stable in with above Settings, increase the CPU speed until unstable. If RAM is the issue even the minimal settings wont be stable. |
Doesn't sound like a CPU overheatting problem because with the side panel off it was supposed to run at least a little bit longer without errors. Which brings us back to the beginning... with the order of probable failurs as listed by [B]dsouza[/B] above.
Can you try again memory tests, one stick at the time, first Prime95 blend, then on failure several passes of memtest86+ followed by several passes of Win Memory Diagnostics (be sure to select the extended Memory Map 820(3) in advanced options). Also, can you tell us what components in your system are new. You said you had 2100+ before and got a new CPU. Have you used the same DIMM's before or you got new ones? |
[QUOTE=Kosmaj]Can you try again memory tests, one stick at the time, first Prime95 blend, then on failure several passes of memtest86+ followed by several passes of Win Memory Diagnostics (be sure to select the extended Memory Map 820(3) in advanced options).[/QUOTE]
I'll try Win Memory Diag again with the option that you have mentionned. Is there a need to use memtest86 again if I didn't get any errors while having both dimms running in dual channel? [QUOTE=Kosmaj]Also, can you tell us what components in your system are new. You said you had 2100+ before and got a new CPU. Have you used the same DIMM's before or you got new ones?[/QUOTE] All of the components remained the same, I just swapped the CPU (2100+) for a new one (3000+). The OS also remained the same. I just applied the latest MS patches. |
Okay, ran MS Memory Diagnostics using the 820(3) option. It ran overnight without any problems detected.
I tried Prime95 in safe mode and it too, gave me the hardware error within 2 minutes. I'm going to try to underclock the CPU and see if that solves anything. Actually, how does one underclock a CPU? By simply selecting a smaller multiplier number? |
A lower multiplier is the easiest way,
some motherboards also support a lower FSB but this is more for small adjustments and not major changes in speed which the multiplier does. |
How small should I go? I am at 13x right now. I saw 6x for instance.
|
6x is a good place to start.
Prime95 is the more dicerning test, it more thouroughly tests the components and it has known correct results to compare to. You want to find out if your PC can be stable at any speed. If it can't run stable at the most minimal settings it points to a hardware problem. If it can run at a minimal level then it could be parts were labeled higher than they really were. For example the thermal compound might not be thin enough or completely covering the surfaces in contact or the fan isn't removing enough heat. The memory could be labeled higher than it really can work correctly. Running minimal with case panels off will lessen the heat load. |
So far I have done the following:
FSB: 133Mhz Dual Channel Multiplier 9.5x CPU = 1733Mhz (underclocked from 2.17Ghz) I ran the torchure test (blend) for 16 minutes which is 15 minutes more than at stock speed. However, my Barton seems to have a locked multiplier which stops me from trying various settings. Either I have full speed or the box doesn't post. Somebody said something about voltages from my PSU, particularly the +12V was too low (11.67). Could this be the cause of some of my problems? |
i went and checked ur mbm 5 output it seams ur fans are spinning very fast also i would point that voltages semm to be low in some ranges.
|
[QUOTE=moo]i went and checked ur mbm 5 output it seams ur fans are spinning very fast also i would point that voltages semm to be low in some ranges.[/QUOTE]
How would I fix this? I did get a good Antec 400W PSU. I have 2 fans, 3 HD and 1 DVD Burner. I also have a SB Live Value and Geforce 3 Ti 200. |
I have tried relaxing the timings for my RAM (2.5T-3-3-3-11) and (3T-3-3-3-11) and to no avail. The only difference, in the first timings, got a reboot. The second timings, I got 100 warning message.
Maybe I'm getting closer? (ie. error vs warning?) |
have you barrowed someone elses ram... jw
|
Unfortunately, I don't know anyone that has 2 x 512MB of PC2700 that I can borrow...
|
Did you already try out using only a single RAM module?
|
Yes I did. Running Prime95 with 1 dimm (tried with both) worked.
|
Then I guess lowering the CPU clock won't improve that much. I'd suspect (s)lower memory frequency / timings to be more effective...
|
3T-3-3-3-11 gets me warning messages rather than hardware errors. Maybe I'm getting closer?
|
All afternoon, running a FSB of 133Mhz, resulted in a stable system. As soon as I put it on 166Mhz, then I start having random reboots. I think it's time for slightly more expensive memory :D
|
So far, system is stable at 166Mhz. Seems all I needed was a CMOS wipe (Suggested by Asus tech). So far so good. However, Prime95 still gives me the hardware error...
|
[QUOTE=thermalMan]So far, system is stable at 166Mhz. Seems all I needed was a CMOS wipe (Suggested by Asus tech). So far so good. However, Prime95 still gives me the hardware error...[/QUOTE]
How about your other BIOS settings? Is there any turbo mode, vlink or any other speed acceleration setting that you might have turned on? One of the machines I'm running is a P4 2.6C on an Asrock motherboard. It was unstable, even at stock speeds, until I turned off V-Link in the bios (it was a VIA chipset). Not sure if your A7N has something similar. |
I actually reset the CMOS. So I'm running with basically the default settings. The only thing I did was disable floppy, parallel and serial ports. And I changed the boot order.
So far, it's now stable. It seems Motherboard Monitor was causing some of my stability issues. However, I still can't run Prime95 on a 166Mhz FSB. |
I swapped the motherboard and CPU and it *seems* to have gotten rid of the problem... games are now buggy though! LOL!
|
My bad, the test ran for 3 hours then gave a fatal rounding error. :surrender
|
Resolved!
I bought a new dual channel kit from Corsair (2x512MB PC3200) and I no longer have any problems. I guess when Memtest doesn't show an error, but you're still having problems, it can still be the memory. Prime95 runs like a dream now as well. Games are stable. With all the RMA's, it will have taken a few months!
THANK YOU! Everybody. :bow: |
[QUOTE=thermalMan]I guess when Memtest doesn't show an error, but you're still having problems, it can still be the memory.[/QUOTE]
You´re absolutely right. That is the main lesson learned from this episode. There have been several examples of this situation in the past. |
To anyone having similar problems, for god sakes, listen to these folks! LOL!
|
| All times are UTC. The time now is 22:21. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.