mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Prime Sierpinski Project (https://www.mersenneforum.org/forumdisplay.php?f=48)
-   -   LLR 3.7.1c error (https://www.mersenneforum.org/showthread.php?t=10479)

Nekto 2008-07-13 19:52

LLR 3.7.1c error
 
If I get some errors like "Bit: 1427333/5049039, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 0 != 3.999908273430412e+092", it means bad overclock? But earlier LLRnet worked fine with better overclock for a long time without any errors.

Joe O 2008-07-13 20:59

[QUOTE=Nekto;137753]If I get some errors like "Bit: 1427333/5049039, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 0 != 3.999908273430412e+092", it means bad overclock? But earlier LLRnet worked fine with better overclock for a long time without any errors.[/QUOTE]

1) The new program is ~10% faster, so it stresses the system ~10% more.
2) The old program may not have been checking the GWNUM error messages.
3) It's warmer now
4) Your machine is older
5) There may be more dust on the heatsink or in the case vents
6) You might have new drivers installed. Either for new hardware or new versions of drivers. I have seen this message with a badly written sound card driver that did not properly save and restore the FP state.
7) All of the above!

Nekto 2008-07-13 21:15

it's not cause of temperature... 70 degrees max Q6600 (9x332)... Earlier it was working 9x375 stable during several weeks... If i have such errors, result can be wrong and I shouldn't help in PRP-ing?

Joe O 2008-07-13 22:00

[QUOTE=Nekto;137759]it's not cause of temperature... 70 degrees max Q6600 (9x332)... Earlier it was working 9x375 stable during several weeks... If i have such errors, result can be wrong and I shouldn't help in PRP-ing?[/QUOTE]

Define stable.
According to one Xtremely good overclocking site, there is boot stable, windows stable, prime stable, etc. You get the point.
As to whether your results are usable, you could finish a test and then send me the k n values so that I can run it and see if my result matches yours. Essentially an early double check. Do not post the 64 bit residual here. That would allow anyone to send in a double check result that would match even though they had not run the test. LTD will let us know if our residuals match.
My experience with this particular error is mixed. I had quite a few results that were correct in spite of this error. The software restarted from the previous checkpoint and recalculated correctly. I also had one result that was bad, so YMMV.

Nekto 2008-07-14 19:24

i get this error when I launch any game with llr running in background

VJS 2008-07-14 23:01

Welcome to the q6600 club.

I went through this with Lars quite a bit with my q6600 part of the reason why we have progressed as far as we have in the double check portion.

My advise for what it's worth.

Also you stated 70C but not what temperature your running at. I found that by mid 60's I was producing errors. Have to keep temperatures below 64C on all four cores not just the mother board utility temperature.

First download the following program, coretemp.exe it actually report the temperature of each core.

For a heatsink I found the best one to be a thermal u-120 extreme with 2 1x120mm fans one push one pull. I grabbed some cheap ones from Fry's combined they were quite and performed better than one of any name brand fan out there.

For settings try the following.

[B]Multiplier at 8x
fsb at 400mhz
memory multiplier at 2x.[/B]

This will give you a 3.2Ghz clock even multipliers all around and a 800 mhz memory. For me that was stable, rock solid stable (produced less than a 1% error rate in about 500 tests), temperatures were in the high 50's 54-58 with variance between the cores. Voltage was around 1.38V, memory at 2.2 or 2.1 what ever the patriots suggested.

I'd suggest 7x400 first which will yeild 2.8Ghz if your "stable" then try 3.2Ghz.

BTW I though I was "stable" at 3.8Ghz crashed windows once a week maybe never crashed in games. Would overheat and crash with prime95, had to move things back to 3.4Ghz before prp stopped showing errors although they still existed (according to double checks).

It's also important to note that llr seems less likely to produce errors than prp... but that's just a hunch, maybe the new version is different...

When I was comparing the two temps were also 2C lower with llr vs prp.

em99010pepe 2008-07-19 10:15

Which motherboard are you using?
What's your FSB speed? Memory timings?

Nekto 2008-07-19 14:49

[quote=em99010pepe;137999]Which motherboard are you using?
What's your FSB speed? Memory timings?[/quote]
[url]http://valid.x86-secret.com/show_oc.php?id=391232[/url]

VJS 2008-07-19 18:55

DUDE!!!

You are way way over on RAM (memory) ... I'm sure any overclocking buddy that likes high frequency might be suggesting what you have but in all seriousness try the following...

multiplier [SIZE="7"]8[/SIZE]

fsb [SIZE="7"]422[/SIZE]

memory multiplier [B][SIZE="7"]2x or (1:1)[/SIZE][/B]

That would give you the exact same total MHZ with a 844 memory and a 422 fsb.

Those settings will be much much much better and you will probably be able to run better timings 6-6-6-15 <--- (suxs!!!)


Also you didn't comment on what coretemp.exe is saying, really I'm trying to help you make a stable fast machine lets give it a try. Also run the DC server Lars will let you know if your stable.

Nekto 2008-07-20 06:46

[quote=VJS;138026]Also you didn't comment on what coretemp.exe is saying[/quote]
it's hot last days :) so during the day i make smaller MHz and voltage :) trying to keep temperature under 71 (70-71 on 2 cores and 68-69 on other two). Did like you said [url]http://valid.x86-secret.com/show_oc.php?id=391516[/url]

Nekto 2008-07-20 10:02

[url]http://img525.imageshack.us/my.php?image=cachememxi5.jpg[/url]
Everest test

VJS 2008-07-20 15:26

much better :smile:

Did you see any improvement in the memory bandwidth?

I also wonder which memory you have and are you applying enough voltage?

With most of the memory out there the bios will run it at either 1.8 or 2.0V when it is rated to run at 2.1 or 2.2 Volts.

Also 71C is way too hot! You will get errors.

Consider the following.

1. I would start by making sure your running the memory at the rated voltage probably 2.1V.

2. Whats your case temperature? If it's more than 5C above room temperature. Stop! Run to the store and by more fans , LOL. The idea here is to reduce your case temps which will both lower memory temperatures and lower your CPU temperature. Also if your running something like a 8800GT that Card ( Single slot cooling) it is probably dumping alot of heat into the system. Try removing the slot cover just below your card and making some sort of homemade ducting that pipes heat out the back of the case.

Anyways do anything and everything you can to reduce your case temps.

The above two are pretty simple.

Now consider the following.

A. Benchmark Benchmark Benchmark.

B. Now reduce your fsb a few mhz at a time until your memory timing improves to 4-4-4-#

This will probably happen around 410 Mhz it will also reduce your heat load on your CPU and alow you to run lower cpu volatge. Keep reducing your volatge by 0.01V until the system becomes unstable (which it will at some voltage) once this voltage is found add 0.02V back.

Example stable at 1.36V unstable at 1.35V. Run the system at 1.37V

So yes your running at a lower total clock but your memory is more effective. Benchmark again and see how much you have actually lost in performance, you might be amazed at how little.

If for example it doesn't happen until 408 fsb and allow your memory to run at 4-4-4-16-1T and your able to drop your CPU voltage 0.075V

So if you do the following what would you get?

Clock speed would only be reduced to 3264 Mhz or 96.7% of your unstable previous settings. Memory bandwidth much improved, lower heat dissapation which will prolong processor life, stable machine which will cruch llr if you so desire.

-------------------------------

from a PSP perspective, even if had to reduce down to 400 mhz fsb. This would be 94.7% of your previous clock but your 99.999% stable with good residuals.

Previous setting you needed to throw out 10% of your tests due to errors and restarts.

So running a slower stable clock actually produces 94.7%-(100%-10% errors) is a 4.7% MORE TESTS!!!

BTW, it took me several weeks to figure this out with my Q6600, LOL.

Nekto 2008-07-20 19:39

[quote=VJS;138064]
B. Now reduce your fsb a few mhz at a time until your memory timing improves to 4-4-4-#[/quote]
I've got bad memory, so that won't happen :)

ltd 2008-07-20 20:54

[QUOTE=VJS;138064]
BTW, it took me several weeks to figure this out with my Q6600, LOL.[/QUOTE]

And it took some effort to clean up the residues you left damaged. :razz:

Just kidding. As I had to thank you for bringing the DC effort to speed.
:beer:

VJS 2008-07-21 01:58

Yeah... a 2T might be pushing it but trying to get into a 4-4-4-16-2T would be possible


@LTD yup your right that was alot of manual work checking who was the culprit. But better in secondpass than first pass.

In the end with your help we made mine pretty darn stable. :showoff:

:smile:BTW waiting for those neathlems to come :smile: out then you will really have your work cut out LOL.

Nekto 2008-07-21 17:48

so i looked at benchmarks with different timings and FSB:DRAM... Decided to stop at: 8x395 (1.55 V in Bios), RAM 493,7 MHz 6-6-5-14

VJS 2008-07-21 18:42

How is your ram at 493Mhz???

You should have your ram setting at 2x or 2:1 for a 395Mhz your ram should be running at 790 Mhz.

Nekto 2008-07-21 19:24

[quote=VJS;138118]How is your ram at 493Mhz???

You should have your ram setting at 2x or 2:1 for a 395Mhz your ram should be running at 790 Mhz.[/quote]
493x2 :)

Nekto 2008-07-21 19:27

so... when i leave computer, nothing bad happens... But when I launch 3d-application, error immediately occurs. Why? :*(

Joe O 2008-07-21 19:40

[QUOTE=Nekto;138122]so... when i leave computer, nothing bad happens... But when I launch 3d-application, error immediately occurs. Why? :*([/QUOTE]

As I posted before, probably a bad driver that does not correctly save and restore the Floating Point state.
What Graphics card do you have? Do you have the most current drivers for it?What Sound card are you using, or do you only use the sound on the mainboard? Again, do you have the most current drivers?

Nekto 2008-07-21 20:33

[quote=Joe O;138123]As I posted before, probably a bad driver that does not correctly save and restore the Floating Point state.
What Graphics card do you have? Do you have the most current drivers for it?What Sound card are you using, or do you only use the sound on the mainboard? Again, do you have the most current drivers?[/quote]
sapphire radeon x1950xt 256 mb, I have latest drivers for x64 windows with CCC... on-board sound (Realtek 5.10.0.5391)

VJS 2008-07-22 01:15

because your memory is way to high, its really that simple.

Set your memory so that it runs at 2x your fsb it's really that simple.

Most of your problems will go away at that point.

Nekto 2008-07-22 08:14

installed new drivers 8.7, set memory to 791 MHz (5-5-5-5), launched 4 llrnet, launched game, played 10 minutes... still have errors
ERROR: ROUND OFF (0.5) > 0.40

Joe O 2008-07-22 13:31

[QUOTE=Nekto;138146]installed new drivers 8.7, set memory to 791 MHz (5-5-5-5), launched 4 llrnet, launched game, played 10 minutes... still have errors
ERROR: ROUND OFF (0.5) > 0.40[/QUOTE]
Now this error is much more likely to be a hardware error than a software error. I'll let VJS take the lead on this, but have you tried to large fan pointed at your machine (or pointed away from it) to increase the air flow. Too many people put their machines in those "cute" little computer desks to get them out of the way.

Nekto 2008-07-22 14:33

[quote=Joe O;138159]Now this error is much more likely to be a hardware error than a software error. I'll let VJS take the lead on this, but have you tried to large fan pointed at your machine (or pointed away from it) to increase the air flow. Too many people put their machines in those "cute" little computer desks to get them out of the way.[/quote]
open case ;) temperature around 65 (coretemp)
[url]http://img152.imageshack.us/my.php?image=pictureghdggt7.jpg[/url]

ltd 2008-07-22 16:29

So far you returned 16 results on the DC server with two results not matching to the first pass. But this does not mean that your results are wrong. A third test will be needed to close the case. There is a very good chance that your results are correct as we are working on a range which is know to have bad first results.

If somebody has the time( My machines are doing other third pass tests at the moment to investigate other possible error pattern ) here are the pairs that need to be retestet:

258317 2587559
79817 2588031

Cheers,

Lars

VJS 2008-07-22 22:02

Well your on the right track,

Couple things... first coretemp.exe will not tell you 65C for a q6600 it will provide you 4 different temperatures one for each core. What are those temperatures...

Also 65C is still too hot but your close.


Joe is right that error is certainly hardware related and not software.

I'm trying to be helpful but I can't continue or certianly not help if you won't answer my questions.

Please answer the following...

[B]are you still running a 9X or are you running 8X?[/B]

You should be running 8X at this point.

Looking at the picture I have an oh no... what power supply is that? How many watts...

Also you still have not told me what voltage your running at or which memory you are using?

Looking at the pic I would also suggest a larger fan in tha back replace the blue one with a 120mm or biggest you can fit. I purchased about 4 of them at fry's for $4.95 each so they are cheap.

I would also put the side back on after you put a larger fan in back. Open cases really don't work well, you need to get the cold air coming in the front and bottom hot out the back and top. Generally I'm able to get case temps around 26C without work or problems.

Nekto 2008-07-23 03:14

Memory Samsung PC2-6400 (5-5-5-5) 1.8V 790 (or 791) MHz
Power supply suxx :) Hopely 400W
Temperature right now 66 on 2 cores and 63-64 on other two.
8x395 1.55V in BIOS. It goes to 1.36-1.368 during work (cpu-z)

Nekto 2008-07-23 09:06

I've sent 12 more results... So the problem can be in power supply? When videocard switches to 3d-mode?

Sloth 2008-07-23 11:51

[QUOTE=Nekto;138212]I've sent 12 more results... So the problem can be in power supply? When videocard switches to 3d-mode?[/QUOTE]

They take more power when you kick into the 3d stuff. Most current cards go to a slower speed for 2d and full speed for the 3d. Think of it as speed step for the gpu. Add in the fan rpm bump from the extra heat (minor but it is more power).

S.

Nekto 2008-07-23 12:04

So if I'll buy for example Sirtec 500W, the problem would go away?

VJS 2008-07-23 14:06

Ahh now we know what the problem is...

Memory is not great but your doing the best you can with what you have.

Temperature right now 66 on 2 cores and 63-64 on other two.

O.K. your about 3 degrees to hot on the first two cores, simple fix... can probably be done with cooling but your real issue is power supply. As evident by your next post...

8x395 1.55V in BIOS. It goes to 1.36-1.368 during work (cpu-z)

So your looking at about a 0.2V drop under load and when that card kicks in drawing alot of power your CPU voltage drops even further...

Increasing the volatge further is not going to help your powersupply is just not up to the task of running at that Mhz.

I know it doesn't really make sence but a large voltage drop generates alot of heat remember I^2 r and V^2 / r you still need to produce the power at the cpu and if you can't give it the voltage it takes more current which causes excessive heat.

BTW 1.55V is probably going to kill that chip at that heat level... at least with time.

My suggestion get the largest single rail powersupply of at least 700watts with a 8-pin power.

Until then you should probably run at 7X400 and drop that CPU volatge down to about 1.4V. Might stabilize things a little. If you get a decent power supply and some case fans it sounds like you may even be stable at 3.6Ghz. But you would need to get that temperature down to a high 50's maybe 60C on the hottest core.

VJS 2008-07-23 14:12

BTW 7x400 is actually 2800Mhz.

If you look at the errors Lars talked about 2 errors in 16. If those two errors are actually your errors.

Then,

Out of 16 you had 14 good tests.

8x395 is 3160 Mhz multiply by 14 / 16 is 2765 Mhz...


If you ran that CPU at 2765 Mhz and produced no errors you would have been more successfull.

In actual if you ran 100% stable at 2800 Mhz you would be more successfull...

In any case I'm sure we can get it stable at more than 2.8Mhz for certain. And stability at 3.2 Ghz is very easy with the right parts.

Nekto 2008-07-23 15:40

I'll make a test... underclock :)

VJS 2008-07-23 16:03

Actually you might try the following...

Up your memory voltage bu 0.1V to 1.9V total since this is probably dropping underload as well...

Your basically short on power, you need another power supply to get more out of that cpu and video card.

ALso try dropping that CPU voltage until it bluescreens then add 0.1V see what you come up with.

You might be able to stabilize it a little but I think your really going to need a Power supply.

Anything 700W is better than what you have. I was having stability issues with my PS which I think was a piece of Shat 600 or 650. Now I have I think an 800W with a single rail, voltage drop is something like 0.05V between load and no load.

My board also had 24-pin power, a 4-pin molex, and a 8-pin.

I friend also bought the identical board his power supply was a 20-pin and he was only running the 4-pin. Could not get any stablility above 3.2GHz never mind running DC at the same time. Was good enough for games but that was about it.

But hey 3.2Ghz quad core is very fast.

Nekto 2008-07-23 16:43

7x410, after 10 minutes OCCT 61-62 on 2 cores and 58-59 on other 2

VJS 2008-07-23 18:39

Those are acceptable temperatures :-)

Question what is your volatge currently, and are you still able to keep the 5-5-5-? timings?

Might want to leave it there for a while and check stablility on the DC server.

Nekto 2008-07-23 18:46

1.4 or 1.4125 in BIOS (don't remember). Timings 5-5-5-5 (820 MHz). Seems that problem gone, played during crunching - no errors. Thx for your help

Nekto 2008-07-23 19:20

One more question :) If I want to stop receiving new tasks, I should change in llr-clienconfig.txt
once = 1
WUCacheSize = 0

Sloth 2008-07-23 20:40

[QUOTE=Nekto;138243]One more question :) If I want to stop receiving new tasks, I should change in llr-clienconfig.txt
once = 1
WUCacheSize = 0[/QUOTE]

Yes. Although if you have llrnet running as a service it will restart on reboot and grab another WU to crunch. So if you do not want to crunch any more set once=1 and wait for it to finish. Once it does stop the service and change it off of automatic.


S.

ltd 2008-07-23 22:00

@Nekto:

Your last 13 DC results were all without errors.:tu:

VJS 2008-07-24 13:02

Very very cool Nekto :smile:

I'd suggest that you seriously consider a better powersupply.

Then get a few more fans to more that air in and out of the case putting the sides back on.

Once you do those two things you should be able to get that 400 Mhz back out of the processor.

Also memory is so cheap for 40 dollars after mail in rebate you can get a killer set of memory now.

Personally I like OCZ, I saw a set that would run, 4-4-4-(16?) at 1000Mhz on new egg. You really only need 2G if your running a 32-bit OS.

Nekto 2008-07-24 13:20

[quote=VJS;138286]You really only need 2G if your running a 32-bit OS.[/quote]
I run 64-bit :)

Nekto 2008-08-26 19:36

I've got new Chieftec CFT-1200G-DF (1200W) :) I'll start testing tomorrow

VJS 2008-08-27 19:37

thought you dissappeared...

I'd run the dc server until lars says otherwise...

Could use some help there anyways, Joe is a little lonely.

Nekto 2008-08-28 11:59

[quote=VJS;140101]thought you dissappeared...

I'd run the dc server until lars says otherwise...

Could use some help there anyways, Joe is a little lonely.[/quote]
ok :) finishing last tasks in primegrid 321 ang starting DC

VJS 2008-08-29 13:08

Wow kicking some serious but in DC...

Lars will be away for a few weeks so I'd keep it there for now. I'm sure when he gets back he will have a look at how many errors you have produced if any. Give a good idea of stability.

Nekto 2008-09-02 19:54

I'll be back after OGR-25 finishes...


All times are UTC. The time now is 14:35.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.