 2012-03-28, 23:01 #1 PageFault     Aug 2002 Dawn of the Dead 5·47 Posts Error Prone Machines Some time ago, GP2, member of this board, made some interesting observations on the nature of LL testing errors; more specifically, GP2 found that the great majority of machines made almost no, or very few, bad tests but that some machines produced bad results in the majority of the tests completed by them. At the time, I was becoming serious about the GIMPS project. I had done about two calendar years of work, on various machines in various locales. At the time, overclocking seemed to be the thing to do, so at home I squeezed every penny's worth out of them boxen. It was shown that I had an error prone machine ... This was followed up in detail - I may link or repost what I wrote nine years earlier (for another day). Given the passage of time, many suspect exponents have cleared and suspect boxen have been validated. What remains is the 33M work - this may take several more years for doublechecking to begin clearing this range. Still, almost everything I had done (sub 20M) has cleared, enough to draw conclusions. One box was definetely bad, having ruined most of the work done. Before continuing, the gory details: Code: Grand Total Count Percent Bad 52 6.48% Good 594 73.97% Unverified 150 18.68% Triplecheck 7 0.87% All Tests 803 100.00% Confirmed Tests Bad 52 8.05% Good 594 91.95% All Confirmed 646 100.00% Excluding Error Prone Machines Bad 9 1.25% Good 558 77.39% Unverified 148 20.53% Triplecheck 6 0.83% All Tests 721 100.00% Confirmed, Excluding Error Prone Bad 9 1.59% Good 558 98.41% All Confirmed 567 100.00% All machines except the error prone one consistently produced good results. The error prone box was a Northwood B, 1.6 GHz ... remember how easily those overclocked to 2.133 GHz by means of the fsb change ... The eternal question - is overclocking worth the risk?
2012-03-28, 23:35   #2
Dubslow

"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts


 Originally Posted by PageFault The eternal question - is overclocking worth the risk?

I think for a mild overclock, especially those which don't require voltage changes, are safe. I'm running a 2600K, stock 3.4GHz, at around 3.9-4.0GHz, no suspect LLs yet, and many many good DCs done. If I were to go above 4.2-4.3, then I'd get worried.

 2012-03-29, 00:41 #3 More haste, less speed.
2012-03-29, 03:20   #4
LaurV
Romulan Interpreter

"name field"
Jun 2011
Thailand

24·613 Posts


 Originally Posted by PageFault The eternal question - is overclocking worth the risk?
I heavily overclock my CPUs, but have good cooling for them. The highest I went to 4.68G with a i7 2600k on a maximus 4 extreme z mobo, but at this score p95 repeatedly crashes. I went down to 4.4 and stay there. P95 is very stable, and yet to produce a DC mismatch (4 DC outputs every 3 days, under 10ms per iteration, 4 cores running 4 workers, each worker its own core, NO helper thereads, i.e. NO hyperthreading is used). Temperatures are around 65C (air cooled, coolermaster v6 series, I mention it some time ago, even put some photos), when in the room are 30C (hot Thai summer, April is the hottest month of the year). So, overclocking is worth or not, depending on how good and stable your hardware is. Here I go over 30%, and my output is over 30%, and there is no need for TCs. Of course, electricity bill increases more then 30%, maybe about 50% (in fact my March bill is more then double comparing with February, but I don't know how to count in the airconds, which we are using more and more day by day starting from February, and we will use them less and less from May-June till next year). Anyhow, electricity is cheap in Thailand, so, only the output counts, for me.

On the other hand, I tried to OC my gpu's too, as I explained in CudaLucas thread, and then CudaLucas started to produce mismatches. Every 3-5 DC tests done, I had to TC at least one of them. This is the same productivity or even lower, at a much higher electricity cost, so it does not worth. Why? Because in this case, the hardware is "bad". It could be "top line" on the paper (I am talking about gtx580 and teslas!) but maybe that is why is "top line", because is already "the best it can do"... I have no idea, but trying to overclock it was definitively a bad choice.



 2012-03-29, 17:06 #5 Yep, agree with LaurV. I tested my OCs extensively using both torture testing and then by running 10s of DCs through each before trusting them. This has shown, in my experience anyway, that a stable overclock is much lower than it appears compared to just seeing if the machine is generally stable, but once you get to the point where P95 is stable you're fine. Not to mention there's a break-even point in the speed vs reliability curve. If you get 50% better performance with a 10% error rate (made up numbers), you're still producing more good work than the non-overclocked system. Not that I'd be happy with that result since there's probably a ~33% OC with 0% extra error rate, but it's not totally black and white.
 2012-03-29, 23:28 #6 If one were to accept a higher than historical error rate ... fine, if one runs only doublechecks, then one doesn't disrupt the flow of the project, and one is aware, if one wants to be, of all the details concerning good / bad. On first time tests ... I have had several overclocked boxes which consistently produced good results, but at a higher than historical error rate. My overall error rate, for all results cleared so far and excluding an error prone machine, is 1.59 % - the historical average for the project, as it was discussed about ten years ago. On the other hand, for boxen that I am certain of running at stock speed, the error rate is 0.3 %. Many first time tests in the range of M30 - M40, on both stock and overclocked machines, will have to wait years before I will know everything. Time will tell ... I waited nine years to post this ...
2012-03-31, 13:55   #7
garo

Aug 2002
Termonfeckin, IE

22×691 Posts


 Originally Posted by PageFault Time will tell ... I waited nine years to post this ...
Gah! We've been doing this for way too long.

 2012-03-31, 23:41 #8 Have we ... see you in 9 for the nextest, latest and greatest installment ... and hopefully no error prone boxen between now and then ...
 2012-04-01, 15:51 #9 I just got my first LL error in 22 months. It got pretty warm here the last two weeks. I think I need to turn one of my machines off.
 2012-04-02, 01:01 #10 Going through the results, I had two flagged errors on a stock build. Those are 33M tests and are waiting doublechecking. The machine has 94 confirmed tests and 0 bad. This happened during July - Sept, when ambient temps were the seasonal max. Following that, nada. I think you should leave the boxen be and fire up an aircon instead ... that way we can both post back here in another 9 years or so ...
 2012-04-03, 21:02 #11 Ha! My prayers were answered and the temps have dropped 10C. An open skylight is sufficient now.

