mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Data

Reply
 
Thread Tools
Old 2012-03-28, 23:01   #1
PageFault
 
PageFault's Avatar
 
Aug 2002
Dawn of the Dead

5·47 Posts
Default Error Prone Machines

Some time ago, GP2, member of this board, made some interesting observations on the nature of LL testing errors; more specifically, GP2 found that the great majority of machines made almost no, or very few, bad tests but that some machines produced bad results in the majority of the tests completed by them.

At the time, I was becoming serious about the GIMPS project. I had done about two calendar years of work, on various machines in various locales. At the time, overclocking seemed to be the thing to do, so at home I squeezed every penny's worth out of them boxen.

It was shown that I had an error prone machine ...

This was followed up in detail - I may link or repost what I wrote nine years earlier (for another day). Given the passage of time, many suspect exponents have cleared and suspect boxen have been validated. What remains is the 33M work - this may take several more years for doublechecking to begin clearing this range. Still, almost everything I had done (sub 20M) has cleared, enough to draw conclusions.

One box was definetely bad, having ruined most of the work done. Before continuing, the gory details:

Code:
Grand Total	Count	Percent
		
Bad	             52	6.48%
Good       	594	73.97%
Unverified	150	18.68%
Triplecheck	7	0.87%
		
All Tests  	803	100.00%		

Confirmed Tests		
		
Bad	             52	8.05%
Good	             594	91.95%
		
All Confirmed	646	100.00%
		
Excluding Error Prone Machines		
		
Bad	             9	1.25%
Good	             558	77.39%
Unverified	148	20.53%
Triplecheck	6	0.83%
		
All Tests          721	100.00%
		
Confirmed, Excluding Error Prone		
		
Bad	             9	1.59%
Good	             558	98.41%
		
All Confirmed	567	100.00%
All machines except the error prone one consistently produced good results. The error prone box was a Northwood B, 1.6 GHz ... remember how easily those overclocked to 2.133 GHz by means of the fsb change ...

The eternal question - is overclocking worth the risk?
PageFault is offline   Reply With Quote
Old 2012-03-28, 23:35   #2
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

Quote:
Originally Posted by PageFault View Post
The eternal question - is overclocking worth the risk?



I think for a mild overclock, especially those which don't require voltage changes, are safe. I'm running a 2600K, stock 3.4GHz, at around 3.9-4.0GHz, no suspect LLs yet, and many many good DCs done. If I were to go above 4.2-4.3, then I'd get worried.
Dubslow is offline   Reply With Quote
Old 2012-03-29, 00:41   #3
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

2×23×137 Posts
Default

More haste, less speed.
retina is online now   Reply With Quote
Old 2012-03-29, 03:20   #4
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

24·613 Posts
Default

Quote:
Originally Posted by PageFault View Post
The eternal question - is overclocking worth the risk?
I heavily overclock my CPUs, but have good cooling for them. The highest I went to 4.68G with a i7 2600k on a maximus 4 extreme z mobo, but at this score p95 repeatedly crashes. I went down to 4.4 and stay there. P95 is very stable, and yet to produce a DC mismatch (4 DC outputs every 3 days, under 10ms per iteration, 4 cores running 4 workers, each worker its own core, NO helper thereads, i.e. NO hyperthreading is used). Temperatures are around 65C (air cooled, coolermaster v6 series, I mention it some time ago, even put some photos), when in the room are 30C (hot Thai summer, April is the hottest month of the year). So, overclocking is worth or not, depending on how good and stable your hardware is. Here I go over 30%, and my output is over 30%, and there is no need for TCs. Of course, electricity bill increases more then 30%, maybe about 50% (in fact my March bill is more then double comparing with February, but I don't know how to count in the airconds, which we are using more and more day by day starting from February, and we will use them less and less from May-June till next year). Anyhow, electricity is cheap in Thailand, so, only the output counts, for me.

On the other hand, I tried to OC my gpu's too, as I explained in CudaLucas thread, and then CudaLucas started to produce mismatches. Every 3-5 DC tests done, I had to TC at least one of them. This is the same productivity or even lower, at a much higher electricity cost, so it does not worth. Why? Because in this case, the hardware is "bad". It could be "top line" on the paper (I am talking about gtx580 and teslas!) but maybe that is why is "top line", because is already "the best it can do"... I have no idea, but trying to overclock it was definitively a bad choice.

Last fiddled with by LaurV on 2012-03-29 at 03:22
LaurV is offline   Reply With Quote
Old 2012-03-29, 17:06   #5
kjaget
 
kjaget's Avatar
 
Jun 2005

3×43 Posts
Default

Yep, agree with LaurV. I tested my OCs extensively using both torture testing and then by running 10s of DCs through each before trusting them. This has shown, in my experience anyway, that a stable overclock is much lower than it appears compared to just seeing if the machine is generally stable, but once you get to the point where P95 is stable you're fine.

Not to mention there's a break-even point in the speed vs reliability curve. If you get 50% better performance with a 10% error rate (made up numbers), you're still producing more good work than the non-overclocked system. Not that I'd be happy with that result since there's probably a ~33% OC with 0% extra error rate, but it's not totally black and white.
kjaget is offline   Reply With Quote
Old 2012-03-29, 23:28   #6
PageFault
 
PageFault's Avatar
 
Aug 2002
Dawn of the Dead

3538 Posts
Default

If one were to accept a higher than historical error rate ... fine, if one runs only doublechecks, then one doesn't disrupt the flow of the project, and one is aware, if one wants to be, of all the details concerning good / bad. On first time tests ...

I have had several overclocked boxes which consistently produced good results, but at a higher than historical error rate. My overall error rate, for all results cleared so far and excluding an error prone machine, is 1.59 % - the historical average for the project, as it was discussed about ten years ago.

On the other hand, for boxen that I am certain of running at stock speed, the error rate is 0.3 %.

Many first time tests in the range of M30 - M40, on both stock and overclocked machines, will have to wait years before I will know everything. Time will tell ... I waited nine years to post this ...
PageFault is offline   Reply With Quote
Old 2012-03-31, 13:55   #7
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

22×691 Posts
Default

Quote:
Originally Posted by PageFault View Post
Time will tell ... I waited nine years to post this ...
Gah! We've been doing this for way too long.
garo is offline   Reply With Quote
Old 2012-03-31, 23:41   #8
PageFault
 
PageFault's Avatar
 
Aug 2002
Dawn of the Dead

23510 Posts
Default

Have we ... see you in 9 for the nextest, latest and greatest installment ... and hopefully no error prone boxen between now and then ...
PageFault is offline   Reply With Quote
Old 2012-04-01, 15:51   #9
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

22×691 Posts
Default

I just got my first LL error in 22 months. It got pretty warm here the last two weeks. I think I need to turn one of my machines off.
garo is offline   Reply With Quote
Old 2012-04-02, 01:01   #10
PageFault
 
PageFault's Avatar
 
Aug 2002
Dawn of the Dead

5×47 Posts
Default

Going through the results, I had two flagged errors on a stock build. Those are 33M tests and are waiting doublechecking. The machine has 94 confirmed tests and 0 bad. This happened during July - Sept, when ambient temps were the seasonal max. Following that, nada.

I think you should leave the boxen be and fire up an aircon instead ... that way we can both post back here in another 9 years or so ...
PageFault is offline   Reply With Quote
Old 2012-04-03, 21:02   #11
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

22×691 Posts
Default

Ha! My prayers were answered and the temps have dropped 10C. An open skylight is sufficient now.
garo is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Tracking the November 2003 release of error-prone exponents nfortino Data 2 2005-07-25 13:46
List of error prone machines available for download GP2 Data 3 2004-01-03 00:41
Project Error Prone PageFault Data 2 2003-12-15 22:46
Early double-checking to determine error-prone machines? GP2 Data 13 2003-11-15 06:59
Team_Prime_Rib error-prone machines GP2 Data 10 2003-10-05 18:34

All times are UTC. The time now is 20:04.


Sun Dec 5 20:04:29 UTC 2021 up 135 days, 14:33, 1 user, load averages: 2.33, 2.90, 2.25

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.