mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Data

Reply
 
Thread Tools
Old 2003-10-04, 02:55   #1
GP2
 
GP2's Avatar
 
Sep 2003

258510 Posts
Default Team_Prime_Rib error-prone machines

I don't want to single out Team_Prime_Rib, but they already keep track of their own bad results on their Triple Checks Required page and Incorrect Team Results page.

So it would be a useful comparison to see which machines are identified as error-prone by my proposed criteria, and compare it with what they know about their own machines. Comments from Team_Prime_Rib would be welcome.

Maybe we can see where to set the appropriate threshold percentage for error-prone machines (maybe 50% is too high)

Once again, the proposed standard is:

bad / (bad + good) >= X % and bad >= 2
or
uv3_plus / (uv3_plus + uv2) >= X % and uv3_plus >=2

where
X % = 50%
uv2 = unverified exponents needing a 2nd check.
uv3_plus = unverified exponents needing a 3rd or higher check.


If we set X % = 50%, there are 28 error-prone machines in TPR.
If we lower the threshold to 33% there are 5 more machines.
If we lower the threshold to 20% there are another 6 machines.

Total distinct machines evaluated for TPR = 491.


50%

16
20
24
29
40
DSheets_09
Hades_au_P4e
KL_Dancer2
KL_KenOffice
KL_Looi
KL_Orphanage
KL_Zedd
SC_reaver02
Tasuke5
Tasuke9
bayanne_MRoe
bayanne_clv2
garo4
glenon1
glenon7
greensinozw0
hum7
outlnder01
outlnder02
outlnder4
outlnderprim
p1000
shlide

33%

DSheets_62
Odessit5
SC_derek2
SC_reaver01
robcreid_01

20%

boxen_05
forge2
outlnder06
outlnder07
outlnder1
outlnder5

Last fiddled with by GP2 on 2003-10-04 at 02:56
GP2 is offline   Reply With Quote
Old 2003-10-04, 03:20   #2
GP2
 
GP2's Avatar
 
Sep 2003

5·11·47 Posts
Default Re: Team_Prime_Rib error-prone machines

Quote:
Originally posted by GP2
Total distinct machines evaluated for TPR = 491.
I should probably give a few more details about this.

Some TPR machines were not even considered, even though they might have a high error rate, because there are no exponents associated with them to release for early double-checking.

An example:

The machine outlndr52 has 6 bad, 4 good, 0 unverified needing a 2nd check, and 11 unverified needing a 3rd check or higher.

So it is error-prone.

But there are 0 exponents needing a 2nd check. There are 11 exponents needing a 3rd check or higher, but all of these already have one presumed-good result (from other, non-error-prone machines).

The basic philosophy is: any exponent that does not have one presumed-good result should be scheduled for early re-testing, while any exponent that does will be re-tested in due course (sometimes years later).


So in other words, the list in the previous message (and the 491 total TPR machines considered) was not a complete list of TPR machines, but only the ones that hold interest for purposes of releasing exponents for early double-checking.

Last fiddled with by GP2 on 2003-10-04 at 03:22
GP2 is offline   Reply With Quote
Old 2003-10-04, 04:04   #3
outlnder
 
outlnder's Avatar
 
Aug 2002

2×3×53 Posts
Default

Fortunately for Team Prime Rib and Gimps in gerneral, all listed outlnder machines no longer crunch for TPR.

Just to clarify some of the questions about my machines, none were overclocked, most used aynchronous FSB settings and most were on cheap(ECS) motherboards.

And just for the hell of it, most had 0 Prime95 errors.
outlnder is offline   Reply With Quote
Old 2003-10-04, 04:24   #4
GP2
 
GP2's Avatar
 
Sep 2003

5×11×47 Posts
Default

If we include all TPR machines (1365 in all), rather than just the 491 considered in the previous message, we get:

50%

16
20
22
24
28
29
33
35
40
DSheets_09
DSheets_16
DSheets_20
DSheets_22
DSheets_33
DSheets_35
Hades_au_P4e
KL_Dancer2
KL_Gar
KL_KenOffice
KL_Looi
KL_Orphanage
KL_Zedd
Odessit
PJG-G4-800
PM_node_2
SC_12_derek2
SC_reaver02
SlashDude10
SlashDude_WT
TGC_03
Tasuke
Tasuke5
Tasuke9
Tasuke_26
adoptfactor
alvin
bayanne_MRoe
bayanne_clv2
garo4
garo_jul
glenon1
glenon7
greensinozw0
hum7
kvizbar_srv
outlnder01
outlnder02
outlnder04
outlnder2
outlnder21
outlnder4
outlnder52
outlnder55
outlnderprim
p1000
riskin01
shlide

33%

4
DSheets_29
DSheets_62
Odessit5
SC_derek2
SC_reaver01
Tasuke10
Tasuke3
Tasuke4
dizzytcis
glenon_700hm
robcreid_01

20%

Tasuke_27
boxen_05
forge2
outlnder06
outlnder07
outlnder1
outlnder12
outlnder5
pvillecat6
GP2 is offline   Reply With Quote
Old 2003-10-04, 04:38   #5
GP2
 
GP2's Avatar
 
Sep 2003

5×11×47 Posts
Default

Quote:
Originally posted by outlnder
Fortunately for Team Prime Rib and Gimps in gerneral, all listed outlnder machines no longer crunch for TPR.
Sorry about your farm, outlnder. You did have some good machines (outlnder10, 11, and many others), and even the error-prone ones returned a lot of good exponents in total.


Quote:
And just for the hell of it, most had 0 Prime95 errors.
I think we have to conclude that although nonzero error code is a good predictor of a possible bad result, a zero error code is not a good predictor of a good result.

The same was true for Team_Italia/Paperino in the M77909869 thread.
GP2 is offline   Reply With Quote
Old 2003-10-04, 06:54   #6
robreid
 
robreid's Avatar
 
Aug 2002
New Zealand

31 Posts
Default

Quote:
Originally posted by GP2
If we include all TPR machines (1365 in all), rather than just the 491 considered in the previous message, we get:
33%
robcreid_01
This computer is definatly confirmed bad, it has since been dismantled and redeployed. Parts of it are now in the boxen that tracks our incorrect stats. I've already come to terms with the fact that the outstanding LL will all probably fail doublecheck
robreid is offline   Reply With Quote
Old 2003-10-04, 07:11   #7
outlnder
 
outlnder's Avatar
 
Aug 2002

31810 Posts
Default

PageFault has made some deductions that asychronous memory/FSB settings are responsible for some /all errors in machines that show "0" errors.

Maybe this is something we could get feedback on??

Ask for participants that have bad results if their machines are set to asychronous memory/FSB settings.
outlnder is offline   Reply With Quote
Old 2003-10-04, 17:41   #8
PageFault
 
PageFault's Avatar
 
Aug 2002
Dawn of the Dead

5·47 Posts
Default

Asynchronous ram / fsb was suspect on boxen_05 as this machine produced AM radio interference when at 133 fsb / 166 ram. Forcing 1:1 ratio got rid of the interference but the machine is still failing doublechecks.

This machine's problems were first found in a batch of doublechecks ran last fall, about 50 % needing a triplecheck or confirmed bad. The machine had been on 33M most of this year until June, when I started a batch of dc's - all were bad. It was down for a few months, recently I revived it to try and solve the problem - current batch of dc's all require triplechecks. I will try once again at full stock (1.6A @ 2133) after an in progress (and certainly corrupt) 33M test completes.

This box is probably scrap, I'll get around to fixing or replacing it soon.
PageFault is offline   Reply With Quote
Old 2003-10-05, 00:01   #9
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

22×691 Posts
Default

GP2,
I am surprised that you have garo4 and garo_jul in that list. Can you tell me what criteria was used to get these machines?
My guess is that you used the outstanding triplecheck ratio criteria. In that case, I would like to suggest a revision. What you do not take into account is that fact that not a single exponent from either of these machines has been confirmed bad and there are literally dozens of confirmed good results from both these machines.
So the high triple check ratio is in fact a coincidence, i.e. the other machine is more likely to be at fault.

So I would like to suggest that the unverified criteria be used only in the case of those machines that do not have enough verified results. I am posting this observation in the other thread as well.

Also, I know from my observations that all the KL machines you listed are bad as well as the SC, PM_node and the DSheets machines you listed.
garo is offline   Reply With Quote
Old 2003-10-05, 01:27   #10
GP2
 
GP2's Avatar
 
Sep 2003

A1916 Posts
Default

Quote:
Originally posted by garo
GP2,
I am surprised that you have garo4 and garo_jul in that list. Can you tell me what criteria was used to get these machines?
OK, since you posted in the other thread as well, I'll answer it there.

One of the reasons I posted the list of TPR machines above was precisely to get this type of feedback.
GP2 is offline   Reply With Quote
Old 2003-10-05, 18:34   #11
PageFault
 
PageFault's Avatar
 
Aug 2002
Dawn of the Dead

111010112 Posts
Default

More on TPR machines:

Unlike most of the history of crunching prime95, a competitive team is going to have many enthusiaist machines, overclocked machines. For some, commitment is casual and they may walk away leaving a pile of bad results.

Those dedicated to being #1 are going to be more involved in what is happening. One aspect of winning is not losing credit due to bad tests. There is a trend in the making at TPR, that a new machine must pass a set of doublechecks before going on to the first time or 33M tests. A machine should periodically do a set of doublechecks to verify integrity. Errors need to be isolated and corrected - this is going to make many bad results.

TPR machines are more prone than average to run doublechecks. Many are borged work machines and the reduced P-1 time and memory mandate this. This practice has revealed problems, i.e., dsheets has a group of identical machines which all fail due to some chipset issue (or other).

Many prefer the fast credit of doublechecks. Many have a spectrum of hardware at home. Old PIII's and tbirds tend to get put on doublechecks. Indeed the default work type for a PIII is now a doublecheck.

All these factors are going to show on TPR's error incidence. It is going to be higher than that of other groups. This may even push up the project overall error rate.

The good thing about this is that it promotes self awareness. TPR is but one of a group of teams, a group that has upper ranking in most of the other DC projects. Knowing that a machine is capable of good results can only benefit the other projects that get crunched.
PageFault is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Error Prone Machines PageFault Data 17 2012-04-10 01:40
Tracking the November 2003 release of error-prone exponents nfortino Data 2 2005-07-25 13:46
List of error prone machines available for download GP2 Data 3 2004-01-03 00:41
Project Error Prone PageFault Data 2 2003-12-15 22:46
Early double-checking to determine error-prone machines? GP2 Data 13 2003-11-15 06:59

All times are UTC. The time now is 21:46.


Sun Dec 5 21:46:39 UTC 2021 up 135 days, 16:15, 0 users, load averages: 1.13, 1.35, 1.38

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.