mersenneforum.org > Data Error rate plot
 Register FAQ Search Today's Posts Mark Forums Read

2012-05-14, 02:37   #34
Dubslow

"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts

Quote:
 Originally Posted by PageFault More likely is that the original and / or the doublechecks were done on two error prone machines. Such machines corrupt results even when there are no error codes. Some years back garo, GP2 and myself did alot of work concerning error prone machines. We observed several instances of tests needing a quadruple check or higher.
The slightly worrying thing is that I'm pretty darned sure I'm error free; I have yet to produce a known bad result. OTOH, "linded" is a large producer, and I thought he was solid. (I suppose that being a large producer makes it more likely that a bad machine is harder to detect, but still.) Could be a cosmic-ray type thing, but that's highly unlikely. I guess we'll see.

Code:
Exponent	WorkType  Stage, %  agedays daystogo    Estimated Completion	Next Update	Updated	Assigned	Userid
25646837	D	LL, 20.50%	5	11	2012-05-25	2012-05-15	2012-05-14	2012-05-09	spradlin

Last fiddled with by Dubslow on 2012-05-14 at 02:41

 2012-05-14, 05:19 #35 PageFault     Aug 2002 Dawn of the Dead 5·47 Posts If your machines are your own, you can easily verify them (and I probably don't have to tell you how). When we did the error analysis, our team was vast and many observations were possible. Most important was that, at the slightest hint of trouble, a run of doublechecks produced instant and infallible proof of intergrity (or lack of). We had several groupings: (1.) Home horsepower enthusiast. These guys bought quality parts, looked after their farms and for the most part were conservative: modest or no overclock. (2.) Penny pinching stats ho - budget crap, massive overclocks, eventual progression of many machines into the error prone category. (3.) Top ranked IT professional havings rights within a serious corporation, running the client on quality stock hardware at stock settings. (4.) IT neophyte, in an upstart business, on bargain basement hardware. In two cases this developed into masses of error prone machines. I'm not going to mention names here, but anyone caring to peek into the stats will see that categories 2 and 4 showed some epic fails ... Most important, don't cut corners, and don't overclock ... look at the relation between the two ...
 2012-05-14, 06:37 #36 Dubslow Basketry That Evening!     "Bunslow the Bold" Jun 2011 40
 2012-05-14, 06:48 #37 LaurV Romulan Interpreter     Jun 2011 Thailand 5×23×73 Posts There are (were) guys who changed from one group to the other, or even walking through all 4. I am one of those guys who went through in ALL the groups, at one moment or another.
 2012-06-04, 20:26 #38 Dubslow Basketry That Evening!     "Bunslow the Bold" Jun 2011 40
2012-06-04, 23:46   #39
Dubslow

"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts

Quote:
 Originally Posted by Dubslow In this case, the third and fourth tests matched. The first test by curtisc was bad. http://mersenne.org/report_exponent/?exp_lo=25866773
I actually just checked, the person whose test I verified here ^ is the same person who was assigned my quadruple check up there.

Last fiddled with by Dubslow on 2012-06-04 at 23:47

2012-06-10, 04:30   #40
Dubslow

"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts

Quote:
 Originally Posted by Dubslow Not exactly new-thread worthy, but here's an expo that's into quadruple-check territory with no reported error code yet. My laptop's not yet turned in a bad DC, but then again linded is pretty reliable as well. Cosmic ray type error? I hope spradlin finishes the test quickly.
Huh. It seems my DC was bad. That's strange, because that's the only bad DC my CPU's ever turned in, and that includes at least two more good DCs since that one. I guess it really was a cosmic ray or some other unnoticed memory error.

2012-06-10, 05:26   #41
retina
Undefined

"The unspeakable one"
Jun 2006
My evil lair

5,309 Posts

Quote:
 Originally Posted by Dubslow Huh. It seems my DC was bad. That's strange, because that's the only bad DC my CPU's ever turned in, and that includes at least two more good DCs since that one. I guess it really was a cosmic ray or some other unnoticed memory error.
Did you overclock?

2012-06-10, 18:00   #42
Dubslow

"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts

Quote:
 Originally Posted by retina Did you overclock?
Slightly, but my temps are below 65C, I can pass 24 hrs of torture testing, and no errors were reported, and like I said this is the only bad test out of 25-50 good DCs.

Edit: The overclock has been there for almost all the DCs.

Last fiddled with by Dubslow on 2012-06-10 at 18:00

2012-06-10, 23:36   #43
retina
Undefined

"The unspeakable one"
Jun 2006
My evil lair

5,309 Posts

Quote:
 Originally Posted by Dubslow Slightly, but my temps are below 65C, I can pass 24 hrs of torture testing, and no errors were reported, and like I said this is the only bad test out of 25-50 good DCs. Edit: The overclock has been there for almost all the DCs.
Thanks for the honest answer. But actually my Q was rhetorical since you had already stated above that you do overclock. My point was that you were blaming cosmic rays and other things when really the explanation is right there under your control.

2012-06-11, 04:25   #44
Dubslow

"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts

Quote:
 Originally Posted by retina Thanks for the honest answer. But actually my Q was rhetorical since you had already stated above that you do overclock. My point was that you were blaming cosmic rays and other things when really the explanation is right there under your control.

I just checked PrimeNet, and this computer has turned in at least 38 Double Checks, of which at most 1 were bad. No errors were reported, and I can pass 24hrs of Prime95's stress test. Additionally, this isn't even proper overclocking; this stupid Intel motherboard will not let me adjust the base (non-Turbo) multiplier above the stock 34; I've set the Turbo to 38/39/40/41, which means that my computer spends 90% of its time at 3.8 GHz. Additionally, for more than half of those DCs, the computer spent most of it's time at 39*103 GHz and produced no bad results during that time. Are you saying that reducing the OC caused errors?

These data lead me to believe that a cosmic ray accidentally screwing up a bit of my memory is more likely than hardware errors. (And I've turned in at least four good DCs since then, again, no errors.) I'm very meticulous about OCing, especially since my motherboard won't let me be more aggressive. (Hint: Don't buy motherboards from the same company that made the processor. (I know, I know, "Duh", but I got it almost half off retail.))

 Similar Threads Thread Thread Starter Forum Replies Last Post ixfd64 Hardware 4 2011-04-12 02:14 S485122 PrimeNet 15 2009-01-16 11:27 GP2 Data 3 2003-12-01 20:24 dsouza123 Data 6 2003-10-23 22:26 GP2 Data 5 2003-09-15 23:34

All times are UTC. The time now is 22:16.

Sat Apr 4 22:16:21 UTC 2020 up 10 days, 19:49, 0 users, load averages: 1.83, 1.69, 1.59