mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Data

Reply
 
Thread Tools
Old 2012-06-11, 08:35   #45
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

122758 Posts
Default

Quote:
Originally Posted by Dubslow View Post
These data lead me to believe that a cosmic ray accidentally screwing up a bit of my memory is more likely than hardware errors.
I think it is the opposite. A cosmic ray is extremely unlikely to make it all the way to your mobo. Sources of error for memory contents are more likely to be: Vdd upsets (fluctuating power supply to the chips), package impurities (uranium/plutonium/etc. in the plastic package) and trying to read the data too quickly (overclocking). But also note that overclocking also implies a higher voltage and raises the likelyhood of getting a PSU upset. The only thing not under your control is the package impurities.

My guess is that it is indeed overclocking that is causing (or at least amplifying) your problems. You have pushed things to the edge of the cliff and sometimes when the conditions are just right something falls off the edge.
retina is offline   Reply With Quote
Old 2012-06-11, 08:57   #46
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

What problems? I've gotten no errors reported, just one bad DC out of 38 (with even high overclocks for many). Any overclocking is done by Intel's Turbo Boost. Like I said, 90% of the time it doesn't even exceed stock Turbo settings, which are 35/36/37/38. I agree that a literal cosmic ray is unlikely, but we are being bombarded by them all the time. It could be a neutrino strike (even less likely) or even one of the many thousands (millions?) of human-generated sub-luminal frequencies of electromagnetic radiation, or one of the many things you mentioned.

When Intel sells the processor it's rated to hit at least 3.8 GHz (though admittedly only on one core) and I don't exceed that. I still find it much more likely that some uncontrollable memory corruption occurred rather than an overclocking error.
Dubslow is offline   Reply With Quote
Old 2012-06-11, 09:32   #47
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

10100101111012 Posts
Default

Quote:
Originally Posted by Dubslow View Post
What problems? I've gotten no errors reported, just one bad DC out of 38 ...
Isn't that enough? It has already dropped your throughput probably to the same you might get without OC, perhaps less.
Quote:
Originally Posted by Dubslow View Post
... (with even high overclocks for many). Any overclocking is done by Intel's Turbo Boost. Like I said, 90% of the time it doesn't even exceed stock Turbo settings, which are 35/36/37/38. I agree that a literal cosmic ray is unlikely, but we are being bombarded by them all the time. It could be a neutrino strike (even less likely) or even one of the many thousands (millions?) of human-generated sub-luminal frequencies of electromagnetic radiation, or one of the many things you mentioned.
Ermm, "sub-luminal frequencies of electromagnetic radiation"? Is that a real thing? Anyhow, you would need a high energy event to topple a memory bit.
Quote:
Originally Posted by Dubslow View Post
When Intel sells the processor it's rated to hit at least 3.8 GHz (though admittedly only on one core) and I don't exceed that. I still find it much more likely that some uncontrollable memory corruption occurred rather than an overclocking error.
Remember that overclocking is stressing all of the system, not just the thing you raise the clocks on. The common factor is often the PSU or the memory interface.

But don't feel bad about it. We all get bad results from time to time. I am just trying to point you in the right direction to look for the problem. It does not help to point the finger at the wrong thing and then proceed to ignore it because of an assumption that it can't be fixed. If it doesn't bother you then that is fine, you can ignore it and be content to turn in the occasional bad result and keep going. But since you posted here about it then I assumed you were concerned about it and so I am trying to help you to identify the problem.

Last fiddled with by retina on 2012-06-11 at 09:33
retina is offline   Reply With Quote
Old 2012-06-11, 14:40   #48
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

Quote:
Originally Posted by retina View Post
Ermm, "sub-luminal frequencies of electromagnetic radiation"? Is that a real thing?
That was fancy for "radio waves" of all sorts, including 3G/4G networks, old cellphone networks, etc...

How does upping the CPU multiplier stress the memory? I've got a 600W PSU for a 95W TDP CPU and a graphics card which says "Maximum Graphics Card Power (W) 160W". Now obviously the CPU is probably drawing more than 100W, but I still can't imagine my PSU is under too much stress.

It does worry me, but I still can't believe that it was a hardware error other than an uncontrollable memory corruption.
Dubslow is offline   Reply With Quote
Old 2012-06-11, 14:53   #49
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

5,309 Posts
Default

Quote:
Originally Posted by Dubslow View Post
That was fancy for "radio waves" of all sorts, including 3G/4G networks, old cellphone networks, etc...
Those are too low in power to cause a problem like this.
Quote:
Originally Posted by Dubslow View Post
How does upping the CPU multiplier stress the memory? I've got a 600W PSU for a 95W TDP CPU and a graphics card which says "Maximum Graphics Card Power (W) 160W". Now obviously the CPU is probably drawing more than 100W, but I still can't imagine my PSU is under too much stress.
Remember the on-board PSUs that do DC-DC conversion from the main PSU to the ICs. If your mobo is not-the-best-quality(tm) then power delivery rails can be problematic and cause problems in almost any other part of the system. Anyhow, the short of it is, there are lots of other things affected when a seemingly simple adjustment is made to the amount of work one part of the system is doing.
Quote:
Originally Posted by Dubslow View Post
... I still can't believe that it was a hardware error other than an uncontrollable memory corruption.
If we just stick our heads in the sand and choose not to believe then we learn nothing. But for me, ignorance is not bliss.

My suggestion is this: Don't overclock for any long run computation. It just isn't worth it in the long term. Why push your system to the edge of stability and increase the chance of lost work? In the long term you gain nothing once all the lost work is taken into consideration.

Last fiddled with by retina on 2012-06-11 at 14:53
retina is offline   Reply With Quote
Old 2012-06-11, 15:12   #50
kladner
 
kladner's Avatar
 
"Kieren, ktony"
Jul 2011

3×52×127 Posts
Default

Quote:
Originally Posted by retina View Post
My suggestion is this: Don't overclock for any long run computation. It just isn't worth it in the long term. Why push your system to the edge of stability and increase the chance of lost work? In the long term you gain nothing once all the lost work is taken into consideration.
I am taking this to heart. I'm back at stock for CPU/RAM. Another reason it's worthwhile is the current temps in this area. Throttling back knocked off a couple of degrees C.

EDIT: PrimeNet has yet to notice the change, though.

Last fiddled with by kladner on 2012-06-11 at 15:13
kladner is offline   Reply With Quote
Old 2012-07-02, 19:14   #51
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

1C3516 Posts
Default Another quadruple check

http://mersenne.org/report_exponent/?exp_lo=26141851
Dubslow is offline   Reply With Quote
Old 2012-07-10, 10:25   #52
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

160658 Posts
Default Another quadruple check

http://mersenne.org/report_exponent/?exp_lo=26379497
Dubslow is offline   Reply With Quote
Old 2012-07-10, 14:13   #53
bcp19
 
bcp19's Avatar
 
Oct 2011

7×97 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Quote:
Originally Posted by Dubslow View Post
When you run across things like this I often wonder how long the DC took on the 'bad' machine. On the one, the completion dates were 6-11-11 and 5-01-12, so potentially an 11 month run to a bad result (assuming it was not taken and then returned at any point). These older slower machines that take a year or better to complete a DC pose a serious question, how often does an exponent that has been slowly worked on for greater than a year come back as bad?
bcp19 is offline   Reply With Quote
Old 2012-09-27, 07:40   #54
DannyAbel
 
Sep 2012

1 Posts
Default

The red curve show there is some problems about this
DannyAbel is offline   Reply With Quote
Old 2012-10-28, 21:14   #55
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

202010 Posts
Default

Here's a quintuple check.
http://www.mersenne.org/report_expon...xp_lo=27634309
frmky is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
error rate and mitigation ixfd64 Hardware 4 2011-04-12 02:14
EFF prize and error rate S485122 PrimeNet 15 2009-01-16 11:27
A plot of Log2 (P) vs N for the Mersenne primes GP2 Data 3 2003-12-01 20:24
What ( if tracked ) is the error rate for Trial Factoring dsouza123 Data 6 2003-10-23 22:26
Error rate for LL tests GP2 Data 5 2003-09-15 23:34

All times are UTC. The time now is 21:57.

Sat Apr 4 21:57:33 UTC 2020 up 10 days, 19:30, 0 users, load averages: 1.83, 1.61, 1.60

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.