mersenneforum.org  

Go Back   mersenneforum.org > New To GIMPS? Start Here! > Information & Answers

Reply
 
Thread Tools
Old 2007-04-10, 22:28   #1
Unregistered
 

52×199 Posts
Default Errors due to cosmic rays

How many of GIMPS's errors do people think were due to cosmic rays (just looking for estimates)? 2 primality tests? 5? Maybe more?

I just want to see what everyone else thinks.
  Reply With Quote
Old 2007-04-11, 04:36   #2
E_tron
 
E_tron's Avatar
 
Sep 2002
Austin, TX

3×11×17 Posts
Default

In the past, I blamed cosmic rays for stability issues, but i now think they were not the reason for my old computer's issues.

The probability of a cosmic ray disintegrating in a computers CPU die is infinitesimal. Even if it happened, it would have little effect on a transistor that is constantly verified by its millions of neighbors. Or course, it only takes one bad iteration to yield incorrect results, so maybe 1 bad LL test is due to a cosmic rays.
E_tron is offline   Reply With Quote
Old 2007-04-11, 04:57   #3
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

22·3·641 Posts
Default

I think we need some data on which to base such estimates. Googling on the phrase

"cosmic rays" CPU errors

finds about 111,000 hits, such as:
Quote:
Originally Posted by http://www.edn.com/article/CA454636.html
"Soft errors' impact on system reliability"

...

During the SER (soft-error-rate) session of the 2003 IRPS (International Reliability Physics Symposium), Texas Instruments reliability scientist Robert Baumann stated that, "Soft errors induced the highest failure rate of all other reliability mechanisms combined." As device technology scales, the number of stages in the processor pipeline increases, the area efficiency of memory devices decreases, and a device's natural resistances against SEUs (single-event upsets) decreases.

...

The rate at which SEUs occur is given as SER, and you measure it in FITs (failures in time), which is the number of failures in 1 billion device-operation hours. A measurement of 1000 FITs corresponds to a MTTF (mean time to failure) of approximately 114 years.

The potential impact on typical memory applications illustrates the importance of considering soft errors. A cell phone with one 4-Mbit, low-power memory with an SER of 1000 FITs per megabit will likely have a soft error every 28 years. A high-end router with 10 Gbits of SRAM and an SER of 600 FITs per megabit can experience an error every 170 hours. For a router farm that uses 100 Gbits of memory, a potential networking error interrupting its proper operation could occur every 17 hours. Finally, consider a person on an airplane over the Atlantic at 35,000 ft working on a laptop with 256 Mbytes (2 Gbits) of memory. At this altitude, the SER of 600 FITs per megabit becomes 100,000 FITs per megabit, resulting in a potential error every five hours. The FIT rate of soft errors is more than 10 times the typical FIT rate for a hard reliability failure. Soft errors are not the same concern for cell phones as they can be for systems using a large amount of memory.

...

The four common sources of SEUs are low-energy alpha particles, high-energy cosmic particles, thermal neutrons, and poor system design (Table 1). Low-energy alpha particles are generated by the radioactive decay of trace uranium-238 and thorium-232 in quartz filler used in mold compounds, or from polonium-210 in lead bumps that flip-chips use. These impurities release alpha particles with energy of 2 to 9 MeV (million electron volts). The energy required to form an electron-hole pair in silicon is 3.6 eV. Therefore, alpha particles can generate approximately 106 electron-hole pairs.

The electric field in the depletion region directly generates electron-hole pairs in its wake, causing the charges to drift so that the transistor sees a current disturbance (Figure 2). The depletion regions under the effect of the electric field collect free electrons. A fraction of the excess charge drifts to device nodes and, if it exceeds a certain critical charge, Qcrit, may flip the state of the memory cell. A lower Qcrit results in a higher SER. Alpha particles normally cause SBUs because they have lower energies, but they can cause MBUs in devices with low supply voltages.

High-energy cosmic particles react with the upper atmosphere of the Earth, and their collisions, modulated by solar flares and intergalactic cosmic rays, generate high-energy protons and neutrons. High-energy neutrons have energies of 10 to 800 MeV; in contrast, protons have energies greater than 30 MeV. High-energy neutrons have no charge; therefore, they do not coulombically interact with the semiconductor material, so their interaction with silicon differs from that of an alpha particle. For a high-energy neutron to cause a soft error, it must produce ionized particles by colliding with the silicon nucleus and undergoing impact ionization with the silicon nuclei. This collision can generate alpha particles and other heavier ions, thus producing electron-hole pairs but with higher energies than a typical alpha particle from mold compounds.

Neutrons are particularly troublesome, because they can penetrate most manmade construction; for example, a neutron can pass through five feet of concrete. The flux rate is geoposition-dependent and increases at higher altitudes due to a lower shielding effect of the atmosphere. In London, the effect is 1.2 times worse than at the equator. In Denver, with its high altitude, the effect is three times worse than at sea level in San Francisco. In an airplane, the effect can be 100 to 800 times worse than on the ground.

Thermal neutrons are major contributors to soft failures and typically have energies of approximately 25 meV. The Boron-10 isotope that occurs in large quantity in BPSG (boron-phosphor-silicate-glass) dielectric layers easily captures these low-energy neutrons. Capturing a neutron results in a fission that produces lithium, an alpha particle, and a gamma ray, which may lead to potential bit-flips. Thermal neutrons are primarily an SEU issue only if BPSG is present; eliminating the use of B-10 isotopes effectively addresses the problem.

Poor system design is the final common source of SEUs. High-performance memory devices normally comprise SRAM cells, combinational logic, and latches. In high-performance communication-memory products, the area efficiency is usually low. Past academic research shows that combinational logic is less susceptible to soft errors than memory cells because of natural resistances set up by masking. However, these natural resistances could diminish as devices scale and the number of stages in the processor pipeline increases.
... and more.

Note the sources of potentially-damaging radiation within the very materials of which computer circuits are manufactured. It's earthly rays as well as cosmic ones.

Quote:
Originally Posted by http://www.filibeto.org/~aduritz/ecache-sram-data-parity-err.html
"Ecache SRAM Data Parity Error"

Cosmic rays actually do cause bit rot. A study in the 80s by IBM placed RAM testers in Boulder, Colorado, Leadville, New York City, and underground in Kansas City. Boulder had 5 times more errors than New York, and Leadville had ten times as many as New York. The elevation of the towns has a lot to do with it, since Leadville doesn't have as much atmosphere to absorb sub-atomic particles at 10,152ft. Boulder is at about 5,000ft. New York is at sea level. However, the shape of the earth's magnetic field has a lot to do with it, too. la Paz has a similar altitude to Leadville's, but is at a different latitude.

...

All this stuff can be found in an IBM research journal somewhere. Specifically, the IBM Journal of Research and Development, Volume 40, Number 1.
So keep your LL-testers in the basement ... unless your area, like mine, happens to have significant amounts of radium and radon in the ground, in which case the attic might be safer, especially if your roof has lead shingles ... or move (by surface transportation, please) to New York City or Kansas City ... or near the equator, like Rio de Janeiro -- yeah, that's it: some equatorial seacoast. Oh, wait -- there's the South Atlantic Anomaly (http://en.wikipedia.org/wiki/South_Atlantic_Anomaly). Hmmm ... Mumbai!

Last fiddled with by akruppa on 2007-04-12 at 09:21 Reason: South Atlantic Anomaly (fixed 106->10^6, Alex)
cheesehead is offline   Reply With Quote
Old 2007-04-12, 08:01   #4
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

22·3·641 Posts
Default

Quote:
Originally Posted by cheesehead View Post
Quote:
Originally Posted by http://www.edn.com/article/CA454636.html
"Soft errors' impact on system reliability"[/i]

...

These impurities release alpha particles with energy of 2 to 9 MeV (million electron volts). The energy required to form an electron-hole pair in silicon is 3.6 eV. Therefore, alpha particles can generate approximately 106 electron-hole pairs.
"approximately 106 electron-hole pairs" should be "approximately 106 electron-hole pairs."
cheesehead is offline   Reply With Quote
Old 2007-04-13, 02:46   #5
Unregistered
 

11011101001002 Posts
Default

I'm putting the estimate at 20 primality tests that were incorrect because of cosmic rays. It sounds like a lot, but it only takes one incorrect iteration to mess up the result of a primality test. This problem will continue to get worse as the length of a test increases.

Does anyone agree/disagree with me?
  Reply With Quote
Old 2007-04-13, 13:51   #6
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

67258 Posts
Default

Quote:
Originally Posted by Unregistered View Post
I'm putting the estimate at 20 primality tests that were incorrect because of cosmic rays. It sounds like a lot, but it only takes one incorrect iteration to mess up the result of a primality test. This problem will continue to get worse as the length of a test increases.

Does anyone agree/disagree with me?
My guess is the number is much higher. If GIMPS has a ~1% error rate for LL tests, far more than 20 tests would be ruined because of cosmic rays. It doesn't take just one incorrect iteration, it takes one incorrect bit to spoil an entire LL test.

jasonp
jasonp is offline   Reply With Quote
Old 2007-04-13, 17:01   #7
Mini-Geek
Account Deleted
 
Mini-Geek's Avatar
 
"Tim Sorbera"
Aug 2006
San Antonio, TX USA

102538 Posts
Default

Quote:
Originally Posted by jasonp View Post
My guess is the number is much higher. If GIMPS has a ~1% error rate for LL tests, far more than 20 tests would be ruined because of cosmic rays. It doesn't take just one incorrect iteration, it takes one incorrect bit to spoil an entire LL test.

jasonp
Sure, there's a ~1% error rate, but how many of those are caused by cosmic rays is the question.
Mini-Geek is offline   Reply With Quote
Old 2007-04-13, 18:04   #8
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

22·3·641 Posts
Default

My guess is that almost all LL errors are either "hard" (repeatable) errors due to hardware problems such as bad RAM, or soft errors due to hardware/setup flaws such as inadequate cooling or excessive overclocking, and that soft errors due to radiation (including, but not limited to, cosmic rays) are definitely a minority. I think hard errors due to radiation (i.e., radiation-damaged circuitry) are much less common than soft errors due to radiation, but have no actual data on that.

So, in answer to the thread question: I think a small, but not totally negligible, fraction (~3%, say) of the ~1% of LL results that are erroneous are due to cosmic rays, and those would be concentrated among those testers who were at high altitudes or, to a lesser extent, high latitudes.

If we knew enough testers' latitudes (and longitudes of those nearest the South Atlantic Anomaly) and altitudes, we might be able to tease out a (faint) correlation between error rate and latitude/altitude.

Last fiddled with by cheesehead on 2007-04-13 at 18:21
cheesehead is offline   Reply With Quote
Old 2007-04-14, 00:37   #9
brunoparga
 
brunoparga's Avatar
 
Feb 2006
Brasília, Brazil

3258 Posts
Default

Wow. I'd never heard of that "South Atlantic Anomaly". Let me check that out.

The anomaly is roughly ellipse-shaped...
the minor axis is something between 0 and -40 degrees...
the major axis is around -20 to -90...

I'm in -23, -46... that explains a lot my tail, for instance.

I wonder if any of the few tests I've done/factors I've found are wrong
brunoparga is offline   Reply With Quote
Old 2007-04-16, 00:06   #10
hhh
 
hhh's Avatar
 
Jun 2005

37310 Posts
Default

Quote:
Originally Posted by cheesehead View Post
Hmmm ... Mumbai!
Mally is there. Mally and his pyramids. Nobody knows what kind of impact these concentrate the cosmic rays causing brain tumor leading to human error in chip ingeneering like the pentium division bug making the "P-90 year" unit inaccurate driving the stats-monkeys nuts so they see no other way than overclocking to keep their testosterone high enough and get unstable computers.
H.
hhh is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
ERRORS Unregistered Information & Answers 2 2013-04-01 04:14
!@#$%^& Mail Errors schickel Aliquot Sequences 5 2012-04-22 06:41
Attack of the Cosmic Rays S485122 Hardware 3 2010-08-24 01:19
Request for Phenomenal Cosmic Power fivemack Forum Feedback 2 2008-01-14 14:27
Life ruled by cosmic forces! mfgoode Science & Technology 33 2007-05-10 06:16

All times are UTC. The time now is 10:00.


Sat Jul 17 10:00:08 UTC 2021 up 50 days, 7:47, 1 user, load averages: 1.02, 1.18, 1.22

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.