![]() |
|
|
#12 |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
1A8916 Posts |
Those results are not based upon real-world usage. They were exposed systems deliberately arranged to capture as much external influence as possible IIRC. Any normal system with a small cross section of DRAM, enclosed within a steel surrounding case, inside a concrete building, receives very little in the way of cosmic ray caused events.
Last fiddled with by retina on 2018-03-23 at 05:13 |
|
|
|
|
|
#13 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Quote:
Cosmic ray arrival rate at a dime sized area at ground level is about 14/second, as indicated by a Geiger counter. Inside the main floor of a multistory brick school building had a significant flux, as I recall from conducting the experiment in high school. I've worked recently with scientists who need to put their experiments in mines or other locations thousands of feet underground to get the cosmic ray background low enough. (See LBNE, NUMI, DAYA BAY, ICE CUBE, etc.) "Cosmic rays dominate" was my summary of my impression after reading the entire soft error article, which looks well referenced to credible sources; IBM, ACM, IEEE, Cypress Semiconductor etc. Cosmic rays are 90% protons. Rates at the earth's surface are a strong function of particle energy. A 100MeV proton has range ~14mm in steel. Rate striking my roof of about that energy would be about 20/cm2/sec. Asphalt shingles, wood roof decking, a couple layers of drywall, and 1 of flooring and a mm of steel computer case wouldn't amount to the equivalent of 14mm of steel for stopping power. See NIST tables and charts available from https://physics.nist.gov/PhysRefData...ext/PSTAR.html Although, there are other sources of energetic particles. Trace radioactivity is all over. The Mechanical Engineering Building on the UW-Madison campus contains a little uranium, in the stone it was built of. Humans and a lot of foods contain potassium, some of which is K40. Leave some traces inside the computer case by fingerprints, or dust, and the 1.3 or 1.4MeV decay particles do not need to penetrate the computer case, only the plastic chip package. There are no concrete walled and roofed buildings in my neighborhood. But the concrete itself is likely to be slightly radioactive, and produce higher energy particles than K40 does. https://www.cdc.gov/nceh/radiation/building.html Cosmic rays arrive from above. Laptops typically have their memory in a horizontal plane, approximately maximizing the chip target area, except the memory modules may stack over each other. Checking one of my tower cases, those memory chips are also oriented in a horizontal plane. The towers have cpus oriented in a vertical plane and the big aluminum heatsink/fan assemblies would provide some shielding. Gpu chips in the tower cases are in a horizontal plane and are large; heatsink shielding is less because some are 2-slot and the rest are 1-slot width overall. Orientations won't matter much, unless the systems are in a basement near a wall, since the cosmic rays are nearly isotropic at ground level before solid shielding considerations. http://lss.fnal.gov/conf2/C990817/o1_3_04.pdf |
|
|
|
|
|
|
#14 | ||
|
"Mihai Preda"
Apr 2015
5AC16 Posts |
Quote:
Quote:
For example, in Nvidia's camp, there was the "carry" bug that kept being fixed and coming back. In AMD's camp, there are multiple compiler issues that are being tracked and fixed in the open (which is nice): https://github.com/RadeonOpenCompute/ROCm/issues As some of these issues trigger in rare and seemingly non-deterministic circumstances, they may appear similar to memory bit-flips but aren't fixed by ECC. OTOH the situation on CPUs is much better, and probably most of the errors are ECC-fixed there. |
||
|
|
|
|
|
#15 |
|
"Jacob"
Sep 2006
Brussels, Belgium
2×977 Posts |
I am not convinced about the need for ECC memory nowadays. When I look at the 39000000-40000000 range the error rate for LL tests is relatively low at 2,8%. Then I think that current hardware (let us say from DDR3 on) is much less error prone. For instance I have a machine that has done 1350 double checks in 32 month, all results were correct. Then the software has improved : we now have the Jacobi error check.
Jacob |
|
|
|
|
|
#16 |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
11010100010012 Posts |
|
|
|
|
|
|
#17 |
|
"Forget I exist"
Jul 2009
Dartmouth NS
8,461 Posts |
|
|
|
|
|
|
#18 |
|
"Victor de Hollander"
Aug 2011
the Netherlands
32×131 Posts |
In practise there are a few machines with a lot of bad results that skew the error rate. Most machines are almost error free.
|
|
|
|
|
|
#19 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1E9016 Posts |
Quote:
|
|
|
|
|
|
|
#20 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Quote:
Last fiddled with by kriesel on 2018-03-23 at 17:48 |
|
|
|
|
|
|
#21 | |||
|
"Jacob"
Sep 2006
Brussels, Belgium
7A216 Posts |
Of course it is too high. But you cannot limit participation to GIMPS to server grade hardware only. Then that range was initially tested a long time ago.
Quote:
Quote:
Quote:
Jacob |
|||
|
|
|
|
|
#22 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Quote:
and the rate of wrong residue is 2.8% in the double-check population over the same completed exponent interval, and the first and second tests per exponent are completely independent, on independent hardware so no systematic hardware errors, and there is no systematic error in the algorithm or code causing duplication of wrong residues on the same exponent (and that is an important design goal), so that the wrong residues are different, the wrong residue rate is 2.8% of attempts, and whether the wrong residues occur sometimes at the same exponent or not, the mismatch rate is 2.8% of the residues. Per exponent, the wrong residue rate is 5.6% or very near it, before subsequent needed third checks are run. Take an interval containing 1,000,000 prime exponents that have survived trial factoring and P-1 factoring attempts. Run a first and second test on each. There will be about 28,000 wrong residues among the first tests, and about 28,000 wrong residues among the double checks. For a given exponent, under these assumptions, a) first test correct, double check correct; both are matches. probability .972^2=94.4784% of the exponents. b) first test wrong, double check right; both are mismatches, one is wrong. Probability 2.8% *97.2% = 2.7216% of exponents. Triple check probably clears it up. Triple check is subject to the same 2.8% likelihood of being wrong, and affects exponents-with-mismatches rate but does not affect the wrong-residue rate. We hope the random shift and other error checking prevents the third or fourth check from matching the wrong residue(s). c) first test correct, double check wrong; both are mismatches, one is wrong. See b) d) first test wrong, double check wrong; both are mismatches, both are wrong, and they differ. .028*.028 = 784ppm of exponents. A third check if correct will need a fourth check to probably get a match. We hope the random shift and other error checking prevents the third or fourth check from matching one of the wrong residues. e) first test wrong residue, double check the same wrong residue, counting as matches but misleading, is excluded by the premises. ~0ppm of the exponents. If this case occurs, it reduces d's probability. I think George et al have considered this possibility. I recall Madpoo posting about doing triple checks of all very low exponents. f) total: 1e6 ppm of exponents, check. Number of wrong residues divided by number of exponents, 5.6%, before triple checking, because two residues have been run per exponent, so there are twice as many residues as exponents. Suppose the first test has gone wrong. That's 2.8% of exponents. If it's specific to a particular offset, and the exponent is ~50M, there's a 20ppb chance of reusing the same offset in the double check. .028 x 20ppb = 560.e-12. Further out, at exponent ~2000M, there's a 0.5ppb chance of randomly picking the same offset. Last fiddled with by kriesel on 2018-03-23 at 19:10 |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Manual assignment results ? Have I found a new prime ? -- False alarm. | JonRussell | PrimeNet | 21 | 2018-02-28 02:08 |
| Generate Unrestricted Grammars | Raman | Puzzles | 3 | 2013-09-15 09:15 |
| New(?) Algorithm to Generate Cycles | russellharper | Factoring | 10 | 2010-12-01 01:33 |
| An equation to generate all primes that uses 2 & 3 | Carl Fischbach | Miscellaneous Math | 16 | 2007-10-10 16:43 |
| Notifying a user with false results | Thomas | Lounge | 6 | 2003-07-18 07:28 |