mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2005-08-05, 04:32   #1
Citrix
 
Citrix's Avatar
 
Jun 2003

2×7×113 Posts
Default computing errors?

I am just curious that are there any conditions where the computer may produce an invalid result for a a+b compution.

By conditions I mean that the CPU is too hot etc. Please explain if you can, as to why the computer would do so and what is the solution to that problem.

Citrix
Citrix is offline   Reply With Quote
Old 2005-08-05, 14:12   #2
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

192 Posts
Default

Quote:
Originally Posted by Citrix
I am just curious that are there any conditions where the computer may produce an invalid result for a a+b compution.

By conditions I mean that the CPU is too hot etc. Please explain if you can, as to why the computer would do so and what is the solution to that problem.
I'll try to explain it in a short way:
Temperature is a factor, which has an effect on the behaviour of the materials used for creating all the transistors and wiring on the chip. Higher temperatures make the electrons, which are flowing through the transistor, faster, while the reverse happens to the metals used for connecting individual transistors up to different areas on the chip. While we wouldn't care about faster switching transistors we surely doubt, that slower electrons in the wiring metal layers will help us.

Now imagine the processor clock as a signal switching between a low and a high voltage (high = the operation voltage of the processor). During such a switch a lot of internal results are transferred to another place. Due to the internal processor pipelines all things have to be processed in small steps, so that not too much is done during one step, which allows higher clock speeds.

The internal units, which do all sort of things (like an adder, a multiplier, a store unit etc.) are designed in a way, that they'll do their work fast enough to finish before the (clock) switch to the next step happens. Sometimes such operations are designed to finish in half a clock cycle (between clock up->down and down->up) and more difficult operations are designed to use a full clock cycle (between clock down->up and the next down->up flank).

The chip designers use target specifications (like min./max. temperature), for which the chip has to do its job as expected. And sometimes they have to design things, which are close to the given limits (eg. an 64 bit addition could take 360 ps, while a full clock period is 400 ps). A multiplier is more difficult, because it does a lot of parallel addition operations at once. And during these operations some other things are necessary like taking care of the carries. If the temperature is higher than the chip is spec'ed for, then it could happen, that the signals in the wires are so slow, that some information like a last carry bit or some other bits of a multiplication result don't make it into the transfer latch (where they are stored to be fed into the next units) in time. Then the result is wrong. Such places are also named "critical path".

Solution: Use a better cooling solution or a lower clock speed.

Last fiddled with by Dresdenboy on 2005-08-05 at 14:13
Dresdenboy is offline   Reply With Quote
Old 2005-08-05, 18:41   #3
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

103·113 Posts
Default

Quote:
Originally Posted by Citrix
I am just curious that are there any conditions where the computer may produce an invalid result for a a+b compution.

By conditions I mean that the CPU is too hot etc. Please explain if you can, as to why the computer would do so and what is the solution to that problem.

Citrix
When dealing with integer data, there is also a quite common problem having nothing to do with CPU errors, namely when the sum a+b is larger than the integer type in question can store, resulting in integer overflow. Fortunately, that is easy to check for (alas, there are huge amounts of code out there that don't do this): do an unsigned compare of the result of a+b with either original input: if (a+b) < (a or b), there was a carry (i.e. overflow occurred on add).
ewmayer is offline   Reply With Quote
Old 2005-08-05, 19:11   #4
Citrix
 
Citrix's Avatar
 
Jun 2003

62E16 Posts
Default

Ewmayer, So one step is slower, it will slow the whole calculation, not produce wrong results? When can a computer produce wrong results?

Assume for simplification you are trying to add 1+0=1 , can the computer return 0 to this in some odd circumstance?

Citrix
Citrix is offline   Reply With Quote
Old 2005-08-06, 06:46   #5
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

22×3×641 Posts
Default

Quote:
Originally Posted by Citrix
So one step is slower, it will slow the whole calculation, not produce wrong results?
No. What Dresdenboy was explaining is that the CPU operations proceed in discrete steps according to the clock cycles, and the circuitry is designed to be able to fully complete each operation in the specified number of clock cycles as long as the environment is within specified limits, such as temperature being less than XX degrees. If a too-high temperature causes some signals to be late, then the next clock cycle starts anyway (assuming the temperature hasn't affected the clock rate) with incorrect data, so it gives wrong results.

Quote:
Assume for simplification you are trying to add 1+0=1 , can the computer return 0 to this in some odd circumstance?
Yes. E.g., if the 1-bit is too late getting to the adder, the adder could add 0 + 0 instead of 1 + 0.
cheesehead is offline   Reply With Quote
Old 2005-08-06, 13:56   #6
Citrix
 
Citrix's Avatar
 
Jun 2003

2·7·113 Posts
Default

Thanks cheesehead, that makes sense! Can anything other than heating also cause this problem?

Citrix

Last fiddled with by Citrix on 2005-08-06 at 13:58
Citrix is offline   Reply With Quote
Old 2005-08-08, 23:59   #7
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

22×3×641 Posts
Default

Quote:
Originally Posted by Citrix
Can anything other than heating also cause this problem?
Yes, several things.

For instance: cosmic rays. Seriously.

Example: A high-energy gamma ray from space smashes into your CPU chip one morning. It smacks straight into a silicon atom in one of that CPU's gates and knocks it around so hard that that atom knocks a number of others out of place and scatters a bunch of electrons ... just as the adder was outputting the pulse for the 1-bit after adding 0 + 1 through that gate on its output side. The abnormal electron spray disrupts the circuit pulse so that it looks like a 0 instead of a 1. Everything settles down quickly, and a couple of missing Si atoms don't really have a lasting effect ... but the erroneous output pulse causes 0 + 1 = 0 that one particular time.

Over a few years more cosmic rays mess up the nice neat silicon lineup in a memory chip until ... one day bit 2 of the byte at location 1F88843 gets permanently disrupted so that it never again reads out a 0-bit, but always gives a 1 no matter what data was written to 1F88843. So a memory chip goes bad after working perfectly for years.

(BTW this type of problem will happen slightly more often to a CPU in mile-high Denver than to a CPU in sea-level Miami.)

(Egads! It'll also happen more often here in SE Wisconsin where the soil has a higher-than-average radium content and breathing radon in your basement is a significant lung cancer risk.)

Last fiddled with by cheesehead on 2005-08-09 at 00:11
cheesehead is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Error while Computing Antonio NFS@Home 5 2016-06-30 17:30
GPU Computing Cheat Sheet (a.k.a. GPU Computing Guide) Brain GPU Computing 20 2015-10-25 18:39
How to start GPU computing? colinhester GPU Computing 6 2011-07-25 13:54
The ATI GPU Computing thread Brain Hardware 7 2009-12-19 18:54
The difference between P2P and distributed computing and grid computing GP2 Lounge 2 2003-12-03 14:13

All times are UTC. The time now is 08:30.


Sat Jul 17 08:30:48 UTC 2021 up 50 days, 6:18, 1 user, load averages: 1.69, 1.67, 1.56

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.