![]() |
|
|
#1 |
|
Jun 2003
22×11×37 Posts |
Some of my computers (All of which are identical) say they have the following error.
Bit: 2176/1842096, ERROR: ROUND OFF (0.465637207) > 0.40 Possible hardware failure, consult the readme file. Continuing from last save file. What does it mean? Secondly what should I do? Thirdly are the residues from the same machines that were not affected by the error correct or not? Also for n=1842096, since the machine started from scratch, is the residue trust worthy? Thanks, Citrix edit: I was using LLRnet. Last fiddled with by Citrix on 2006-01-04 at 02:08 |
|
|
|
|
|
#2 |
|
May 2004
FRANCE
24×3×13 Posts |
This is very probably not an hardware error, and if so, it is harmless.
Another LLR user encountered this problem in November 2005. I had then a discussion with George Woltman about that. George wrote : "Hi, At 10:00 AM 11/22/2005, Darren Bedwell wrote: Bit: 291584/410094, ERROR: ROUND OFF (0.4375) > 0.40 Possible hardware failure, consult the readme file. Continuing from last save file. The error is harmless. It means that you are testing a number near the limit of this FFT size. If the 0.4375 was 0.5 or 0.4999... then the cause is a hardware problem. LLR should handle this more gracefully -- not entering an infinite loop resuming from the last save file." That is what I will do in the next release : If the exact error is reproduced on the exact same bit number, the program will say "disregard that error message, all is OK" and continue on. For now, if the error occurs on an SSE2 machine, you may work around by setting "CpuSupportsSSE2=0" in the .ini file, and redo the test on the problematic number ; the x87 gwnums code beeing totally different, it will work. Note that this error does not affect any other test done on the same machine. I am sorry for the drawbacks... Jean |
|
|
|
|
|
#3 |
|
"Erling B."
Dec 2005
3×5×7 Posts |
So I am not the only one with this problem.
Bit: 1825443/1825449, ERROR: ROUND OFF (0.4999542236) > 0.40 Possible hardware failure, consult the readme file. and I have had this one to.... Bit: 478553/1825215, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 3300737325341117 != 1080654913947812 Possible hardware failure, consult the readme file. then I changed the AMD CPU on the motherboard without any success. |
|
|
|
|
|
#4 | |
|
Jun 2003
22×11×37 Posts |
Quote:
Citrix |
|
|
|
|
|
|
#5 |
|
Jun 2003
22·11·37 Posts |
Lars,
What happened to the residues with this error that I had emailed you, a few months back. Were they double checked? Are the computers safe to put more ranges on? Do they still need double/triple checking? If you could post the numbers, I am sure some one will volunteer to double check them. Thanks! |
|
|
|
|
|
#6 |
|
Jun 2005
37310 Posts |
Sure I'll do it. But Lars is on vacation now. Do you still have these numbers?
BUT: The round-off error is, from what I read, really harmless. The wrong thing is in fact to call it an error. The program expects a value, and if it does not quite match it's expectations, restarts from the save file in order to see if the same result will occure again. If yes, everything is fine, and it can interpret the result in a correct way. Something like this. It's related to the FFT-thresholds. You could highten the FFT-length and not get the error, but the computing time would go up. So you prefer to restart from the safefile from time to time, but are still sooner done with the test. Don't worry. H. |
|
|
|
|
|
#7 | |
|
Jun 2003
162810 Posts |
Quote:
Lars, does not go for vacation for another 3 days. I did some double check myself and more than 1/2 the residues failed irrspective of whether the error occured or not, that were done on computers producing the errors. For residues to double check, see file attached. Last fiddled with by Citrix on 2006-05-24 at 18:18 |
|
|
|
|
|
|
#8 |
|
Apr 2003
22×193 Posts |
I have not done the triple checks for your numbers so far.
I would not recomment to put these machines back to work as they produced wrong results when their was no warning from the client. I have at the moment 100 triple checks in my queue but don't know when i will find the time( resources) to finish them. Most of the checks are needed due to the fact that one residue was done with PRP or an old llrnet version and the other one with the new llrnet. Lars |
|
|
|
|
|
#9 |
|
Jun 2003
22·11·37 Posts |
If you post the numbers. I am sure we can divide the work.
As for doing the numbers, all the numbers were done on the new LLR client. ( I never used PRP or the old llrclient to do these numbers).
|
|
|
|
|
|
#10 |
|
Apr 2003
22·193 Posts |
The different client is the reason for most of the other "errors" and not special for your ranges.
The attached file contains all tests that need a third test. The client to be used for these test should be llr3.5 and greater or llrnet3.5 Cheers, Lars Last fiddled with by ltd on 2006-05-24 at 19:55 |
|
|
|
|
|
#11 |
|
Jun 2005
5658 Posts |
So it seems to be an issue about the machines, not the round-off error which is not an error, but more like a sign of correctness of the program.
PM me a file with those numbers, and I'll put them to my P4 on Friday. It ran stable the last time. Why don't you run a torture test on all of your systems, and then try to make a permutation on the set of the hardware parts of the faulty ones. H. Edit: I started the tests today. Guess it'll take a week or so. H. Last fiddled with by hhh on 2006-05-26 at 12:57 |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Hardware Error after 1s | StechusKaktus | Information & Answers | 13 | 2018-02-20 07:46 |
| Hardware Error? | Fred | Software | 11 | 2016-03-09 19:18 |
| Possible hardware error | kladner | Hardware | 2 | 2011-09-01 22:13 |
| Software error or hardware error | GuloGulo | Software | 3 | 2011-01-19 00:36 |
| Error, hardware causing CRC error's | Unregistered | Information & Answers | 3 | 2008-05-05 05:40 |