![]() |
SUM(INPUTS) != SUM(OUTPUTS)
Hi,
Today I found the following error in my logs: ............. [Aug 17 22:12] Iteration: 1500000 / 34746359 [4.31%]. Per iteration time: 0.079 sec. Iteration: 1504331/34746359, ERROR: SUM(INPUTS) != SUM(OUTPUTS), 1.351257879789209e+16 != 1.335852 066589149e+16 Possible hardware failure, consult the readme.txt file. Continuing from last save file. Waiting five minutes before restarting. Resuming primality test of M34746359 at iteration 1504257 [4.32%] [Aug 17 23:27] Iteration: 1550000 / 34746359 [4.46%]. Per iteration time: 0.083 sec .............. Its the onliest error which happened on this PIV machine and all standard stresstests passed without probs until now. My question is: Does it make sense to let this LL run or should I restart? I wonder why it resumed the test just from 64 iterations before that error occured. Is it liable that the error occured during these last iterations at all or wouldnt it be better to resume from last (30 minutes older) backup file instead? |
Hi, rudi_m!
As can be seen from the excerpt you posted, P95 continued from the last save file which contained the interim result of iteration 1504257: [quote=rudi_m] [b]Continuing from last save file.[/b] Waiting five minutes before restarting. Resuming primality test of M34746359 at [b]iteration 1504257[/b] [4.32%][/quote]If the failure doesn't recur, everything should be fine. To clarify your last question: if the "minutes between diskwrites" option is set to "30", it means that every 30 minutes after starting/continuing a task, a save file is stored, overwriting the former save file for this task. So if a failure occurs, you lose [u]at most[/u] 30 minutes, if the save file isn't corrupted. HTH Benjamin |
[QUOTE=rudi_m]Is it liable that the error occured during these last iterations at all or wouldnt it be better to resume from last (30 minutes older) backup file instead?[/QUOTE]The program actually does the check for sum(outputs) = sum(inputs) at the end of every iteration. It issues the error message immediately when the difference exceeds a threshold. Going back even just one iteration before the erroneous one reaches a point at which the calculation was okay. The last savefile is always at least one iteration before the erroneous one.
|
[QUOTE=S80780]
If the failure doesn't recur, everything should be fine.[/QUOTE] Oki thx, I hope that the comp got just a bad day and now he is ok again :) [QUOTE] To clarify your last question: if the "minutes between diskwrites" option is set to "30", it means that every 30 minutes after starting/continuing a task, a save file is stored, overwriting the former save file for this task. So if a failure occurs, you lose [u]at most[/u] 30 minutes, if the save file isn't corrupted. [/QUOTE] Ok, I know - but before overwriting it makes a 2nd backup file. (TwoBackupFiles=1 in prime.ini) So thought that it would be more safe to resume from older backup. -rw-r--r-- 1 rudi users 8388630 Aug 22 17:23 pY746359 -rw-r--r-- 1 rudi users 8388630 Aug 22 16:53 qY746359 (the 2nd file is the 30 minutes older backup) |
| All times are UTC. The time now is 23:24. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.