![]() |
|
|
#1 |
|
Dec 2007
22 Posts |
Hi,
For some reason, my mostly x64 Q6600 quad system (I am multibooting different OSes, XP64 and Vista64 seem to be the problem) has the habit of bombing out quite deeply into my primenet jobs. This week I lost 4 primes that were around 80% ready with a variety of SUMOUTs all the gory details are in http://ructrash.googlepages.com/results.txt . I am running 25.5 now, I think I "upgraded" a couple of months ago because of similar earlier losses. My system withstands torture tests and anything else I throw at it with ease, it is not overclocked or anything. My two main suspicions are that either primenet being down triggered a bug and/or that me running other demanding proggies ( multithreaded chess for example) triggers some other insanity. The latest misery happened in XP64, and was in fact stuck in multiple SUMOUTs that seemed to recover only just before(!) or after I changed file permissions to all of those file to a permissive 666 (to those unfamiliar with Unix, I am not a satanist). Anyway, I can't see how to help debug this ultra-random and rare behaviour, which cost me months of lost processing time, but my real beef is in the next paragraph. Even working with the standard "two backup files", the "recovery" code in prime95 is doing more damage than good: as soon as an error is detected, it pauses, uses a backup and, if the error repeats, it discards the backup. So, 2 or 3 consecutive errors and both backup files are gone, not just all the work lost but no possibility for a postmortem anymore too. This behaviour is whack! Also, going through the documentation, I am not so sure there is any manual recovery information out there, say if I happened to undelete the q3465785 file, would it be enough to put in the work directory, or should it be renamed to p3465785 or what. I am nearly devastated by all this, I had no idea prime95 has been stubbing me in the back for so long, my account report shows I have submitted only one exponent after months of multi-computer, multi-processor work. So, fellow mersenners, backup, backup, backup! And someone have a look at the recovery process. |
|
|
|
|
|
#2 |
|
"Mike"
Aug 2002
25·257 Posts |
From undoc.txt:
You can have the program generate save files every n iterations. The files will have a .XXX extension where XXX equals the current iteration divided by n. In prime.ini enter: InterimFiles=n |
|
|
|
|
|
#3 |
|
Dec 2007
22 Posts |
I have activated the interim files, still not sure if they are used for recovery automagically or I will have to do some manipulation. Anyway, prime95 bombed out all 4 threads again with SUMOUTs, so the permissions on the files is not what is breaking it. I still think other high maintenance processses may be breaking it, one suspect is Google Desktop on my system, has to deep-index quite a ton of documents.
|
|
|
|
|
|
#4 |
|
"Mike"
Aug 2002
25·257 Posts |
From undoc.txt:
PauseWhileRunning=prog1,prog2,prog3,etc Note that prime95 will pause if the program name matches any part of the running program's file name. That is "foobar" will match "c:\foobar.exe", "C:\FOOBAR\name.exe", and even "C:\myfoobarprog.exe". By default, prime95 will check the list of running programs every 64 iterations, but not more frequently than every 10 seconds. You can adjust the time period with this prime.ini setting: PauseCheckInterval=n where n is the number of seconds between checking which programs are running. |
|
|
|
|
|
#5 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
2·53·71 Posts |
That is a nasty looking results file. I suspect there are some serious hardware issues going on. If true, LL testinf is likely a waste of your CPU resources.
Last fiddled with by Prime95 on 2007-12-18 at 02:31 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Prime95 suspect credit for extended assignment arrangements broken? | snme2pm1 | PrimeNet | 6 | 2014-04-12 03:53 |
| N-1 Broken? | wblipp | FactorDB | 4 | 2012-11-30 22:54 |
| Something broken? | schickel | Forum Feedback | 9 | 2011-08-16 04:43 |
| Server machine Aug. 20th crash and backups | kar_bon | No Prime Left Behind | 29 | 2010-08-30 03:26 |
| broken 1.6A | PageFault | Hardware | 5 | 2003-10-16 01:40 |