mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2007-12-16, 12:04   #1
abstractius
 
Dec 2007

22 Posts
Default Prime95's backups broken?

Hi,

For some reason, my mostly x64 Q6600 quad system (I am multibooting different OSes, XP64 and Vista64 seem to be the problem) has the habit of bombing out quite deeply into my primenet jobs. This week I lost 4 primes that were around 80% ready with a variety of SUMOUTs all the gory details are in http://ructrash.googlepages.com/results.txt . I am running 25.5 now, I think I "upgraded" a couple of months ago because of similar earlier losses. My system withstands torture tests and anything else I throw at it with ease, it is not overclocked or anything. My two main suspicions are that either primenet being down triggered a bug and/or that me running other demanding proggies ( multithreaded chess for example) triggers some other insanity. The latest misery happened in XP64, and was in fact stuck in multiple SUMOUTs that seemed to recover only just before(!) or after I changed file permissions to all of those file to a permissive 666 (to those unfamiliar with Unix, I am not a satanist). Anyway, I can't see how to help debug this ultra-random and rare behaviour, which cost me months of lost processing time, but my real beef is in the next paragraph.

Even working with the standard "two backup files", the "recovery" code in prime95 is doing more damage than good: as soon as an error is detected, it pauses, uses a backup and, if the error repeats, it discards the backup. So, 2 or 3 consecutive errors and both backup files are gone, not just all the work lost but no possibility for a postmortem anymore too. This behaviour is whack! Also, going through the documentation, I am not so sure there is any manual recovery information out there, say if I happened to undelete the q3465785 file, would it be enough to put in the work directory, or should it be renamed to p3465785 or what.

I am nearly devastated by all this, I had no idea prime95 has been stubbing me in the back for so long, my account report shows I have submitted only one exponent after months of multi-computer, multi-processor work. So, fellow mersenners, backup, backup, backup! And someone have a look at the recovery process.
abstractius is offline   Reply With Quote
Old 2007-12-16, 13:15   #2
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

22·2,089 Posts
Default

From undoc.txt:

You can have the program generate save files every n iterations. The files will have a .XXX extension where XXX equals the current iteration divided by n. In prime.ini enter:

InterimFiles=n
Xyzzy is offline   Reply With Quote
Old 2007-12-17, 21:34   #3
abstractius
 
Dec 2007

416 Posts
Default OK

I have activated the interim files, still not sure if they are used for recovery automagically or I will have to do some manipulation. Anyway, prime95 bombed out all 4 threads again with SUMOUTs, so the permissions on the files is not what is breaking it. I still think other high maintenance processses may be breaking it, one suspect is Google Desktop on my system, has to deep-index quite a ton of documents.
abstractius is offline   Reply With Quote
Old 2007-12-18, 00:25   #4
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

22·2,089 Posts
Default

From undoc.txt:

PauseWhileRunning=prog1,prog2,prog3,etc

Note that prime95 will pause if the program name matches any part of the running program's file name. That is "foobar" will match "c:\foobar.exe", "C:\FOOBAR\name.exe", and even "C:\myfoobarprog.exe". By default, prime95 will check the list of running programs every 64 iterations, but not more frequently than every 10 seconds. You can adjust the time period with this prime.ini setting:

PauseCheckInterval=n

where n is the number of seconds between checking which programs are running.
Xyzzy is offline   Reply With Quote
Old 2007-12-18, 02:31   #5
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

170018 Posts
Default

That is a nasty looking results file. I suspect there are some serious hardware issues going on. If true, LL testinf is likely a waste of your CPU resources.

Last fiddled with by Prime95 on 2007-12-18 at 02:31
Prime95 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 suspect credit for extended assignment arrangements broken? snme2pm1 PrimeNet 6 2014-04-12 03:53
N-1 Broken? wblipp FactorDB 4 2012-11-30 22:54
Something broken? schickel Forum Feedback 9 2011-08-16 04:43
Server machine Aug. 20th crash and backups kar_bon No Prime Left Behind 29 2010-08-30 03:26
broken 1.6A PageFault Hardware 5 2003-10-16 01:40

All times are UTC. The time now is 23:01.


Thu Dec 2 23:01:32 UTC 2021 up 132 days, 17:30, 1 user, load averages: 1.12, 1.24, 1.21

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.