mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
Thread Tools
Old 2016-04-09, 02:11   #2586
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

74 Posts
Default

I was factoring an exponent from 73 to 74 bits when the computer completely froze, requiring a hard reboot. After starting up mfaktc again, I noticed that the factoring had started over, suggesting that the save file had been corrupted. I thought about uploading a copy of it here in case someone wanted to investigate the issue, but the file had already been overwritten. Just putting this out there.

On the subject of which, maybe it would be a good idea to give mfaktc the ability to create backup save files like Prime95 does?

Last fiddled with by ixfd64 on 2016-04-09 at 02:33
ixfd64 is online now   Reply With Quote
Old 2016-04-11, 12:24   #2587
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11×101 Posts
Default

Hi,

Quote:
Originally Posted by ixfd64 View Post
I was factoring an exponent from 73 to 74 bits when the computer completely froze, requiring a hard reboot. After starting up mfaktc again, I noticed that the factoring had started over, suggesting that the save file had been corrupted. I thought about uploading a copy of it here in case someone wanted to investigate the issue, but the file had already been overwritten. Just putting this out there.

On the subject of which, maybe it would be a good idea to give mfaktc the ability to create backup save files like Prime95 does?
Short answer: No!

Longer answer: No! The checkpoints are most likely a atomic write to the filesystem (actually fopen(), a single fprintf() (less than 512 bytes) and a fclose()). Because the fprintf() is atomic it is very unlikely that this will yield a corrupted checkpoint. It could be an empty checkpoint file but that isn't very likely, too. If such things happens I would fix the computer before doing anything useful.

Maybe there was just no checkpoint because prior the system lockup there wasn't much work done on that step?

Oliver
TheJudger is offline   Reply With Quote
Old 2016-04-11, 13:59   #2588
axn
 
axn's Avatar
 
Jun 2003

508710 Posts
Default

Quote:
Originally Posted by TheJudger View Post
The checkpoints are most likely a atomic write to the filesystem (actually fopen(), a single fprintf() (less than 512 bytes) and a fclose()). Because the fprintf() is atomic it is very unlikely that this will yield a corrupted checkpoint. It could be an empty checkpoint file but that isn't very likely, too.
That doesn't sound atomic. If something goes wrong between fopen and fprintf, or more likely, if the OS hasn't actually propagated the write from memory to disk even after fopen->fprintf->fclose has completed, you'll end up with empty checkpoint. It is rare, but can happen, even if everything is working exactly as expected. Hence the advantage of multiple checkpoint files -- even if one fails, the other one(s) will be there, and loss of work can be minimized.
axn is offline   Reply With Quote
Old 2016-04-11, 15:15   #2589
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

D6316 Posts
Default

Quote:
Originally Posted by TheJudger View Post
atomic write to the filesystem (actually fopen(), a single fprintf() (less than 512 bytes) and a fclose())
Quote:
Originally Posted by axn View Post
That doesn't sound atomic.
A relatively minor change of writing to Mxxxxx.tmp and then renaming over Mxxxxx.ckp after the write is complete would atomize it. You'll either have the new checkpoint, or in worst case if a crash happens during the checkpoint-write process you'll have the previous checkpoint and a temp file (which may or may not be correctly written).
James Heinrich is offline   Reply With Quote
Old 2016-04-11, 16:07   #2590
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

45716 Posts
Default

This is not rocket science, lets keep it simple. One could argue that if you can't write a simple checkpoint reliable on your machine you won't trust the main calculation, too.

Oliver
TheJudger is offline   Reply With Quote
Old 2016-04-11, 17:12   #2591
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

74 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Maybe there was just no checkpoint because prior the system lockup there wasn't much work done on that step?
When I started up mfaktc before the crash, the assignment was already about 37% done, so I'm pretty sure there was a checkpoint.

On the subject of which, is there any way to tell mfaktc to start at a certain class other than by hacking the save file?

Last fiddled with by ixfd64 on 2016-04-11 at 17:13
ixfd64 is online now   Reply With Quote
Old 2016-04-11, 21:04   #2592
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

111110 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
On the subject of which, is there any way to tell mfaktc to start at a certain class other than by hacking the save file?
Expect hacking the code? No (for obvious reasons).

Oliver
TheJudger is offline   Reply With Quote
Old 2016-04-11, 21:33   #2593
TObject
 
TObject's Avatar
 
Feb 2012

34·5 Posts
Default

Windows 7 had a nice feature, called “Previous Versions” (Windows 8 and later have it replaced with something called “File History” which is not as good).

You just right click on a file and you can see or restore a previous version.

This functionality is also usually available with NAS configured to make automatic snapshots.
Or, at the very least, checkpoint files can be restored from daily backups, manually.

Last time I checked, mfaktc was open source. The encryption routine is just a few lines long. Modifying checkpoint files is useful when one wants to split a bit level among several GPUs.
TObject is offline   Reply With Quote
Old 2016-04-12, 09:44   #2594
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11×101 Posts
Default

Hi,

Quote:
Originally Posted by TObject View Post
Last time I checked, mfaktc was open source. The encryption routine is just a few lines long. Modifying checkpoint files is useful when one wants to split a bit level among several GPUs.
it is not encryption, it is just a checksum.
Is there any good reason why you would split a single assignment through multiple GPUs on a regular basis?
I'm afraid this discussion leads to a "howto forge false results" even if not intended by you.

Oliver
TheJudger is offline   Reply With Quote
Old 2016-04-12, 13:42   #2595
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

26·151 Posts
Default

We are with Oliver here.
We used mfaktc for years and never had problems with checkpoint files. [edit: we do checkpoint every 30 minutes, or so]

Also, if really needed, for assignment that would take ages, splitting one expo over many cards is no problem, one simple pari or perl script can create the checkpoint file to start with some predetermined class. [edit: you still have to watch them to know when to stop each of them, except the last who stops by itself after the last class]

Last fiddled with by LaurV on 2016-04-12 at 13:44
LaurV is offline   Reply With Quote
Old 2016-06-07, 10:27   #2596
bayanne
 
bayanne's Avatar
 
"Tony Gott"
Aug 2002
Yell, Shetland, UK

22×83 Posts
Default 0.21 for a Mac

Has anyone compiled mfaktc 0.21 for a Mac?
bayanne is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 23:29.


Fri Aug 6 23:29:29 UTC 2021 up 14 days, 17:58, 1 user, load averages: 3.93, 3.86, 3.95

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.