![]() |
I was factoring an exponent from 73 to 74 bits when the computer completely froze, requiring a hard reboot. After starting up mfaktc again, I noticed that the factoring had started over, suggesting that the save file had been corrupted. I thought about uploading a copy of it here in case someone wanted to investigate the issue, but the file had already been overwritten. Just putting this out there.
On the subject of which, maybe it would be a good idea to give mfaktc the ability to create backup save files like Prime95 does? |
Hi,
[QUOTE=ixfd64;431080]I was factoring an exponent from 73 to 74 bits when the computer completely froze, requiring a hard reboot. After starting up mfaktc again, I noticed that the factoring had started over, suggesting that the save file had been corrupted. I thought about uploading a copy of it here in case someone wanted to investigate the issue, but the file had already been overwritten. Just putting this out there. On the subject of which, maybe it would be a good idea to give mfaktc the ability to create backup save files like Prime95 does?[/QUOTE] Short answer: No! Longer answer: No! The checkpoints are most likely a atomic write to the filesystem (actually fopen(), a single fprintf() (less than 512 bytes) and a fclose()). Because the fprintf() is atomic it is very unlikely that this will yield a corrupted checkpoint. It could be an empty checkpoint file but that isn't very likely, too. If such things happens I would fix the computer before doing anything useful. Maybe there was just no checkpoint because prior the system lockup there wasn't much work done on that step? Oliver |
[QUOTE=TheJudger;431306]The checkpoints are most likely a atomic write to the filesystem (actually fopen(), a single fprintf() (less than 512 bytes) and a fclose()). Because the fprintf() is atomic it is very unlikely that this will yield a corrupted checkpoint. It could be an empty checkpoint file but that isn't very likely, too.[/QUOTE]
That doesn't sound atomic. If something goes wrong between fopen and fprintf, or more likely, if the OS hasn't actually propagated the write from memory to disk even after fopen->fprintf->fclose has completed, you'll end up with empty checkpoint. It is rare, but can happen, even if everything is working exactly as expected. Hence the advantage of multiple checkpoint files -- even if one fails, the other one(s) will be there, and loss of work can be minimized. |
[QUOTE=TheJudger;431306]atomic write to the filesystem (actually fopen(), a single fprintf() (less than 512 bytes) and a fclose())[/QUOTE][QUOTE=axn;431312]That doesn't sound atomic.[/QUOTE]A relatively minor change of writing to [i]Mxxxxx.tmp[/i] and then renaming over [i]Mxxxxx.ckp[/i] after the write is complete would atomize it. You'll either have the new checkpoint, or in worst case if a crash happens during the checkpoint-write process you'll have the previous checkpoint and a temp file (which may or may not be correctly written).
|
This is not rocket science, lets keep it simple. One [B]could[/B] argue that if you can't write a simple checkpoint reliable on your machine you won't trust the main calculation, too.
Oliver |
[QUOTE=TheJudger;431306]Maybe there was just no checkpoint because prior the system lockup there wasn't much work done on that step?[/QUOTE]
When I started up mfaktc before the crash, the assignment was already about 37% done, so I'm pretty sure there was a checkpoint. On the subject of which, is there any way to tell mfaktc to start at a certain class other than by hacking the save file? |
[QUOTE=ixfd64;431328]On the subject of which, is there any way to tell mfaktc to start at a certain class other than by hacking the save file?[/QUOTE]
Expect hacking the code? No (for obvious reasons). Oliver |
Windows 7 had a nice feature, called “Previous Versions” (Windows 8 and later have it replaced with something called “File History” which is not as good).
You just right click on a file and you can see or restore a previous version. This functionality is also usually available with NAS configured to make automatic snapshots. Or, at the very least, checkpoint files can be restored from daily backups, manually. Last time I checked, mfaktc was open source. The encryption routine is just a few lines long. Modifying checkpoint files is useful when one wants to split a bit level among several GPUs. |
Hi,
[QUOTE=TObject;431343]Last time I checked, mfaktc was open source. The encryption routine is just a few lines long. Modifying checkpoint files is useful when one wants to split a bit level among several GPUs.[/QUOTE] it is [B]not encryption[/B], it is just a [B]checksum[/B]. Is there any good reason why you would split a single assignment through multiple GPUs on a regular basis? I'm afraid this discussion leads to a "howto forge false results" even if not intended by you. Oliver |
We are with Oliver here.
We used mfaktc for years and never had problems with checkpoint files. [edit: we do checkpoint every 30 minutes, or so] Also, if really needed, for assignment that would take ages, splitting one expo over many cards is no problem, one simple pari or perl script can create the checkpoint file to start with some predetermined class. [edit: you still have to watch them to know when to stop each of them, except the last who stops by itself after the last class] |
0.21 for a Mac
Has anyone compiled mfaktc 0.21 for a Mac?
|
| All times are UTC. The time now is 23:12. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.