![]() |
[QUOTE=kladner;305707] Obviously, I need to slow the write timing way down.[/QUOTE]
Use the -c switch with a higher figure (I use 100k, 400k, etc). Writing on files goes together with writing on screen. About the ctrl-C problem, it is not a bug, but how windows works. Ctrl-C was the "break" command in the old DOS times, used to forcibly terminate tasks. Double Ctrl-C (i.e. pressing break during the break interrupt is served) is the "CTRL-Break", which is the same as aborting the process from task-manager. If this occurs when CL writes the file (high possibility! files are big and are written immediately when you press the first ctrl-c), then the file is gone. You have to delete the cxxxxxx file and keep only the txxxxxx and resume from one step behind. You must be really unlucky to destroy both files, but this is still possible, without special precautions in the software. Don't double press ctrl-c. Press it once and be patient. :smile: And use backups, as klander said. Disasters happen... better be protected. |
[QUOTE=Dubslow;305702]Here's the current pseudo-[URL="http://sourceforge.net/p/cudalucas/code/37/tree/trunk/CUDALucas.cu?force=True"]code[/URL] (Ctrl+F "write_checkpoint" if you're curious):
[code]delete t-checkpoint move c-ckp to t-ckp write current data to c-ckp[/code] What about that would you change?[/QUOTE] [code]if IsValidSaveFile(c-ckp) then begin delete t-checkpoint move c-ckp to t-ckp write current data to c-ckp end else deal with it...[/code] |
So check to see if the last checkpoint is good? Why should it do that? That wouldn't help this problem in any way... (I don't think...)
|
Well, right now the t-file is useless since it is being overwritten by the corrupted c-file.
|
[QUOTE=TObject;305713]Well, right now the t-file is useless since it is being overwritten by the corrupted c-file.[/QUOTE]
I think you misunderstand... Let's think this way: A is the data to be written for the quit. B is the last checkpoint, and C is the checkpoint before that, which starts as t-ckp. First, t-ckp is deleted, so the C data is gone. Then, the B data is moved from c-ckp to t-ckp. Then, the current/A data is written to c-ckp. Thus, even if the fwrite()s get interrupted as LaurV described, the B data can still be located in the t-ckp. I think what would be the better feature, is, as we've talked about, try the backup t-ckp (which would contain the B data) before aborting due to corruption (of the A data). If you guys think that's critical, I can wedge it into a second Beta release once the file locking issue is fixed. (It should be fairly easy to mess with the logic, not prone to bug introduction.) |
It is not critical. Getting rid of the mfaktc-formed habit of hitting Ctrl-C twice to stop the test eliminates the issue.
I have daily backups, so I loose 24 hours worth of work at most. I can also use kladner's suggestion. |
[QUOTE=Dubslow;305715]I think you misunderstand...[/QUOTE]
It does not really matter if we understand each other on how exactly it happens. All I am saying, if checking for corrupted data can happen relatively fast, it would be a good idea to do that before erasing the last good t-ckp. |
[QUOTE=Dubslow;305708]I for one welcome any advice they might have. :razz:
[/QUOTE] Is CL re-inserting the signal handler for ^C once it has been hit? Windows has the nasty habit to discard the signal handler when it was invoked once. Therefore, the first thing mfaktc's signal handler does on Windows is to re-register itself. In addition, to make the checkpoint-writing signal-proof, you'd need to enclose each write in a loop and keep trying until the desired number of bytes is written. Or, disable signal-delivery while writing a checkpoint ... hmmm, not sure if windows knows about sigprocmask. |
[QUOTE=kladner;305707]But there is a multi-backup function available.....[/QUOTE]
I just experienced the error you report, TObject. I believe it happened as part of a BSOD, which in turn I think resulted from near-brownout conditions combined with a PSU running near its limit. In any case, I got the same message about corrupted check files. I then realized that I didn't exactly know how to use the backups. With a little study I decided that the obvious thing was to deleted the tiny corrupt files and rename the last save file from s##### to c##### and delete the period and everything after it. This seems to have restored the run to a place just a few minutes before the crash. |
[QUOTE=Bdot;305826]In addition, to make the checkpoint-writing signal-proof, you'd need to enclose each write in a loop and keep trying until the desired number of bytes is written.
Or, disable signal-delivery while writing a checkpoint ... hmmm, not sure if windows knows about sigprocmask.[/QUOTE] Yech. Too darn hard, too much work for such a stupid problem. [QUOTE=Bdot;305826]Is CL re-inserting the signal handler for ^C once it has been hit? Windows has the nasty habit to discard the signal handler when it was invoked once. Therefore, the first thing mfaktc's signal handler does on Windows is to re-register itself.[/quote]Discards the signal handler? Why in the... so the first line of code in the handler should be to install itself again? Hmm... stupid WinBloze. I'll add it in. How's the file (un)locking coming along? |
[QUOTE=Dubslow;305869]
How's the file (un)locking coming along?[/QUOTE] Oops, I did not recognize you're waiting for me ... As the only NV card where I can try CL is in my workstation at work, I could not investigate any further. At the moment there is really no time at all. Sorry for that. |
| All times are UTC. The time now is 23:15. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.