![]() |
There is a 2.03 Stable version and a (better, but still under work) 2.04 Beta version, both on the [URL="https://sourceforge.net/projects/cudalucas/files"]sourceforge[/URL] page. I personally use the Beta right now. There is no difference in math, just in "cosmetic" things, the Beta has some "improvements" which are partially working, partially are still worked on..:smile:
|
CUDALucas 2.03
I accidently hit Ctrl-C twice on the CudaLUCAS window, as the result, when I continued the test I got the following message: “The checkpoint doesn’t match current test. Current test will be restarted.” This is bad. It shouldn’t be that easy to lose all the work; especially when some people may be accustomed hitting Ctrl-C twice on an mfaktc window to exit immediately. |
[QUOTE=TObject;305680]CUDALucas 2.03
I accidently hit Ctrl-C twice on the CudaLUCAS window, as the result, when I continued the test I got the following message: “The checkpoint doesn’t match current test. Current test will be restarted.” This is bad. It shouldn’t be that easy to lose all the work; especially when some people may be accustomed hitting Ctrl-C twice on an mfaktc window to exit immediately.[/QUOTE] One of the problems (and changes in 2.04) is that the message could mean a variety of things. It could be the meta-data was corrupted, that the exponents didn't match, or (most likely, I think) that the main data was corrupted. The ^C doesn't do anything itself except set a global quitting variable, which is in turn checked every iteration. A double ^C thus should not have had any effect, except perhaps printing the quitting message twice. The only possible thing I could think of is that perhaps the second ^C was called while one of the various fwrite() calls was being executed, and that somehow that caused a corruption somewhere. I'll defer to more experienced programmers in that matter. FWIW, I couldn't replicate in 2.04 Beta. [code]Iteration 7680000 M( 26661529 )C, 0x6a13e9d50b44c72e, n = 1440K, CUDALucas v2.04 Beta err = 0.1523 (1:48 real, 5.4042 ms/iter, ETA 28:29:30) ^C SIGINT caught, writing checkpoint. Estimated time spent so far: 11:44:11 bill@Gravemind:~/CUDALucas∰∂ ^C bill@Gravemind:~/CUDALucas∰∂ CUDALucas Continuing work from a partial result of M26661529 fft length = 1440K iteration = 7689302 ^C^C SIGINT caught, writing checkpoint. SIGINT caught, writing checkpoint. Estimated time spent so far: 11:44:11 bill@Gravemind:~/CUDALucas∰∂ CUDALucas Continuing work from a partial result of M26661529 fft length = 1440K iteration = 7689345 Iteration 7700000 M( 26661529 )C, 0x5e6f65ddfa011c0a, n = 1440K, CUDALucas v2.04 Beta err = 0.1406 (0:59 real, 2.9549 ms/iter, ETA 15:33:44) Iteration 7720000 M( 26661529 )C, 0x572d5b0fd4b87e69, n = 1440K, CUDALucas v2.04 Beta err = 0.1523 (1:52 real, 5.5877 ms/iter, ETA 29:23:50) Iteration 7740000 M( 26661529 )C, 0xa9f5f7180a3fd8c2, n = 1440K, CUDALucas v2.04 Beta err = 0.1543 (1:50 real, 5.4870 ms/iter, ETA 28:50:14) Iteration 7760000 M( 26661529 )C, 0x65353774d697b137, n = 1440K, CUDALucas v2.04 Beta err = 0.1453 (1:49 real, 5.4559 ms/iter, ETA 28:38:36) Iteration 7780000 M( 26661529 )C, 0x474870feb62f6ea0, n = 1440K, CUDALucas v2.04 Beta err = 0.1504 (1:49 real, 5.4690 ms/iter, ETA 28:40:55) Iteration 7800000 M( 26661529 )C, 0x00e7204a64ae247d, n = 1440K, CUDALucas v2.04 Beta err = 0.1484 (1:50 real, 5.4655 ms/iter, ETA 28:37:59) ^C^C SIGINT caught, writing checkpoint. SIGINT caught, writing checkpoint. Estimated time spent so far: 11:55:57 bill@Gravemind:~/CUDALucas∰∂ CUDALucas Continuing work from a partial result of M26661529 fft length = 1440K iteration = 7817985[/code] |
Thank you for the explanation. Hopefully 2.04 already fixed it. With 2.03 I can reliably duplicate the issue: every time I press Ctrl-C twice in quick succession, this error pops up on the next start.
|
1 Attachment(s)
I upgraded to the beta version CUDALucas-2.04 Beta-4.1-sm_21-x64.exe and I confirm that the error I reported in the post [url=http://www.mersenneforum.org/showpost.php?p=305680&postcount=1498]1498[/url] has been fixed.
Thank you. Edit: I spoke too soon. The error is still there, although it took a few tries to replicate it with 2.04. |
The error message in 2.04 is “The checkpoint appears to be corrupt. Current test will be restarted.”
A few thoughts: a) Obviously, it would be nice if the corruption did not happen in the first place. b) Do not overwrite the backup save file until it is determined that the main file is in good shape. c) Instead of restarting the test from the beginning attempt to restart it from the backup save file. d) Consider asking “Do you want to restart the test?” rather than restarting automatically. Some people may have ability to restore save files from backup or restore points; so they would answer no, and go looking for a good version of the save file before newer versions are piled on top. These are just friendly suggestions. Not complaints. Thank you for your hard work on the application. |
Hmm... I've no idea what might be causing it. Any experts want to weigh in?
a) Well yeah :smile: b) You mean when writing checkpoints? c) Already on the todo list for 2.05 d) Anyone with that ability can restore the checkpoint regardless of whether or not the test is restarted, just delete/overwrite the new/short/restarted checkpoint. |
[QUOTE=Dubslow;305696]
b) You mean when writing checkpoints? [/QUOTE] Every time the backup save file is overwritten, if it does not cause too much of the performance hit. The idea is to have something to prevent a good backup save file from being overwritten by a corrupted save file. |
[QUOTE=TObject;305699]Every time the backup save file is overwritten, if it does not cause too much of the performance hit. The idea is to have something to prevent a good backup save file from being overwritten by a corrupted save file.[/QUOTE]
Here's the current pseudo-[URL="http://sourceforge.net/p/cudalucas/code/37/tree/trunk/CUDALucas.cu?force=True"]code[/URL] (Ctrl+F "write_checkpoint" if you're curious): [code]delete t-checkpoint move c-ckp to t-ckp write current data to c-ckp[/code] What about that would you change? |
I just got CL up and running by itself on the GTX 460. With "SaveAllCheckpoints=1" set, I already have 143 saved checkpoiints in 3h27m. Obviously, I need to slow the write timing way down.
I say this to ask, do you have this set? Did the error wreck all checkpoints? Does it happen before more than one is created? Sorry if this has already been addressed. I don't see it in the previous posts (or didn't understand.) I agree that the error should not be happening. But there is a multi-backup function available which would greatly mitigate its effects until the coding gods figure out what's going on.:smile: |
[QUOTE=kladner;305707]until the coding gods figure out what's going on.:smile:[/QUOTE]
I for one welcome any advice they might have. :razz: Yes though, as kladner points out, SaveAllCheckpoints is a decent workaround. |
| All times are UTC. The time now is 23:15. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.