mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

flashjh 2013-11-26 14:51

Can it be detected before it errors out and stops?

Possible fix after detection: Stop workers, move back to last save and continue? It has happened for a long time, but I never knew if it was my system or not
and with all the other issues I just restarted every time it stopped.

I have tested on enough systems to know that the systems are not problem.

Now, CUDALucas seems stable enough for a full run in beta testing, but I keep having to restart after the error. It ends up wasting a lot of time and prevents me from knowing if it will compete a good run all the way through.

owftheevil 2013-11-26 15:08

Yes, when detecting the error it doesn't have to exit the program. It could reset the device and restart from the last checkpoint. I tried to do this way back in February, but then my cuda skills were only a small part of the meager set I have now, and I couldn't get it to work. Maybe its time to retry that.

In the meantime, I just run from a shell script that loops on a non zero exit value.

flashjh 2013-11-26 15:25

Good idea, I'll setup one of those tonight.

Did you read specific books or just learn it?

owftheevil 2013-11-26 15:29

I learned much of it by studying the cudalucas code, the rest by reading online articles and forums, and playing around a lot.

I'm not a programmer, just a mathematician with a complier and internet access.

Manpowre 2013-11-26 20:26

I also run a shell script.. works just fine.

Manpowre 2013-11-27 10:12

[QUOTE=Manpowre;360365]I also run a shell script.. works just fine.[/QUOTE]

On Windows:
make a .bat file in the same folder as your cudalucas .exe file. Then put the following inside it:
:loop
echo "Starting Cudalucas:"
CudaLucas.exe -d 1
GOTO loop

This will ensure cudalucas starts again if it crashes.

flashjh 2013-11-27 12:25

I did get one working a little more complicated for the command line because I want to increase a counter to keep track of how many times it restarts. Thanks.

Manpowre 2013-11-27 14:10

[QUOTE=flashjh;360425]I did get one working a little more complicated for the command line because I want to increase a counter to keep track of how many times it restarts. Thanks.[/QUOTE]

No problem, this will have number of restarts to the log.txt file aswell as to the console.

Set count=0
:loop
Set /A count+=1
echo "Starting Cudalucas: "
echo %count% > log.txt
echo %count%
CudaLucas2.03.exe -d 1
GOTO loop

flashjh 2013-11-27 19:19

What is the current format recognized for CUDALucas results?

EDIT:

For example, with this is the worktodo:
Test=10061
Test=10061
DoubleCheck=10061
DoubleCheck=10061

I get this,
[CODE]M10061, 0x56eb9bb91825b188, offset = 9029, n = 1K, CUDALucas v2.05 Beta
M10061, 0x56eb9bb91825b188, offset = 9029, n = 1K, CUDALucas v2.05 Beta
M10061, 0x56eb9bb91825b188, offset = 4000, n = 1K, CUDALucas v2.05 Beta
M10061, 0x56eb9bb91825b188, offset = 4000, n = 1K, CUDALucas v2.05 Beta[/CODE]then again:
[CODE]M10061, 0x56eb9bb91825b188, offset = 4052, n = 1K, CUDALucas v2.05 Beta
M10061, 0x56eb9bb91825b188, offset = 4054, n = 1K, CUDALucas v2.05 Beta
M10061, 0x56eb9bb91825b188, offset = 4054, n = 1K, CUDALucas v2.05 Beta
M10061, 0x56eb9bb91825b188, offset = 9086, n = 1K, CUDALucas v2.05 Beta
[/CODE]Also, I did a DC on another exponent and after [I]many [/I]stops due to the memory error, restarts, troubleshooting, recompiles, and FFT length changes (up & down) the result is a match. I would like to see DoubleCheck results from others. As it looks like things are stable now, can we move to allow DoubleChecks from CUDALuas now?

LaurV 2013-11-28 03:48

[QUOTE=flashjh;360454] can we move to allow DoubleChecks from CUDALuas now?[/QUOTE]
:shock: They are allowed for ages, since 1.48 (the first stable one), few years ago. How do you think we made that huge credits? I think you may be missing a couple of parenthesis from the report, which might confuse James' script on PrimeNet Server. I have to go home to make sure (no reports here at job), but someone else may confirm meantime.

Edit: sorry, let me be stupid few minutes each day... No coffee yet, this morning. I thought you are trying to send a report and the server refuse it. After more reading and trying to understand, I think you were talking about new feature implementing the "shifting", weren't you? Well.. I didn't move to 2.05 yet, as the 2.04 works better and a bit faster with cc 2.0 cards. Beside of "shifts", any reasons to switch?

Edit 2: some simple mechanism to protect against fraud is still missing, I would[U] vote against[/U] accepting "first-time LL" [B][U]and[/U][/B] "DC" from cudaLucas, for the same exponent. What stops me to edit the "offset" parameter, to get the credit two times? You will find after 20 years that we missed a prime because some idiot credit-whore (I learned the word here on the forum, as someone called it, sorry). At least, with P95 is not so easy for childish individuals to fake a report, due to the we1 checksum, etc. Some simple security mechanism should be implemented, beside of shifting, to make it safer. Don't get me wrong, no disrespect for your work, shifting is an [B][U]immense[/U][/B] improvement to guard against software (FFT bugs), for which I am very grateful.

Edit 3: (BTW, after updating the drivers, I am also getting negative iteration times and negative ETA's too, which are very accurate if you multiply them with (about) minus 28 (!?!??!), and consider them in minutes, not in hours :smile:, using the "old good version" 2.04, untouched since Dubslow made it. But the residues are right, and it is about 1% faster, so I let it be).

flashjh 2013-11-28 03:50

That's why I was asking about the correct format. I don't want to change the result line, but I can update CUDALucas to output the correct format.


All times are UTC. The time now is 23:11.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.