mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-11-26, 14:51   #2025
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

Can it be detected before it errors out and stops?

Possible fix after detection: Stop workers, move back to last save and continue? It has happened for a long time, but I never knew if it was my system or not
and with all the other issues I just restarted every time it stopped.

I have tested on enough systems to know that the systems are not problem.

Now, CUDALucas seems stable enough for a full run in beta testing, but I keep having to restart after the error. It ends up wasting a lot of time and prevents me from knowing if it will compete a good run all the way through.
flashjh is offline   Reply With Quote
Old 2013-11-26, 15:08   #2026
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

32·5·7 Posts
Default

Yes, when detecting the error it doesn't have to exit the program. It could reset the device and restart from the last checkpoint. I tried to do this way back in February, but then my cuda skills were only a small part of the meager set I have now, and I couldn't get it to work. Maybe its time to retry that.

In the meantime, I just run from a shell script that loops on a non zero exit value.
owftheevil is offline   Reply With Quote
Old 2013-11-26, 15:25   #2027
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

100011000112 Posts
Default

Good idea, I'll setup one of those tonight.

Did you read specific books or just learn it?
flashjh is offline   Reply With Quote
Old 2013-11-26, 15:29   #2028
owftheevil
 
owftheevil's Avatar
 
"Carl Darby"
Oct 2012
Spring Mountains, Nevada

4738 Posts
Default

I learned much of it by studying the cudalucas code, the rest by reading online articles and forums, and playing around a lot.

I'm not a programmer, just a mathematician with a complier and internet access.

Last fiddled with by owftheevil on 2013-11-26 at 15:30
owftheevil is offline   Reply With Quote
Old 2013-11-26, 20:26   #2029
Manpowre
 
"Svein Johansen"
May 2013
Norway

3·67 Posts
Default

I also run a shell script.. works just fine.
Manpowre is offline   Reply With Quote
Old 2013-11-27, 10:12   #2030
Manpowre
 
"Svein Johansen"
May 2013
Norway

3×67 Posts
Default

Quote:
Originally Posted by Manpowre View Post
I also run a shell script.. works just fine.
On Windows:
make a .bat file in the same folder as your cudalucas .exe file. Then put the following inside it:
:loop
echo "Starting Cudalucas:"
CudaLucas.exe -d 1
GOTO loop

This will ensure cudalucas starts again if it crashes.
Manpowre is offline   Reply With Quote
Old 2013-11-27, 12:25   #2031
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

I did get one working a little more complicated for the command line because I want to increase a counter to keep track of how many times it restarts. Thanks.

Last fiddled with by flashjh on 2013-11-27 at 12:26
flashjh is offline   Reply With Quote
Old 2013-11-27, 14:10   #2032
Manpowre
 
"Svein Johansen"
May 2013
Norway

20110 Posts
Default

Quote:
Originally Posted by flashjh View Post
I did get one working a little more complicated for the command line because I want to increase a counter to keep track of how many times it restarts. Thanks.
No problem, this will have number of restarts to the log.txt file aswell as to the console.

Set count=0
:loop
Set /A count+=1
echo "Starting Cudalucas: "
echo %count% > log.txt
echo %count%
CudaLucas2.03.exe -d 1
GOTO loop
Manpowre is offline   Reply With Quote
Old 2013-11-27, 19:19   #2033
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

What is the current format recognized for CUDALucas results?

EDIT:

For example, with this is the worktodo:
Test=10061
Test=10061
DoubleCheck=10061
DoubleCheck=10061

I get this,
Code:
M10061, 0x56eb9bb91825b188, offset = 9029, n = 1K, CUDALucas v2.05 Beta
M10061, 0x56eb9bb91825b188, offset = 9029, n = 1K, CUDALucas v2.05 Beta
M10061, 0x56eb9bb91825b188, offset = 4000, n = 1K, CUDALucas v2.05 Beta
M10061, 0x56eb9bb91825b188, offset = 4000, n = 1K, CUDALucas v2.05 Beta
then again:
Code:
M10061, 0x56eb9bb91825b188, offset = 4052, n = 1K, CUDALucas v2.05 Beta
M10061, 0x56eb9bb91825b188, offset = 4054, n = 1K, CUDALucas v2.05 Beta
M10061, 0x56eb9bb91825b188, offset = 4054, n = 1K, CUDALucas v2.05 Beta
M10061, 0x56eb9bb91825b188, offset = 9086, n = 1K, CUDALucas v2.05 Beta
Also, I did a DC on another exponent and after many stops due to the memory error, restarts, troubleshooting, recompiles, and FFT length changes (up & down) the result is a match. I would like to see DoubleCheck results from others. As it looks like things are stable now, can we move to allow DoubleChecks from CUDALuas now?

Last fiddled with by flashjh on 2013-11-27 at 19:28
flashjh is offline   Reply With Quote
Old 2013-11-28, 03:48   #2034
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

26×151 Posts
Default

Quote:
Originally Posted by flashjh View Post
can we move to allow DoubleChecks from CUDALuas now?
They are allowed for ages, since 1.48 (the first stable one), few years ago. How do you think we made that huge credits? I think you may be missing a couple of parenthesis from the report, which might confuse James' script on PrimeNet Server. I have to go home to make sure (no reports here at job), but someone else may confirm meantime.

Edit: sorry, let me be stupid few minutes each day... No coffee yet, this morning. I thought you are trying to send a report and the server refuse it. After more reading and trying to understand, I think you were talking about new feature implementing the "shifting", weren't you? Well.. I didn't move to 2.05 yet, as the 2.04 works better and a bit faster with cc 2.0 cards. Beside of "shifts", any reasons to switch?

Edit 2: some simple mechanism to protect against fraud is still missing, I would vote against accepting "first-time LL" and "DC" from cudaLucas, for the same exponent. What stops me to edit the "offset" parameter, to get the credit two times? You will find after 20 years that we missed a prime because some idiot credit-whore (I learned the word here on the forum, as someone called it, sorry). At least, with P95 is not so easy for childish individuals to fake a report, due to the we1 checksum, etc. Some simple security mechanism should be implemented, beside of shifting, to make it safer. Don't get me wrong, no disrespect for your work, shifting is an immense improvement to guard against software (FFT bugs), for which I am very grateful.

Edit 3: (BTW, after updating the drivers, I am also getting negative iteration times and negative ETA's too, which are very accurate if you multiply them with (about) minus 28 (!?!??!), and consider them in minutes, not in hours , using the "old good version" 2.04, untouched since Dubslow made it. But the residues are right, and it is about 1% faster, so I let it be).

Last fiddled with by LaurV on 2013-11-28 at 04:06
LaurV is offline   Reply With Quote
Old 2013-11-28, 03:50   #2035
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

That's why I was asking about the correct format. I don't want to change the result line, but I can update CUDALucas to output the correct format.
flashjh is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 23:29.


Fri Aug 6 23:29:48 UTC 2021 up 14 days, 17:58, 1 user, load averages: 3.72, 3.82, 3.94

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.