![]() |
It is not your batch file. I found this behavior too, when the exponent is finished. It did not bothered me, beside of the thing that the message "could not find a file to resume from" is written at the end (after the testing is done), which I already mention. I think there is a mis-synchronization between CPU and GPU, willing to do the things faster, the last one does not wait for the former to finish its part first (printing, returning the exit codes) then the things go wild. It does not affect the computing, just the cosmetic. For me it was a bit bothering that I have to edit the batch file (to eliminate the first row) and restart, every time one exponent was finished. Right now I am LL-ing at the LL front, so it will take about 5 days for an expo. I can live doing that editing every 5 days.
|
[QUOTE=LaurV;289526]It is not your batch file. I found this behavior too, when the exponent is finished. It did not bothered me, beside of the thing that the message "could not find a file to resume from" is written at the end (after the testing is done), which I already mention. I think there is a mis-synchronization between CPU and GPU, willing to do the things faster, the last one does not wait for the former to finish its part first (printing, returning the exit codes) then the things go wild. It does not affect the computing, just the cosmetic. For me it was a bit bothering that I have to edit the batch file (to eliminate the first row) and restart, every time one exponent was finished. Right now I am LL-ing at the LL front, so it will take about 5 days for an expo. I can live doing that editing every 5 days.[/QUOTE]
I will have to switch to LL for the same reason. Right now DCs finish every 24 hours, so I lose a lot of time when I can't be there to get it started again. Thanks for the info. |
[QUOTE=flashjh;289525]I just finished another exponent with 1.50 and it force closed again. Maybe it's my batch file?[/QUOTE]
I still think this is related to a problem in the code. I reviewed the 1.50 source just posted, and I believer we need to add a line with *infp = NULL; after the each of the fclose(*infp) calls @ line 1525, 1587, and possibly1494 of rw.cu. Since I'm not running or compiling the code currently I can't be sure, but it was needed before (see the code in [url]http://mersenneforum.org/showpost.php?p=251358&postcount=405[/url]). If this isn't the correct fix in the current code it's still a place to start looking - a file was being closed but since the file pointer wasn't being set to NULL other code was trying to use it. Chaos resulted. If I remember it didn't always happen - either it happened when restarting from a checkpoint but not from scratch or vice versa. Wish I could remember which one was the problem, but it's been a while. If someone can save a checkpoint file that's nearly finished it would make this easier to test. |
1 Attachment(s)
[QUOTE=kjaget;289576]I still think this is related to a problem in the code. I reviewed the 1.50 source just posted, and I believer we need to add a line with
*infp = NULL; after the each of the fclose(*infp) calls @ line 1525, 1587, and possibly1494 of rw.cu.[/QUOTE] I feel addressed... Find attached a zip containing experimental exe and modified rw.cu. I have no time testing... :-( |
[QUOTE=Brain;289600]I feel addressed...
Find attached a zip containing experimental exe and modified rw.cu. I have no time testing... :-([/QUOTE] I'll test tonight, can you also compile a 4.1/2.1. Thanks. |
[QUOTE=kjaget;289576]
If someone can save a checkpoint file that's nearly finished it would make this easier to test.[/QUOTE] I can save a file. Who would like a copy? |
1 Attachment(s)
[QUOTE=flashjh;289608]I'll test tonight, can you also compile a 4.1/2.1. Thanks.[/QUOTE]
Here it is. |
My invoking cudalucas scripts:
Start with a file 'exp' (list of candidates to check): [CODE]25431071 27961807 25019459[/CODE] Invoking command to 'kick' things off: [CODE]tail -n 1 -F exp | ./runme.sh[/CODE] All that does is do a permanent 'tail' on the exponents list. You can add additional exponents like this: [CODE]echo 25888663 >> exp[/CODE] You can do this even while running a previous exponent or even if the last exponent has stopped. With the capital 'F' option, you can even delete the exp file if it gets too cumbersome. Tail will re-read the new file as long as it has the same name. runme.sh: [CODE]#!/bin/bash while read line do echo Starting M$line ./start.sh $line done [/CODE] Basically take an exponent from tail and kick off this script: [CODE]#!/bin/bash GPU=`cat limit.gpu` ./CUDALucas.cuda3.2.sm_13.WIN64.exe -c10000 -D$GPU -t $1 [/CODE] This is where you set the options. Note: There's also a file called limit.gpu which sets the gpu number. To be better extensible, you could do something like: [CODE]#!/bin/bash OPTIONS=`cat limit.options` ./CUDALucas.cuda3.2.sm_13.WIN64.exe $OPTIONS $1 [/CODE] This would completely abstract the options. So you can could change them for the next exponent. Saves stopping the process. Note: I haven't tested this. Pros: You can add exponents any time. No need to ctrl-c to redo a batch file or stop the code in anyway. It keeps on going as long as you have sufficient exponents in the 'exp file. Cons: A little complicated. Maybe hard to understand. Yes it's a hack - I didn't say it's pretty. I'm not a coder, so the coding style may offend :) (My apologies) I'm guessing it could work under linux. Under Windows it uses cygwin tools, so you need to install that. No warranty whatsoever. I'm sorry, I can't support this if it doesn't work for you. I provide this as information only. -- Craig |
I bet those who are familiar with cmd.exe can put something together for Windows that is similar, so that it doesn't necessitate cygwin. The only major drawback I see is that you'd spawn a child process for each script, though they would be idle and shouldn't affect performance in any way, but if you watch your process list carefully, it'll be annoying.
Perhaps an alternate runme.sh: [code] #!/bin/bash FLAGS='cat [insert options file here, relative to above directory]' while read line do echo "Starting M$line" ./CUDALucas.[blah blah insert proper name].exe $FLAGS $line done [/code] This file would be invoked with the same tail command (I don't know how that works, can't provide support there) [code]tail -n 1 -F exp | ./runme.sh[/code] And add exponents to the file as described above. Edit: When I tried that tail command with the example exp file, it printed the last line, not the first line. That means the first line exp will never get tested unless you let the exp file get really low. Also, how do you remove lines once a test is done? And now that I think about it, I don't understand how getting to the end of the while loop will cause the tail command to run again and input another exp. |
[QUOTE=Brain;289613]Here it is.[/QUOTE]
Works great for batch file, have to test further. It tested successfully: [CODE] M( 216091 )P, n = 524288, CUDALucas v1.50 M( 216103 )C, 0xd27223d7dbf3febf, n = 524288, CUDALucas v1.50 [/CODE] Thanks. |
CUDALucasWatchdog
[QUOTE=flashjh;289652]Works great for batch file, have to test further.
[/QUOTE] So the code change did have an effect? New topic: I'm currently developing a proof-of-concept CUDALucasWatchdog, just for curiousity. It is written in Java and does the following: 1) Check via "taskmgr.exe" / "ps -e" if CUDALucas is running. 2) If not, goto worktodo.txt (new) and grab top assignment line. 3) Check in mersarch.txt if this expo has ever been finished. 4a) If yes delete assigment line in worktodo and goto 2). 4b) If no launch CUDALucas via command line call the old-fashioned way. 5) Quit This program would have to be periodally executed by the user/system like the perl submit spider is working. I'm not sure yet if I will ever publish the code but liked to know if a Cygwin based solution or a Java solution would be preferred. I have no Pros and Cons yet. CUDALucasWatchdog would be designed for Win, Linux (and Mac) to run on. Example call: [CODE]java -jar CUDALucasWatchdog "F:\Computing\CUDALucas\CUDALucas.x.y.z.exe"[/CODE] |
| All times are UTC. The time now is 23:08. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.