![]() |
Just wondering since I haven't used gpuLucas... is that project set to replace CUDALucas or the other way around?
Is it feasable to combine the best of both into one as to not spend time on two separate projects with the same goal? Maybe aaronhaviland and msft can talk about it? |
[QUOTE=flashjh;291621]The -s switch in 1.63 does this[/QUOTE]
That is very good news. I will switch to it after this test is finished (about 10 hours to go). It may fill up the hdd, but is a gold mine when you debug mismatches.... |
[QUOTE=LaurV;291626]...but is a gold mine when you debug mismatches....[/QUOTE]
Completely agree! |
I think a lot more work needs to be done on both. As of yet, gpuLucas is not ready for production runs, which most versions of CUDALucas are. With the new speed in 1.63, msft may have closed the gap. There'll need to be extensive testing of both to determine which is better.
(Keep in mind the claims about gpuLucas being so much better were made when CUDALucas supported only power-of-2 FFTs; for those FFTs, gpuLucas was the same speed. Now that CUDALucas does support non-p-o-2 FFTs, and is much more mature, the programs are on even footing.) |
[QUOTE=Dubslow;291633]I think a lot more work needs to be done on both. As of yet, gpuLucas is not ready for production runs, which most versions of CUDALucas are. With the new speed in 1.63, msft may have closed the gap. There'll need to be extensive testing of both to determine which is better.
(Keep in mind the claims about gpuLucas being so much better were made when CUDALucas supported only power-of-2 FFTs; for those FFTs, gpuLucas was the same speed. Now that CUDALucas does support non-p-o-2 FFTs, and is much more mature, the programs are on even footing.)[/QUOTE] I see... that's good news. The gains of having two separate projects are good because you get a lot more ideas, different algorithms, etc. In the end though, I'd think one program would be best. Like you said, lots of testing and development ahead. [BREAK] So far 1.63 has survived several stops and restarts. Only problem so far is I get [B]random[/B] force closes... it's like before when it would finish and force close before it would start the next exponent, only now it's doing it randomly. msft, any ideas what could be causing this? I didn't get any errors or warning during compile? |
[QUOTE=flashjh;291625]Is it feasable to combine the best of both into one as to not spend time on two separate projects with the same goal?
[/QUOTE] I can make combined version. |
[QUOTE=flashjh;291634]msft, any ideas what could be causing this? I didn't get any errors or warning during compile?[/QUOTE]
Please wait another guy's report. |
The test with 1.61 is finished for 26026433. I had one mismatch which was not reproducible, and it was NOT on the same place as Jerry got it. After restoring the checkpoint and rerun, everything went fine. So, too much overclock, or heat, or bad memory, or cosmic rays, whatever caused it first time, it did not cause it again.
I lowered the clock to factory default and begun testing v1.63 on the same expo, gtx580. Up to now running stable, all partial residue matching. I love -s switch, I would love it more if I could specify (or have builtin hardcoded, no special need to change) a [B]subfolder[/B], see the idea in my former post with the "backup" folder. That is because usually I have many other things on the cudalucas folder, and all sXXXX files are flooding it, making difficult the periodical maintenance (deleting, etc). Putting them all in a "backup" subfolder would be great. edit ps: I am using 4.1/2.0 from jerry's build |
1 Attachment(s)
Ver 1.64
1)Make backup dir. 2)Add all iteration round off err check option. when err > 0.48 exit. performace 2% down. [code] $ ./CUDALucas Usage: ./CUDALucas [-d device_number] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] -r|exponent|input_filename -f set fft length -s save all checkpoint files -t check round off error all iterations,when err > 0.49 exit $ ./CUDALucas -s 24036583 DEVICE:0------------------------ name GeForce GTX 460 ~~~ start M24036583 fft length = 1310720 Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.64 (1:00 real, 5.9464 ms/iter, ETA 39:40:31) ^C caught. Writing checkpoint. $ ls backup/ s24036583.10001 s24036583.11377 4$ ./CUDALucas -t 24036583 DEVICE:0------------------------ name GeForce GTX 460 ~~~ start M24036583 fft length = 1310720 Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.64 (1:01 real, 6.0964 ms/iter, ETA 40:40:36) [/code] |
1 Attachment(s)
Got mismatch with 1.58 but had several err/increasing messages. Could be bad handling of that...
|
1.64 Binaries
1 Attachment(s)
[QUOTE=msft;291688]Ver 1.64
1)Make backup dir. 2)Add all iteration round off err check option. when err > 0.48 exit. performace 2% down. [code] $ ./CUDALucas Usage: ./CUDALucas [-d device_number] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] -r|exponent|input_filename -f set fft length -s save all checkpoint files -t check round off error all iterations,when err > 0.49 exit $ ./CUDALucas -s 24036583 DEVICE:0------------------------ name GeForce GTX 460 ~~~ start M24036583 fft length = 1310720 Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.64 (1:00 real, 5.9464 ms/iter, ETA 39:40:31) ^C caught. Writing checkpoint. $ ls backup/ s24036583.10001 s24036583.11377 4$ ./CUDALucas -t 24036583 DEVICE:0------------------------ name GeForce GTX 460 ~~~ start M24036583 fft length = 1310720 Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.64 (1:01 real, 6.0964 ms/iter, ETA 40:40:36) [/code][/QUOTE] Attached v1.64 x64 binaries (untested):[LIST][*]CUDA 4.0 / SM 2.0[*]CUDA 4.1 / SM 2.0[*]CUDA 4.1 / SM 2.1[/LIST] |
1 Attachment(s)
[QUOTE=apsen;267365]I figured it out (kind of). When restarting from checkpoint and finishing the test it will then try to read more input from the same file although it has already been closed. It will loop endlessly until it crashes (haven't really figured out the exact point and reason for crash but it happens in "input" function when it enters endless loop).
As I tried to figure it out I have cut out a lot of unused code, removed K&R style prototypes, etc. Also added some timing output. I haven't touched anything related to calculations but it would be prudent to be cautious - it needs a lot of testing before production use. Most likely bugs would be in parsing command line. That code suffered most nontrivial change. I'm attaching the modified source code with Win64 executable compiled for sm_13.[/QUOTE] aspen/msft, 1.2b was the last build that included a win32 makefile. I modified my current makefile for win32, but it does not compile. Lots of errors during nvcc processing CUDALucas.cu. Has 32 bit compatability been removed or do I need some extra includes? Edit: I included the screen output with the errors. Thanks. |
[QUOTE=flashjh;291706]aspen/msft,
1.2b was the last build that included a win32 makefile. I modified my current makefile for win32, but it does not compile. Lots of errors during nvcc processing CUDALucas.cu. Has 32 bit compatability been removed or do I need some extra includes? Edit: I included the screen output with the errors.[/QUOTE] From the looks of things, all those errors can be tied back to the lack of definition of "BOOL" which I'm assuming is a pre-processor macro. (searching the internet...) on windows, you should probably "#include <WinDef.h>" at some point. I cannot confirm this due to a lack of a windows environment. |
[QUOTE=aaronhaviland;291712]From the looks of things, all those errors can be tied back to the lack of definition of "BOOL" which I'm assuming is a pre-processor macro. (searching the internet...) on windows, you should probably "#include <WinDef.h>" at some point. I cannot confirm this due to a lack of a windows environment.[/QUOTE]
Thanks! I'll take a look. |
[QUOTE=flashjh;291713]Thanks! I'll take a look.[/QUOTE]
Since taking a look at CUDALucas.cu, the file was edited since the last win32 build, some changes need to be made-any help is appreciated. I'll edit it later. |
[QUOTE=apsen;291701]Got mismatch with 1.58 but had several err/increasing messages. Could be bad handling of that...[/QUOTE]
[code] err = 0.371603, increasing n from 1572864 continuing work from a partial result err = 0.371603, increasing n from 1572864 continuing work from a partial result err = 0.371603, increasing n from 1572864 continuing work from a partial result err = 0.371603, increasing n from 1572864 continuing work from a partial result Iteration 27390000 M( 29198173 )C, 0x0ae3a28bd9f1003c, n = 1572864, CUDALucas v1.58 (1:29 real, 8.9455 ms/iter, ETA 4:28:21) [/code] Interesting. Why exit infinity loop ? |
What version is this?[code]bill@Gravemind:~/CUDALucas∰∂ CUDALucas -v
CUDALucas version information: $Id: MacLucasFFTW.c,v 8.1 2007/06/23 22:33:35 wedgingt Exp $ wedgingt@acm.org ^C^C ^C[/code]Note that ^C doesn't have an effect. The tarball in the directory says 1.3. |
[QUOTE=Dubslow;291775]What version is this?[code]bill@Gravemind:~/CUDALucas∰∂ CUDALucas -v
CUDALucas version information: $Id: MacLucasFFTW.c,v 8.1 2007/06/23 22:33:35 wedgingt Exp $ wedgingt@acm.org ^C^C ^C[/code]Note that ^C doesn't have an effect. The tarball in the directory says 1.3.[/QUOTE] ^C works on newer versions :smile: |
Is 1.64 a good bet for a (wink-wink-nudge-nudge) test?
Or is there a more recommended version (but with 3M FFT size)? |
[QUOTE=Batalov;291777]Is 1.64 a good bet for a (wink-wink-nudge-nudge) test?
Or is there a more recommended version (but with 3M FFT size)?[/QUOTE] 1.64 has completed a successful test for me and that is saying a lot. It went thru multiple ^c and restarts. It's a *little* slower (like 3.8% on my 580) than 1.63, but it also did not force close. I'd say you're good-to-go. (Of course, if you don't want to run it, you can pass it to me :wink:) |
Fat chance, man. :devil:
I am checking the first residues on another computer to have an early warning is something is afoul. (The first and last iterations are the most dangerous because of initial non-randomness. I am sure that this has been taken care of by the smart code, but I'll run checks anyway.) |
[QUOTE=Batalov;291781]Fat chance, man. :devil:[/QUOTE]
Always worth a shot. |
I was asking about version, because I'd be willing to run the power-of-2 FFT on my 460, if necessary. Whatever version I have (still have no clue which it is) is already compiled and ready to go (been sitting idle for the last few months). Such would be necessary if 1.64 and MPrime disagree.
|
[QUOTE=Batalov;291781]Fat chance, man. :devil:[/QUOTE]
A quick quiz... You're a Geek if it's Saturday Night, and you're more interested in a possible MP than the moist female making eyes beside you... True, or false? :wink: |
[QUOTE=chalsall;291791]A quick quiz...
You're a Geek if it's Saturday Night, and you're more interested in a possible MP than the moist female making eyes beside you... True, or false? :wink:[/QUOTE] I only read the forum while waiting for the screen to load a new area while playing Star Wars. |
[QUOTE=Dubslow;291775]What version is this?[code]bill@Gravemind:~/CUDALucas∰∂ CUDALucas -v
CUDALucas version information: $Id: MacLucasFFTW.c,v 8.1 2007/06/23 22:33:35 wedgingt Exp $ wedgingt@acm.org ^C^C ^C[/code]Note that ^C doesn't have an effect. The tarball in the directory says 1.3.[/QUOTE] Please. $ CUDALucas 216091 |
[QUOTE=apsen;291701]Got mismatch with 1.58 but had several err/increasing messages. Could be bad handling of that...[/QUOTE]
restart from check point file was SW error. residues mismatch was HW error.(I guess) |
[QUOTE=msft;291854]Please.
$ CUDALucas 216091[/QUOTE] Ah yes, now I remember why I gave up on CUDALucas running on Linux... [code]bill@Gravemind:~/CUDALucas∰∂ CUDALucas 216091 CUDALucas: Could not find a checkpoint file to resume from device_number >= device_count ... exiting[/code] I never did figure this out. I have 270.xx drivers with a GTX 460; mfaktc 0.17 is running fine. (0.18 requires later drivers (like 285.xx), but for the life of me I have not been able to get any new drivers to properly install.) |
[QUOTE=flashjh;291634]
Only problem so far is I get [B]random[/B] force closes... it's like before when it would finish and force close before it would start the next exponent, only now it's doing it randomly. msft, any ideas what could be causing this? I didn't get any errors or warning during compile?[/QUOTE] Sorry I don't understand with my poor english. Please more information(log or ...). |
Yay!
[CODE] Processing result: M( 26206723 )C, 0x19fb046af3bf4bce, n = 1572864, CUDALucas v1.64 LL test successfully completes double-check of M26206723 [/CODE] |
I saw I had 3 results this morning before leaving for job, two from 1.64 and one from 1.63, but could not report them (server was down, as everybody already know). No idea if they matched, and unfortunately, they will have to wait about 6 hours more, till I can reach that computer again. I will keep you informed about the result, in 6 hours.
|
When PrimeNet's down, you could use Mersenne-aries, because it has everything up to at least the day before.
|
[QUOTE=Dubslow;291936]When PrimeNet's down, you could use Mersenne-aries, because it has everything up to at least the day before.[/QUOTE]
Good point |
[QUOTE=Dubslow;291936]When PrimeNet's down, you could use Mersenne-aries, because it has everything up to at least the day before.[/QUOTE]
Too late now. I will know it next time when we find another prime and the server will be down :P |
Perfect, again
[CODE]Processing result: M( 29217869 )C, 0x1b92a9eb0e596b54, n = 1572864, CUDALucas v1.61
LL test successfully completes double-check of M29217869[/CODE]Switching to 1.64. |
1 Attachment(s)
[QUOTE=apsen;291701]Got mismatch with 1.58 but had several err/increasing messages. Could be bad handling of that...[/QUOTE]
I have rerun 29198173 with 1.63 and got another mismatch. BTW no err messages on 1.63 side and residues matched between 1.58 and 1.63 up to the point of the first 1.58 err message. Running 29027371 with 1.63 now. ETA 33 hours. Will rerun 29198173 with older version once 29027371 completes. |
[QUOTE=aaronhaviland;291712]From the looks of things, all those errors can be tied back to the lack of definition of "BOOL" which I'm assuming is a pre-processor macro. (searching the internet...) on windows, you should probably "#include <WinDef.h>" at some point. I cannot confirm this due to a lack of a windows environment.[/QUOTE]
Agree on the BOOL issue. You need to be careful with includes so winsocks do not get included as it will prevent CudaLucas.cu from compiling. I believe BOOL is defined as int so you could just change the signature correspondingly. I'll see what is the best way to fix that when I get to my VS computer. |
[QUOTE=apsen;291995]
I'll see what is the best way to fix that when I get to my VS computer.[/QUOTE] Actually that code does not need ifdef. You could compile it using unix semantics. |
It is moving
[QUOTE=Brain;291986]Switching to 1.64.[/QUOTE]
Ehm, is it getter faster for you, too? GTX 560 Ti @ clock speeds, 4.1/2.1 exe: [CODE]1.58: M( 28652753 ), n = 1572864, CUDALucas v1.58: 6.03ms/Iter 1.61: M( 29217869 ), n = 1572864, CUDALucas v1.61: 5.73ms/Iter 1.64: M( 29013107 ), n = 1572864, CUDALucas v1.64: 5.08ms/Iter[/CODE]I thought 1.64 had 2% slowdown... I like it. :wink: |
[QUOTE=Brain;292075]I thought 1.64 had 2% slowdown...
:wink:[/QUOTE] Only with the dirty switch. I love it WITH the switch, in fact... I prefer slower and safer. |
[CODE]
. LL test successfully completes double-check of M26177689 LL test successfully completes double-check of M26128457[/CODE]So, version 1.64 is VERY close to what means a good and stable version :D |
Reporting results
Sorry for this dumb question.
I´m running Cudalucas for the first time: everything fine, smooth and all, but... what will be the correct way to report the result? I mean, does Cudalucas write the result to some text file that can then be uploaded through the server Manual Pages, or... In any case I think the result will have to be manually uploaded, and no security code is in place to avoid faking results, isn´t it? |
[QUOTE=lycorn;292187] does Cudalucas write the result to some text file that can then be uploaded through the server Manual Pages, or...
In any case I think the result will have to be manually uploaded, and no security code is in place to avoid faking results, isn´t it?[/QUOTE] Older versions used to create/update a file called mersarch.txt. With the newer versions you should look for a plain text result.txt. Every time an expo is finished, the last residue is appended to the file. You can edit the file (I always do, to add some info like date found etc). The new lines added by CudaLucas you have to copy and paste into mersenne.org report form. Be careful to be logged in, otherwise your results will end reported as anonymous and you won't be credited. |
Thanks a bunch.
|
CUDALucas1.58.cuda4.0.sm_20.WIN64.29198173.log
[code] Iteration 8160000 M( 29198173 )C, 0xd3d8e5229c0ef595, n = 1572864, CUDALucas v1.58 (1:06 real, 6.6116 ms/iter, ETA 38:37:22) Iteration 8170000 M( 29198173 )C, 0x191e1edf48d3246acontinuing work from a partial result Iteration 8220000 M( 29198173 )C, 0x8b29efc1039a9f62, n = 1572864, CUDALucas v1.58 (1:50 real, 11.0380 ms/iter, ETA 64:17:47) [/code] CUDALucas1.63.cuda4.1.sm_20.WIN64.29198173.log [code] Iteration 8160000 M( 29198173 )C, 0xd3d8e5229c0ef595, n = 1572864, CUDALucas v1.63 (0:53 real, 5.2994 ms/iter, ETA 30:57:26) Iteration 8170000 M( 29198173 )C, 0x191e1edf48d3246a, n = 1572864, CUDALucas v1.63 (0:53 real, 5.2998 ms/iter, ETA 30:56:41) Iteration 8180000 M( 29198173 )C, 0x38a7e6a8dbc3b325, n = 1572864, CUDALucas v1.63 (0:53 real, 5.2994 ms/iter, ETA 30:55:41) Iteration 8190000 M( 29198173 )C, 0x850cdcbc97f98d2a, n = 1572864, CUDALucas v1.63 (0:53 real, 5.2993 ms/iter, ETA 30:54:45) Iteration 8200000 M( 29198173 )C, 0xe6cd792c103f8994, n = 1572864, CUDALucas v1.63 (0:53 real, 5.2995 ms/iter, ETA 30:53:56) Iteration 8210000 M( 29198173 )C, 0xdd1b0a482e81a352, n = 1572864, CUDALucas v1.63 (0:53 real, 5.2992 ms/iter, ETA 30:52:57) Iteration 8220000 M( 29198173 )C, 0x9652c73462421f58, n = 1572864, CUDALucas v1.63 (0:53 real, 5.2994 ms/iter, ETA 30:52:08) [/code] My Ver1.58 [code] Iteration 8160000 M( 29198173 )C, 0xd3d8e5229c0ef595, n = 1572864, CUDALucas v1.58 (1:28 real, 8.7039 ms/iter, ETA 50:50:43) Iteration 8170000 M( 29198173 )C, 0x191e1edf48d3246a, n = 1572864, CUDALucas v1.58 (1:27 real, 8.7058 ms/iter, ETA 50:49:55) Iteration 8180000 M( 29198173 )C, 0x38a7e6a8dbc3b325, n = 1572864, CUDALucas v1.58 (1:27 real, 8.7038 ms/iter, ETA 50:47:47) Iteration 8190000 M( 29198173 )C, 0x850cdcbc97f98d2a, n = 1572864, CUDALucas v1.58 (1:27 real, 8.7049 ms/iter, ETA 50:46:43) Iteration 8200000 M( 29198173 )C, 0xe6cd792c103f8994, n = 1572864, CUDALucas v1.58 (1:27 real, 8.7074 ms/iter, ETA 50:46:08) Iteration 8210000 M( 29198173 )C, 0xdd1b0a482e81a352, n = 1572864, CUDALucas v1.58 (1:27 real, 8.7073 ms/iter, ETA 50:44:40) Iteration 8220000 M( 29198173 )C, 0x9652c73462421f58, n = 1572864, CUDALucas v1.58 (1:27 real, 8.7054 ms/iter, ETA 50:42:31) [/code] roundoff error check per 100 iterations. can not find all round off error. |
[QUOTE=msft;292278]CUDALucas1.58.cuda4.0.sm_20.WIN64.29198173.log
CUDALucas1.63.cuda4.1.sm_20.WIN64.29198173.log My Ver1.58 roundoff error check per 100 iterations. can not find all round off error.[/QUOTE] Did you get a mismatch? Or is it still running? |
Possibly not a round off error: I've found a few instances where CUDALucas uses cudaMalloc / malloc of a size larger than the data that is put there. You may need to call cudaMemset on g_x before copying x to it.
[CODE] v1.57 v1.64 size of cudaMalloc g_x 1.5*n + 2*STRIDE 1.25*n size of memcpy x->g_x 1*n 1*n [/CODE]I have not checked and of your other arrays to see if they have similar issues. They might need cudaMemset as well? I had a similar problem in gpuLucas, and am currently clearing all the arrays on initialization just to be safe. |
[QUOTE=aaronhaviland;292330]I have not checked and of your other arrays to see if they have similar issues. They might need cudaMemset as well? I had a similar problem in gpuLucas, and am currently clearing all the arrays on initialization just to be safe.[/QUOTE]
Data initialization is no problem. Roundoff error was HW issue. |
[QUOTE=apsen;291992]I have rerun 29198173 with 1.63 and got another mismatch. BTW no err messages on 1.63 side and residues matched between 1.58 and 1.63 up to the point of the first 1.58 err message.
Running 29027371 with 1.63 now. ETA 33 hours. Will rerun 29198173 with older version once 29027371 completes.[/QUOTE] Got mismatch for 29027371 with 1.63. :sad: |
Got another match with 1.64.
|
[QUOTE=Batalov;292443]Got another match with 1.64.[/QUOTE]
Have you gotten enough matches to at least entertain the idea that 1.64 is well behaved? |
Just these two (26059867 and the "M48"). I am not interested enough to make it a regular activity, I was only interested to see if this program works. My plan is to run gpuLucas on the same two numbers, but later, because the end result is somewhat underwhelmingly predictable.:smile:
|
[QUOTE=apsen;292433]Got mismatch for 29027371 with 1.63. :sad:[/QUOTE]
Ver 1.64 have -s and -t option,Please try. [code] $ ./CUDALucas Usage: ./CUDALucas [-d device_number] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] -r|exponent|input_filename -f set fft length -s save all checkpoint files -t check round off error all iterations,when err > 0.49 exit [/code] |
[QUOTE=Batalov;292464]Just these two (26059867 and the "M48"). I am not interested enough to make it a regular activity, I was only interested to see if this program works. My plan is to run gpuLucas on the same two numbers, but later, because the end result is somewhat underwhelmingly predictable.:smile:[/QUOTE]
I look forward to those results, when they happen. |
I re-ran a series of DC exponents which gave me bad residues in the past (sorry for poaching, they were already re-assigned). I have another 2 matches (totally 4) using v1.64 with -s and -t. Seems to be stable and "well behaved". I am trying 3 expos in the first-time-LL (45M+) front, I will not report the results before DC-ing them with P95 to make sure. CudaLucas v1.64, ~10 hours to go for the first two.
By the way, I found a small cosmetic bug: when -c/-s/-t switches are paired together with -r, CL will crash. There should be a small description, or different pairing of the square brackets in the command line help, or better an improved implementation of the command line parser... |
replying to myself... thats a bit odd.. :D
I was quite optimistic... I just reached home after work (my working Saturday today) and found: [CODE]>cl1644120w64 -d 0 -c 250000 -s -t 45130601 ...<snip>... Iteration 33750000 M( 45130601 )C, .... err = 0.499978,round off err exiting. [/CODE]it seems to be reproducible, and I would try to insulate it, by calling CL with -c 1k (it was called with -c 250k -s -t). Two observations came out from this: First, you can see the utility of -t, as I said before, better safe then sorry. The safety highly compensates for slowness, imagine I would have to repeat whole 33M iterations again... Now the only last 250k (max) would need to be repeated, which is in fact A GAIN OF SPEED, despite the 2% penalty. I love -t! :D I would like to have it setable (like 0.45, 0.37, whatever I like, parameter of the -t switch?). And the second observation: [B]would it be possible to spit out the iteration number and eventually save a checkpoint file with previous residue when this round off error happens[/B]? This would save me from re-running the last 250k (or 500k, or 1M) iterations to insulate the error. I can not use lower values for -c, because the harddisk space will be eaten up very fast by the save files, so 250k is a perfect value for a residue-comparison process, but in case of roundoff error, it would take longer to insulate it or repeat the test. |
Regarding what is safe round off error -- when testing which FFT size to use, Prime95 requires that the average round off be less than .24 (roughly), NOT less than .49. Here's what George had to say:
[url]http://www.mersenneforum.org/showpost.php?p=291942&postcount=102[/url] That would presumably prevent errors like LaurV just encountered. |
George reffers to the AVERAGE roundoff error (for 100 or 1k iterations, or so). Which indeed would need to be much lower to catch the spikes going over 0.5. If you check it at every iteration (what -t is doing) then comparing it with 0.5 would be enough. But we skeptic being, to set it lower we would like... (paraphrasing master Yoda..)
|
[QUOTE=LaurV;292485]By the way, I found a small cosmetic bug: when -c/-s/-t switches are paired together with -r, CL will crash. There should be a small description, or different pairing of the square brackets in the command line help, or better an improved implementation of the command line parser...[/QUOTE]
Please more information.With linux is OK. |
The roundoff error have two causes, fft length or HW error.
HW error need down clock. |
Perfect, again
Still no mismatch here:
[CODE]Processing result: M( 29013107 )C, 0xd9e76769f7b81b52, n = 1572864, CUDALucas v1.64 LL test successfully completes double-check of M29013107[/CODE] |
[QUOTE=apsen;292326]Did you get a mismatch? Or is it still running?[/QUOTE]
[code] M( 29198173 )C, 0x6fd7e4d6557f5b77, n = 1572864, CUDALucas v1.58 [/code] correct. |
I had a match but the assignment [URL="http://www.mersenneforum.org/showthread.php?p=292493#post292493"]had already been turned in[/URL]. Good news is the original LL was bad because my 1.64 matched David's CUDALucas run.
M( 26002063 )C, 0x1c5e4ca283b033__, n = 1572864, CUDALucas v1.64 |
[QUOTE=LaurV;292500]Which indeed would need to be much lower to catch the spikes going over 0.5. If you check it at every iteration (what -t is doing) then comparing it with 0.5 would be enough.[/QUOTE]
This is not correct. A roundoff error of 0.49 is harmless, but a roundoff error of 0.51 is deadly. The problem is the program will correctly report both as 0.49. So if CUDALucas reports a round off error of 0.49, how confident are you that it really wasn't a deadly roundoff of 0.51??? This is why PFGW aborts (actually switches to a larger FFT length) when the roundoff error exceeds 0.45. Prime95 retries the iteration if the roundoff exceeds 0.40. |
[QUOTE=Prime95;292511]This is not correct. A roundoff error of 0.49 is harmless, but a roundoff error of 0.51 is deadly. The problem is the program will correctly report both as 0.49.
So if CUDALucas reports a round off error of 0.49, how confident are you that it really wasn't a deadly roundoff of 0.51??? This is why PFGW aborts (actually switches to a larger FFT length) when the roundoff error exceeds 0.45. Prime95 retries the iteration if the roundoff exceeds 0.40.[/QUOTE] That was EXACTLY what I was talking about. You may not get that if only read post 931, but please read careful my post 929 [edit: the first observation, last part]. |
[QUOTE=Prime95;292511]So if CUDALucas reports a round off error of 0.49, how confident are you that it really wasn't a deadly roundoff of 0.51??? This is why PFGW aborts (actually switches to a larger FFT length) when the roundoff error exceeds 0.45. Prime95 retries the iteration if the roundoff exceeds 0.40.[/QUOTE]
[code] Ver 1.64 default: if((iteration % 100) == 0 || iteration < 1000) if(roundoff > 0.35) increasing fft length -t option: if(roundoff > 0.49) exit program else if(roundoff > 0.35) increasing fft length [/code] [code] if(roundoff > 0.49) exit program [/code] this is experimental code. |
Another good one
Another 1.64 success
[CODE] Processing result: M( 26134351 )C, 0xb9d6a5672486c791, n = 1572864, CUDALucas v1.64 LL test successfully completes double-check of M26134351 [/CODE] |
[QUOTE=flashjh;292579]Another 1.64 success
[CODE] Processing result: M( 26134351 )C, 0xb9d6a5672486c791, n = 1572864, CUDALucas v1.64 LL test successfully completes double-check of M26134351 [/CODE][/QUOTE] Encouraging. |
[QUOTE=msft;292504][code]
M( 29198173 )C, 0x6fd7e4d6557f5b77, n = 1572864, CUDALucas v1.58 [/code] correct.[/QUOTE] That does not match first time test. I guess I better rerun it with P95. |
[QUOTE=apsen;292627]That does not match first time test. I guess I better rerun it with P95.[/QUOTE]
You should submit the result to PrimeNet, it may be correct. |
I finished first-time-LL for 45130601 and 4520386. The tests were done with CL1.64, with -s and -t, so the intermediate residues and all checkpoint files (every 250k iterations) are available if someone wants to do the double check with p95. When (and if) my cores become less loaded, I would attempt a DC with P95 by myself, but this will not be the coming weeks.
Currently, I am testing another expo in the same range (45221537) with 2 cards in the same time, no overclocking. This to be sure if CL.1.64 is "reliable" in the 45M range area (in fact, this is more a test of the fact that the "cheap" gtx580 with 1.5Gig memory which I use are "reliable" from the hardware point of view, at factory speed 782MHz, they shall get the same results, no matter if the software is mathematically correct or not). Up to now, 19M iterations done on both (they have about the same speed, one is a bit slower maybe because it is used as primary display(?!)) and both residues are matching. edit: Roughly 40 hours to go, I don't use -s and -t, in fact this is the idea, to see how reliable is without checking every iteration, but I am saving the checkpoints (using my batch file posted before) every 30 minutes, in case there will be a mismatch, to avoid starting everything from the beginning. Without -t switch, CL is faster, as discussed before. Anyhow, if two copies are testing the same exponent (in two different folders) then [B]-s can not be used[/B], as they will try writing the [B]SAME[/B] checkpoint files. The idea with the "backup" subfolder was to have it [B]in the current folder[/B], and not in the root of the disk... Like in ".\backup\......." and not "c:\backup\....." Anyhow, you could argue that no one will test the same expo with more copies of CL in the same time, but in the case you re-test the same expo later using -s, the chechpoint files will be overwritten too... Why not let the user to customize the output path? |
Responsibility
[QUOTE=James Heinrich;292430]I just started experimenting with CUDAlucas yesterday. First impressions: it uses zero CPU, but the GPU usage is more aggressive than mfaktc. Normal Windows usage is fine, I can't watch even DVD-quality video smoothly with CUDAlucas whereas it's only 1080 video I have to switch mfaktc off for. Most likely I'll go back to mfaktc, partly for usability, but also because the extra two cores don't scale so well with the new AVX cores in Prime95 (iteration times when running 6 workers are significantly slower than 4 workers).[/QUOTE]
I cannot even run low res playback with 1.64. Because of lags / bad responsiveness. I suggest - again - a command line switch: for example: --polite or --agressive where --polite would be default. This would insert an artificial CUDA wait loop where other apps (playback) have a go. It was introduced when an unnecessary CudaMemCpy was killed. |
[QUOTE=Brain;292671]I cannot even run low res playback with 1.64. Because of lags / bad responsiveness. I suggest - again - a command line switch: for example: --polite or --agressive where --polite would be default. This would insert an artificial CUDA wait loop where other apps (playback) have a go.
It was introduced when an unnecessary CudaMemCpy was killed.[/QUOTE] Or a CL option to control threads and blocks. This way, it's up to the user to decide whether to run at max performance or at some gpu-idle state. |
[QUOTE=LaurV;292662]Anyhow, if two copies are testing the same exponent (in two different folders) then [B]-s can not be used[/B], as they will try writing the [B]SAME[/B] checkpoint files. The idea with the "backup" subfolder was to have it [B]in the current folder[/B], and not in the root of the disk... Like in ".\backup\......." and not "c:\backup\....." Anyhow, you could argue that no one will test the same expo with more copies of CL in the same time, but in the case you re-test the same expo later using -s, the chechpoint files will be overwritten too... Why not let the user to customize the output path?[/QUOTE]
It is bug with Windows. Can someone fix this bug? |
[QUOTE=Brain;292671]I cannot even run low res playback with 1.64. Because of lags / bad responsiveness. I suggest - again - a command line switch: for example: --polite or --agressive where --polite would be default. This would insert an artificial CUDA wait loop where other apps (playback) have a go.
It was introduced when an unnecessary CudaMemCpy was killed.[/QUOTE] It is mean,CudaMemCpy fix bad responsiveness with Windows? |
[QUOTE=msft;292682]It is bug with Windows.
Can someone fix this bug?[/QUOTE] I never heard of such windows bug, therefore I went to have a look into CL1.64 source, in Cudalucas.cu at line 1330 we can see: [CODE]#ifdef linux mode_t mode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH; if (mkdir ("./backup", mode) != 0) printf ("mkdir: cannot create directory `backup': File exists\n"); #else if (_mkdir ("\\backup") != 0) printf ("mkdir: cannot create directory `backup': File exists\n"); #endif [/CODE]which, when no linux, it will create a backup folder in the ROOT of the current windows disk. This isn't what was intended, is it? [edit: for Jerry or other builders, the double backslash should be eliminated in front of backup, to create the backup dir as a subfolder of the current folder. It also appears at line 1079, when written.] |
[QUOTE=msft;292683]It is mean,CudaMemCpy fix bad responsiveness with Windows?[/QUOTE]
CudaMemCpy was removed by ethan in his 1.3 --> GPU usage went from 97% to 99% (good, faster) --> But same time, displays became laggy (not good) Suggestion: Reduce GPU usage a little bit to allow other apps to access device. Formerly, the unneeded CudaMemCpy did this waiting... |
[QUOTE=Brain;292693]CudaMemCpy was removed by ethan in his 1.3
--> GPU usage went from 97% to 99% (good, faster) --> But same time, displays became laggy (not good) Suggestion: Reduce GPU usage a little bit to allow other apps to access device. Formerly, the unneeded CudaMemCpy did this waiting...[/QUOTE] This happens in the reverse way when CL jumped from 1.58 to 1.63 (I don't know exactly where 1.61 falls): -- GPU went from 99% to 92-95% (GTX580, Tesla) - the "a bit slower" was not visible due to better fft size, changing some tempvars from double to float, etc, compensatory stuff, which in fact made CL 1.63 and 1.64 faster (letting apart the -t switch, which slows things down 2-3 percents, we are talking about speed comparison without -t). -- And same time the computer (display, cpu) become more responsive. Running with -t will lower the speed more, but in the same time the GPU will be less busy (I believe that checking the errors at every iteration make the GPU to wait longer), like 88% instead of 95%, or even lower, 83%, the computer will be even more responsive and the process will be safer, not talking about the consumed energy and the produced heat, which would also be lower. So, why won't you try to use -t? For me it works nice with it. If this "GPU-busy percent" could be parametrized, it could be even better. |
Do you like?
[code] cudalucas.1.65$ ./CUDALucas Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] [-agressive] -r|exponent|input_filename -threads set threads number(default=256) -f set fft length -s save all checkpoint files -t check round off error all iterations [/code] |
[QUOTE=msft;292703]Do you like?
[code] cudalucas.1.65$ ./CUDALucas Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] [-agressive] -r|exponent|input_filename -threads set threads number(default=256) -f set fft length -s save all checkpoint files -t check round off error all iterations [/code][/QUOTE] yeaaa... (where the hack is the salivating smiley???) :smile: edit: well, [-s folder] would sound perfect... :razz: |
[QUOTE=msft;292703]Do you like?
[code] cudalucas.1.65$ ./CUDALucas Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] [-agressive] -r|exponent|input_filename -threads set threads number(default=256) -f set fft length -s save all checkpoint files -t check round off error all iterations [/code][/QUOTE] Love it! |
1 Attachment(s)
Ver 1.65
1) change behavior round off error iterations < 1000:increasing fft length iterations >= 1000:exit program 2) print maxerror 3) change -s option 4) add -agressive option 5) add -threads option [code] cudalucas.1.65$ ./CUDALucas Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s folder] [-t] [-agressive] -r|exponent|input_filename -threads set threads number(default=256) -f set fft length -s save all checkpoint files -t check round off error all iterations -agressive GPU agressive(default polite) cudalucas.1.65$ ./CUDALucas -threads 1024 -r DEVICE:0------------------------ name GeForce GTX 460 ~~~ Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.65 err = 0.04723 (0:20 real, 1.9987 ms/iter, ETA 3:51:51) Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.65 err = 0.03019 (0:39 real, 3.9262 ms/iter, ETA 14:40:07) Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.65 err = 0.09749 (0:54 real, 5.3697 ms/iter, ETA 31:17:36) Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.65 err = 0.1996 (1:03 real, 6.2895 ms/iter, ETA 41:57:54) Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.65 err = 0.01873 (1:17 real, 7.7218 ms/iter, ETA 55:39:40) Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.65 err = 0.02155 (1:26 real, 8.6305 ms/iter, ETA 72:51:20) Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.65 err = 0.1181 (1:27 real, 8.6291 ms/iter, ETA 78:04:10) err = 0.441193, increasing n from 1966080 Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.65 err = 0.1117 (1:35 real, 9.4234 ms/iter, ETA 97:13:04) Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.65 err = 0.1871 (1:50 real, 10.9708 ms/iter, ETA 129:54:47) Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.65 err = 0.2798 (1:50 real, 10.9809 ms/iter, ETA 131:27:57) [/code] |
v1.65 x64 binaries (untested)
1 Attachment(s)
[QUOTE=msft;292729]Ver 1.65
1) change behavior round off error iterations < 1000:increasing fft length iterations >= 1000:exit program 2) print maxerror 3) change -s option 4) add -agressive option 5) add -threads option [code] cudalucas.1.65$ ./CUDALucas Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s folder] [-t] [-agressive] -r|exponent|input_filename -threads set threads number(default=256) -f set fft length -s save all checkpoint files -t check round off error all iterations -agressive GPU agressive(default polite) cudalucas.1.65$ ./CUDALucas -threads 1024 -r DEVICE:0------------------------ name GeForce GTX 460 ~~~ Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.65 err = 0.04723 (0:20 real, 1.9987 ms/iter, ETA 3:51:51) Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.65 err = 0.03019 (0:39 real, 3.9262 ms/iter, ETA 14:40:07) Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.65 err = 0.09749 (0:54 real, 5.3697 ms/iter, ETA 31:17:36) Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.65 err = 0.1996 (1:03 real, 6.2895 ms/iter, ETA 41:57:54) Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.65 err = 0.01873 (1:17 real, 7.7218 ms/iter, ETA 55:39:40) Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.65 err = 0.02155 (1:26 real, 8.6305 ms/iter, ETA 72:51:20) Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.65 err = 0.1181 (1:27 real, 8.6291 ms/iter, ETA 78:04:10) err = 0.441193, increasing n from 1966080 Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.65 err = 0.1117 (1:35 real, 9.4234 ms/iter, ETA 97:13:04) Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.65 err = 0.1871 (1:50 real, 10.9708 ms/iter, ETA 129:54:47) Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.65 err = 0.2798 (1:50 real, 10.9809 ms/iter, ETA 131:27:57) [/code][/QUOTE] Attached v1.65 x64 binaries (untested): [LIST][*]CUDA 4.0 / SM 2.0[*]CUDA 4.1 / SM 2.0[*]CUDA 4.1 / SM 2.1[/LIST]EDIT: Just tried running 1.65 4.1 | 2.0 and it quit right after displaying the inital startup stuff. I switched back to 1.64 because I have to go to work. |
[CODE]>CUDALucas.exe -d 1 -threads 512 -c 10000 -t 216091
DEVICE:1------------------------ name GeForce GTX 480 totalGlobalMem 1610612736 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 totalConstMem 65536 major.minor 2.0 clockRate 1640000 textureAlignment 512 deviceOverlap 1 multiProcessorCount 15 too small Exponent 216091 >pause Press any key to continue . . .[/CODE] Why? CUDALucas no longer accepts small exponents? |
[QUOTE=flashjh;292731]Attached v1.65 x64 binaries (untested): [LIST][*]CUDA 4.0 / SM 2.0[*]CUDA 4.1 / SM 2.0[*]CUDA 4.1 / SM 2.1[/LIST]EDIT: Just tried running 1.65 4.1 | 2.0 and it quit right after displaying the inital startup stuff. I switched back to 1.64 because I have to go to work.[/QUOTE]
Up and running, thanks both of you msft and Jerry. Did all possible testing combinations for threads, the fastest on gtx580 is the one with 512. The 1024 brings a small penalty, no idea why, theoretically the threads would also queue for 512, despite of the fact that there are 512 cores, they would never work all in the same time for CL only. The -agressive switch is ok, works perfectly it brings a bit of more speed (as in 15% more!!) but the computer is less responsive (as argued before). I love the default variant (polite), is slower, but the computer is more responsive and at least Mrs LaurV can write her mails... :P so this part of headache is gone... :smile:. So, because of that, we will ignore the fact that the spelling of aggressive is wrong :P (anyhow, "-a" would suffice too) We started the testing of 45221537 (see discussion before) on TWO cards (cheap gtx580, 1.5G mem, 782MHz clock, no OC). [B]We love the output format![/B] (it shows error=, real time=, eta=, 4 decimals for ms/iter, wonderful!) [B]We love the [-s backup] switch[/B], we can arrange the things as we like in the folder now. It is working, we tried it. [B]We love the speed[/B], we get between 4.5 and 5.1 ms/iter, without -t, and with 512 threads, with the default FFT size. That is faster then before, where we got 5.3-5.6 ms/iter, average. [B]We don't love the fact that [/B][B]older checkpoints are not compatible with the new one[/B]. That is why the test had to be restarted from scratch (it was close to finish, maybe 10-20 hours to go, now we need to wait again about 60 hours or more). This is minor, there will be not so many cases where one will restart "old tests". If you have old tests running, better let them finish before update, if they are say, half through. Otherwise is worth to restart, v1.65 is soooo much nicer! [B]There is not possible to test small expos anymore[/B]. We don't love that either, but thinking about the fact we only need to test big expos... well... to take the billion digit prize, say... that is satisfactory :smile: [B][COLOR=Red]We don't love -f switch[/COLOR][/B], because we don't know what values are allowed. Some documentation would be nice, if not all values are accepted. We understand the "use it on your risk" idea, but don't like programs crashing... We tried to use random values, smaller then default, based on the idea that the error for this expo is 0.07 (for the default FFT size of 2621440 of this expo), we consider that a bit smaller FFT, one for which the error could go to 0.1 or even 0.2, or there around, will speed-up the things a dime, but all the values we tried resulted in CL crashing with "unhandled exception, please report to microsoft". The good news are that since we started to write this mail, we got 30 rows of text in each window, and all residue matching with what we have saved in previous run with 1.64. We think we will stop one card and give her some mfaktc to do, and let only one to finish this expo. |
[QUOTE=LaurV;292756][B][COLOR=Red]We don't love -f switch[/COLOR][/B], because we don't know what values are allowed. Some documentation would be nice, if not all values are accepted.[/QUOTE]
multiples 32768(threads=256) multiples 65536(threads=512) multiples 131072(threads=1024) |
From CUDALucas.cu: smallest exponent is now 6,972,593
[CODE] if (q < 6972593) printf (" too small Exponent %d\n", q); [/CODE] |
[QUOTE=Karl M Johnson;292742]Why?
CUDALucas no longer accepts small exponents?[/QUOTE] [code] normalize2_kernel <<< N / threads / 128, 128 >>> (g_x, threads, bigAB, bigAB, g_err, g_carry, N, error_log, g_inv, g_ttp, g_ttmp, g_inv2, g_ttp2, g_ttmp2, g_inv3, g_ttp3, g_ttmp3); [/code] threads = 1024 1024 * 128 = 131072 131072 is min fft length. |
[QUOTE=apsen;292627]That does not match first time test. I guess I better rerun it with P95.[/QUOTE]
[code] Verified test results Exponent User name Computer name Residue Date found 29198173 msft Manual testing 6FD7E4D6557F5B77 2012-03-12 00:48 29198173 msft 6FD7E4D6557F5B77 2012-03-13 02:50 [/code] What on your mind? |
[QUOTE=msft;292827][code]
normalize2_kernel <<< N / threads / 128, 128 >>> (g_x, threads, bigAB, bigAB, g_err, g_carry, N, error_log, g_inv, g_ttp, g_ttmp, g_inv2, g_ttp2, g_ttmp2, g_inv3, g_ttp3, g_ttmp3); [/code] threads = 1024 1024 * 128 = 131072 131072 is min fft length.[/QUOTE] Two suggestions: 1) Check thread count before checking exponent (so that if threads =512, you can do a 512*64K FFT, or a 256*32K FFT for 256 threads). 1b) Select total number of threads after getting exponent to test (perhaps a warning about low GPU utilization) 2) Even if 1024 threads is selected, you can just continue the test anyways (but perhaps warn the user that below a certain threshold the efficiency will massively drop). (Obviously how these choices would interact with a manually selected FFT size or thread count would have to be figured out, but this is just to get the ball rolling.) [QUOTE=msft;292830][code] Verified test results Exponent User name Computer name Residue Date found 29198173 msft Manual testing 6FD7E4D6557F5B77 2012-03-12 00:48 29198173 msft 6FD7E4D6557F5B77 2012-03-13 02:50 [/code] What on your mind?[/QUOTE] Were those both done on GPU, or was one done on Prime95? (When apsen first posted his reply, only one of your tests was visible, so no one was able to tell that there was a match.) |
[QUOTE=flashjh;292731]EDIT: Just tried running 1.65 4.1 | 2.0 and it quit right after displaying the inital startup stuff. I switched back to 1.64 because I have to go to work.[/QUOTE]
I have everything running on 1.65 now. Who knows why it wouldn't work? I was in a hurry so I probably had a switch set wrong. Thanks for the updates msft. |
Ok, so now the first exponent to run DCs on is 6972593.
Okay:smile: That's 2h here. |
Used latest binaries, cuda 4.1, sm_20(thanks!)
[CODE]M( 6972593 )P, n = 393216, CUDALucas v1.65 [/CODE] |
CL 1.65 success
1 Attachment(s)
[CODE]Processing result: M( 26071663 )C, 0x48620a8eaadcaeb7, n = 1572864, CUDALucas v1.65
LL test successfully completes double-check of M26071663 [/CODE] EDIT: Attached full run .txt file with all results. EDIT2: It's working really well now msft. Thanks everyone for all the work on this! |
[QUOTE=msft;292830][code]
Verified test results Exponent User name Computer name Residue Date found 29198173 msft Manual testing 6FD7E4D6557F5B77 2012-03-12 00:48 29198173 msft 6FD7E4D6557F5B77 2012-03-13 02:50 [/code] What on your mind?[/QUOTE] I mean GIMPS reports this: [CODE]Unverified LL 3F6F8AA0E00307__ by "Olaf Fiebig"[/CODE] |
Meanwhile, I've had another good DC with 1.64, switching to 1.65.
I looked into the 1.65 code for "ag(g)ressive" setting, line 691: [CODE]if (!agressive_f) cutilSafeCall (cudaMemcpy (&l_err, g_err, sizeof (double), cudaMemcpyDeviceToHost));[/CODE]So the implementation doesn't use a wait timer - works as before in CL 1.2. Basically, calling a method only to do the waiting... :cmd: Nevertheless, I like it. Haven't tried a(g)gressive param yet. Will do that when 2nd GPU (GTX 680) is there. |
1 Attachment(s)
[QUOTE=Brain;292995]Meanwhile, I've had another good DC with 1.64, switching to 1.65.
I looked into the 1.65 code for "ag(g)ressive" setting, line 691: [CODE]if (!agressive_f) cutilSafeCall (cudaMemcpy (&l_err, g_err, sizeof (double), cudaMemcpyDeviceToHost));[/CODE]So the implementation doesn't use a wait timer - works as before in CL 1.2. Basically, calling a method only to do the waiting... :cmd: Nevertheless, I like it. Haven't tried a(g)gressive param yet. Will do that when 2nd GPU (GTX 680) is there.[/QUOTE] Per request, attached CUDALucas 1.65 with x64 MAKEFILE included. |
| All times are UTC. The time now is 13:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.