mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

flashjh 2012-03-02 20:00

Just wondering since I haven't used gpuLucas... is that project set to replace CUDALucas or the other way around?

Is it feasable to combine the best of both into one as to not spend time on two separate projects with the same goal?

Maybe aaronhaviland and msft can talk about it?

LaurV 2012-03-02 20:07

[QUOTE=flashjh;291621]The -s switch in 1.63 does this[/QUOTE]
That is very good news. I will switch to it after this test is finished (about 10 hours to go). It may fill up the hdd, but is a gold mine when you debug mismatches....

flashjh 2012-03-02 20:09

[QUOTE=LaurV;291626]...but is a gold mine when you debug mismatches....[/QUOTE]

Completely agree!

Dubslow 2012-03-02 21:12

I think a lot more work needs to be done on both. As of yet, gpuLucas is not ready for production runs, which most versions of CUDALucas are. With the new speed in 1.63, msft may have closed the gap. There'll need to be extensive testing of both to determine which is better.

(Keep in mind the claims about gpuLucas being so much better were made when CUDALucas supported only power-of-2 FFTs; for those FFTs, gpuLucas was the same speed. Now that CUDALucas does support non-p-o-2 FFTs, and is much more mature, the programs are on even footing.)

flashjh 2012-03-02 21:17

[QUOTE=Dubslow;291633]I think a lot more work needs to be done on both. As of yet, gpuLucas is not ready for production runs, which most versions of CUDALucas are. With the new speed in 1.63, msft may have closed the gap. There'll need to be extensive testing of both to determine which is better.

(Keep in mind the claims about gpuLucas being so much better were made when CUDALucas supported only power-of-2 FFTs; for those FFTs, gpuLucas was the same speed. Now that CUDALucas does support non-p-o-2 FFTs, and is much more mature, the programs are on even footing.)[/QUOTE]

I see... that's good news. The gains of having two separate projects are good because you get a lot more ideas, different algorithms, etc. In the end though, I'd think one program would be best.

Like you said, lots of testing and development ahead.

[BREAK]

So far 1.63 has survived several stops and restarts.

Only problem so far is I get [B]random[/B] force closes... it's like before when it would finish and force close before it would start the next exponent, only now it's doing it randomly.

msft, any ideas what could be causing this? I didn't get any errors or warning during compile?

msft 2012-03-02 22:19

[QUOTE=flashjh;291625]Is it feasable to combine the best of both into one as to not spend time on two separate projects with the same goal?
[/QUOTE]
I can make combined version.

msft 2012-03-02 22:22

[QUOTE=flashjh;291634]msft, any ideas what could be causing this? I didn't get any errors or warning during compile?[/QUOTE]
Please wait another guy's report.

LaurV 2012-03-03 04:27

The test with 1.61 is finished for 26026433. I had one mismatch which was not reproducible, and it was NOT on the same place as Jerry got it. After restoring the checkpoint and rerun, everything went fine. So, too much overclock, or heat, or bad memory, or cosmic rays, whatever caused it first time, it did not cause it again.

I lowered the clock to factory default and begun testing v1.63 on the same expo, gtx580. Up to now running stable, all partial residue matching. I love -s switch, I would love it more if I could specify (or have builtin hardcoded, no special need to change) a [B]subfolder[/B], see the idea in my former post with the "backup" folder. That is because usually I have many other things on the cudalucas folder, and all sXXXX files are flooding it, making difficult the periodical maintenance (deleting, etc). Putting them all in a "backup" subfolder would be great.

edit ps: I am using 4.1/2.0 from jerry's build

msft 2012-03-03 09:36

1 Attachment(s)
Ver 1.64
1)Make backup dir.
2)Add all iteration round off err check option.
when err > 0.48 exit.
performace 2% down.
[code]
$ ./CUDALucas
Usage: ./CUDALucas [-d device_number] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] -r|exponent|input_filename
-f set fft length
-s save all checkpoint files
-t check round off error all iterations,when err > 0.49 exit
$ ./CUDALucas -s 24036583
DEVICE:0------------------------
name GeForce GTX 460
~~~

start M24036583 fft length = 1310720
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.64 (1:00 real, 5.9464 ms/iter, ETA 39:40:31)
^C caught. Writing checkpoint.
$ ls backup/
s24036583.10001 s24036583.11377
4$ ./CUDALucas -t 24036583
DEVICE:0------------------------
name GeForce GTX 460
~~~

start M24036583 fft length = 1310720
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.64 (1:01 real, 6.0964 ms/iter, ETA 40:40:36)
[/code]

apsen 2012-03-03 13:18

1 Attachment(s)
Got mismatch with 1.58 but had several err/increasing messages. Could be bad handling of that...

flashjh 2012-03-03 13:29

1.64 Binaries
 
1 Attachment(s)
[QUOTE=msft;291688]Ver 1.64
1)Make backup dir.
2)Add all iteration round off err check option.
when err > 0.48 exit.
performace 2% down.
[code]
$ ./CUDALucas
Usage: ./CUDALucas [-d device_number] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] -r|exponent|input_filename
-f set fft length
-s save all checkpoint files
-t check round off error all iterations,when err > 0.49 exit
$ ./CUDALucas -s 24036583
DEVICE:0------------------------
name GeForce GTX 460
~~~

start M24036583 fft length = 1310720
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.64 (1:00 real, 5.9464 ms/iter, ETA 39:40:31)
^C caught. Writing checkpoint.
$ ls backup/
s24036583.10001 s24036583.11377
4$ ./CUDALucas -t 24036583
DEVICE:0------------------------
name GeForce GTX 460
~~~

start M24036583 fft length = 1310720
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.64 (1:01 real, 6.0964 ms/iter, ETA 40:40:36)
[/code][/QUOTE]

Attached v1.64 x64 binaries (untested):[LIST][*]CUDA 4.0 / SM 2.0[*]CUDA 4.1 / SM 2.0[*]CUDA 4.1 / SM 2.1[/LIST]

flashjh 2012-03-03 14:00

1 Attachment(s)
[QUOTE=apsen;267365]I figured it out (kind of). When restarting from checkpoint and finishing the test it will then try to read more input from the same file although it has already been closed. It will loop endlessly until it crashes (haven't really figured out the exact point and reason for crash but it happens in "input" function when it enters endless loop).

As I tried to figure it out I have cut out a lot of unused code, removed K&R style prototypes, etc. Also added some timing output. I haven't touched anything related to calculations but it would be prudent to be cautious - it needs a lot of testing before production use. Most likely bugs would be in parsing command line. That code suffered most nontrivial change.

I'm attaching the modified source code with Win64 executable compiled for sm_13.[/QUOTE]

aspen/msft,

1.2b was the last build that included a win32 makefile. I modified my current makefile for win32, but it does not compile. Lots of errors during nvcc processing CUDALucas.cu. Has 32 bit compatability been removed or do I need some extra includes?

Edit: I included the screen output with the errors.

Thanks.

aaronhaviland 2012-03-03 14:55

[QUOTE=flashjh;291706]aspen/msft,

1.2b was the last build that included a win32 makefile. I modified my current makefile for win32, but it does not compile. Lots of errors during nvcc processing CUDALucas.cu. Has 32 bit compatability been removed or do I need some extra includes?

Edit: I included the screen output with the errors.[/QUOTE]

From the looks of things, all those errors can be tied back to the lack of definition of "BOOL" which I'm assuming is a pre-processor macro. (searching the internet...) on windows, you should probably "#include <WinDef.h>" at some point. I cannot confirm this due to a lack of a windows environment.

flashjh 2012-03-03 14:59

[QUOTE=aaronhaviland;291712]From the looks of things, all those errors can be tied back to the lack of definition of "BOOL" which I'm assuming is a pre-processor macro. (searching the internet...) on windows, you should probably "#include <WinDef.h>" at some point. I cannot confirm this due to a lack of a windows environment.[/QUOTE]

Thanks! I'll take a look.

flashjh 2012-03-03 16:08

[QUOTE=flashjh;291713]Thanks! I'll take a look.[/QUOTE]

Since taking a look at CUDALucas.cu, the file was edited since the last win32 build, some changes need to be made-any help is appreciated. I'll edit it later.

msft 2012-03-03 16:25

[QUOTE=apsen;291701]Got mismatch with 1.58 but had several err/increasing messages. Could be bad handling of that...[/QUOTE]
[code]
err = 0.371603, increasing n from 1572864
continuing work from a partial result
err = 0.371603, increasing n from 1572864
continuing work from a partial result
err = 0.371603, increasing n from 1572864
continuing work from a partial result
err = 0.371603, increasing n from 1572864
continuing work from a partial result
Iteration 27390000 M( 29198173 )C, 0x0ae3a28bd9f1003c, n = 1572864, CUDALucas v1.58 (1:29 real, 8.9455 ms/iter, ETA 4:28:21)
[/code]
Interesting.
Why exit infinity loop ?

Dubslow 2012-03-03 21:51

What version is this?[code]bill@Gravemind:~/CUDALucas∰∂ CUDALucas -v
CUDALucas version information:
$Id: MacLucasFFTW.c,v 8.1 2007/06/23 22:33:35 wedgingt Exp $ wedgingt@acm.org
^C^C

^C[/code]Note that ^C doesn't have an effect. The tarball in the directory says 1.3.

flashjh 2012-03-03 21:53

[QUOTE=Dubslow;291775]What version is this?[code]bill@Gravemind:~/CUDALucas∰∂ CUDALucas -v
CUDALucas version information:
$Id: MacLucasFFTW.c,v 8.1 2007/06/23 22:33:35 wedgingt Exp $ wedgingt@acm.org
^C^C

^C[/code]Note that ^C doesn't have an effect. The tarball in the directory says 1.3.[/QUOTE]

^C works on newer versions :smile:

Batalov 2012-03-03 21:54

Is 1.64 a good bet for a (wink-wink-nudge-nudge) test?
Or is there a more recommended version (but with 3M FFT size)?

flashjh 2012-03-03 21:59

[QUOTE=Batalov;291777]Is 1.64 a good bet for a (wink-wink-nudge-nudge) test?
Or is there a more recommended version (but with 3M FFT size)?[/QUOTE]


1.64 has completed a successful test for me and that is saying a lot. It went thru multiple ^c and restarts. It's a *little* slower (like 3.8% on my 580) than 1.63, but it also did not force close.

I'd say you're good-to-go. (Of course, if you don't want to run it, you can pass it to me :wink:)

Batalov 2012-03-03 22:08

Fat chance, man. :devil:

I am checking the first residues on another computer to have an early warning is something is afoul. (The first and last iterations are the most dangerous because of initial non-randomness. I am sure that this has been taken care of by the smart code, but I'll run checks anyway.)

flashjh 2012-03-03 22:11

[QUOTE=Batalov;291781]Fat chance, man. :devil:[/QUOTE]

Always worth a shot.

Dubslow 2012-03-03 22:13

I was asking about version, because I'd be willing to run the power-of-2 FFT on my 460, if necessary. Whatever version I have (still have no clue which it is) is already compiled and ready to go (been sitting idle for the last few months). Such would be necessary if 1.64 and MPrime disagree.

chalsall 2012-03-03 22:22

[QUOTE=Batalov;291781]Fat chance, man. :devil:[/QUOTE]

A quick quiz...

You're a Geek if it's Saturday Night, and you're more interested in a possible MP than the moist female making eyes beside you...

True, or false? :wink:

bcp19 2012-03-03 22:48

[QUOTE=chalsall;291791]A quick quiz...

You're a Geek if it's Saturday Night, and you're more interested in a possible MP than the moist female making eyes beside you...

True, or false? :wink:[/QUOTE]

I only read the forum while waiting for the screen to load a new area while playing Star Wars.

msft 2012-03-04 11:10

[QUOTE=Dubslow;291775]What version is this?[code]bill@Gravemind:~/CUDALucas∰∂ CUDALucas -v
CUDALucas version information:
$Id: MacLucasFFTW.c,v 8.1 2007/06/23 22:33:35 wedgingt Exp $ wedgingt@acm.org
^C^C

^C[/code]Note that ^C doesn't have an effect. The tarball in the directory says 1.3.[/QUOTE]
Please.
$ CUDALucas 216091

msft 2012-03-04 11:51

[QUOTE=apsen;291701]Got mismatch with 1.58 but had several err/increasing messages. Could be bad handling of that...[/QUOTE]
restart from check point file was SW error.
residues mismatch was HW error.(I guess)

Dubslow 2012-03-04 19:31

[QUOTE=msft;291854]Please.
$ CUDALucas 216091[/QUOTE]

Ah yes, now I remember why I gave up on CUDALucas running on Linux...
[code]bill@Gravemind:~/CUDALucas∰∂ CUDALucas 216091
CUDALucas: Could not find a checkpoint file to resume from
device_number >= device_count ... exiting[/code]

I never did figure this out. I have 270.xx drivers with a GTX 460; mfaktc 0.17 is running fine. (0.18 requires later drivers (like 285.xx), but for the life of me I have not been able to get any new drivers to properly install.)

msft 2012-03-05 03:59

[QUOTE=flashjh;291634]
Only problem so far is I get [B]random[/B] force closes... it's like before when it would finish and force close before it would start the next exponent, only now it's doing it randomly.

msft, any ideas what could be causing this? I didn't get any errors or warning during compile?[/QUOTE]
Sorry I don't understand with my poor english.
Please more information(log or ...).

flashjh 2012-03-05 04:08

Yay!

[CODE]
Processing result: M( 26206723 )C, 0x19fb046af3bf4bce, n = 1572864, CUDALucas v1.64
LL test successfully completes double-check of M26206723
[/CODE]

LaurV 2012-03-05 04:46

I saw I had 3 results this morning before leaving for job, two from 1.64 and one from 1.63, but could not report them (server was down, as everybody already know). No idea if they matched, and unfortunately, they will have to wait about 6 hours more, till I can reach that computer again. I will keep you informed about the result, in 6 hours.

Dubslow 2012-03-05 05:18

When PrimeNet's down, you could use Mersenne-aries, because it has everything up to at least the day before.

flashjh 2012-03-05 05:56

[QUOTE=Dubslow;291936]When PrimeNet's down, you could use Mersenne-aries, because it has everything up to at least the day before.[/QUOTE]

Good point

LaurV 2012-03-05 06:26

[QUOTE=Dubslow;291936]When PrimeNet's down, you could use Mersenne-aries, because it has everything up to at least the day before.[/QUOTE]
Too late now. I will know it next time when we find another prime and the server will be down :P

Brain 2012-03-05 16:42

Perfect, again
 
[CODE]Processing result: M( 29217869 )C, 0x1b92a9eb0e596b54, n = 1572864, CUDALucas v1.61
LL test successfully completes double-check of M29217869[/CODE]Switching to 1.64.

apsen 2012-03-05 16:51

1 Attachment(s)
[QUOTE=apsen;291701]Got mismatch with 1.58 but had several err/increasing messages. Could be bad handling of that...[/QUOTE]

I have rerun 29198173 with 1.63 and got another mismatch. BTW no err messages on 1.63 side and residues matched between 1.58 and 1.63 up to the point of the first 1.58 err message.

Running 29027371 with 1.63 now. ETA 33 hours. Will rerun 29198173 with older version once 29027371 completes.

apsen 2012-03-05 16:57

[QUOTE=aaronhaviland;291712]From the looks of things, all those errors can be tied back to the lack of definition of "BOOL" which I'm assuming is a pre-processor macro. (searching the internet...) on windows, you should probably "#include <WinDef.h>" at some point. I cannot confirm this due to a lack of a windows environment.[/QUOTE]

Agree on the BOOL issue. You need to be careful with includes so winsocks do not get included as it will prevent CudaLucas.cu from compiling. I believe BOOL is defined as int so you could just change the signature correspondingly.

I'll see what is the best way to fix that when I get to my VS computer.

apsen 2012-03-05 20:23

[QUOTE=apsen;291995]
I'll see what is the best way to fix that when I get to my VS computer.[/QUOTE]

Actually that code does not need ifdef. You could compile it using unix semantics.

Brain 2012-03-06 05:44

It is moving
 
[QUOTE=Brain;291986]Switching to 1.64.[/QUOTE]
Ehm, is it getter faster for you, too?
GTX 560 Ti @ clock speeds, 4.1/2.1 exe:
[CODE]1.58: M( 28652753 ), n = 1572864, CUDALucas v1.58: 6.03ms/Iter
1.61: M( 29217869 ), n = 1572864, CUDALucas v1.61: 5.73ms/Iter
1.64: M( 29013107 ), n = 1572864, CUDALucas v1.64: 5.08ms/Iter[/CODE]I thought 1.64 had 2% slowdown...
I like it. :wink:

LaurV 2012-03-06 05:48

[QUOTE=Brain;292075]I thought 1.64 had 2% slowdown...
:wink:[/QUOTE]
Only with the dirty switch. I love it WITH the switch, in fact... I prefer slower and safer.

LaurV 2012-03-06 11:25

[CODE]
.
LL test successfully completes double-check of M26177689
LL test successfully completes double-check of M26128457[/CODE]So, version 1.64 is VERY close to what means a good and stable version :D

lycorn 2012-03-07 12:13

Reporting results
 
Sorry for this dumb question.
I´m running Cudalucas for the first time: everything fine, smooth and all, but... what will be the correct way to report the result? I mean, does Cudalucas write the result to some text file that can then be uploaded through the server Manual Pages, or...
In any case I think the result will have to be manually uploaded, and no security code is in place to avoid faking results, isn´t it?

LaurV 2012-03-07 14:26

[QUOTE=lycorn;292187] does Cudalucas write the result to some text file that can then be uploaded through the server Manual Pages, or...
In any case I think the result will have to be manually uploaded, and no security code is in place to avoid faking results, isn´t it?[/QUOTE]
Older versions used to create/update a file called mersarch.txt. With the newer versions you should look for a plain text result.txt. Every time an expo is finished, the last residue is appended to the file. You can edit the file (I always do, to add some info like date found etc). The new lines added by CudaLucas you have to copy and paste into mersenne.org report form. Be careful to be logged in, otherwise your results will end reported as anonymous and you won't be credited.

lycorn 2012-03-07 18:16

Thanks a bunch.

msft 2012-03-08 01:01

CUDALucas1.58.cuda4.0.sm_20.WIN64.29198173.log
[code]
Iteration 8160000 M( 29198173 )C, 0xd3d8e5229c0ef595, n = 1572864, CUDALucas v1.58 (1:06 real, 6.6116 ms/iter, ETA 38:37:22)
Iteration 8170000 M( 29198173 )C, 0x191e1edf48d3246acontinuing work from a partial result
Iteration 8220000 M( 29198173 )C, 0x8b29efc1039a9f62, n = 1572864, CUDALucas v1.58 (1:50 real, 11.0380 ms/iter, ETA 64:17:47)
[/code]
CUDALucas1.63.cuda4.1.sm_20.WIN64.29198173.log
[code]
Iteration 8160000 M( 29198173 )C, 0xd3d8e5229c0ef595, n = 1572864, CUDALucas v1.63 (0:53 real, 5.2994 ms/iter, ETA 30:57:26)
Iteration 8170000 M( 29198173 )C, 0x191e1edf48d3246a, n = 1572864, CUDALucas v1.63 (0:53 real, 5.2998 ms/iter, ETA 30:56:41)
Iteration 8180000 M( 29198173 )C, 0x38a7e6a8dbc3b325, n = 1572864, CUDALucas v1.63 (0:53 real, 5.2994 ms/iter, ETA 30:55:41)
Iteration 8190000 M( 29198173 )C, 0x850cdcbc97f98d2a, n = 1572864, CUDALucas v1.63 (0:53 real, 5.2993 ms/iter, ETA 30:54:45)
Iteration 8200000 M( 29198173 )C, 0xe6cd792c103f8994, n = 1572864, CUDALucas v1.63 (0:53 real, 5.2995 ms/iter, ETA 30:53:56)
Iteration 8210000 M( 29198173 )C, 0xdd1b0a482e81a352, n = 1572864, CUDALucas v1.63 (0:53 real, 5.2992 ms/iter, ETA 30:52:57)
Iteration 8220000 M( 29198173 )C, 0x9652c73462421f58, n = 1572864, CUDALucas v1.63 (0:53 real, 5.2994 ms/iter, ETA 30:52:08)
[/code]
My Ver1.58
[code]
Iteration 8160000 M( 29198173 )C, 0xd3d8e5229c0ef595, n = 1572864, CUDALucas v1.58 (1:28 real, 8.7039 ms/iter, ETA 50:50:43)
Iteration 8170000 M( 29198173 )C, 0x191e1edf48d3246a, n = 1572864, CUDALucas v1.58 (1:27 real, 8.7058 ms/iter, ETA 50:49:55)
Iteration 8180000 M( 29198173 )C, 0x38a7e6a8dbc3b325, n = 1572864, CUDALucas v1.58 (1:27 real, 8.7038 ms/iter, ETA 50:47:47)
Iteration 8190000 M( 29198173 )C, 0x850cdcbc97f98d2a, n = 1572864, CUDALucas v1.58 (1:27 real, 8.7049 ms/iter, ETA 50:46:43)
Iteration 8200000 M( 29198173 )C, 0xe6cd792c103f8994, n = 1572864, CUDALucas v1.58 (1:27 real, 8.7074 ms/iter, ETA 50:46:08)
Iteration 8210000 M( 29198173 )C, 0xdd1b0a482e81a352, n = 1572864, CUDALucas v1.58 (1:27 real, 8.7073 ms/iter, ETA 50:44:40)
Iteration 8220000 M( 29198173 )C, 0x9652c73462421f58, n = 1572864, CUDALucas v1.58 (1:27 real, 8.7054 ms/iter, ETA 50:42:31)
[/code]
roundoff error check per 100 iterations.
can not find all round off error.

apsen 2012-03-08 13:52

[QUOTE=msft;292278]CUDALucas1.58.cuda4.0.sm_20.WIN64.29198173.log
CUDALucas1.63.cuda4.1.sm_20.WIN64.29198173.log
My Ver1.58

roundoff error check per 100 iterations.
can not find all round off error.[/QUOTE]

Did you get a mismatch? Or is it still running?

aaronhaviland 2012-03-08 15:26

Possibly not a round off error: I've found a few instances where CUDALucas uses cudaMalloc / malloc of a size larger than the data that is put there. You may need to call cudaMemset on g_x before copying x to it.

[CODE] v1.57 v1.64
size of cudaMalloc g_x 1.5*n + 2*STRIDE 1.25*n
size of memcpy x->g_x 1*n 1*n
[/CODE]I have not checked and of your other arrays to see if they have similar issues. They might need cudaMemset as well? I had a similar problem in gpuLucas, and am currently clearing all the arrays on initialization just to be safe.

msft 2012-03-09 04:11

[QUOTE=aaronhaviland;292330]I have not checked and of your other arrays to see if they have similar issues. They might need cudaMemset as well? I had a similar problem in gpuLucas, and am currently clearing all the arrays on initialization just to be safe.[/QUOTE]
Data initialization is no problem.
Roundoff error was HW issue.

apsen 2012-03-09 17:08

[QUOTE=apsen;291992]I have rerun 29198173 with 1.63 and got another mismatch. BTW no err messages on 1.63 side and residues matched between 1.58 and 1.63 up to the point of the first 1.58 err message.

Running 29027371 with 1.63 now. ETA 33 hours. Will rerun 29198173 with older version once 29027371 completes.[/QUOTE]

Got mismatch for 29027371 with 1.63. :sad:

Batalov 2012-03-09 19:21

Got another match with 1.64.

kladner 2012-03-09 21:23

[QUOTE=Batalov;292443]Got another match with 1.64.[/QUOTE]

Have you gotten enough matches to at least entertain the idea that 1.64 is well behaved?

Batalov 2012-03-09 22:47

Just these two (26059867 and the "M48"). I am not interested enough to make it a regular activity, I was only interested to see if this program works. My plan is to run gpuLucas on the same two numbers, but later, because the end result is somewhat underwhelmingly predictable.:smile:

msft 2012-03-10 00:59

[QUOTE=apsen;292433]Got mismatch for 29027371 with 1.63. :sad:[/QUOTE]
Ver 1.64 have -s and -t option,Please try.
[code]
$ ./CUDALucas
Usage: ./CUDALucas [-d device_number] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] -r|exponent|input_filename
-f set fft length
-s save all checkpoint files
-t check round off error all iterations,when err > 0.49 exit
[/code]

kladner 2012-03-10 02:58

[QUOTE=Batalov;292464]Just these two (26059867 and the "M48"). I am not interested enough to make it a regular activity, I was only interested to see if this program works. My plan is to run gpuLucas on the same two numbers, but later, because the end result is somewhat underwhelmingly predictable.:smile:[/QUOTE]

I look forward to those results, when they happen.

LaurV 2012-03-10 03:11

I re-ran a series of DC exponents which gave me bad residues in the past (sorry for poaching, they were already re-assigned). I have another 2 matches (totally 4) using v1.64 with -s and -t. Seems to be stable and "well behaved". I am trying 3 expos in the first-time-LL (45M+) front, I will not report the results before DC-ing them with P95 to make sure. CudaLucas v1.64, ~10 hours to go for the first two.

By the way, I found a small cosmetic bug: when -c/-s/-t switches are paired together with -r, CL will crash. There should be a small description, or different pairing of the square brackets in the command line help, or better an improved implementation of the command line parser...

LaurV 2012-03-10 08:01

replying to myself... thats a bit odd.. :D

I was quite optimistic... I just reached home after work (my working Saturday today) and found:

[CODE]>cl1644120w64 -d 0 -c 250000 -s -t 45130601
...<snip>...
Iteration 33750000 M( 45130601 )C, ....
err = 0.499978,round off err exiting.
[/CODE]it seems to be reproducible, and I would try to insulate it, by calling CL with -c 1k (it was called with -c 250k -s -t).

Two observations came out from this:

First, you can see the utility of -t, as I said before, better safe then sorry. The safety highly compensates for slowness, imagine I would have to repeat whole 33M iterations again... Now the only last 250k (max) would need to be repeated, which is in fact A GAIN OF SPEED, despite the 2% penalty. I love -t! :D I would like to have it setable (like 0.45, 0.37, whatever I like, parameter of the -t switch?).

And the second observation: [B]would it be possible to spit out the iteration number and eventually save a checkpoint file with previous residue when this round off error happens[/B]? This would save me from re-running the last 250k (or 500k, or 1M) iterations to insulate the error. I can not use lower values for -c, because the harddisk space will be eaten up very fast by the save files, so 250k is a perfect value for a residue-comparison process, but in case of roundoff error, it would take longer to insulate it or repeat the test.

Dubslow 2012-03-10 08:13

Regarding what is safe round off error -- when testing which FFT size to use, Prime95 requires that the average round off be less than .24 (roughly), NOT less than .49. Here's what George had to say:
[url]http://www.mersenneforum.org/showpost.php?p=291942&postcount=102[/url]

That would presumably prevent errors like LaurV just encountered.

LaurV 2012-03-10 08:19

George reffers to the AVERAGE roundoff error (for 100 or 1k iterations, or so). Which indeed would need to be much lower to catch the spikes going over 0.5. If you check it at every iteration (what -t is doing) then comparing it with 0.5 would be enough. But we skeptic being, to set it lower we would like... (paraphrasing master Yoda..)

msft 2012-03-10 08:43

[QUOTE=LaurV;292485]By the way, I found a small cosmetic bug: when -c/-s/-t switches are paired together with -r, CL will crash. There should be a small description, or different pairing of the square brackets in the command line help, or better an improved implementation of the command line parser...[/QUOTE]
Please more information.With linux is OK.

msft 2012-03-10 09:30

The roundoff error have two causes, fft length or HW error.
HW error need down clock.

Brain 2012-03-10 09:38

Perfect, again
 
Still no mismatch here:
[CODE]Processing result: M( 29013107 )C, 0xd9e76769f7b81b52, n = 1572864, CUDALucas v1.64
LL test successfully completes double-check of M29013107[/CODE]

msft 2012-03-10 10:02

[QUOTE=apsen;292326]Did you get a mismatch? Or is it still running?[/QUOTE]
[code]
M( 29198173 )C, 0x6fd7e4d6557f5b77, n = 1572864, CUDALucas v1.58
[/code]
correct.

flashjh 2012-03-10 15:06

I had a match but the assignment [URL="http://www.mersenneforum.org/showthread.php?p=292493#post292493"]had already been turned in[/URL]. Good news is the original LL was bad because my 1.64 matched David's CUDALucas run.

M( 26002063 )C, 0x1c5e4ca283b033__, n = 1572864, CUDALucas v1.64

Prime95 2012-03-10 15:12

[QUOTE=LaurV;292500]Which indeed would need to be much lower to catch the spikes going over 0.5. If you check it at every iteration (what -t is doing) then comparing it with 0.5 would be enough.[/QUOTE]

This is not correct. A roundoff error of 0.49 is harmless, but a roundoff error of 0.51 is deadly. The problem is the program will correctly report both as 0.49.

So if CUDALucas reports a round off error of 0.49, how confident are you that it really wasn't a deadly roundoff of 0.51??? This is why PFGW aborts (actually switches to a larger FFT length) when the roundoff error exceeds 0.45. Prime95 retries the iteration if the roundoff exceeds 0.40.

LaurV 2012-03-10 17:02

[QUOTE=Prime95;292511]This is not correct. A roundoff error of 0.49 is harmless, but a roundoff error of 0.51 is deadly. The problem is the program will correctly report both as 0.49.

So if CUDALucas reports a round off error of 0.49, how confident are you that it really wasn't a deadly roundoff of 0.51??? This is why PFGW aborts (actually switches to a larger FFT length) when the roundoff error exceeds 0.45. Prime95 retries the iteration if the roundoff exceeds 0.40.[/QUOTE]
That was EXACTLY what I was talking about. You may not get that if only read post 931, but please read careful my post 929 [edit: the first observation, last part].

msft 2012-03-10 22:20

[QUOTE=Prime95;292511]So if CUDALucas reports a round off error of 0.49, how confident are you that it really wasn't a deadly roundoff of 0.51??? This is why PFGW aborts (actually switches to a larger FFT length) when the roundoff error exceeds 0.45. Prime95 retries the iteration if the roundoff exceeds 0.40.[/QUOTE]

[code]
Ver 1.64
default:
if((iteration % 100) == 0 || iteration < 1000)
if(roundoff > 0.35)
increasing fft length

-t option:
if(roundoff > 0.49)
exit program
else if(roundoff > 0.35)
increasing fft length
[/code]
[code]
if(roundoff > 0.49)
exit program
[/code]
this is experimental code.

flashjh 2012-03-11 02:48

Another good one
 
Another 1.64 success
[CODE]
Processing result: M( 26134351 )C, 0xb9d6a5672486c791, n = 1572864, CUDALucas v1.64
LL test successfully completes double-check of M26134351

[/CODE]

kladner 2012-03-11 03:50

[QUOTE=flashjh;292579]Another 1.64 success
[CODE]
Processing result: M( 26134351 )C, 0xb9d6a5672486c791, n = 1572864, CUDALucas v1.64
LL test successfully completes double-check of M26134351

[/CODE][/QUOTE]

Encouraging.

apsen 2012-03-11 18:35

[QUOTE=msft;292504][code]
M( 29198173 )C, 0x6fd7e4d6557f5b77, n = 1572864, CUDALucas v1.58
[/code]
correct.[/QUOTE]

That does not match first time test. I guess I better rerun it with P95.

flashjh 2012-03-11 19:29

[QUOTE=apsen;292627]That does not match first time test. I guess I better rerun it with P95.[/QUOTE]

You should submit the result to PrimeNet, it may be correct.

LaurV 2012-03-12 03:38

I finished first-time-LL for 45130601 and 4520386. The tests were done with CL1.64, with -s and -t, so the intermediate residues and all checkpoint files (every 250k iterations) are available if someone wants to do the double check with p95. When (and if) my cores become less loaded, I would attempt a DC with P95 by myself, but this will not be the coming weeks.

Currently, I am testing another expo in the same range (45221537) with 2 cards in the same time, no overclocking. This to be sure if CL.1.64 is "reliable" in the 45M range area (in fact, this is more a test of the fact that the "cheap" gtx580 with 1.5Gig memory which I use are "reliable" from the hardware point of view, at factory speed 782MHz, they shall get the same results, no matter if the software is mathematically correct or not). Up to now, 19M iterations done on both (they have about the same speed, one is a bit slower maybe because it is used as primary display(?!)) and both residues are matching.

edit: Roughly 40 hours to go, I don't use -s and -t, in fact this is the idea, to see how reliable is without checking every iteration, but I am saving the checkpoints (using my batch file posted before) every 30 minutes, in case there will be a mismatch, to avoid starting everything from the beginning. Without -t switch, CL is faster, as discussed before.

Anyhow, if two copies are testing the same exponent (in two different folders) then [B]-s can not be used[/B], as they will try writing the [B]SAME[/B] checkpoint files. The idea with the "backup" subfolder was to have it [B]in the current folder[/B], and not in the root of the disk... Like in ".\backup\......." and not "c:\backup\....." Anyhow, you could argue that no one will test the same expo with more copies of CL in the same time, but in the case you re-test the same expo later using -s, the chechpoint files will be overwritten too... Why not let the user to customize the output path?

Brain 2012-03-12 05:33

Responsibility
 
[QUOTE=James Heinrich;292430]I just started experimenting with CUDAlucas yesterday. First impressions: it uses zero CPU, but the GPU usage is more aggressive than mfaktc. Normal Windows usage is fine, I can't watch even DVD-quality video smoothly with CUDAlucas whereas it's only 1080 video I have to switch mfaktc off for. Most likely I'll go back to mfaktc, partly for usability, but also because the extra two cores don't scale so well with the new AVX cores in Prime95 (iteration times when running 6 workers are significantly slower than 4 workers).[/QUOTE]
I cannot even run low res playback with 1.64. Because of lags / bad responsiveness. I suggest - again - a command line switch: for example: --polite or --agressive where --polite would be default. This would insert an artificial CUDA wait loop where other apps (playback) have a go.

It was introduced when an unnecessary CudaMemCpy was killed.

Karl M Johnson 2012-03-12 06:15

[QUOTE=Brain;292671]I cannot even run low res playback with 1.64. Because of lags / bad responsiveness. I suggest - again - a command line switch: for example: --polite or --agressive where --polite would be default. This would insert an artificial CUDA wait loop where other apps (playback) have a go.

It was introduced when an unnecessary CudaMemCpy was killed.[/QUOTE]
Or a CL option to control threads and blocks. This way, it's up to the user to decide whether to run at max performance or at some gpu-idle state.

msft 2012-03-12 06:58

[QUOTE=LaurV;292662]Anyhow, if two copies are testing the same exponent (in two different folders) then [B]-s can not be used[/B], as they will try writing the [B]SAME[/B] checkpoint files. The idea with the "backup" subfolder was to have it [B]in the current folder[/B], and not in the root of the disk... Like in ".\backup\......." and not "c:\backup\....." Anyhow, you could argue that no one will test the same expo with more copies of CL in the same time, but in the case you re-test the same expo later using -s, the chechpoint files will be overwritten too... Why not let the user to customize the output path?[/QUOTE]
It is bug with Windows.
Can someone fix this bug?

msft 2012-03-12 07:01

[QUOTE=Brain;292671]I cannot even run low res playback with 1.64. Because of lags / bad responsiveness. I suggest - again - a command line switch: for example: --polite or --agressive where --polite would be default. This would insert an artificial CUDA wait loop where other apps (playback) have a go.

It was introduced when an unnecessary CudaMemCpy was killed.[/QUOTE]
It is mean,CudaMemCpy fix bad responsiveness with Windows?

LaurV 2012-03-12 07:14

[QUOTE=msft;292682]It is bug with Windows.
Can someone fix this bug?[/QUOTE]
I never heard of such windows bug, therefore I went to have a look into CL1.64 source, in Cudalucas.cu at line 1330 we can see:

[CODE]#ifdef linux
mode_t mode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
if (mkdir ("./backup", mode) != 0)
printf ("mkdir: cannot create directory `backup': File exists\n");
#else
if (_mkdir ("\\backup") != 0)
printf ("mkdir: cannot create directory `backup': File exists\n");
#endif
[/CODE]which, when no linux, it will create a backup folder in the ROOT of the current windows disk. This isn't what was intended, is it?
[edit: for Jerry or other builders, the double backslash should be eliminated in front of backup, to create the backup dir as a subfolder of the current folder. It also appears at line 1079, when written.]

Brain 2012-03-12 08:41

[QUOTE=msft;292683]It is mean,CudaMemCpy fix bad responsiveness with Windows?[/QUOTE]
CudaMemCpy was removed by ethan in his 1.3
--> GPU usage went from 97% to 99% (good, faster)
--> But same time, displays became laggy (not good)

Suggestion: Reduce GPU usage a little bit to allow other apps to access device.

Formerly, the unneeded CudaMemCpy did this waiting...

LaurV 2012-03-12 09:07

[QUOTE=Brain;292693]CudaMemCpy was removed by ethan in his 1.3
--> GPU usage went from 97% to 99% (good, faster)
--> But same time, displays became laggy (not good)
Suggestion: Reduce GPU usage a little bit to allow other apps to access device.
Formerly, the unneeded CudaMemCpy did this waiting...[/QUOTE]
This happens in the reverse way when CL jumped from 1.58 to 1.63 (I don't know exactly where 1.61 falls):
-- GPU went from 99% to 92-95% (GTX580, Tesla) - the "a bit slower" was not visible due to better fft size, changing some tempvars from double to float, etc, compensatory stuff, which in fact made CL 1.63 and 1.64 faster (letting apart the -t switch, which slows things down 2-3 percents, we are talking about speed comparison without -t).
-- And same time the computer (display, cpu) become more responsive.

Running with -t will lower the speed more, but in the same time the GPU will be less busy (I believe that checking the errors at every iteration make the GPU to wait longer), like 88% instead of 95%, or even lower, 83%, the computer will be even more responsive and the process will be safer, not talking about the consumed energy and the produced heat, which would also be lower.

So, why won't you try to use -t? For me it works nice with it. If this "GPU-busy percent" could be parametrized, it could be even better.

msft 2012-03-12 10:11

Do you like?
[code]
cudalucas.1.65$ ./CUDALucas
Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] [-agressive] -r|exponent|input_filename
-threads set threads number(default=256)
-f set fft length
-s save all checkpoint files
-t check round off error all iterations
[/code]

LaurV 2012-03-12 10:17

[QUOTE=msft;292703]Do you like?
[code]
cudalucas.1.65$ ./CUDALucas
Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] [-agressive] -r|exponent|input_filename
-threads set threads number(default=256)
-f set fft length
-s save all checkpoint files
-t check round off error all iterations
[/code][/QUOTE]
yeaaa... (where the hack is the salivating smiley???)
:smile:

edit: well, [-s folder] would sound perfect... :razz:

Karl M Johnson 2012-03-12 12:08

[QUOTE=msft;292703]Do you like?
[code]
cudalucas.1.65$ ./CUDALucas
Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] [-agressive] -r|exponent|input_filename
-threads set threads number(default=256)
-f set fft length
-s save all checkpoint files
-t check round off error all iterations
[/code][/QUOTE]
Love it!

msft 2012-03-12 13:21

1 Attachment(s)
Ver 1.65
1) change behavior round off error
iterations < 1000:increasing fft length
iterations >= 1000:exit program
2) print maxerror
3) change -s option
4) add -agressive option
5) add -threads option
[code]
cudalucas.1.65$ ./CUDALucas
Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s folder] [-t] [-agressive] -r|exponent|input_filename
-threads set threads number(default=256)
-f set fft length
-s save all checkpoint files
-t check round off error all iterations
-agressive GPU agressive(default polite)
cudalucas.1.65$ ./CUDALucas -threads 1024 -r
DEVICE:0------------------------
name GeForce GTX 460
~~~
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.65 err = 0.04723 (0:20 real, 1.9987 ms/iter, ETA 3:51:51)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.65 err = 0.03019 (0:39 real, 3.9262 ms/iter, ETA 14:40:07)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.65 err = 0.09749 (0:54 real, 5.3697 ms/iter, ETA 31:17:36)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.65 err = 0.1996 (1:03 real, 6.2895 ms/iter, ETA 41:57:54)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.65 err = 0.01873 (1:17 real, 7.7218 ms/iter, ETA 55:39:40)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.65 err = 0.02155 (1:26 real, 8.6305 ms/iter, ETA 72:51:20)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.65 err = 0.1181 (1:27 real, 8.6291 ms/iter, ETA 78:04:10)
err = 0.441193, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.65 err = 0.1117 (1:35 real, 9.4234 ms/iter, ETA 97:13:04)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.65 err = 0.1871 (1:50 real, 10.9708 ms/iter, ETA 129:54:47)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.65 err = 0.2798 (1:50 real, 10.9809 ms/iter, ETA 131:27:57)
[/code]

flashjh 2012-03-12 13:47

v1.65 x64 binaries (untested)
 
1 Attachment(s)
[QUOTE=msft;292729]Ver 1.65
1) change behavior round off error
iterations < 1000:increasing fft length
iterations >= 1000:exit program
2) print maxerror
3) change -s option
4) add -agressive option
5) add -threads option
[code]
cudalucas.1.65$ ./CUDALucas
Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s folder] [-t] [-agressive] -r|exponent|input_filename
-threads set threads number(default=256)
-f set fft length
-s save all checkpoint files
-t check round off error all iterations
-agressive GPU agressive(default polite)
cudalucas.1.65$ ./CUDALucas -threads 1024 -r
DEVICE:0------------------------
name GeForce GTX 460
~~~
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.65 err = 0.04723 (0:20 real, 1.9987 ms/iter, ETA 3:51:51)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.65 err = 0.03019 (0:39 real, 3.9262 ms/iter, ETA 14:40:07)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.65 err = 0.09749 (0:54 real, 5.3697 ms/iter, ETA 31:17:36)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.65 err = 0.1996 (1:03 real, 6.2895 ms/iter, ETA 41:57:54)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.65 err = 0.01873 (1:17 real, 7.7218 ms/iter, ETA 55:39:40)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.65 err = 0.02155 (1:26 real, 8.6305 ms/iter, ETA 72:51:20)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.65 err = 0.1181 (1:27 real, 8.6291 ms/iter, ETA 78:04:10)
err = 0.441193, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.65 err = 0.1117 (1:35 real, 9.4234 ms/iter, ETA 97:13:04)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.65 err = 0.1871 (1:50 real, 10.9708 ms/iter, ETA 129:54:47)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.65 err = 0.2798 (1:50 real, 10.9809 ms/iter, ETA 131:27:57)
[/code][/QUOTE]


Attached v1.65 x64 binaries (untested): [LIST][*]CUDA 4.0 / SM 2.0[*]CUDA 4.1 / SM 2.0[*]CUDA 4.1 / SM 2.1[/LIST]EDIT: Just tried running 1.65 4.1 | 2.0 and it quit right after displaying the inital startup stuff. I switched back to 1.64 because I have to go to work.

Karl M Johnson 2012-03-12 15:26

[CODE]>CUDALucas.exe -d 1 -threads 512 -c 10000 -t 216091
DEVICE:1------------------------
name GeForce GTX 480
totalGlobalMem 1610612736
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
totalConstMem 65536
major.minor 2.0
clockRate 1640000
textureAlignment 512
deviceOverlap 1
multiProcessorCount 15
too small Exponent 216091
>pause
Press any key to continue . . .[/CODE]

Why?
CUDALucas no longer accepts small exponents?

LaurV 2012-03-12 18:01

[QUOTE=flashjh;292731]Attached v1.65 x64 binaries (untested): [LIST][*]CUDA 4.0 / SM 2.0[*]CUDA 4.1 / SM 2.0[*]CUDA 4.1 / SM 2.1[/LIST]EDIT: Just tried running 1.65 4.1 | 2.0 and it quit right after displaying the inital startup stuff. I switched back to 1.64 because I have to go to work.[/QUOTE]
Up and running, thanks both of you msft and Jerry. Did all possible testing combinations for threads, the fastest on gtx580 is the one with 512. The 1024 brings a small penalty, no idea why, theoretically the threads would also queue for 512, despite of the fact that there are 512 cores, they would never work all in the same time for CL only.

The -agressive switch is ok, works perfectly it brings a bit of more speed (as in 15% more!!) but the computer is less responsive (as argued before).

I love the default variant (polite), is slower, but the computer is more responsive and at least Mrs LaurV can write her mails... :P so this part of headache is gone... :smile:. So, because of that, we will ignore the fact that the spelling of aggressive is wrong :P (anyhow, "-a" would suffice too)

We started the testing of 45221537 (see discussion before) on TWO cards (cheap gtx580, 1.5G mem, 782MHz clock, no OC).

[B]We love the output format![/B] (it shows error=, real time=, eta=, 4 decimals for ms/iter, wonderful!)

[B]We love the [-s backup] switch[/B], we can arrange the things as we like in the folder now. It is working, we tried it.

[B]We love the speed[/B], we get between 4.5 and 5.1 ms/iter, without -t, and with 512 threads, with the default FFT size. That is faster then before, where we got 5.3-5.6 ms/iter, average.

[B]We don't love the fact that [/B][B]older checkpoints are not compatible with the new one[/B]. That is why the test had to be restarted from scratch (it was close to finish, maybe 10-20 hours to go, now we need to wait again about 60 hours or more). This is minor, there will be not so many cases where one will restart "old tests". If you have old tests running, better let them finish before update, if they are say, half through. Otherwise is worth to restart, v1.65 is soooo much nicer!

[B]There is not possible to test small expos anymore[/B]. We don't love that either, but thinking about the fact we only need to test big expos... well... to take the billion digit prize, say... that is satisfactory :smile:

[B][COLOR=Red]We don't love -f switch[/COLOR][/B], because we don't know what values are allowed. Some documentation would be nice, if not all values are accepted. We understand the "use it on your risk" idea, but don't like programs crashing... We tried to use random values, smaller then default, based on the idea that the error for this expo is 0.07 (for the default FFT size of 2621440 of this expo), we consider that a bit smaller FFT, one for which the error could go to 0.1 or even 0.2, or there around, will speed-up the things a dime, but all the values we tried resulted in CL crashing with "unhandled exception, please report to microsoft".

The good news are that since we started to write this mail, we got 30 rows of text in each window, and all residue matching with what we have saved in previous run with 1.64. We think we will stop one card and give her some mfaktc to do, and let only one to finish this expo.

msft 2012-03-12 19:37

[QUOTE=LaurV;292756][B][COLOR=Red]We don't love -f switch[/COLOR][/B], because we don't know what values are allowed. Some documentation would be nice, if not all values are accepted.[/QUOTE]
multiples 32768(threads=256)
multiples 65536(threads=512)
multiples 131072(threads=1024)

flashjh 2012-03-13 01:38

From CUDALucas.cu: smallest exponent is now 6,972,593

[CODE]
if (q < 6972593)
printf (" too small Exponent %d\n", q);
[/CODE]

msft 2012-03-13 03:09

[QUOTE=Karl M Johnson;292742]Why?
CUDALucas no longer accepts small exponents?[/QUOTE]
[code]
normalize2_kernel <<< N / threads / 128, 128 >>> (g_x, threads, bigAB, bigAB, g_err, g_carry, N, error_log, g_inv, g_ttp, g_ttmp, g_inv2, g_ttp2, g_ttmp2, g_inv3, g_ttp3, g_ttmp3);
[/code]
threads = 1024
1024 * 128 = 131072
131072 is min fft length.

msft 2012-03-13 03:14

[QUOTE=apsen;292627]That does not match first time test. I guess I better rerun it with P95.[/QUOTE]
[code]
Verified test results
Exponent User name Computer name Residue Date found
29198173 msft Manual testing 6FD7E4D6557F5B77 2012-03-12 00:48
29198173 msft 6FD7E4D6557F5B77 2012-03-13 02:50
[/code]
What on your mind?

Dubslow 2012-03-13 03:16

[QUOTE=msft;292827][code]
normalize2_kernel <<< N / threads / 128, 128 >>> (g_x, threads, bigAB, bigAB, g_err, g_carry, N, error_log, g_inv, g_ttp, g_ttmp, g_inv2, g_ttp2, g_ttmp2, g_inv3, g_ttp3, g_ttmp3);
[/code]
threads = 1024
1024 * 128 = 131072
131072 is min fft length.[/QUOTE]

Two suggestions:

1) Check thread count before checking exponent (so that if threads =512, you can do a 512*64K FFT, or a 256*32K FFT for 256 threads).
1b) Select total number of threads after getting exponent to test (perhaps a warning about low GPU utilization)

2) Even if 1024 threads is selected, you can just continue the test anyways (but perhaps warn the user that below a certain threshold the efficiency will massively drop).

(Obviously how these choices would interact with a manually selected FFT size or thread count would have to be figured out, but this is just to get the ball rolling.)


[QUOTE=msft;292830][code]
Verified test results
Exponent User name Computer name Residue Date found
29198173 msft Manual testing 6FD7E4D6557F5B77 2012-03-12 00:48
29198173 msft 6FD7E4D6557F5B77 2012-03-13 02:50
[/code]
What on your mind?[/QUOTE]
Were those both done on GPU, or was one done on Prime95? (When apsen first posted his reply, only one of your tests was visible, so no one was able to tell that there was a match.)

flashjh 2012-03-13 04:33

[QUOTE=flashjh;292731]EDIT: Just tried running 1.65 4.1 | 2.0 and it quit right after displaying the inital startup stuff. I switched back to 1.64 because I have to go to work.[/QUOTE]

I have everything running on 1.65 now. Who knows why it wouldn't work? I was in a hurry so I probably had a switch set wrong.

Thanks for the updates msft.

Karl M Johnson 2012-03-13 07:26

Ok, so now the first exponent to run DCs on is 6972593.
Okay:smile:
That's 2h here.

Karl M Johnson 2012-03-13 09:44

Used latest binaries, cuda 4.1, sm_20(thanks!)
[CODE]M( 6972593 )P, n = 393216, CUDALucas v1.65
[/CODE]

flashjh 2012-03-14 02:46

CL 1.65 success
 
1 Attachment(s)
[CODE]Processing result: M( 26071663 )C, 0x48620a8eaadcaeb7, n = 1572864, CUDALucas v1.65
LL test successfully completes double-check of M26071663
[/CODE]

EDIT: Attached full run .txt file with all results.

EDIT2: It's working really well now msft. Thanks everyone for all the work on this!

apsen 2012-03-14 13:50

[QUOTE=msft;292830][code]
Verified test results
Exponent User name Computer name Residue Date found
29198173 msft Manual testing 6FD7E4D6557F5B77 2012-03-12 00:48
29198173 msft 6FD7E4D6557F5B77 2012-03-13 02:50
[/code]
What on your mind?[/QUOTE]

I mean GIMPS reports this:

[CODE]Unverified LL 3F6F8AA0E00307__ by "Olaf Fiebig"[/CODE]

Brain 2012-03-14 14:53

Meanwhile, I've had another good DC with 1.64, switching to 1.65.

I looked into the 1.65 code for "ag(g)ressive" setting, line 691:
[CODE]if (!agressive_f)
cutilSafeCall (cudaMemcpy
(&l_err, g_err, sizeof (double),
cudaMemcpyDeviceToHost));[/CODE]So the implementation doesn't use a wait timer - works as before in CL 1.2.
Basically, calling a method only to do the waiting... :cmd: Nevertheless, I like it. Haven't tried a(g)gressive param yet. Will do that when 2nd GPU (GTX 680) is there.

flashjh 2012-03-14 22:52

1 Attachment(s)
[QUOTE=Brain;292995]Meanwhile, I've had another good DC with 1.64, switching to 1.65.

I looked into the 1.65 code for "ag(g)ressive" setting, line 691:
[CODE]if (!agressive_f)
cutilSafeCall (cudaMemcpy
(&l_err, g_err, sizeof (double),
cudaMemcpyDeviceToHost));[/CODE]So the implementation doesn't use a wait timer - works as before in CL 1.2.
Basically, calling a method only to do the waiting... :cmd: Nevertheless, I like it. Haven't tried a(g)gressive param yet. Will do that when 2nd GPU (GTX 680) is there.[/QUOTE]

Per request, attached CUDALucas 1.65 with x64 MAKEFILE included.


All times are UTC. The time now is 13:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.