mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

flashjh 2012-03-19 17:38

[QUOTE=Prime95;293442]I added extern "C" to make MSVC 2010 happy. Extern "C" overrides name-mangling.

Is the new version faster for you? Does it work OK?[/QUOTE]

1.66:
[CODE]
start M26193103 fft length = 1572864
Iteration 10000 M( 26193103 )C, 0x5c3d0847657d8cff, n = 1572864, CUDALucas v1.66 err = 0.02176 (0:26 real, 2.5921 ms/iter, ETA 18:51:01)
Iteration 20000 M( 26193103 )C, 0x1ef2c5a292c0fdb6, n = 1572864, CUDALucas v1.66 err = 0.02219 (0:25 real, 2.5387 ms/iter, ETA 18:27:17)
Iteration 30000 M( 26193103 )C, 0x9a07463702e8aa32, n = 1572864, CUDALucas v1.66 err = 0.02219 (0:26 real, 2.5744 ms/iter, ETA 18:42:25)
Iteration 40000 M( 26193103 )C, 0x2f16825930638d20, n = 1572864, CUDALucas v1.66 err = 0.02219 (0:27 real, 2.6960 ms/iter, ETA 19:35:00)
Iteration 50000 M( 26193103 )C, 0x41e02e29604eb893, n = 1572864, CUDALucas v1.66 err = 0.02219 (0:27 real, 2.7101 ms/iter, ETA 19:40:41)
Iteration 60000 M( 26193103 )C, 0x5609ea689ce4cf4d, n = 1572864, CUDALucas v1.66 err = 0.02219 (0:26 real, 2.5640 ms/iter, ETA 18:36:36)
[/CODE]
1.67:
[CODE]
start M26193103 fft length = 1572864
Iteration 10000 M( 26193103 )C, 0x5c3d0847657d8cff, n = 1572864, CUDALucas v1.67 err = 0.023 (0:26 real, 2.5482 ms/iter, ETA 18:31:52)
Iteration 20000 M( 26193103 )C, 0x1ef2c5a292c0fdb6, n = 1572864, CUDALucas v1.67 err = 0.023 (0:25 real, 2.5156 ms/iter, ETA 18:17:14)
Iteration 30000 M( 26193103 )C, 0x9a07463702e8aa32, n = 1572864, CUDALucas v1.67 err = 0.023 (0:24 real, 2.4494 ms/iter, ETA 17:47:56)
Iteration 40000 M( 26193103 )C, 0x2f16825930638d20, n = 1572864, CUDALucas v1.67 err = 0.023 (0:25 real, 2.5086 ms/iter, ETA 18:13:20)
Iteration 50000 M( 26193103 )C, 0x41e02e29604eb893, n = 1572864, CUDALucas v1.67 err = 0.023 (0:25 real, 2.5100 ms/iter, ETA 18:13:30)
Iteration 60000 M( 26193103 )C, 0x5609ea689ce4cf4d, n = 1572864, CUDALucas v1.67 err = 0.023 (0:25 real, 2.4933 ms/iter, ETA 18:05:48)
[/CODE]

It's faster :smile: I'll compare a full run tomorrow when this one is done.

Brain 2012-03-19 18:01

[QUOTE]Could we also have the device info when no parameter is entered and usage is printed? Helps finding the device number...[/QUOTE]Dedicated param -devices will be better...

kladner 2012-03-19 18:03

[QUOTE=Svenie25;293496]Thanks a lot.

I found my error. CL created a ini file with the number of the line where to start. I deleted thiese file and then it worked.

Again, thanks a lot.[/QUOTE]

I'm glad you found the glitch.

flashjh 2012-03-20 01:20

1 Attachment(s)
[QUOTE=msft;293444]I believe 64 bit & SM1.3 is best.[/QUOTE]




Attached v1.67 x64 binaries: [LIST][*]CUDA 3.2 / SM 1.3[/LIST]I tested, let me know how it works for you.

Polite and aggressive:
[CODE]
C:\CUDA\src>CUDALucas1.67.x64.3.2.sm13.exe -d 1 -r
DEVICE:1------------------------
name GeForce GTX 580
totalGlobalMem 1610612736
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
totalConstMem 65536
major.minor 2.0
clockRate 1600000
textureAlignment 512
deviceOverlap 1
multiProcessorCount 16
Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 8192, CUDALucas v1.67 err = 1.788e-007 (0:03 real, 0.2453 ms/iter, ETA 0:17)
Iteration 10000 M( 132049 )C, 0x4c52a92b54635f9e, n = 8192, CUDALucas v1.67 err= 0.0004272 (0:02 real, 0.2452 ms/iter, ETA 0:29)
Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 16384, CUDALucas v1.67 err = 1.144e-005 (0:02 real, 0.2508 ms/iter, ETA 0:50)
Iteration 10000 M( 756839 )C, 0x5d2cbe7cb24a109a, n = 40960, CUDALucas v1.67 err = 0.0293 (0:03 real, 0.3425 ms/iter, ETA 4:13)
Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 49152, CUDALucas v1.67 err = 0.009033 (0:04 real, 0.3565 ms/iter, ETA 4:59)
Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 73728, CUDALucas v1.67 err = 0.00618 (0:04 real, 0.4060 ms/iter, ETA 8:23)
Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 73728, CUDALucas v1.67 err = 0.08594 (0:04 real, 0.4034 ms/iter, ETA 9:16)
Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.67 err = 0.04297 (0:05 real, 0.5785 ms/iter, ETA 28:32)
Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.67 err = 0.0625 (0:06 real, 0.5912 ms/iter, ETA 29:39)
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.67 err = 0.04297 (0:10 real, 0.9328 ms/iter, ETA 1:48:12)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.67 err = 0.0293 (0:16 real, 1.5683 ms/iter, ETA 5:51:34)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.67 err = 0.08594 (0:21 real, 2.0962 ms/iter, ETA 12:12:58)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.67 err = 0.2031 (0:23 real, 2.2819 ms/iter, ETA 15:13:30)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.67 err = 0.01807 (0:27 real, 2.7415 ms/iter, ETA 19:45:41)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.67 err = 0.04736 (0:31 real, 3.0937 ms/iter, ETA 26:06:56)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.67 err = 0.2422 (0:31 real, 3.0718 ms/iter, ETA 27:47:28)
err = 0.411133, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.67 err = 0.1099 (0:33 real, 3.2419 ms/iter, ETA 33:26:44)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.67 err = 0.1953 (0:39 real, 3.8797 ms/iter, ETA 45:56:31)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.67 err = 0.2656 (0:38 real, 3.7448 ms/iter, ETA 44:50:02)
C:\CUDA\src>CUDALucas1.67.x64.3.2.sm13.exe -d 1 -aggressive -r
DEVICE:1------------------------
name GeForce GTX 580
totalGlobalMem 1610612736
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
totalConstMem 65536
major.minor 2.0
clockRate 1600000
textureAlignment 512
deviceOverlap 1
multiProcessorCount 16
Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 8192, CUDALucas v1.67 err = 1.788e-007 (0:01 real, 0.1059 ms/iter, ETA 0:07)
Iteration 10000 M( 132049 )C, 0x4c52a92b54635f9e, n = 8192, CUDALucas v1.67 err= 0.0004272 (0:01 real, 0.1060 ms/iter, ETA 0:12)
Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 16384, CUDALucas v1.67 err = 1.144e-005 (0:01 real, 0.1068 ms/iter, ETA 0:21)
Iteration 10000 M( 756839 )C, 0x5d2cbe7cb24a109a, n = 40960, CUDALucas v1.67 err = 0.0293 (0:01 real, 0.1468 ms/iter, ETA 1:48)
Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 49152, CUDALucas v1.67 err = 0.009033 (0:02 real, 0.1502 ms/iter, ETA 2:06)
Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 73728, CUDALucas v1.67 err = 0.00618 (0:02 real, 0.1762 ms/iter, ETA 3:38)
Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 73728, CUDALucas v1.67 err = 0.08594 (0:01 real, 0.1658 ms/iter, ETA 3:48)
Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.67 err = 0.04297 (0:04 real, 0.3241 ms/iter, ETA 15:59)
Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.67 err = 0.0625 (0:03 real, 0.3239 ms/iter, ETA 16:14)
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.67 err = 0.04297 (0:07 real, 0.6658 ms/iter, ETA 1:17:14)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.67 err = 0.0293 (0:12 real, 1.2409 ms/iter, ETA 4:38:09)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.67 err = 0.08594 (0:18 real, 1.7983 ms/iter, ETA 10:28:47)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.67 err = 0.2031 (0:19 real, 1.9525 ms/iter, ETA 13:01:39)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.67 err = 0.01807 (0:24 real, 2.3899 ms/iter, ETA 17:13:37)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.67 err = 0.04736 (0:27 real, 2.7081 ms/iter, ETA 22:51:39)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.67 err = 0.2422 (0:27 real, 2.7177 ms/iter, ETA 24:35:14)
err = 0.40625, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.67 err = 0.1099 (0:29 real, 2.9287 ms/iter, ETA 30:12:52)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.67 err = 0.1953 (0:33 real, 3.3846 ms/iter, ETA 40:04:46)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.67 err = 0.2656 (0:34 real, 3.3845 ms/iter, ETA 40:31:12)
[/CODE]

flashjh 2012-03-20 02:25

CUDA 3.2 sm13 is a lot faster
 
This is on a GTX 580
4.1 sm20:
[CODE]e:\cuda2\CUDALucas1.67.cuda4.1.sm_20.x64.exe -d 1 -threads 512 -c 10000 -aggressive 26193103 >> 26193103.txt
Iteration 10880000 M( 26193103 )C, 0x7ab4bd1491575cfb, n = 1572864, CUDALucas v1.67 err = 0.0232 (0:26 real, [COLOR=red]2.6770[/COLOR] ms/iter, ETA [COLOR=red]11:23:04[/COLOR])
Iteration 10890000 M( 26193103 )C, 0x04d40edbdff48f4a, n = 1572864, CUDALucas v1.67 err = 0.0232 (0:27 real, [COLOR=red]2.6778[/COLOR] ms/iter, ETA [COLOR=red]11:22:49[/COLOR])
Iteration 10900000 M( 26193103 )C, 0xb9b9207366261cbe, n = 1572864, CUDALucas v1.67 err = 0.0232 (0:27 real, [COLOR=red]2.6783[/COLOR] ms/iter, ETA [COLOR=red]11:22:31[/COLOR])
Iteration 10910000 M( 26193103 )C, 0x2738902e2e87743d, n = 1572864, CUDALucas v1.67 err = 0.0232 (0:26 real, [COLOR=red]2.6097[/COLOR] ms/iter, ETA [COLOR=red]11:04:36[/COLOR])
^C caught. Writing checkpoint.[/CODE]

3.2 sm13:
[CODE]CUDALucas1.67.cuda3.2.sm_13.x64.exe -d 1 -threads 512 -c 10000 -aggressive 26193103 >> 26193103.txt
continuing work from a partial result M26193103 fft length = 1572864 iteration = 10919002
Iteration 10920000 M( 26193103 )C, 0xa5a7b77eb9aafd24, n = 1572864, CUDALucas v1.67 err = 0.01762 (0:02 real, 0.2327 ms/iter, ETA 59:12)
Iteration 10930000 M( 26193103 )C, 0xf8b54ad25990bc15, n = 1572864, CUDALucas v1.67 err = 0.01904 (0:23 real, [COLOR=red]2.3203[/COLOR] ms/iter, ETA [COLOR=red]9:50:07[/COLOR])
Iteration 10940000 M( 26193103 )C, 0xd6d0c49220fdb2b1, n = 1572864, CUDALucas v1.67 err = 0.02002 (0:24 real, [COLOR=red]2.3264[/COLOR] ms/iter, ETA [COLOR=red]9:51:17[/COLOR])
Iteration 10950000 M( 26193103 )C, 0xa4757f98b2a34eea, n = 1572864, CUDALucas v1.67 err = 0.02002 (0:23 real, [COLOR=red]2.3349[/COLOR] ms/iter, ETA [COLOR=red]9:53:04[/COLOR])
[/CODE]

We'll see if it matches...

ixfd64 2012-03-20 02:41

It's nice to know that George is starting to work on GPU programming. :smile:

Karl M Johnson 2012-03-20 05:29

I've got a match with sm_13/cuda4.1 x64 version of latest cudalucas.
Using the same smallest expo.

flashjh 2012-03-20 13:35

1.67 Success
 
CUDALucas1.67.cuda3.2.sm_13.x64.exe -d 1 -threads 512 -c 10000 -aggressive 26193103

[CODE]Processing result: M( 26193103 )C, 0x9ad25d21f58dbda8, n = 1572864, CUDALucas v1.67
LL test successfully completes double-check of M26193103
[/CODE]

flashjh 2012-03-21 02:59

1 Attachment(s)
[QUOTE=flashjh;293318]I'm sure I'm missing something, but what is the method to choose the best FFT size? Where did you get these values?[/QUOTE]

[QUOTE=msft;293340]Hi ,flashjh[/QUOTE]

Attached cufftbench x64 binaries:

- CUDA 3.2 | SM 1.3
- CUDA 4.0 | SM 2.0
- CUDA 4.1 | SM 2.0
- CUDA 4.1 | SM 2.1

Only supports first video card.

@msft: Two things. 1) Can you incorporate the -d option in in this program without too much trouble? 2) I looked through the source and didn't see a way to specify a range; Can you incorporate a range option? Thanks.

Edit: .h files and makefiles are for compiling - not needed to run.

msft 2012-03-21 05:52

[QUOTE=Brain;293515]Dedicated param -devices will be better...[/QUOTE]
Understand.

msft 2012-03-21 05:56

[QUOTE=flashjh;293653]@msft: Two things. 1) Can you incorporate the -d option in in this program without too much trouble? 2) I looked through the source and didn't see a way to specify a range; Can you incorporate a range option? Thanks.[/QUOTE]
I'll marge to CUDALucas.
$ ./CLUDALucas -d 1 -cufftbench 1048576,2097152,65536


All times are UTC. The time now is 23:13.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.