mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

Karl M Johnson 2012-03-19 07:54

The results are always the same for 4 different modes: gpu0 cl1, gpu0 cl2, gpu1 cl1, gpu1 cl2.
[CODE]DEVICE:1------------------------
name GeForce GTX 480
totalGlobalMem 1610612736
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
totalConstMem 65536
major.minor 2.0
clockRate 1640000
textureAlignment 512
deviceOverlap 1
multiProcessorCount 15
Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 8192, CUDALucas v1.67 err = 1.901e-007 (0:02 real, 0.2024 ms/iter, ETA 0:14)
Iteration 10000 M( 132049 )C, 0x4c52a92b54635f9e, n = 8192, CUDALucas v1.67 err = 0.0004187 (0:02 real, 0.2025 ms/iter, ETA 0:24)
Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 16384, CUDALucas v1.67 err = 1.15e-005 (0:02 real, 0.2015 ms/iter, ETA 0:40)
Iteration 10000 M( 756839 )C, 0x5d2cbe7cb24a109a, n = 40960, CUDALucas v1.67 err = 0.0317 (0:03 real, 0.2481 ms/iter, ETA 3:03)
Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 49152, CUDALucas v1.67 err = 0.009213 (0:02 real, 0.2503 ms/iter, ETA 3:30)
Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 73728, CUDALucas v1.67 err = 0.006912 (0:03 real, 0.3152 ms/iter, ETA 6:30)
Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 73728, CUDALucas v1.67 err = 0.08477 (0:04 real, 0.3244 ms/iter, ETA 7:27)
Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.67 err = 0.04649 (0:05 real, 0.4984 ms/iter, ETA 24:35)
Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.67 err = 0.06791 (0:06 real, 0.5889 ms/iter, ETA 29:32)
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.67 err = 0.04772 (0:10 real, 1.0405 ms/iter, ETA 2:00:41)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.67 err = 0.0295 (0:18 real, 1.7384 ms/iter, ETA 6:29:41)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.67 err = 0.08511 (0:22 real, 2.2505 ms/iter, ETA 13:06:55)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.67 err = 0.2073 (0:26 real, 2.5972 ms/iter, ETA 17:19:44)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.67 err = 0.01915 (0:31 real, 3.0897 ms/iter, ETA 22:16:18)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.67 err = 0.02111 (0:35 real, 3.4515 ms/iter, ETA 29:08:11)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.67 err = 0.1135 (0:35 real, 3.4586 ms/iter, ETA 31:17:25)
err = 0.378309, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.67 err = 0.1061 (0:35 real, 3.4426 ms/iter, ETA 35:30:59)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.67 err = 0.1855 (0:43 real, 4.2987 ms/iter, ETA 50:54:15)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.67 err = 0.2697 (0:43 real, 4.3005 ms/iter, ETA 51:29:13)[/CODE]

msft 2012-03-19 08:10

[QUOTE=Karl M Johnson;293464]The results are always the same for 4 different modes: gpu0 cl1, gpu0 cl2, gpu1 cl1, gpu1 cl2.
[/QUOTE]
Thank you for report.

Karl M Johnson 2012-03-19 08:17

Actually, it way my mistake, since GPU2, which has no monitor output attached, and is not in SLI, was not stress tested.
I found out that it was unstable at certain clock.

Now running DC on smallest exponent again.

LaurV 2012-03-19 09:23

Any sources and binaries for v1.68? (the one with interactive aggressive/polite mode). I will be home in about 2-3 hours and I am eager to try it. Anyhow, if not, I will still keep you posted with v1.65's progress. I understand that you have other things to do too, sorry for being such a pain in the butt. :blush:

Karl M Johnson 2012-03-19 12:00

DC successful !
2^6972593 - 1 is indeed a prime:smile:

Svenie25 2012-03-19 12:37

Hi guys.

Could someone please tell me, how the inputfile for CL had to look? I tried the exponents alone and the line from the worktodo.txt of P95 but there always CL tells me to start with the first exponent and then closes.

Thanks in advance.

Karl M Johnson 2012-03-19 12:51

[CODE]CUDALucas.exe -d 1 -threads 512 -c 25000 -t -agressive 6972593[/CODE]
Run cudalucas without args to find out the meaning of commands.

LaurV 2012-03-19 12:54

version 1.67, polite and aggressive:
(still not interactively changeable)

[CODE]
CUDALucas1.67.cuda4.1.sm_20.x64.exe -d 1 -r
DEVICE:1------------------------
name GeForce GTX 580
totalGlobalMem 1610612736
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
totalConstMem 65536
major.minor 2.0
clockRate 1564000
textureAlignment 512
deviceOverlap 1
multiProcessorCount 16
Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 8192, CUDALucas v1.67 err = 1.919e-007 (0:02 real, 0.2334 ms/iter, ETA 0:16)
Iteration 10000 M( 132049 )C, 0x4c52a92b54635f9e, n = 8192, CUDALucas v1.67 err = 0.0004515 (0:02 real, 0.2340 ms/iter, ETA 0:28)
Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 16384, CUDALucas v1.67 err = 1.14e-005 (0:03 real, 0.2316 ms/iter, ETA 0:46)
Iteration 10000 M( 756839 )C, 0x5d2cbe7cb24a109a, n = 40960, CUDALucas v1.67 err = 0.0295 (0:03 real, 0.2828 ms/iter, ETA 3:29)
Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 49152, CUDALucas v1.67 err = 0.009473 (0:02 real, 0.2930 ms/iter, ETA 4:06)
Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 73728, CUDALucas v1.67 err = 0.006119 (0:04 real, 0.3601 ms/iter, ETA 7:26)
Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 73728, CUDALucas v1.67 err = 0.09116 (0:04 real, 0.3570 ms/iter, ETA 8:12)
Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.67 err = 0.04841 (0:05 real, 0.5641 ms/iter, ETA 27:49)
Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.67 err = 0.06637 (0:06 real, 0.5643 ms/iter, ETA 28:18)
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.67 err = 0.05295 (0:11 real, 1.1262 ms/iter, ETA 2:10:38)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.67 err = 0.02841 (0:19 real, 1.8848 ms/iter, ETA 7:02:30)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.67 err = 0.08614 (0:25 real, 2.4236 ms/iter, ETA 14:07:26)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.67 err = 0.216 (0:27 real, 2.6855 ms/iter, ETA 17:55:06)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.67 err = 0.01812 (0:32 real, 3.1922 ms/iter, ETA 23:00:37)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.67 err = 0.02299 (0:35 real, 3.5650 ms/iter, ETA 30:05:40)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.67 err = 0.1126 (0:36 real, 3.5962 ms/iter, ETA 32:32:08)
err = 0.384875, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.67 err = 0.1081 (0:35 real, 3.5168 ms/iter, ETA 36:16:52)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.67 err = 0.1898 (0:45 real, 4.4142 ms/iter, ETA 52:16:15)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.67 err = 0.2643 (0:41 real, 4.1197 ms/iter, ETA 49:19:18)

>CUDALucas1.67.cuda4.1.sm_20.x64.exe -d 1 -aggressive -r
DEVICE:1------------------------
name GeForce GTX 580
totalGlobalMem 1610612736
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
totalConstMem 65536
major.minor 2.0
clockRate 1564000
textureAlignment 512
deviceOverlap 1
multiProcessorCount 16
Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 8192, CUDALucas v1.67 err = 1.919e-007 (0:01 real, 0.0802 ms/iter, ETA 0:05)
Iteration 10000 M( 132049 )C, 0x4c52a92b54635f9e, n = 8192, CUDALucas v1.67 err = 0.0004515 (0:00 real, 0.0802 ms/iter, ETA 0:09)
Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 16384, CUDALucas v1.67 err = 1.14e-005 (0:01 real, 0.0792 ms/iter, ETA 0:15)
Iteration 10000 M( 756839 )C, 0x5d2cbe7cb24a109a, n = 40960, CUDALucas v1.67 err = 0.0295 (0:01 real, 0.1082 ms/iter, ETA 1:20)
Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 49152, CUDALucas v1.67 err = 0.009473 (0:02 real, 0.1181 ms/iter, ETA 1:39)
Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 73728, CUDALucas v1.67 err = 0.006119 (0:01 real, 0.1842 ms/iter, ETA 3:48)
Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 73728, CUDALucas v1.67 err = 0.09116 (0:02 real, 0.1939 ms/iter, ETA 4:27)
Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.67 err = 0.04841 (0:04 real, 0.3753 ms/iter, ETA 18:30)
Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.67 err = 0.06637 (0:04 real, 0.3770 ms/iter, ETA 18:54)
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.67 err = 0.05295 (0:08 real, 0.7606 ms/iter, ETA 1:28:13)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.67 err = 0.02841 (0:14 real, 1.4295 ms/iter, ETA 5:20:26)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.67 err = 0.08614 (0:20 real, 1.9823 ms/iter, ETA 11:33:09)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.67 err = 0.216 (0:23 real, 2.2765 ms/iter, ETA 15:11:21)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.67 err = 0.01812 (0:28 real, 2.7817 ms/iter, ETA 20:03:04)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.67 err = 0.02299 (0:31 real, 3.1177 ms/iter, ETA 26:19:07)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.67 err = 0.1126 (0:31 real, 3.1220 ms/iter, ETA 28:14:44)
err = 0.373917, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.67 err = 0.1081 (0:32 real, 3.1166 ms/iter, ETA 32:09:09)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.67 err = 0.1898 (0:39 real, 3.9440 ms/iter, ETA 46:42:13)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.67 err = 0.2643 (0:40 real, 3.9444 ms/iter, ETA 47:13:22)[/CODE]

all this time p95 was running, and cl.1.65 was crunching 26248759 DC on the second card (20 minutes to go)

kladner 2012-03-19 14:47

1 Attachment(s)
[QUOTE=Svenie25;293478]Hi guys.

Could someone please tell me, how the inputfile for CL had to look? I tried the exponents alone and the line from the worktodo.txt of P95 but there always CL tells me to start with the first exponent and then closes.

Thanks in advance.[/QUOTE]

I just tried the following:
[CODE]E:\CUDA\CUDALucas166.x64>CUDALucas1.66.cuda4.1.sm_21.x64 -t -c10000 -threads 512 -s check worktodo.txt
DEVICE:0------------------------
name GeForce GTX 460
totalGlobalMem 1073741824
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
totalConstMem 65536
major.minor 2.1
clockRate 1700000
textureAlignment 512
deviceOverlap 1
multiProcessorCount 7
mkdir: cannot create directory `check': File exists
Start test of file 'worktodo.txt'

continuing work from a partial result M26116807 fft length = 1572864 iteration = 14178
Iteration 20000 M( 26116807 )C, 0xca672378e7d6596a, n = 1572864, CUDALucas v1.66 err = 0.02349 (0:37 real, 3.6748 ms/iter, ETA 26:37:56)
Iteration 30000 M( 26116807 )C, 0x3252f697aa7b19ce, n = 1572864, CUDALucas v1.66 err = 0.02716 (1:03 real, 6.3077 ms/iter, ETA 45:41:43)
^C caught. Writing checkpoint.[/CODE]worktodo.txt had two test exponents (from my completed double-checks), see attached. I also tried it with the two exponents in reversed order, and it started with the correct one. (Note that "check" is the folder I made for checkpoint files to be saved in. That also seems to be working correctly.)

I hope this helps.

EDIT: I stated incorrectly in a previous post that the worktodo.txt in the command line would be preceded by -r. LaurV corrected this error. "-r" runs a self-test.

Svenie25 2012-03-19 15:07

Thanks a lot.

I found my error. CL created a ini file with the number of the line where to start. I deleted thiese file and then it worked.

Again, thanks a lot.

Brain 2012-03-19 16:17

Timings (best values)
 
[QUOTE=Prime95;293442]I added extern "C" to make MSVC 2010 happy. Extern "C" overrides name-mangling.

Is the new version faster for you? Does it work OK?[/QUOTE]
[CODE]1.65 polite : M( 29309279 )C, n = 1835008, CUDALucas v1.65 err = 0.009593 (1:01 real, [B]6.0932[/B] ms/iter, ETA 49:20:17)
1.67 polite : M( 29359303 )C, n = 1835008, CUDALucas v1.67 err = 0.009615 (0:57 real, [B]5.6353[/B] ms/iter, ETA 39:39:58)
1.67 aggressive: M( 29359303 )C, n = 1835008, CUDALucas v1.67 err = 0.009195 (0:53 real, [B]5.3320[/B] ms/iter, ETA 37:28:58)[/CODE][CODE]DEVICE:0------------------------
name GeForce GTX 560 Ti
totalGlobalMem 1073741824
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
totalConstMem 65536
major.minor 2.1
clockRate 1645000
textureAlignment 512
deviceOverlap 1
multiProcessorCount 8[/CODE]Could we also have the device info when no parameter is entered and usage is printed? Helps finding the device number...


All times are UTC. The time now is 23:13.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.