mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-03-19, 07:54   #1024
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3×137 Posts
Default

The results are always the same for 4 different modes: gpu0 cl1, gpu0 cl2, gpu1 cl1, gpu1 cl2.
Code:
DEVICE:1------------------------
name                GeForce GTX 480
totalGlobalMem      1610612736
sharedMemPerBlock   49152
regsPerBlock        32768
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      65535,65535,65535
totalConstMem       65536
major.minor         2.0
clockRate           1640000
textureAlignment    512
deviceOverlap       1
multiProcessorCount 15
Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 8192, CUDALucas v1.67 err = 1.901e-007 (0:02 real, 0.2024 ms/iter, ETA 0:14)
Iteration 10000 M( 132049 )C, 0x4c52a92b54635f9e, n = 8192, CUDALucas v1.67 err = 0.0004187 (0:02 real, 0.2025 ms/iter, ETA 0:24)
Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 16384, CUDALucas v1.67 err = 1.15e-005 (0:02 real, 0.2015 ms/iter, ETA 0:40)
Iteration 10000 M( 756839 )C, 0x5d2cbe7cb24a109a, n = 40960, CUDALucas v1.67 err = 0.0317 (0:03 real, 0.2481 ms/iter, ETA 3:03)
Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 49152, CUDALucas v1.67 err = 0.009213 (0:02 real, 0.2503 ms/iter, ETA 3:30)
Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 73728, CUDALucas v1.67 err = 0.006912 (0:03 real, 0.3152 ms/iter, ETA 6:30)
Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 73728, CUDALucas v1.67 err = 0.08477 (0:04 real, 0.3244 ms/iter, ETA 7:27)
Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.67 err = 0.04649 (0:05 real, 0.4984 ms/iter, ETA 24:35)
Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.67 err = 0.06791 (0:06 real, 0.5889 ms/iter, ETA 29:32)
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.67 err = 0.04772 (0:10 real, 1.0405 ms/iter, ETA 2:00:41)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.67 err = 0.0295 (0:18 real, 1.7384 ms/iter, ETA 6:29:41)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.67 err = 0.08511 (0:22 real, 2.2505 ms/iter, ETA 13:06:55)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.67 err = 0.2073 (0:26 real, 2.5972 ms/iter, ETA 17:19:44)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.67 err = 0.01915 (0:31 real, 3.0897 ms/iter, ETA 22:16:18)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.67 err = 0.02111 (0:35 real, 3.4515 ms/iter, ETA 29:08:11)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.67 err = 0.1135 (0:35 real, 3.4586 ms/iter, ETA 31:17:25)
err = 0.378309, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.67 err = 0.1061 (0:35 real, 3.4426 ms/iter, ETA 35:30:59)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.67 err = 0.1855 (0:43 real, 4.2987 ms/iter, ETA 50:54:15)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.67 err = 0.2697 (0:43 real, 4.3005 ms/iter, ETA 51:29:13)
Karl M Johnson is offline   Reply With Quote
Old 2012-03-19, 08:10   #1025
msft
 
msft's Avatar
 
Jul 2009
Tokyo

10011000102 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
The results are always the same for 4 different modes: gpu0 cl1, gpu0 cl2, gpu1 cl1, gpu1 cl2.
Thank you for report.
msft is offline   Reply With Quote
Old 2012-03-19, 08:17   #1026
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3×137 Posts
Default

Actually, it way my mistake, since GPU2, which has no monitor output attached, and is not in SLI, was not stress tested.
I found out that it was unstable at certain clock.

Now running DC on smallest exponent again.
Karl M Johnson is offline   Reply With Quote
Old 2012-03-19, 09:23   #1027
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

2·5·859 Posts
Default

Any sources and binaries for v1.68? (the one with interactive aggressive/polite mode). I will be home in about 2-3 hours and I am eager to try it. Anyhow, if not, I will still keep you posted with v1.65's progress. I understand that you have other things to do too, sorry for being such a pain in the butt.
LaurV is offline   Reply With Quote
Old 2012-03-19, 12:00   #1028
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3·137 Posts
Default

DC successful !
2^6972593 - 1 is indeed a prime
Karl M Johnson is offline   Reply With Quote
Old 2012-03-19, 12:37   #1029
Svenie25
 
Svenie25's Avatar
 
Aug 2008
Good old Germany

14110 Posts
Default

Hi guys.

Could someone please tell me, how the inputfile for CL had to look? I tried the exponents alone and the line from the worktodo.txt of P95 but there always CL tells me to start with the first exponent and then closes.

Thanks in advance.
Svenie25 is offline   Reply With Quote
Old 2012-03-19, 12:51   #1030
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3·137 Posts
Default

Code:
CUDALucas.exe -d 1 -threads 512 -c 25000 -t -agressive 6972593
Run cudalucas without args to find out the meaning of commands.
Karl M Johnson is offline   Reply With Quote
Old 2012-03-19, 12:54   #1031
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

2×5×859 Posts
Default

version 1.67, polite and aggressive:
(still not interactively changeable)

Code:
CUDALucas1.67.cuda4.1.sm_20.x64.exe -d 1 -r
DEVICE:1------------------------
name                GeForce GTX 580
totalGlobalMem      1610612736
sharedMemPerBlock   49152
regsPerBlock        32768
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      65535,65535,65535
totalConstMem       65536
major.minor         2.0
clockRate           1564000
textureAlignment    512
deviceOverlap       1
multiProcessorCount 16
Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 8192, CUDALucas v1.67 err = 1.919e-007 (0:02 real, 0.2334 ms/iter, ETA 0:16)
Iteration 10000 M( 132049 )C, 0x4c52a92b54635f9e, n = 8192, CUDALucas v1.67 err = 0.0004515 (0:02 real, 0.2340 ms/iter, ETA 0:28)
Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 16384, CUDALucas v1.67 err = 1.14e-005 (0:03 real, 0.2316 ms/iter, ETA 0:46)
Iteration 10000 M( 756839 )C, 0x5d2cbe7cb24a109a, n = 40960, CUDALucas v1.67 err = 0.0295 (0:03 real, 0.2828 ms/iter, ETA 3:29)
Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 49152, CUDALucas v1.67 err = 0.009473 (0:02 real, 0.2930 ms/iter, ETA 4:06)
Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 73728, CUDALucas v1.67 err = 0.006119 (0:04 real, 0.3601 ms/iter, ETA 7:26)
Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 73728, CUDALucas v1.67 err = 0.09116 (0:04 real, 0.3570 ms/iter, ETA 8:12)
Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.67 err = 0.04841 (0:05 real, 0.5641 ms/iter, ETA 27:49)
Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.67 err = 0.06637 (0:06 real, 0.5643 ms/iter, ETA 28:18)
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.67 err = 0.05295 (0:11 real, 1.1262 ms/iter, ETA 2:10:38)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.67 err = 0.02841 (0:19 real, 1.8848 ms/iter, ETA 7:02:30)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.67 err = 0.08614 (0:25 real, 2.4236 ms/iter, ETA 14:07:26)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.67 err = 0.216 (0:27 real, 2.6855 ms/iter, ETA 17:55:06)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.67 err = 0.01812 (0:32 real, 3.1922 ms/iter, ETA 23:00:37)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.67 err = 0.02299 (0:35 real, 3.5650 ms/iter, ETA 30:05:40)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.67 err = 0.1126 (0:36 real, 3.5962 ms/iter, ETA 32:32:08)
err = 0.384875, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.67 err = 0.1081 (0:35 real, 3.5168 ms/iter, ETA 36:16:52)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.67 err = 0.1898 (0:45 real, 4.4142 ms/iter, ETA 52:16:15)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.67 err = 0.2643 (0:41 real, 4.1197 ms/iter, ETA 49:19:18)

>CUDALucas1.67.cuda4.1.sm_20.x64.exe -d 1 -aggressive -r
DEVICE:1------------------------
name                GeForce GTX 580
totalGlobalMem      1610612736
sharedMemPerBlock   49152
regsPerBlock        32768
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      65535,65535,65535
totalConstMem       65536
major.minor         2.0
clockRate           1564000
textureAlignment    512
deviceOverlap       1
multiProcessorCount 16
Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 8192, CUDALucas v1.67 err = 1.919e-007 (0:01 real, 0.0802 ms/iter, ETA 0:05)
Iteration 10000 M( 132049 )C, 0x4c52a92b54635f9e, n = 8192, CUDALucas v1.67 err = 0.0004515 (0:00 real, 0.0802 ms/iter, ETA 0:09)
Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 16384, CUDALucas v1.67 err = 1.14e-005 (0:01 real, 0.0792 ms/iter, ETA 0:15)
Iteration 10000 M( 756839 )C, 0x5d2cbe7cb24a109a, n = 40960, CUDALucas v1.67 err = 0.0295 (0:01 real, 0.1082 ms/iter, ETA 1:20)
Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 49152, CUDALucas v1.67 err = 0.009473 (0:02 real, 0.1181 ms/iter, ETA 1:39)
Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 73728, CUDALucas v1.67 err = 0.006119 (0:01 real, 0.1842 ms/iter, ETA 3:48)
Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 73728, CUDALucas v1.67 err = 0.09116 (0:02 real, 0.1939 ms/iter, ETA 4:27)
Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.67 err = 0.04841 (0:04 real, 0.3753 ms/iter, ETA 18:30)
Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.67 err = 0.06637 (0:04 real, 0.3770 ms/iter, ETA 18:54)
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.67 err = 0.05295 (0:08 real, 0.7606 ms/iter, ETA 1:28:13)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.67 err = 0.02841 (0:14 real, 1.4295 ms/iter, ETA 5:20:26)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.67 err = 0.08614 (0:20 real, 1.9823 ms/iter, ETA 11:33:09)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.67 err = 0.216 (0:23 real, 2.2765 ms/iter, ETA 15:11:21)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.67 err = 0.01812 (0:28 real, 2.7817 ms/iter, ETA 20:03:04)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.67 err = 0.02299 (0:31 real, 3.1177 ms/iter, ETA 26:19:07)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.67 err = 0.1126 (0:31 real, 3.1220 ms/iter, ETA 28:14:44)
err = 0.373917, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.67 err = 0.1081 (0:32 real, 3.1166 ms/iter, ETA 32:09:09)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.67 err = 0.1898 (0:39 real, 3.9440 ms/iter, ETA 46:42:13)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.67 err = 0.2643 (0:40 real, 3.9444 ms/iter, ETA 47:13:22)
all this time p95 was running, and cl.1.65 was crunching 26248759 DC on the second card (20 minutes to go)
LaurV is offline   Reply With Quote
Old 2012-03-19, 14:47   #1032
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×5×991 Posts
Default

Quote:
Originally Posted by Svenie25 View Post
Hi guys.

Could someone please tell me, how the inputfile for CL had to look? I tried the exponents alone and the line from the worktodo.txt of P95 but there always CL tells me to start with the first exponent and then closes.

Thanks in advance.
I just tried the following:
Code:
E:\CUDA\CUDALucas166.x64>CUDALucas1.66.cuda4.1.sm_21.x64 -t -c10000 -threads 512 -s check worktodo.txt
DEVICE:0------------------------
name                GeForce GTX 460
totalGlobalMem      1073741824
sharedMemPerBlock   49152
regsPerBlock        32768
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      65535,65535,65535
totalConstMem       65536
major.minor         2.1
clockRate           1700000
textureAlignment    512
deviceOverlap       1
multiProcessorCount 7
mkdir: cannot create directory `check': File exists
Start test of file 'worktodo.txt'

continuing work from a partial result M26116807 fft length = 1572864 iteration = 14178
Iteration 20000 M( 26116807 )C, 0xca672378e7d6596a, n = 1572864, CUDALucas v1.66 err = 0.02349 (0:37 real, 3.6748 ms/iter, ETA 26:37:56)
Iteration 30000 M( 26116807 )C, 0x3252f697aa7b19ce, n = 1572864, CUDALucas v1.66 err = 0.02716 (1:03 real, 6.3077 ms/iter, ETA 45:41:43)
^C caught.  Writing checkpoint.
worktodo.txt had two test exponents (from my completed double-checks), see attached. I also tried it with the two exponents in reversed order, and it started with the correct one. (Note that "check" is the folder I made for checkpoint files to be saved in. That also seems to be working correctly.)

I hope this helps.

EDIT: I stated incorrectly in a previous post that the worktodo.txt in the command line would be preceded by -r. LaurV corrected this error. "-r" runs a self-test.
Attached Files
File Type: txt worktodo.txt (22 Bytes, 63 views)

Last fiddled with by kladner on 2012-03-19 at 14:50
kladner is offline   Reply With Quote
Old 2012-03-19, 15:07   #1033
Svenie25
 
Svenie25's Avatar
 
Aug 2008
Good old Germany

3·47 Posts
Default

Thanks a lot.

I found my error. CL created a ini file with the number of the line where to start. I deleted thiese file and then it worked.

Again, thanks a lot.
Svenie25 is offline   Reply With Quote
Old 2012-03-19, 16:17   #1034
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

33110 Posts
Default Timings (best values)

Quote:
Originally Posted by Prime95 View Post
I added extern "C" to make MSVC 2010 happy. Extern "C" overrides name-mangling.

Is the new version faster for you? Does it work OK?
Code:
1.65 polite    : M( 29309279 )C, n = 1835008, CUDALucas v1.65 err = 0.009593 (1:01 real, 6.0932 ms/iter, ETA 49:20:17)
1.67 polite    : M( 29359303 )C, n = 1835008, CUDALucas v1.67 err = 0.009615 (0:57 real, 5.6353 ms/iter, ETA 39:39:58)
1.67 aggressive: M( 29359303 )C, n = 1835008, CUDALucas v1.67 err = 0.009195 (0:53 real, 5.3320 ms/iter, ETA 37:28:58)
Code:
DEVICE:0------------------------
name                GeForce GTX 560 Ti
totalGlobalMem      1073741824
sharedMemPerBlock   49152
regsPerBlock        32768
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      65535,65535,65535
totalConstMem       65536
major.minor         2.1
clockRate           1645000
textureAlignment    512
deviceOverlap       1
multiProcessorCount 8
Could we also have the device info when no parameter is entered and usage is printed? Helps finding the device number...

Last fiddled with by Brain on 2012-03-19 at 16:18 Reason: typo
Brain is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 19:42.

Tue Jul 14 19:42:08 UTC 2020 up 111 days, 17:15, 0 users, load averages: 1.46, 1.54, 1.55

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.