mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-03-02, 15:39   #870
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default Version 1.63 binaries

Quote:
Originally Posted by msft View Post
Ver 1.63
Only use complex to complex fft.
Code:
$ ./CUDALucas -r
DEVICE:0------------------------
name                GeForce GTX 460
totalGlobalMem      804454400
sharedMemPerBlock   49152
regsPerBlock        32768
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      65535,65535,65535
totalConstMem       65536
major.minor         2.1
clockRate           1350000
textureAlignment    512
deviceOverlap       1
multiProcessorCount 7
Iteration 10000 M( 756893 )C, 0xb94c673f25fe7ded, n = 65536, CUDALucas v1.63 (0:04 real, 0.3923 ms/iter, ETA 4:50)
Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 65536, CUDALucas v1.63 (0:04 real, 0.3830 ms/iter, ETA 5:21)
Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 98304, CUDALucas v1.63 (0:05 real, 0.5458 ms/iter, ETA 11:16)
Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 98304, CUDALucas v1.63 (0:06 real, 0.5427 ms/iter, ETA 12:28)
Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.63 (0:08 real, 0.7994 ms/iter, ETA 39:26)
Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.63 (0:08 real, 0.7788 ms/iter, ETA 39:04)
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.63 (0:19 real, 1.8962 ms/iter, ETA 3:39:57)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.63 (0:37 real, 3.7644 ms/iter, ETA 14:03:50)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.63 (0:51 real, 5.1375 ms/iter, ETA 29:56:25)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.63 (1:00 real, 6.0356 ms/iter, ETA 40:16:16)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.63 (1:14 real, 7.4174 ms/iter, ETA 53:28:01)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.63 (1:23 real, 8.2723 ms/iter, ETA 69:49:53)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.63 (1:23 real, 8.2783 ms/iter, ETA 74:53:45)
err = 0.367069, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.63 (1:30 real, 9.0108 ms/iter, ETA 92:57:40)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.63 (1:46 real, 10.5272 ms/iter, ETA 124:39:33)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.63 (1:45 real, 10.5222 ms/iter, ETA 125:58:25)
Attached v1.63 Win64 binaries:
  • CUDA 4.0 / SM 2.0
  • CUDA 4.1 / SM 2.0
  • CUDA 4.1 / SM 2.1
Attached Files
File Type: zip CUDALucas1.63.WIN64.zip (222.6 KB, 66 views)
flashjh is offline   Reply With Quote
Old 2012-03-02, 17:42   #871
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

Quote:
Originally Posted by msft View Post
Ver 1.63
Only use complex to complex fft.
Code:
$ ./CUDALucas -r
DEVICE:0------------------------
name                GeForce GTX 460
totalGlobalMem      804454400
sharedMemPerBlock   49152
regsPerBlock        32768
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      65535,65535,65535
totalConstMem       65536
major.minor         2.1
clockRate           1350000
textureAlignment    512
deviceOverlap       1
multiProcessorCount 7
Iteration 10000 M( 756893 )C, 0xb94c673f25fe7ded, n = 65536, CUDALucas v1.63 (0:04 real, 0.3923 ms/iter, ETA 4:50)
Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 65536, CUDALucas v1.63 (0:04 real, 0.3830 ms/iter, ETA 5:21)
Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 98304, CUDALucas v1.63 (0:05 real, 0.5458 ms/iter, ETA 11:16)
Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 98304, CUDALucas v1.63 (0:06 real, 0.5427 ms/iter, ETA 12:28)
Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.63 (0:08 real, 0.7994 ms/iter, ETA 39:26)
Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.63 (0:08 real, 0.7788 ms/iter, ETA 39:04)
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.63 (0:19 real, 1.8962 ms/iter, ETA 3:39:57)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.63 (0:37 real, 3.7644 ms/iter, ETA 14:03:50)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.63 (0:51 real, 5.1375 ms/iter, ETA 29:56:25)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.63 (1:00 real, 6.0356 ms/iter, ETA 40:16:16)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.63 (1:14 real, 7.4174 ms/iter, ETA 53:28:01)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.63 (1:23 real, 8.2723 ms/iter, ETA 69:49:53)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.63 (1:23 real, 8.2783 ms/iter, ETA 74:53:45)
err = 0.367069, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.63 (1:30 real, 9.0108 ms/iter, ETA 92:57:40)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.63 (1:46 real, 10.5272 ms/iter, ETA 124:39:33)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.63 (1:45 real, 10.5222 ms/iter, ETA 125:58:25)
Had a couple of random force closes so far after 40000 iterations. If I reopen and continue, it doesn't happen on the same exponents, so far.

Residues are a match so far.

Any idea what change could cause the force closes?
flashjh is offline   Reply With Quote
Old 2012-03-02, 18:32   #872
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

3×3,221 Posts
Default

I am currently testing 26177689 and 26026433 using v1.61, and comparing the residues with the one posted by Jerry (for his expo) and with the one I have from the former run (for my exponent). I am about a third of the way for both expos, and up to now, all residues matched.

In the process I developed a very simple batch file I was telling you about, to keep the resuming files of CL. No big deal, very simple and dirty, but I found out it is REALLY useful and easy to use and I would share it.

Code:
@echo off
set file1=t26026433
set file2=t26177689
set /A ext1=0
set /A ext2=0

:loop01
choice /N /T 5 /D Y >nul
if exist %file1% goto rena1
if exist %file2% goto rena2
echo ... Nothing done at %TIME% ...
goto loop01

:rena1
set /A ext1=%ext1%+1
set dest=backup\%file1%_%ext1%.txt
copy /b %file1% %dest% >nul
del %file1% >nul
echo ... File %file1% Saved as %dest% at %TIME% ...
if not exist %file2% goto loop01

:rena2
set /A ext2=%ext2%+1
set dest=backup\%file2%_%ext2%.txt
copy/b %file2% %dest% >nul
del %file2% >nul
echo ... File %file2% Saved as %dest% at %TIME% ...
goto loop01
To use it, you have first to save it as a batch file, say, keepresidues.bat, on the same folder where temporary files (cXXXXX and tXXXXX) of CudaLucas are written. Then you have to edit it and change the file1 and file2 lines to match the exponents you want to test. Keep the "t" and change the digits only. If you test only one expo, use a fake number for the second.

Then you have to create a subfolder called "backup" inside of that folder where you put the batch file. This is the home for the files you will save.

Then run the batch and forget the command prompt. The batch does not take cpu resources, it will verify every 5 seconds if a "tXXXXX" file exists, and if so, it will copy (move) to the backup folder, keeping a counter with all the files. The time interval you can change if you modify the "choice" command. I used choice because it works better with windows vista and windows 7. I am right now on win7 64 bits.

When (if) you get a residue mismatch you will be able to re-run only the last iterations, and see if it was hardware bug or if is repeatable.

After you ran the batch you can launch your two copies of cudalucas in the usual way you do it (or before, it does not matter, but in this case the counter of the files will not really be aligned with your number of iterations, but it does not matter, anyhow in case of mismatch you can easily find out where to resume from. Alternative is to modify the two indexes (ext1 and ext2) to match your number of iteration and step.

From time to time if you see that the residues are matching the comparing file (assuming you have one from a previous run, yours, or from someone else, like the one uploaded by Jerry, screen-copy of CL output, or generated by P95) then you can clear totaly or partially the content of the backup folder. These files are huge and if you are going to generate one at every 10k iterations, then your harddisk will be filled in few weeks.

Last fiddled with by LaurV on 2012-03-02 at 18:37
LaurV is offline   Reply With Quote
Old 2012-03-02, 19:51   #873
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

100011000112 Posts
Default

Quote:
Originally Posted by LaurV View Post
I am currently testing 26177689 and 26026433 using v1.61, and comparing the residues with the one posted by Jerry (for his expo) and with the one I have from the former run (for my exponent). I am about a third of the way for both expos, and up to now, all residues matched.

In the process I developed a very simple batch file I was telling you about, to keep the resuming files of CL. No big deal, very simple and dirty, but I found out it is REALLY useful and easy to use and I would share it.

Code:
@echo off
set file1=t26026433
set file2=t26177689
set /A ext1=0
set /A ext2=0
 
:loop01
choice /N /T 5 /D Y >nul
if exist %file1% goto rena1
if exist %file2% goto rena2
echo ... Nothing done at %TIME% ...
goto loop01
 
:rena1
set /A ext1=%ext1%+1
set dest=backup\%file1%_%ext1%.txt
copy /b %file1% %dest% >nul
del %file1% >nul
echo ... File %file1% Saved as %dest% at %TIME% ...
if not exist %file2% goto loop01
 
:rena2
set /A ext2=%ext2%+1
set dest=backup\%file2%_%ext2%.txt
copy/b %file2% %dest% >nul
del %file2% >nul
echo ... File %file2% Saved as %dest% at %TIME% ...
goto loop01
To use it, you have first to save it as a batch file, say, keepresidues.bat, on the same folder where temporary files (cXXXXX and tXXXXX) of CudaLucas are written. Then you have to edit it and change the file1 and file2 lines to match the exponents you want to test. Keep the "t" and change the digits only. If you test only one expo, use a fake number for the second.

Then you have to create a subfolder called "backup" inside of that folder where you put the batch file. This is the home for the files you will save.

Then run the batch and forget the command prompt. The batch does not take cpu resources, it will verify every 5 seconds if a "tXXXXX" file exists, and if so, it will copy (move) to the backup folder, keeping a counter with all the files. The time interval you can change if you modify the "choice" command. I used choice because it works better with windows vista and windows 7. I am right now on win7 64 bits.

When (if) you get a residue mismatch you will be able to re-run only the last iterations, and see if it was hardware bug or if is repeatable.

After you ran the batch you can launch your two copies of cudalucas in the usual way you do it (or before, it does not matter, but in this case the counter of the files will not really be aligned with your number of iterations, but it does not matter, anyhow in case of mismatch you can easily find out where to resume from. Alternative is to modify the two indexes (ext1 and ext2) to match your number of iteration and step.

From time to time if you see that the residues are matching the comparing file (assuming you have one from a previous run, yours, or from someone else, like the one uploaded by Jerry, screen-copy of CL output, or generated by P95) then you can clear totaly or partially the content of the backup folder. These files are huge and if you are going to generate one at every 10k iterations, then your harddisk will be filled in few weeks.
The -s switch in 1.63 does this, but like you pointed out... it will fill up your drive fast!
flashjh is offline   Reply With Quote
Old 2012-03-02, 20:00   #874
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

Just wondering since I haven't used gpuLucas... is that project set to replace CUDALucas or the other way around?

Is it feasable to combine the best of both into one as to not spend time on two separate projects with the same goal?

Maybe aaronhaviland and msft can talk about it?
flashjh is offline   Reply With Quote
Old 2012-03-02, 20:07   #875
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

25BF16 Posts
Default

Quote:
Originally Posted by flashjh View Post
The -s switch in 1.63 does this
That is very good news. I will switch to it after this test is finished (about 10 hours to go). It may fill up the hdd, but is a gold mine when you debug mismatches....
LaurV is offline   Reply With Quote
Old 2012-03-02, 20:09   #876
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

Quote:
Originally Posted by LaurV View Post
...but is a gold mine when you debug mismatches....
Completely agree!
flashjh is offline   Reply With Quote
Old 2012-03-02, 21:12   #877
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

I think a lot more work needs to be done on both. As of yet, gpuLucas is not ready for production runs, which most versions of CUDALucas are. With the new speed in 1.63, msft may have closed the gap. There'll need to be extensive testing of both to determine which is better.

(Keep in mind the claims about gpuLucas being so much better were made when CUDALucas supported only power-of-2 FFTs; for those FFTs, gpuLucas was the same speed. Now that CUDALucas does support non-p-o-2 FFTs, and is much more mature, the programs are on even footing.)
Dubslow is offline   Reply With Quote
Old 2012-03-02, 21:17   #878
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

112310 Posts
Default

Quote:
Originally Posted by Dubslow View Post
I think a lot more work needs to be done on both. As of yet, gpuLucas is not ready for production runs, which most versions of CUDALucas are. With the new speed in 1.63, msft may have closed the gap. There'll need to be extensive testing of both to determine which is better.

(Keep in mind the claims about gpuLucas being so much better were made when CUDALucas supported only power-of-2 FFTs; for those FFTs, gpuLucas was the same speed. Now that CUDALucas does support non-p-o-2 FFTs, and is much more mature, the programs are on even footing.)
I see... that's good news. The gains of having two separate projects are good because you get a lot more ideas, different algorithms, etc. In the end though, I'd think one program would be best.

Like you said, lots of testing and development ahead.

[BREAK]

So far 1.63 has survived several stops and restarts.

Only problem so far is I get random force closes... it's like before when it would finish and force close before it would start the next exponent, only now it's doing it randomly.

msft, any ideas what could be causing this? I didn't get any errors or warning during compile?
flashjh is offline   Reply With Quote
Old 2012-03-02, 22:19   #879
msft
 
msft's Avatar
 
Jul 2009
Tokyo

26216 Posts
Default

Quote:
Originally Posted by flashjh View Post
Is it feasable to combine the best of both into one as to not spend time on two separate projects with the same goal?
I can make combined version.
msft is offline   Reply With Quote
Old 2012-03-02, 22:22   #880
msft
 
msft's Avatar
 
Jul 2009
Tokyo

11428 Posts
Default

Quote:
Originally Posted by flashjh View Post
msft, any ideas what could be causing this? I didn't get any errors or warning during compile?
Please wait another guy's report.
msft is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 13:00.


Fri Aug 6 13:00:19 UTC 2021 up 14 days, 7:29, 1 user, load averages: 3.22, 2.90, 2.70

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.