mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

flashjh 2012-03-02 15:39

Version 1.63 binaries
 
1 Attachment(s)
[QUOTE=msft;291556]Ver 1.63
Only use complex to complex fft.
[code]
$ ./CUDALucas -r
DEVICE:0------------------------
name GeForce GTX 460
totalGlobalMem 804454400
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
totalConstMem 65536
major.minor 2.1
clockRate 1350000
textureAlignment 512
deviceOverlap 1
multiProcessorCount 7
Iteration 10000 M( 756893 )C, 0xb94c673f25fe7ded, n = 65536, CUDALucas v1.63 (0:04 real, 0.3923 ms/iter, ETA 4:50)
Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 65536, CUDALucas v1.63 (0:04 real, 0.3830 ms/iter, ETA 5:21)
Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 98304, CUDALucas v1.63 (0:05 real, 0.5458 ms/iter, ETA 11:16)
Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 98304, CUDALucas v1.63 (0:06 real, 0.5427 ms/iter, ETA 12:28)
Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.63 (0:08 real, 0.7994 ms/iter, ETA 39:26)
Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.63 (0:08 real, 0.7788 ms/iter, ETA 39:04)
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.63 (0:19 real, 1.8962 ms/iter, ETA 3:39:57)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.63 (0:37 real, 3.7644 ms/iter, ETA 14:03:50)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.63 (0:51 real, 5.1375 ms/iter, ETA 29:56:25)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.63 (1:00 real, 6.0356 ms/iter, ETA 40:16:16)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.63 (1:14 real, 7.4174 ms/iter, ETA 53:28:01)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.63 (1:23 real, 8.2723 ms/iter, ETA 69:49:53)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.63 (1:23 real, 8.2783 ms/iter, ETA 74:53:45)
err = 0.367069, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.63 (1:30 real, 9.0108 ms/iter, ETA 92:57:40)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.63 (1:46 real, 10.5272 ms/iter, ETA 124:39:33)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.63 (1:45 real, 10.5222 ms/iter, ETA 125:58:25)
[/code][/QUOTE]

Attached v1.63 Win64 binaries:
[LIST][*]CUDA 4.0 / SM 2.0[*]CUDA 4.1 / SM 2.0[*]CUDA 4.1 / SM 2.1[/LIST]

flashjh 2012-03-02 17:42

[QUOTE=msft;291556]Ver 1.63
Only use complex to complex fft.
[code]
$ ./CUDALucas -r
DEVICE:0------------------------
name GeForce GTX 460
totalGlobalMem 804454400
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
totalConstMem 65536
major.minor 2.1
clockRate 1350000
textureAlignment 512
deviceOverlap 1
multiProcessorCount 7
Iteration 10000 M( 756893 )C, 0xb94c673f25fe7ded, n = 65536, CUDALucas v1.63 (0:04 real, 0.3923 ms/iter, ETA 4:50)
Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 65536, CUDALucas v1.63 (0:04 real, 0.3830 ms/iter, ETA 5:21)
Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 98304, CUDALucas v1.63 (0:05 real, 0.5458 ms/iter, ETA 11:16)
Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 98304, CUDALucas v1.63 (0:06 real, 0.5427 ms/iter, ETA 12:28)
Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.63 (0:08 real, 0.7994 ms/iter, ETA 39:26)
Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.63 (0:08 real, 0.7788 ms/iter, ETA 39:04)
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.63 (0:19 real, 1.8962 ms/iter, ETA 3:39:57)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.63 (0:37 real, 3.7644 ms/iter, ETA 14:03:50)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.63 (0:51 real, 5.1375 ms/iter, ETA 29:56:25)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.63 (1:00 real, 6.0356 ms/iter, ETA 40:16:16)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.63 (1:14 real, 7.4174 ms/iter, ETA 53:28:01)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.63 (1:23 real, 8.2723 ms/iter, ETA 69:49:53)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.63 (1:23 real, 8.2783 ms/iter, ETA 74:53:45)
err = 0.367069, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.63 (1:30 real, 9.0108 ms/iter, ETA 92:57:40)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.63 (1:46 real, 10.5272 ms/iter, ETA 124:39:33)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.63 (1:45 real, 10.5222 ms/iter, ETA 125:58:25)
[/code][/QUOTE]

Had a couple of random force closes so far after 40000 iterations. If I reopen and continue, it doesn't happen on the same exponents, so far.

Residues are a match so far.

Any idea what change could cause the force closes?

LaurV 2012-03-02 18:32

I am currently testing 26177689 and 26026433 using v1.61, and comparing the residues with the one posted by Jerry (for his expo) and with the one I have from the former run (for my exponent). I am about a third of the way for both expos, and up to now, all residues matched.

In the process I developed a very simple batch file I was telling you about, to keep the resuming files of CL. No big deal, very simple and dirty, but I found out it is REALLY useful and easy to use and I would share it.

[CODE]
@echo off
set file1=t26026433
set file2=t26177689
set /A ext1=0
set /A ext2=0

:loop01
choice /N /T 5 /D Y >nul
if exist %file1% goto rena1
if exist %file2% goto rena2
echo ... Nothing done at %TIME% ...
goto loop01

:rena1
set /A ext1=%ext1%+1
set dest=backup\%file1%_%ext1%.txt
copy /b %file1% %dest% >nul
del %file1% >nul
echo ... File %file1% Saved as %dest% at %TIME% ...
if not exist %file2% goto loop01

:rena2
set /A ext2=%ext2%+1
set dest=backup\%file2%_%ext2%.txt
copy/b %file2% %dest% >nul
del %file2% >nul
echo ... File %file2% Saved as %dest% at %TIME% ...
goto loop01
[/CODE]To use it, you have first to save it as a batch file, say, keepresidues.bat, on the same folder where temporary files (cXXXXX and tXXXXX) of CudaLucas are written. Then you have to edit it and change the file1 and file2 lines to match the exponents you want to test. Keep the "t" and change the digits only. If you test only one expo, use a fake number for the second.

Then you have to create a subfolder called "backup" inside of that folder where you put the batch file. This is the home for the files you will save.

Then run the batch and forget the command prompt. The batch does not take cpu resources, it will verify every 5 seconds if a "tXXXXX" file exists, and if so, it will copy (move) to the backup folder, keeping a counter with all the files. The time interval you can change if you modify the "choice" command. I used choice because it works better with windows vista and windows 7. I am right now on win7 64 bits.

When (if) you get a residue mismatch you will be able to re-run only the last iterations, and see if it was hardware bug or if is repeatable.

After you ran the batch you can launch your two copies of cudalucas in the usual way you do it (or before, it does not matter, but in this case the counter of the files will not really be aligned with your number of iterations, but it does not matter, anyhow in case of mismatch you can easily find out where to resume from. Alternative is to modify the two indexes (ext1 and ext2) to match your number of iteration and step.

From time to time if you see that the residues are matching the comparing file (assuming you have one from a previous run, yours, or from someone else, like the one uploaded by Jerry, screen-copy of CL output, or generated by P95) then you can clear totaly or partially the content of the backup folder. These files are huge and if you are going to generate one at every 10k iterations, then your harddisk will be filled in few weeks.

flashjh 2012-03-02 19:51

[QUOTE=LaurV;291603]I am currently testing 26177689 and 26026433 using v1.61, and comparing the residues with the one posted by Jerry (for his expo) and with the one I have from the former run (for my exponent). I am about a third of the way for both expos, and up to now, all residues matched.

In the process I developed a very simple batch file I was telling you about, to keep the resuming files of CL. No big deal, very simple and dirty, but I found out it is REALLY useful and easy to use and I would share it.

[CODE]
@echo off
set file1=t26026433
set file2=t26177689
set /A ext1=0
set /A ext2=0

:loop01
choice /N /T 5 /D Y >nul
if exist %file1% goto rena1
if exist %file2% goto rena2
echo ... Nothing done at %TIME% ...
goto loop01

:rena1
set /A ext1=%ext1%+1
set dest=backup\%file1%_%ext1%.txt
copy /b %file1% %dest% >nul
del %file1% >nul
echo ... File %file1% Saved as %dest% at %TIME% ...
if not exist %file2% goto loop01

:rena2
set /A ext2=%ext2%+1
set dest=backup\%file2%_%ext2%.txt
copy/b %file2% %dest% >nul
del %file2% >nul
echo ... File %file2% Saved as %dest% at %TIME% ...
goto loop01
[/CODE]To use it, you have first to save it as a batch file, say, keepresidues.bat, on the same folder where temporary files (cXXXXX and tXXXXX) of CudaLucas are written. Then you have to edit it and change the file1 and file2 lines to match the exponents you want to test. Keep the "t" and change the digits only. If you test only one expo, use a fake number for the second.

Then you have to create a subfolder called "backup" inside of that folder where you put the batch file. This is the home for the files you will save.

Then run the batch and forget the command prompt. The batch does not take cpu resources, it will verify every 5 seconds if a "tXXXXX" file exists, and if so, it will copy (move) to the backup folder, keeping a counter with all the files. The time interval you can change if you modify the "choice" command. I used choice because it works better with windows vista and windows 7. I am right now on win7 64 bits.

When (if) you get a residue mismatch you will be able to re-run only the last iterations, and see if it was hardware bug or if is repeatable.

After you ran the batch you can launch your two copies of cudalucas in the usual way you do it (or before, it does not matter, but in this case the counter of the files will not really be aligned with your number of iterations, but it does not matter, anyhow in case of mismatch you can easily find out where to resume from. Alternative is to modify the two indexes (ext1 and ext2) to match your number of iteration and step.

From time to time if you see that the residues are matching the comparing file (assuming you have one from a previous run, yours, or from someone else, like the one uploaded by Jerry, screen-copy of CL output, or generated by P95) then you can clear totaly or partially the content of the backup folder. These files are huge and if you are going to generate one at every 10k iterations, then your harddisk will be filled in few weeks.[/QUOTE]

The -s switch in 1.63 does this, but like you pointed out... it will fill up your drive fast!

flashjh 2012-03-02 20:00

Just wondering since I haven't used gpuLucas... is that project set to replace CUDALucas or the other way around?

Is it feasable to combine the best of both into one as to not spend time on two separate projects with the same goal?

Maybe aaronhaviland and msft can talk about it?

LaurV 2012-03-02 20:07

[QUOTE=flashjh;291621]The -s switch in 1.63 does this[/QUOTE]
That is very good news. I will switch to it after this test is finished (about 10 hours to go). It may fill up the hdd, but is a gold mine when you debug mismatches....

flashjh 2012-03-02 20:09

[QUOTE=LaurV;291626]...but is a gold mine when you debug mismatches....[/QUOTE]

Completely agree!

Dubslow 2012-03-02 21:12

I think a lot more work needs to be done on both. As of yet, gpuLucas is not ready for production runs, which most versions of CUDALucas are. With the new speed in 1.63, msft may have closed the gap. There'll need to be extensive testing of both to determine which is better.

(Keep in mind the claims about gpuLucas being so much better were made when CUDALucas supported only power-of-2 FFTs; for those FFTs, gpuLucas was the same speed. Now that CUDALucas does support non-p-o-2 FFTs, and is much more mature, the programs are on even footing.)

flashjh 2012-03-02 21:17

[QUOTE=Dubslow;291633]I think a lot more work needs to be done on both. As of yet, gpuLucas is not ready for production runs, which most versions of CUDALucas are. With the new speed in 1.63, msft may have closed the gap. There'll need to be extensive testing of both to determine which is better.

(Keep in mind the claims about gpuLucas being so much better were made when CUDALucas supported only power-of-2 FFTs; for those FFTs, gpuLucas was the same speed. Now that CUDALucas does support non-p-o-2 FFTs, and is much more mature, the programs are on even footing.)[/QUOTE]

I see... that's good news. The gains of having two separate projects are good because you get a lot more ideas, different algorithms, etc. In the end though, I'd think one program would be best.

Like you said, lots of testing and development ahead.

[BREAK]

So far 1.63 has survived several stops and restarts.

Only problem so far is I get [B]random[/B] force closes... it's like before when it would finish and force close before it would start the next exponent, only now it's doing it randomly.

msft, any ideas what could be causing this? I didn't get any errors or warning during compile?

msft 2012-03-02 22:19

[QUOTE=flashjh;291625]Is it feasable to combine the best of both into one as to not spend time on two separate projects with the same goal?
[/QUOTE]
I can make combined version.

msft 2012-03-02 22:22

[QUOTE=flashjh;291634]msft, any ideas what could be causing this? I didn't get any errors or warning during compile?[/QUOTE]
Please wait another guy's report.


All times are UTC. The time now is 23:11.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.