![]() |
Version 1.63 binaries
1 Attachment(s)
[QUOTE=msft;291556]Ver 1.63
Only use complex to complex fft. [code] $ ./CUDALucas -r DEVICE:0------------------------ name GeForce GTX 460 totalGlobalMem 804454400 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 totalConstMem 65536 major.minor 2.1 clockRate 1350000 textureAlignment 512 deviceOverlap 1 multiProcessorCount 7 Iteration 10000 M( 756893 )C, 0xb94c673f25fe7ded, n = 65536, CUDALucas v1.63 (0:04 real, 0.3923 ms/iter, ETA 4:50) Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 65536, CUDALucas v1.63 (0:04 real, 0.3830 ms/iter, ETA 5:21) Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 98304, CUDALucas v1.63 (0:05 real, 0.5458 ms/iter, ETA 11:16) Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 98304, CUDALucas v1.63 (0:06 real, 0.5427 ms/iter, ETA 12:28) Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.63 (0:08 real, 0.7994 ms/iter, ETA 39:26) Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.63 (0:08 real, 0.7788 ms/iter, ETA 39:04) Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.63 (0:19 real, 1.8962 ms/iter, ETA 3:39:57) Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.63 (0:37 real, 3.7644 ms/iter, ETA 14:03:50) Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.63 (0:51 real, 5.1375 ms/iter, ETA 29:56:25) Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.63 (1:00 real, 6.0356 ms/iter, ETA 40:16:16) Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.63 (1:14 real, 7.4174 ms/iter, ETA 53:28:01) Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.63 (1:23 real, 8.2723 ms/iter, ETA 69:49:53) Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.63 (1:23 real, 8.2783 ms/iter, ETA 74:53:45) err = 0.367069, increasing n from 1966080 Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.63 (1:30 real, 9.0108 ms/iter, ETA 92:57:40) Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.63 (1:46 real, 10.5272 ms/iter, ETA 124:39:33) Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.63 (1:45 real, 10.5222 ms/iter, ETA 125:58:25) [/code][/QUOTE] Attached v1.63 Win64 binaries: [LIST][*]CUDA 4.0 / SM 2.0[*]CUDA 4.1 / SM 2.0[*]CUDA 4.1 / SM 2.1[/LIST] |
[QUOTE=msft;291556]Ver 1.63
Only use complex to complex fft. [code] $ ./CUDALucas -r DEVICE:0------------------------ name GeForce GTX 460 totalGlobalMem 804454400 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 totalConstMem 65536 major.minor 2.1 clockRate 1350000 textureAlignment 512 deviceOverlap 1 multiProcessorCount 7 Iteration 10000 M( 756893 )C, 0xb94c673f25fe7ded, n = 65536, CUDALucas v1.63 (0:04 real, 0.3923 ms/iter, ETA 4:50) Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 65536, CUDALucas v1.63 (0:04 real, 0.3830 ms/iter, ETA 5:21) Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 98304, CUDALucas v1.63 (0:05 real, 0.5458 ms/iter, ETA 11:16) Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 98304, CUDALucas v1.63 (0:06 real, 0.5427 ms/iter, ETA 12:28) Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.63 (0:08 real, 0.7994 ms/iter, ETA 39:26) Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.63 (0:08 real, 0.7788 ms/iter, ETA 39:04) Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.63 (0:19 real, 1.8962 ms/iter, ETA 3:39:57) Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.63 (0:37 real, 3.7644 ms/iter, ETA 14:03:50) Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.63 (0:51 real, 5.1375 ms/iter, ETA 29:56:25) Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.63 (1:00 real, 6.0356 ms/iter, ETA 40:16:16) Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.63 (1:14 real, 7.4174 ms/iter, ETA 53:28:01) Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.63 (1:23 real, 8.2723 ms/iter, ETA 69:49:53) Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.63 (1:23 real, 8.2783 ms/iter, ETA 74:53:45) err = 0.367069, increasing n from 1966080 Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.63 (1:30 real, 9.0108 ms/iter, ETA 92:57:40) Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.63 (1:46 real, 10.5272 ms/iter, ETA 124:39:33) Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.63 (1:45 real, 10.5222 ms/iter, ETA 125:58:25) [/code][/QUOTE] Had a couple of random force closes so far after 40000 iterations. If I reopen and continue, it doesn't happen on the same exponents, so far. Residues are a match so far. Any idea what change could cause the force closes? |
I am currently testing 26177689 and 26026433 using v1.61, and comparing the residues with the one posted by Jerry (for his expo) and with the one I have from the former run (for my exponent). I am about a third of the way for both expos, and up to now, all residues matched.
In the process I developed a very simple batch file I was telling you about, to keep the resuming files of CL. No big deal, very simple and dirty, but I found out it is REALLY useful and easy to use and I would share it. [CODE] @echo off set file1=t26026433 set file2=t26177689 set /A ext1=0 set /A ext2=0 :loop01 choice /N /T 5 /D Y >nul if exist %file1% goto rena1 if exist %file2% goto rena2 echo ... Nothing done at %TIME% ... goto loop01 :rena1 set /A ext1=%ext1%+1 set dest=backup\%file1%_%ext1%.txt copy /b %file1% %dest% >nul del %file1% >nul echo ... File %file1% Saved as %dest% at %TIME% ... if not exist %file2% goto loop01 :rena2 set /A ext2=%ext2%+1 set dest=backup\%file2%_%ext2%.txt copy/b %file2% %dest% >nul del %file2% >nul echo ... File %file2% Saved as %dest% at %TIME% ... goto loop01 [/CODE]To use it, you have first to save it as a batch file, say, keepresidues.bat, on the same folder where temporary files (cXXXXX and tXXXXX) of CudaLucas are written. Then you have to edit it and change the file1 and file2 lines to match the exponents you want to test. Keep the "t" and change the digits only. If you test only one expo, use a fake number for the second. Then you have to create a subfolder called "backup" inside of that folder where you put the batch file. This is the home for the files you will save. Then run the batch and forget the command prompt. The batch does not take cpu resources, it will verify every 5 seconds if a "tXXXXX" file exists, and if so, it will copy (move) to the backup folder, keeping a counter with all the files. The time interval you can change if you modify the "choice" command. I used choice because it works better with windows vista and windows 7. I am right now on win7 64 bits. When (if) you get a residue mismatch you will be able to re-run only the last iterations, and see if it was hardware bug or if is repeatable. After you ran the batch you can launch your two copies of cudalucas in the usual way you do it (or before, it does not matter, but in this case the counter of the files will not really be aligned with your number of iterations, but it does not matter, anyhow in case of mismatch you can easily find out where to resume from. Alternative is to modify the two indexes (ext1 and ext2) to match your number of iteration and step. From time to time if you see that the residues are matching the comparing file (assuming you have one from a previous run, yours, or from someone else, like the one uploaded by Jerry, screen-copy of CL output, or generated by P95) then you can clear totaly or partially the content of the backup folder. These files are huge and if you are going to generate one at every 10k iterations, then your harddisk will be filled in few weeks. |
[QUOTE=LaurV;291603]I am currently testing 26177689 and 26026433 using v1.61, and comparing the residues with the one posted by Jerry (for his expo) and with the one I have from the former run (for my exponent). I am about a third of the way for both expos, and up to now, all residues matched.
In the process I developed a very simple batch file I was telling you about, to keep the resuming files of CL. No big deal, very simple and dirty, but I found out it is REALLY useful and easy to use and I would share it. [CODE] @echo off set file1=t26026433 set file2=t26177689 set /A ext1=0 set /A ext2=0 :loop01 choice /N /T 5 /D Y >nul if exist %file1% goto rena1 if exist %file2% goto rena2 echo ... Nothing done at %TIME% ... goto loop01 :rena1 set /A ext1=%ext1%+1 set dest=backup\%file1%_%ext1%.txt copy /b %file1% %dest% >nul del %file1% >nul echo ... File %file1% Saved as %dest% at %TIME% ... if not exist %file2% goto loop01 :rena2 set /A ext2=%ext2%+1 set dest=backup\%file2%_%ext2%.txt copy/b %file2% %dest% >nul del %file2% >nul echo ... File %file2% Saved as %dest% at %TIME% ... goto loop01 [/CODE]To use it, you have first to save it as a batch file, say, keepresidues.bat, on the same folder where temporary files (cXXXXX and tXXXXX) of CudaLucas are written. Then you have to edit it and change the file1 and file2 lines to match the exponents you want to test. Keep the "t" and change the digits only. If you test only one expo, use a fake number for the second. Then you have to create a subfolder called "backup" inside of that folder where you put the batch file. This is the home for the files you will save. Then run the batch and forget the command prompt. The batch does not take cpu resources, it will verify every 5 seconds if a "tXXXXX" file exists, and if so, it will copy (move) to the backup folder, keeping a counter with all the files. The time interval you can change if you modify the "choice" command. I used choice because it works better with windows vista and windows 7. I am right now on win7 64 bits. When (if) you get a residue mismatch you will be able to re-run only the last iterations, and see if it was hardware bug or if is repeatable. After you ran the batch you can launch your two copies of cudalucas in the usual way you do it (or before, it does not matter, but in this case the counter of the files will not really be aligned with your number of iterations, but it does not matter, anyhow in case of mismatch you can easily find out where to resume from. Alternative is to modify the two indexes (ext1 and ext2) to match your number of iteration and step. From time to time if you see that the residues are matching the comparing file (assuming you have one from a previous run, yours, or from someone else, like the one uploaded by Jerry, screen-copy of CL output, or generated by P95) then you can clear totaly or partially the content of the backup folder. These files are huge and if you are going to generate one at every 10k iterations, then your harddisk will be filled in few weeks.[/QUOTE] The -s switch in 1.63 does this, but like you pointed out... it will fill up your drive fast! |
Just wondering since I haven't used gpuLucas... is that project set to replace CUDALucas or the other way around?
Is it feasable to combine the best of both into one as to not spend time on two separate projects with the same goal? Maybe aaronhaviland and msft can talk about it? |
[QUOTE=flashjh;291621]The -s switch in 1.63 does this[/QUOTE]
That is very good news. I will switch to it after this test is finished (about 10 hours to go). It may fill up the hdd, but is a gold mine when you debug mismatches.... |
[QUOTE=LaurV;291626]...but is a gold mine when you debug mismatches....[/QUOTE]
Completely agree! |
I think a lot more work needs to be done on both. As of yet, gpuLucas is not ready for production runs, which most versions of CUDALucas are. With the new speed in 1.63, msft may have closed the gap. There'll need to be extensive testing of both to determine which is better.
(Keep in mind the claims about gpuLucas being so much better were made when CUDALucas supported only power-of-2 FFTs; for those FFTs, gpuLucas was the same speed. Now that CUDALucas does support non-p-o-2 FFTs, and is much more mature, the programs are on even footing.) |
[QUOTE=Dubslow;291633]I think a lot more work needs to be done on both. As of yet, gpuLucas is not ready for production runs, which most versions of CUDALucas are. With the new speed in 1.63, msft may have closed the gap. There'll need to be extensive testing of both to determine which is better.
(Keep in mind the claims about gpuLucas being so much better were made when CUDALucas supported only power-of-2 FFTs; for those FFTs, gpuLucas was the same speed. Now that CUDALucas does support non-p-o-2 FFTs, and is much more mature, the programs are on even footing.)[/QUOTE] I see... that's good news. The gains of having two separate projects are good because you get a lot more ideas, different algorithms, etc. In the end though, I'd think one program would be best. Like you said, lots of testing and development ahead. [BREAK] So far 1.63 has survived several stops and restarts. Only problem so far is I get [B]random[/B] force closes... it's like before when it would finish and force close before it would start the next exponent, only now it's doing it randomly. msft, any ideas what could be causing this? I didn't get any errors or warning during compile? |
[QUOTE=flashjh;291625]Is it feasable to combine the best of both into one as to not spend time on two separate projects with the same goal?
[/QUOTE] I can make combined version. |
[QUOTE=flashjh;291634]msft, any ideas what could be causing this? I didn't get any errors or warning during compile?[/QUOTE]
Please wait another guy's report. |
| All times are UTC. The time now is 23:11. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.