![]() |
[QUOTE=Prime95;293442]I added extern "C" to make MSVC 2010 happy. Extern "C" overrides name-mangling.
Is the new version faster for you? Does it work OK?[/QUOTE] 1.66: [CODE] start M26193103 fft length = 1572864 Iteration 10000 M( 26193103 )C, 0x5c3d0847657d8cff, n = 1572864, CUDALucas v1.66 err = 0.02176 (0:26 real, 2.5921 ms/iter, ETA 18:51:01) Iteration 20000 M( 26193103 )C, 0x1ef2c5a292c0fdb6, n = 1572864, CUDALucas v1.66 err = 0.02219 (0:25 real, 2.5387 ms/iter, ETA 18:27:17) Iteration 30000 M( 26193103 )C, 0x9a07463702e8aa32, n = 1572864, CUDALucas v1.66 err = 0.02219 (0:26 real, 2.5744 ms/iter, ETA 18:42:25) Iteration 40000 M( 26193103 )C, 0x2f16825930638d20, n = 1572864, CUDALucas v1.66 err = 0.02219 (0:27 real, 2.6960 ms/iter, ETA 19:35:00) Iteration 50000 M( 26193103 )C, 0x41e02e29604eb893, n = 1572864, CUDALucas v1.66 err = 0.02219 (0:27 real, 2.7101 ms/iter, ETA 19:40:41) Iteration 60000 M( 26193103 )C, 0x5609ea689ce4cf4d, n = 1572864, CUDALucas v1.66 err = 0.02219 (0:26 real, 2.5640 ms/iter, ETA 18:36:36) [/CODE] 1.67: [CODE] start M26193103 fft length = 1572864 Iteration 10000 M( 26193103 )C, 0x5c3d0847657d8cff, n = 1572864, CUDALucas v1.67 err = 0.023 (0:26 real, 2.5482 ms/iter, ETA 18:31:52) Iteration 20000 M( 26193103 )C, 0x1ef2c5a292c0fdb6, n = 1572864, CUDALucas v1.67 err = 0.023 (0:25 real, 2.5156 ms/iter, ETA 18:17:14) Iteration 30000 M( 26193103 )C, 0x9a07463702e8aa32, n = 1572864, CUDALucas v1.67 err = 0.023 (0:24 real, 2.4494 ms/iter, ETA 17:47:56) Iteration 40000 M( 26193103 )C, 0x2f16825930638d20, n = 1572864, CUDALucas v1.67 err = 0.023 (0:25 real, 2.5086 ms/iter, ETA 18:13:20) Iteration 50000 M( 26193103 )C, 0x41e02e29604eb893, n = 1572864, CUDALucas v1.67 err = 0.023 (0:25 real, 2.5100 ms/iter, ETA 18:13:30) Iteration 60000 M( 26193103 )C, 0x5609ea689ce4cf4d, n = 1572864, CUDALucas v1.67 err = 0.023 (0:25 real, 2.4933 ms/iter, ETA 18:05:48) [/CODE] It's faster :smile: I'll compare a full run tomorrow when this one is done. |
[QUOTE]Could we also have the device info when no parameter is entered and usage is printed? Helps finding the device number...[/QUOTE]Dedicated param -devices will be better...
|
[QUOTE=Svenie25;293496]Thanks a lot.
I found my error. CL created a ini file with the number of the line where to start. I deleted thiese file and then it worked. Again, thanks a lot.[/QUOTE] I'm glad you found the glitch. |
1 Attachment(s)
[QUOTE=msft;293444]I believe 64 bit & SM1.3 is best.[/QUOTE]
Attached v1.67 x64 binaries: [LIST][*]CUDA 3.2 / SM 1.3[/LIST]I tested, let me know how it works for you. Polite and aggressive: [CODE] C:\CUDA\src>CUDALucas1.67.x64.3.2.sm13.exe -d 1 -r DEVICE:1------------------------ name GeForce GTX 580 totalGlobalMem 1610612736 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 totalConstMem 65536 major.minor 2.0 clockRate 1600000 textureAlignment 512 deviceOverlap 1 multiProcessorCount 16 Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 8192, CUDALucas v1.67 err = 1.788e-007 (0:03 real, 0.2453 ms/iter, ETA 0:17) Iteration 10000 M( 132049 )C, 0x4c52a92b54635f9e, n = 8192, CUDALucas v1.67 err= 0.0004272 (0:02 real, 0.2452 ms/iter, ETA 0:29) Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 16384, CUDALucas v1.67 err = 1.144e-005 (0:02 real, 0.2508 ms/iter, ETA 0:50) Iteration 10000 M( 756839 )C, 0x5d2cbe7cb24a109a, n = 40960, CUDALucas v1.67 err = 0.0293 (0:03 real, 0.3425 ms/iter, ETA 4:13) Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 49152, CUDALucas v1.67 err = 0.009033 (0:04 real, 0.3565 ms/iter, ETA 4:59) Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 73728, CUDALucas v1.67 err = 0.00618 (0:04 real, 0.4060 ms/iter, ETA 8:23) Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 73728, CUDALucas v1.67 err = 0.08594 (0:04 real, 0.4034 ms/iter, ETA 9:16) Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.67 err = 0.04297 (0:05 real, 0.5785 ms/iter, ETA 28:32) Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.67 err = 0.0625 (0:06 real, 0.5912 ms/iter, ETA 29:39) Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.67 err = 0.04297 (0:10 real, 0.9328 ms/iter, ETA 1:48:12) Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.67 err = 0.0293 (0:16 real, 1.5683 ms/iter, ETA 5:51:34) Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.67 err = 0.08594 (0:21 real, 2.0962 ms/iter, ETA 12:12:58) Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.67 err = 0.2031 (0:23 real, 2.2819 ms/iter, ETA 15:13:30) Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.67 err = 0.01807 (0:27 real, 2.7415 ms/iter, ETA 19:45:41) Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.67 err = 0.04736 (0:31 real, 3.0937 ms/iter, ETA 26:06:56) Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.67 err = 0.2422 (0:31 real, 3.0718 ms/iter, ETA 27:47:28) err = 0.411133, increasing n from 1966080 Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.67 err = 0.1099 (0:33 real, 3.2419 ms/iter, ETA 33:26:44) Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.67 err = 0.1953 (0:39 real, 3.8797 ms/iter, ETA 45:56:31) Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.67 err = 0.2656 (0:38 real, 3.7448 ms/iter, ETA 44:50:02) C:\CUDA\src>CUDALucas1.67.x64.3.2.sm13.exe -d 1 -aggressive -r DEVICE:1------------------------ name GeForce GTX 580 totalGlobalMem 1610612736 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 totalConstMem 65536 major.minor 2.0 clockRate 1600000 textureAlignment 512 deviceOverlap 1 multiProcessorCount 16 Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 8192, CUDALucas v1.67 err = 1.788e-007 (0:01 real, 0.1059 ms/iter, ETA 0:07) Iteration 10000 M( 132049 )C, 0x4c52a92b54635f9e, n = 8192, CUDALucas v1.67 err= 0.0004272 (0:01 real, 0.1060 ms/iter, ETA 0:12) Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 16384, CUDALucas v1.67 err = 1.144e-005 (0:01 real, 0.1068 ms/iter, ETA 0:21) Iteration 10000 M( 756839 )C, 0x5d2cbe7cb24a109a, n = 40960, CUDALucas v1.67 err = 0.0293 (0:01 real, 0.1468 ms/iter, ETA 1:48) Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 49152, CUDALucas v1.67 err = 0.009033 (0:02 real, 0.1502 ms/iter, ETA 2:06) Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 73728, CUDALucas v1.67 err = 0.00618 (0:02 real, 0.1762 ms/iter, ETA 3:38) Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 73728, CUDALucas v1.67 err = 0.08594 (0:01 real, 0.1658 ms/iter, ETA 3:48) Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v1.67 err = 0.04297 (0:04 real, 0.3241 ms/iter, ETA 15:59) Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v1.67 err = 0.0625 (0:03 real, 0.3239 ms/iter, ETA 16:14) Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.67 err = 0.04297 (0:07 real, 0.6658 ms/iter, ETA 1:17:14) Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.67 err = 0.0293 (0:12 real, 1.2409 ms/iter, ETA 4:38:09) Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.67 err = 0.08594 (0:18 real, 1.7983 ms/iter, ETA 10:28:47) Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.67 err = 0.2031 (0:19 real, 1.9525 ms/iter, ETA 13:01:39) Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.67 err = 0.01807 (0:24 real, 2.3899 ms/iter, ETA 17:13:37) Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.67 err = 0.04736 (0:27 real, 2.7081 ms/iter, ETA 22:51:39) Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.67 err = 0.2422 (0:27 real, 2.7177 ms/iter, ETA 24:35:14) err = 0.40625, increasing n from 1966080 Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.67 err = 0.1099 (0:29 real, 2.9287 ms/iter, ETA 30:12:52) Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.67 err = 0.1953 (0:33 real, 3.3846 ms/iter, ETA 40:04:46) Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.67 err = 0.2656 (0:34 real, 3.3845 ms/iter, ETA 40:31:12) [/CODE] |
CUDA 3.2 sm13 is a lot faster
This is on a GTX 580
4.1 sm20: [CODE]e:\cuda2\CUDALucas1.67.cuda4.1.sm_20.x64.exe -d 1 -threads 512 -c 10000 -aggressive 26193103 >> 26193103.txt Iteration 10880000 M( 26193103 )C, 0x7ab4bd1491575cfb, n = 1572864, CUDALucas v1.67 err = 0.0232 (0:26 real, [COLOR=red]2.6770[/COLOR] ms/iter, ETA [COLOR=red]11:23:04[/COLOR]) Iteration 10890000 M( 26193103 )C, 0x04d40edbdff48f4a, n = 1572864, CUDALucas v1.67 err = 0.0232 (0:27 real, [COLOR=red]2.6778[/COLOR] ms/iter, ETA [COLOR=red]11:22:49[/COLOR]) Iteration 10900000 M( 26193103 )C, 0xb9b9207366261cbe, n = 1572864, CUDALucas v1.67 err = 0.0232 (0:27 real, [COLOR=red]2.6783[/COLOR] ms/iter, ETA [COLOR=red]11:22:31[/COLOR]) Iteration 10910000 M( 26193103 )C, 0x2738902e2e87743d, n = 1572864, CUDALucas v1.67 err = 0.0232 (0:26 real, [COLOR=red]2.6097[/COLOR] ms/iter, ETA [COLOR=red]11:04:36[/COLOR]) ^C caught. Writing checkpoint.[/CODE] 3.2 sm13: [CODE]CUDALucas1.67.cuda3.2.sm_13.x64.exe -d 1 -threads 512 -c 10000 -aggressive 26193103 >> 26193103.txt continuing work from a partial result M26193103 fft length = 1572864 iteration = 10919002 Iteration 10920000 M( 26193103 )C, 0xa5a7b77eb9aafd24, n = 1572864, CUDALucas v1.67 err = 0.01762 (0:02 real, 0.2327 ms/iter, ETA 59:12) Iteration 10930000 M( 26193103 )C, 0xf8b54ad25990bc15, n = 1572864, CUDALucas v1.67 err = 0.01904 (0:23 real, [COLOR=red]2.3203[/COLOR] ms/iter, ETA [COLOR=red]9:50:07[/COLOR]) Iteration 10940000 M( 26193103 )C, 0xd6d0c49220fdb2b1, n = 1572864, CUDALucas v1.67 err = 0.02002 (0:24 real, [COLOR=red]2.3264[/COLOR] ms/iter, ETA [COLOR=red]9:51:17[/COLOR]) Iteration 10950000 M( 26193103 )C, 0xa4757f98b2a34eea, n = 1572864, CUDALucas v1.67 err = 0.02002 (0:23 real, [COLOR=red]2.3349[/COLOR] ms/iter, ETA [COLOR=red]9:53:04[/COLOR]) [/CODE] We'll see if it matches... |
It's nice to know that George is starting to work on GPU programming. :smile:
|
I've got a match with sm_13/cuda4.1 x64 version of latest cudalucas.
Using the same smallest expo. |
1.67 Success
CUDALucas1.67.cuda3.2.sm_13.x64.exe -d 1 -threads 512 -c 10000 -aggressive 26193103
[CODE]Processing result: M( 26193103 )C, 0x9ad25d21f58dbda8, n = 1572864, CUDALucas v1.67 LL test successfully completes double-check of M26193103 [/CODE] |
1 Attachment(s)
[QUOTE=flashjh;293318]I'm sure I'm missing something, but what is the method to choose the best FFT size? Where did you get these values?[/QUOTE]
[QUOTE=msft;293340]Hi ,flashjh[/QUOTE] Attached cufftbench x64 binaries: - CUDA 3.2 | SM 1.3 - CUDA 4.0 | SM 2.0 - CUDA 4.1 | SM 2.0 - CUDA 4.1 | SM 2.1 Only supports first video card. @msft: Two things. 1) Can you incorporate the -d option in in this program without too much trouble? 2) I looked through the source and didn't see a way to specify a range; Can you incorporate a range option? Thanks. Edit: .h files and makefiles are for compiling - not needed to run. |
[QUOTE=Brain;293515]Dedicated param -devices will be better...[/QUOTE]
Understand. |
[QUOTE=flashjh;293653]@msft: Two things. 1) Can you incorporate the -d option in in this program without too much trouble? 2) I looked through the source and didn't see a way to specify a range; Can you incorporate a range option? Thanks.[/QUOTE]
I'll marge to CUDALucas. $ ./CLUDALucas -d 1 -cufftbench 1048576,2097152,65536 |
| All times are UTC. The time now is 23:13. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.