![]() |
I just tried r52 - no luck. I get the error "device_number >= device_count". I'm presently running CUDALucas 2.00 without problems on this Windows 7 box with a GTX 460.
|
[QUOTE=Prime95;362071]I just tried r52 - no luck. I get the error "device_number >= device_count". I'm presently running CUDALucas 2.00 without problems on this Windows 7 box with a GTX 460.[/QUOTE]I get that error if I use version 2xx.xx drivers with r52. Upgrading the drivers solved this for me.
|
I'm getting bad selftests on a GTX460 with r52. I have never had this before with earlier versions.
[CODE] C:\Users\John\Desktop\cudalucas>CUDALucas_205Beta_x64_r52.exe -r ------- DEVICE 0 ------- name GeForce GTX 460 Compatibility 2.1 clockRate (MHz) 1430 memClockRate (MHz) 1800 totalGlobalMem 1073741824 totalConstMem 65536 l2CacheSize 524288 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 7 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment 512 deviceOverlap 1 Using threads: norm1 256, mult 128, norm2 128. Starting self test M86243 fft length = 4K Iteration 10000 / 86243, 0x23992ccd735a03d9, 4K, CUDALucas v2.05 Beta err = 0.26563 (0:01 real, 0.0651 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M132049 fft length = 8K Iteration 10000 / 132049, 0x4c52a92b54635f9e, 8K, CUDALucas v2.05 Beta err = 0.00046 (0:01 real, 0.0709 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M216091 fft length = 16K Iteration 10000 / 216091, 0x30247786758b8792, 16K, CUDALucas v2.05 Beta err = 0.00001 (0:00 real, 0.0884 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M756839 fft length = 40K Iteration 10000 / 756839, 0x5d2cbe7cb24a109a, 40K, CUDALucas v2.05 Beta err = 0.03320 (0:02 real, 0.1868 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M859433 fft length = 48K Iteration 10000 / 859433, 0x3c4ad525c2d0aed0, 48K, CUDALucas v2.05 Beta err = 0.01074 (0:02 real, 0.1988 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M1257787 fft length = 64K Iteration 10000 / 1257787, 0x3f45bf9bea7213ea, 64K, CUDALucas v2.05 Beta err = 0.10938 (0:03 real, 0.2440 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M1398269 fft length = 128K Iteration 10000 / 1398269, 0xa4a6d2f0e34629db, 128K, CUDALucas v2.05 Beta err = 0.00000 (0:04 real, 0.4409 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M2976221 fft length = 256K Iteration 10000 / 2976221, 0x2a7111b7f70fea2f, 256K, CUDALucas v2.05 Beta err = 0.00001 (0:09 real, 0.8995 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M3021377 fft length = 256K Iteration 10000 / 3021377, 0x6387a70a85d46baf, 256K, CUDALucas v2.05 Beta err = 0.00001 (0:09 real, 0.8994 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M6972593 fft length = 512K Iteration 10000 / 6972593, 0x88f1d2640adb89e1, 512K, CUDALucas v2.05 Beta err = 0.00011 (0:18 real, 1.7766 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M13466917 fft length = 1024K Iteration 10000 / 13466917, 0x9fdc1f4092b15d69, 1024K, CUDALucas v2.05 Beta err = 0.00009 (0:37 real, 3.6937 ms/iter) This residue is correct. The fft length 2048K is too large for exponent 20996011, decreasing to 1024K Using threads: norm1 256, mult 128, norm2 128. Starting self test M20996011 fft length = 1024K Iteration 10000 / 20996011, 0x2a354d3a0f96e64e, 1024K, CUDALucas v2.05 Beta err = 0.50000 (0:37 real, 3.6876 ms/iter) [COLOR=red]Expected residue [5fc58920a821da11] does not match actual residue [2a354d3a0f96e64e] [/COLOR]The fft length 2048K is too large for exponent 24036583, decreasing to 1024K Using threads: norm1 256, mult 128, norm2 128. Starting self test M24036583 fft length = 1024K Iteration 10000 / 24036583, 0x47fba1785d32a924, 1024K, CUDALucas v2.05 Beta err = 1.00000 (0:51 real, 5.1785 ms/iter) [COLOR=red]Expected residue [cbdef38a0bdc4f00] does not match actual residue [47fba1785d32a924][/COLOR] Using threads: norm1 256, mult 128, norm2 128. Starting self test M25964951 fft length = 2048K Iteration 10000 / 25964951, 0x62eb3ff0a5f6237c, 2048K, CUDALucas v2.05 Beta err = 0.00008 (1:14 real, 7.4363 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M30402457 fft length = 2048K Iteration 10000 / 30402457, 0x0b8600ef47e69d27, 2048K, CUDALucas v2.05 Beta err = 0.00131 (1:15 real, 7.4195 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M32582657 fft length = 2048K Iteration 10000 / 32582657, 0x02751b7fcec76bb1, 2048K, CUDALucas v2.05 Beta err = 0.00537 (1:14 real, 7.4358 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M37156667 fft length = 2048K Iteration 10000 / 37156667, 0x67ad7646a1fad514, 2048K, CUDALucas v2.05 Beta err = 0.11719 (1:14 real, 7.4356 ms/iter) This residue is correct. The fft length 4096K is too large for exponent 42643801, decreasing to 2048K Using threads: norm1 256, mult 128, norm2 128. Starting self test M42643801 fft length = 2048K Iteration 10000 / 42643801, 0x93ec1e0141513b57, 2048K, CUDALucas v2.05 Beta err = 1.00000 (1:15 real, 7.4357 ms/iter) [COLOR=red]Expected residue [8f90d78d5007bba7] does not match actual residue [93ec1e0141513b57] [/COLOR]The fft length 4096K is too large for exponent 43112609, decreasing to 2048K Using threads: norm1 256, mult 128, norm2 128. Starting self test M43112609 fft length = 2048K Iteration 10000 / 43112609, 0x93f526f2d01c1686, 2048K, CUDALucas v2.05 Beta err = 1.00000 (1:14 real, 7.4352 ms/iter) [COLOR=red]Expected residue [e86891ebf6cd70c4] does not match actual residue [93f526f2d01c1686] [/COLOR]Using threads: norm1 256, mult 128, norm2 128. Starting self test M57885161 fft length = 4096K Iteration 10000 / 57885161, 0x76c27556683cd84d, 4096K, CUDALucas v2.05 Beta err = 0.00076 (2:37 real, 15.7022 ms/iter) This residue is correct. [COLOR=red]Error: There were 4 bad selftests! [/COLOR]C:\Users\John\Desktop\cudalucas>pause Press any key to continue . . . [/CODE] |
I can't speak for the bad self test yet, but the other problems are probably from the driver version, as stated above. I build with CUDA 5.5 now. If you need a different version let me know and I'll try to build one. Otherwise, updating to the newest drivers should fix the problem.
The bad self test may have something to do with FFT selection. We'll look at it. |
[QUOTE=mognuts;362083]I get that error if I use version 2xx.xx drivers with r52. Upgrading the drivers solved this for me.[/QUOTE]
I'm using driver 311.06. I'll try a newer one. |
[QUOTE=mognuts;362084]I'm getting bad selftests on a GTX460 with r52. I have never had this before with earlier versions.[/QUOTE]
FWIW, my GTX460 passes the selftest. I do have one minor bug. I ran "-cufftbench 2000 4100 1". It ran all the benches successfully, but the file to mail to james contained only one line for FFT length 2048K. |
[QUOTE=Prime95;362117]FWIW, my GTX460 passes the selftest.
I do have one minor bug. I ran "-cufftbench 2000 4100 1". It ran all the benches successfully, but the file to mail to james contained only one line for FFT length 2048K.[/QUOTE] -cufftbench is broken for me with r52. It crashes but doesn't bring down the driver. Makes no difference if I'm benchmarking a range of FFTs, or threads for a given FFT. r50 was fine. |
A lot of code was re written for r52. Will need to debugging. Keep posting errors and bugs, thanks :smile:
|
[QUOTE=mognuts;362084]I'm getting bad selftests on a GTX460 with r52. I have never had this before with earlier versions.
[CODE] C:\Users\John\Desktop\cudalucas>CUDALucas_205Beta_x64_r52.exe -r ------- DEVICE 0 ------- name GeForce GTX 460 Compatibility 2.1 clockRate (MHz) 1430 memClockRate (MHz) 1800 totalGlobalMem 1073741824 totalConstMem 65536 l2CacheSize 524288 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 7 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment 512 deviceOverlap 1 Using threads: norm1 256, mult 128, norm2 128. Starting self test M86243 fft length = 4K Iteration 10000 / 86243, 0x23992ccd735a03d9, 4K, CUDALucas v2.05 Beta err = 0.26563 (0:01 real, 0.0651 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M132049 fft length = 8K Iteration 10000 / 132049, 0x4c52a92b54635f9e, 8K, CUDALucas v2.05 Beta err = 0.00046 (0:01 real, 0.0709 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M216091 fft length = 16K Iteration 10000 / 216091, 0x30247786758b8792, 16K, CUDALucas v2.05 Beta err = 0.00001 (0:00 real, 0.0884 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M756839 fft length = 40K Iteration 10000 / 756839, 0x5d2cbe7cb24a109a, 40K, CUDALucas v2.05 Beta err = 0.03320 (0:02 real, 0.1868 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M859433 fft length = 48K Iteration 10000 / 859433, 0x3c4ad525c2d0aed0, 48K, CUDALucas v2.05 Beta err = 0.01074 (0:02 real, 0.1988 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M1257787 fft length = 64K Iteration 10000 / 1257787, 0x3f45bf9bea7213ea, 64K, CUDALucas v2.05 Beta err = 0.10938 (0:03 real, 0.2440 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M1398269 fft length = 128K Iteration 10000 / 1398269, 0xa4a6d2f0e34629db, 128K, CUDALucas v2.05 Beta err = 0.00000 (0:04 real, 0.4409 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M2976221 fft length = 256K Iteration 10000 / 2976221, 0x2a7111b7f70fea2f, 256K, CUDALucas v2.05 Beta err = 0.00001 (0:09 real, 0.8995 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M3021377 fft length = 256K Iteration 10000 / 3021377, 0x6387a70a85d46baf, 256K, CUDALucas v2.05 Beta err = 0.00001 (0:09 real, 0.8994 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M6972593 fft length = 512K Iteration 10000 / 6972593, 0x88f1d2640adb89e1, 512K, CUDALucas v2.05 Beta err = 0.00011 (0:18 real, 1.7766 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M13466917 fft length = 1024K Iteration 10000 / 13466917, 0x9fdc1f4092b15d69, 1024K, CUDALucas v2.05 Beta err = 0.00009 (0:37 real, 3.6937 ms/iter) This residue is correct. The fft length 2048K is too large for exponent 20996011, decreasing to 1024K Using threads: norm1 256, mult 128, norm2 128. Starting self test M20996011 fft length = 1024K Iteration 10000 / 20996011, 0x2a354d3a0f96e64e, 1024K, CUDALucas v2.05 Beta err = 0.50000 (0:37 real, 3.6876 ms/iter) [COLOR=red]Expected residue [5fc58920a821da11] does not match actual residue [2a354d3a0f96e64e] [/COLOR]The fft length 2048K is too large for exponent 24036583, decreasing to 1024K Using threads: norm1 256, mult 128, norm2 128. Starting self test M24036583 fft length = 1024K Iteration 10000 / 24036583, 0x47fba1785d32a924, 1024K, CUDALucas v2.05 Beta err = 1.00000 (0:51 real, 5.1785 ms/iter) [COLOR=red]Expected residue [cbdef38a0bdc4f00] does not match actual residue [47fba1785d32a924][/COLOR] Using threads: norm1 256, mult 128, norm2 128. Starting self test M25964951 fft length = 2048K Iteration 10000 / 25964951, 0x62eb3ff0a5f6237c, 2048K, CUDALucas v2.05 Beta err = 0.00008 (1:14 real, 7.4363 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M30402457 fft length = 2048K Iteration 10000 / 30402457, 0x0b8600ef47e69d27, 2048K, CUDALucas v2.05 Beta err = 0.00131 (1:15 real, 7.4195 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M32582657 fft length = 2048K Iteration 10000 / 32582657, 0x02751b7fcec76bb1, 2048K, CUDALucas v2.05 Beta err = 0.00537 (1:14 real, 7.4358 ms/iter) This residue is correct. Using threads: norm1 256, mult 128, norm2 128. Starting self test M37156667 fft length = 2048K Iteration 10000 / 37156667, 0x67ad7646a1fad514, 2048K, CUDALucas v2.05 Beta err = 0.11719 (1:14 real, 7.4356 ms/iter) This residue is correct. The fft length 4096K is too large for exponent 42643801, decreasing to 2048K Using threads: norm1 256, mult 128, norm2 128. Starting self test M42643801 fft length = 2048K Iteration 10000 / 42643801, 0x93ec1e0141513b57, 2048K, CUDALucas v2.05 Beta err = 1.00000 (1:15 real, 7.4357 ms/iter) [COLOR=red]Expected residue [8f90d78d5007bba7] does not match actual residue [93ec1e0141513b57] [/COLOR]The fft length 4096K is too large for exponent 43112609, decreasing to 2048K Using threads: norm1 256, mult 128, norm2 128. Starting self test M43112609 fft length = 2048K Iteration 10000 / 43112609, 0x93f526f2d01c1686, 2048K, CUDALucas v2.05 Beta err = 1.00000 (1:14 real, 7.4352 ms/iter) [COLOR=red]Expected residue [e86891ebf6cd70c4] does not match actual residue [93f526f2d01c1686] [/COLOR]Using threads: norm1 256, mult 128, norm2 128. Starting self test M57885161 fft length = 4096K Iteration 10000 / 57885161, 0x76c27556683cd84d, 4096K, CUDALucas v2.05 Beta err = 0.00076 (2:37 real, 15.7022 ms/iter) This residue is correct. [COLOR=red]Error: There were 4 bad selftests! [/COLOR]C:\Users\John\Desktop\cudalucas>pause Press any key to continue . . . [/CODE][/QUOTE] This should be fixed with r53. Forgot to reinitialize a pointer after freeing the memory. |
[QUOTE=Prime95;362117]FWIW, my GTX460 passes the selftest.
I do have one minor bug. I ran "-cufftbench 2000 4100 1". It ran all the benches successfully, but the file to mail to james contained only one line for FFT length 2048K.[/QUOTE] Found the problem. I was making the silly assumption that limits would always be powers of 2. I should have the time to fix it tonight. |
[QUOTE=mognuts;362127]-cufftbench is broken for me with r52. It crashes but doesn't bring down the driver. Makes no difference if I'm benchmarking a range of FFTs, or threads for a given FFT. r50 was fine.[/QUOTE]
Crashes how? |
| All times are UTC. The time now is 23:09. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.