mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

Prime95 2013-12-15 03:40

I just tried r52 - no luck. I get the error "device_number >= device_count". I'm presently running CUDALucas 2.00 without problems on this Windows 7 box with a GTX 460.

mognuts 2013-12-15 10:32

[QUOTE=Prime95;362071]I just tried r52 - no luck. I get the error "device_number >= device_count". I'm presently running CUDALucas 2.00 without problems on this Windows 7 box with a GTX 460.[/QUOTE]I get that error if I use version 2xx.xx drivers with r52. Upgrading the drivers solved this for me.

mognuts 2013-12-15 11:45

I'm getting bad selftests on a GTX460 with r52. I have never had this before with earlier versions.

[CODE]
C:\Users\John\Desktop\cudalucas>CUDALucas_205Beta_x64_r52.exe -r
------- DEVICE 0 -------
name GeForce GTX 460
Compatibility 2.1
clockRate (MHz) 1430
memClockRate (MHz) 1800
totalGlobalMem 1073741824
totalConstMem 65536
l2CacheSize 524288
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsPerMP 1536
multiProcessorCount 7
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
textureAlignment 512
deviceOverlap 1
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M86243 fft length = 4K
Iteration 10000 / 86243, 0x23992ccd735a03d9, 4K, CUDALucas v2.05 Beta err = 0.26563 (0:01 real, 0.0651 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M132049 fft length = 8K
Iteration 10000 / 132049, 0x4c52a92b54635f9e, 8K, CUDALucas v2.05 Beta err = 0.00046 (0:01 real, 0.0709 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M216091 fft length = 16K
Iteration 10000 / 216091, 0x30247786758b8792, 16K, CUDALucas v2.05 Beta err = 0.00001 (0:00 real, 0.0884 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M756839 fft length = 40K
Iteration 10000 / 756839, 0x5d2cbe7cb24a109a, 40K, CUDALucas v2.05 Beta err = 0.03320 (0:02 real, 0.1868 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M859433 fft length = 48K
Iteration 10000 / 859433, 0x3c4ad525c2d0aed0, 48K, CUDALucas v2.05 Beta err = 0.01074 (0:02 real, 0.1988 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M1257787 fft length = 64K
Iteration 10000 / 1257787, 0x3f45bf9bea7213ea, 64K, CUDALucas v2.05 Beta err = 0.10938 (0:03 real, 0.2440 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M1398269 fft length = 128K
Iteration 10000 / 1398269, 0xa4a6d2f0e34629db, 128K, CUDALucas v2.05 Beta err = 0.00000 (0:04 real, 0.4409 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M2976221 fft length = 256K
Iteration 10000 / 2976221, 0x2a7111b7f70fea2f, 256K, CUDALucas v2.05 Beta err = 0.00001 (0:09 real, 0.8995 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M3021377 fft length = 256K
Iteration 10000 / 3021377, 0x6387a70a85d46baf, 256K, CUDALucas v2.05 Beta err = 0.00001 (0:09 real, 0.8994 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M6972593 fft length = 512K
Iteration 10000 / 6972593, 0x88f1d2640adb89e1, 512K, CUDALucas v2.05 Beta err = 0.00011 (0:18 real, 1.7766 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M13466917 fft length = 1024K
Iteration 10000 / 13466917, 0x9fdc1f4092b15d69, 1024K, CUDALucas v2.05 Beta err = 0.00009 (0:37 real, 3.6937 ms/iter)
This residue is correct.
The fft length 2048K is too large for exponent 20996011, decreasing to 1024K
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M20996011 fft length = 1024K
Iteration 10000 / 20996011, 0x2a354d3a0f96e64e, 1024K, CUDALucas v2.05 Beta err = 0.50000 (0:37 real, 3.6876 ms/iter)
[COLOR=red]Expected residue [5fc58920a821da11] does not match actual residue [2a354d3a0f96e64e]
[/COLOR]The fft length 2048K is too large for exponent 24036583, decreasing to 1024K
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M24036583 fft length = 1024K
Iteration 10000 / 24036583, 0x47fba1785d32a924, 1024K, CUDALucas v2.05 Beta err = 1.00000 (0:51 real, 5.1785 ms/iter)
[COLOR=red]Expected residue [cbdef38a0bdc4f00] does not match actual residue [47fba1785d32a924][/COLOR]
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M25964951 fft length = 2048K
Iteration 10000 / 25964951, 0x62eb3ff0a5f6237c, 2048K, CUDALucas v2.05 Beta err = 0.00008 (1:14 real, 7.4363 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M30402457 fft length = 2048K
Iteration 10000 / 30402457, 0x0b8600ef47e69d27, 2048K, CUDALucas v2.05 Beta err = 0.00131 (1:15 real, 7.4195 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M32582657 fft length = 2048K
Iteration 10000 / 32582657, 0x02751b7fcec76bb1, 2048K, CUDALucas v2.05 Beta err = 0.00537 (1:14 real, 7.4358 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M37156667 fft length = 2048K
Iteration 10000 / 37156667, 0x67ad7646a1fad514, 2048K, CUDALucas v2.05 Beta err = 0.11719 (1:14 real, 7.4356 ms/iter)
This residue is correct.
The fft length 4096K is too large for exponent 42643801, decreasing to 2048K
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M42643801 fft length = 2048K
Iteration 10000 / 42643801, 0x93ec1e0141513b57, 2048K, CUDALucas v2.05 Beta err = 1.00000 (1:15 real, 7.4357 ms/iter)
[COLOR=red]Expected residue [8f90d78d5007bba7] does not match actual residue [93ec1e0141513b57]
[/COLOR]The fft length 4096K is too large for exponent 43112609, decreasing to 2048K
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M43112609 fft length = 2048K
Iteration 10000 / 43112609, 0x93f526f2d01c1686, 2048K, CUDALucas v2.05 Beta err = 1.00000 (1:14 real, 7.4352 ms/iter)
[COLOR=red]Expected residue [e86891ebf6cd70c4] does not match actual residue [93f526f2d01c1686]
[/COLOR]Using threads: norm1 256, mult 128, norm2 128.
Starting self test M57885161 fft length = 4096K
Iteration 10000 / 57885161, 0x76c27556683cd84d, 4096K, CUDALucas v2.05 Beta err = 0.00076 (2:37 real, 15.7022 ms/iter)
This residue is correct.
[COLOR=red]Error: There were 4 bad selftests!
[/COLOR]C:\Users\John\Desktop\cudalucas>pause
Press any key to continue . . .
[/CODE]

flashjh 2013-12-15 12:03

I can't speak for the bad self test yet, but the other problems are probably from the driver version, as stated above. I build with CUDA 5.5 now. If you need a different version let me know and I'll try to build one. Otherwise, updating to the newest drivers should fix the problem.

The bad self test may have something to do with FFT selection. We'll look at it.

Prime95 2013-12-15 16:47

[QUOTE=mognuts;362083]I get that error if I use version 2xx.xx drivers with r52. Upgrading the drivers solved this for me.[/QUOTE]

I'm using driver 311.06. I'll try a newer one.

Prime95 2013-12-15 18:00

[QUOTE=mognuts;362084]I'm getting bad selftests on a GTX460 with r52. I have never had this before with earlier versions.[/QUOTE]

FWIW, my GTX460 passes the selftest.

I do have one minor bug. I ran "-cufftbench 2000 4100 1". It ran all the benches successfully, but the file to mail to james contained only one line for FFT length 2048K.

mognuts 2013-12-15 18:53

[QUOTE=Prime95;362117]FWIW, my GTX460 passes the selftest.

I do have one minor bug. I ran "-cufftbench 2000 4100 1". It ran all the benches successfully, but the file to mail to james contained only one line for FFT length 2048K.[/QUOTE] -cufftbench is broken for me with r52. It crashes but doesn't bring down the driver. Makes no difference if I'm benchmarking a range of FFTs, or threads for a given FFT. r50 was fine.

flashjh 2013-12-15 19:12

A lot of code was re written for r52. Will need to debugging. Keep posting errors and bugs, thanks :smile:

owftheevil 2013-12-16 14:48

[QUOTE=mognuts;362084]I'm getting bad selftests on a GTX460 with r52. I have never had this before with earlier versions.

[CODE]
C:\Users\John\Desktop\cudalucas>CUDALucas_205Beta_x64_r52.exe -r
------- DEVICE 0 -------
name GeForce GTX 460
Compatibility 2.1
clockRate (MHz) 1430
memClockRate (MHz) 1800
totalGlobalMem 1073741824
totalConstMem 65536
l2CacheSize 524288
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsPerMP 1536
multiProcessorCount 7
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
textureAlignment 512
deviceOverlap 1
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M86243 fft length = 4K
Iteration 10000 / 86243, 0x23992ccd735a03d9, 4K, CUDALucas v2.05 Beta err = 0.26563 (0:01 real, 0.0651 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M132049 fft length = 8K
Iteration 10000 / 132049, 0x4c52a92b54635f9e, 8K, CUDALucas v2.05 Beta err = 0.00046 (0:01 real, 0.0709 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M216091 fft length = 16K
Iteration 10000 / 216091, 0x30247786758b8792, 16K, CUDALucas v2.05 Beta err = 0.00001 (0:00 real, 0.0884 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M756839 fft length = 40K
Iteration 10000 / 756839, 0x5d2cbe7cb24a109a, 40K, CUDALucas v2.05 Beta err = 0.03320 (0:02 real, 0.1868 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M859433 fft length = 48K
Iteration 10000 / 859433, 0x3c4ad525c2d0aed0, 48K, CUDALucas v2.05 Beta err = 0.01074 (0:02 real, 0.1988 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M1257787 fft length = 64K
Iteration 10000 / 1257787, 0x3f45bf9bea7213ea, 64K, CUDALucas v2.05 Beta err = 0.10938 (0:03 real, 0.2440 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M1398269 fft length = 128K
Iteration 10000 / 1398269, 0xa4a6d2f0e34629db, 128K, CUDALucas v2.05 Beta err = 0.00000 (0:04 real, 0.4409 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M2976221 fft length = 256K
Iteration 10000 / 2976221, 0x2a7111b7f70fea2f, 256K, CUDALucas v2.05 Beta err = 0.00001 (0:09 real, 0.8995 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M3021377 fft length = 256K
Iteration 10000 / 3021377, 0x6387a70a85d46baf, 256K, CUDALucas v2.05 Beta err = 0.00001 (0:09 real, 0.8994 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M6972593 fft length = 512K
Iteration 10000 / 6972593, 0x88f1d2640adb89e1, 512K, CUDALucas v2.05 Beta err = 0.00011 (0:18 real, 1.7766 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M13466917 fft length = 1024K
Iteration 10000 / 13466917, 0x9fdc1f4092b15d69, 1024K, CUDALucas v2.05 Beta err = 0.00009 (0:37 real, 3.6937 ms/iter)
This residue is correct.
The fft length 2048K is too large for exponent 20996011, decreasing to 1024K
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M20996011 fft length = 1024K
Iteration 10000 / 20996011, 0x2a354d3a0f96e64e, 1024K, CUDALucas v2.05 Beta err = 0.50000 (0:37 real, 3.6876 ms/iter)
[COLOR=red]Expected residue [5fc58920a821da11] does not match actual residue [2a354d3a0f96e64e]
[/COLOR]The fft length 2048K is too large for exponent 24036583, decreasing to 1024K
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M24036583 fft length = 1024K
Iteration 10000 / 24036583, 0x47fba1785d32a924, 1024K, CUDALucas v2.05 Beta err = 1.00000 (0:51 real, 5.1785 ms/iter)
[COLOR=red]Expected residue [cbdef38a0bdc4f00] does not match actual residue [47fba1785d32a924][/COLOR]
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M25964951 fft length = 2048K
Iteration 10000 / 25964951, 0x62eb3ff0a5f6237c, 2048K, CUDALucas v2.05 Beta err = 0.00008 (1:14 real, 7.4363 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M30402457 fft length = 2048K
Iteration 10000 / 30402457, 0x0b8600ef47e69d27, 2048K, CUDALucas v2.05 Beta err = 0.00131 (1:15 real, 7.4195 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M32582657 fft length = 2048K
Iteration 10000 / 32582657, 0x02751b7fcec76bb1, 2048K, CUDALucas v2.05 Beta err = 0.00537 (1:14 real, 7.4358 ms/iter)
This residue is correct.
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M37156667 fft length = 2048K
Iteration 10000 / 37156667, 0x67ad7646a1fad514, 2048K, CUDALucas v2.05 Beta err = 0.11719 (1:14 real, 7.4356 ms/iter)
This residue is correct.
The fft length 4096K is too large for exponent 42643801, decreasing to 2048K
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M42643801 fft length = 2048K
Iteration 10000 / 42643801, 0x93ec1e0141513b57, 2048K, CUDALucas v2.05 Beta err = 1.00000 (1:15 real, 7.4357 ms/iter)
[COLOR=red]Expected residue [8f90d78d5007bba7] does not match actual residue [93ec1e0141513b57]
[/COLOR]The fft length 4096K is too large for exponent 43112609, decreasing to 2048K
Using threads: norm1 256, mult 128, norm2 128.
Starting self test M43112609 fft length = 2048K
Iteration 10000 / 43112609, 0x93f526f2d01c1686, 2048K, CUDALucas v2.05 Beta err = 1.00000 (1:14 real, 7.4352 ms/iter)
[COLOR=red]Expected residue [e86891ebf6cd70c4] does not match actual residue [93f526f2d01c1686]
[/COLOR]Using threads: norm1 256, mult 128, norm2 128.
Starting self test M57885161 fft length = 4096K
Iteration 10000 / 57885161, 0x76c27556683cd84d, 4096K, CUDALucas v2.05 Beta err = 0.00076 (2:37 real, 15.7022 ms/iter)
This residue is correct.
[COLOR=red]Error: There were 4 bad selftests!
[/COLOR]C:\Users\John\Desktop\cudalucas>pause
Press any key to continue . . .
[/CODE][/QUOTE]

This should be fixed with r53. Forgot to reinitialize a pointer after freeing the memory.

owftheevil 2013-12-16 15:00

[QUOTE=Prime95;362117]FWIW, my GTX460 passes the selftest.

I do have one minor bug. I ran "-cufftbench 2000 4100 1". It ran all the benches successfully, but the file to mail to james contained only one line for FFT length 2048K.[/QUOTE]

Found the problem. I was making the silly assumption that limits would always be powers of 2. I should have the time to fix it tonight.

owftheevil 2013-12-16 15:02

[QUOTE=mognuts;362127]-cufftbench is broken for me with r52. It crashes but doesn't bring down the driver. Makes no difference if I'm benchmarking a range of FFTs, or threads for a given FFT. r50 was fine.[/QUOTE]

Crashes how?


All times are UTC. The time now is 23:09.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.