mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

msft 2012-03-12 06:58

[QUOTE=LaurV;292662]Anyhow, if two copies are testing the same exponent (in two different folders) then [B]-s can not be used[/B], as they will try writing the [B]SAME[/B] checkpoint files. The idea with the "backup" subfolder was to have it [B]in the current folder[/B], and not in the root of the disk... Like in ".\backup\......." and not "c:\backup\....." Anyhow, you could argue that no one will test the same expo with more copies of CL in the same time, but in the case you re-test the same expo later using -s, the chechpoint files will be overwritten too... Why not let the user to customize the output path?[/QUOTE]
It is bug with Windows.
Can someone fix this bug?

msft 2012-03-12 07:01

[QUOTE=Brain;292671]I cannot even run low res playback with 1.64. Because of lags / bad responsiveness. I suggest - again - a command line switch: for example: --polite or --agressive where --polite would be default. This would insert an artificial CUDA wait loop where other apps (playback) have a go.

It was introduced when an unnecessary CudaMemCpy was killed.[/QUOTE]
It is mean,CudaMemCpy fix bad responsiveness with Windows?

LaurV 2012-03-12 07:14

[QUOTE=msft;292682]It is bug with Windows.
Can someone fix this bug?[/QUOTE]
I never heard of such windows bug, therefore I went to have a look into CL1.64 source, in Cudalucas.cu at line 1330 we can see:

[CODE]#ifdef linux
mode_t mode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
if (mkdir ("./backup", mode) != 0)
printf ("mkdir: cannot create directory `backup': File exists\n");
#else
if (_mkdir ("\\backup") != 0)
printf ("mkdir: cannot create directory `backup': File exists\n");
#endif
[/CODE]which, when no linux, it will create a backup folder in the ROOT of the current windows disk. This isn't what was intended, is it?
[edit: for Jerry or other builders, the double backslash should be eliminated in front of backup, to create the backup dir as a subfolder of the current folder. It also appears at line 1079, when written.]

Brain 2012-03-12 08:41

[QUOTE=msft;292683]It is mean,CudaMemCpy fix bad responsiveness with Windows?[/QUOTE]
CudaMemCpy was removed by ethan in his 1.3
--> GPU usage went from 97% to 99% (good, faster)
--> But same time, displays became laggy (not good)

Suggestion: Reduce GPU usage a little bit to allow other apps to access device.

Formerly, the unneeded CudaMemCpy did this waiting...

LaurV 2012-03-12 09:07

[QUOTE=Brain;292693]CudaMemCpy was removed by ethan in his 1.3
--> GPU usage went from 97% to 99% (good, faster)
--> But same time, displays became laggy (not good)
Suggestion: Reduce GPU usage a little bit to allow other apps to access device.
Formerly, the unneeded CudaMemCpy did this waiting...[/QUOTE]
This happens in the reverse way when CL jumped from 1.58 to 1.63 (I don't know exactly where 1.61 falls):
-- GPU went from 99% to 92-95% (GTX580, Tesla) - the "a bit slower" was not visible due to better fft size, changing some tempvars from double to float, etc, compensatory stuff, which in fact made CL 1.63 and 1.64 faster (letting apart the -t switch, which slows things down 2-3 percents, we are talking about speed comparison without -t).
-- And same time the computer (display, cpu) become more responsive.

Running with -t will lower the speed more, but in the same time the GPU will be less busy (I believe that checking the errors at every iteration make the GPU to wait longer), like 88% instead of 95%, or even lower, 83%, the computer will be even more responsive and the process will be safer, not talking about the consumed energy and the produced heat, which would also be lower.

So, why won't you try to use -t? For me it works nice with it. If this "GPU-busy percent" could be parametrized, it could be even better.

msft 2012-03-12 10:11

Do you like?
[code]
cudalucas.1.65$ ./CUDALucas
Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] [-agressive] -r|exponent|input_filename
-threads set threads number(default=256)
-f set fft length
-s save all checkpoint files
-t check round off error all iterations
[/code]

LaurV 2012-03-12 10:17

[QUOTE=msft;292703]Do you like?
[code]
cudalucas.1.65$ ./CUDALucas
Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] [-agressive] -r|exponent|input_filename
-threads set threads number(default=256)
-f set fft length
-s save all checkpoint files
-t check round off error all iterations
[/code][/QUOTE]
yeaaa... (where the hack is the salivating smiley???)
:smile:

edit: well, [-s folder] would sound perfect... :razz:

Karl M Johnson 2012-03-12 12:08

[QUOTE=msft;292703]Do you like?
[code]
cudalucas.1.65$ ./CUDALucas
Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] [-agressive] -r|exponent|input_filename
-threads set threads number(default=256)
-f set fft length
-s save all checkpoint files
-t check round off error all iterations
[/code][/QUOTE]
Love it!

msft 2012-03-12 13:21

1 Attachment(s)
Ver 1.65
1) change behavior round off error
iterations < 1000:increasing fft length
iterations >= 1000:exit program
2) print maxerror
3) change -s option
4) add -agressive option
5) add -threads option
[code]
cudalucas.1.65$ ./CUDALucas
Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s folder] [-t] [-agressive] -r|exponent|input_filename
-threads set threads number(default=256)
-f set fft length
-s save all checkpoint files
-t check round off error all iterations
-agressive GPU agressive(default polite)
cudalucas.1.65$ ./CUDALucas -threads 1024 -r
DEVICE:0------------------------
name GeForce GTX 460
~~~
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.65 err = 0.04723 (0:20 real, 1.9987 ms/iter, ETA 3:51:51)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.65 err = 0.03019 (0:39 real, 3.9262 ms/iter, ETA 14:40:07)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.65 err = 0.09749 (0:54 real, 5.3697 ms/iter, ETA 31:17:36)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.65 err = 0.1996 (1:03 real, 6.2895 ms/iter, ETA 41:57:54)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.65 err = 0.01873 (1:17 real, 7.7218 ms/iter, ETA 55:39:40)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.65 err = 0.02155 (1:26 real, 8.6305 ms/iter, ETA 72:51:20)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.65 err = 0.1181 (1:27 real, 8.6291 ms/iter, ETA 78:04:10)
err = 0.441193, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.65 err = 0.1117 (1:35 real, 9.4234 ms/iter, ETA 97:13:04)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.65 err = 0.1871 (1:50 real, 10.9708 ms/iter, ETA 129:54:47)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.65 err = 0.2798 (1:50 real, 10.9809 ms/iter, ETA 131:27:57)
[/code]

flashjh 2012-03-12 13:47

v1.65 x64 binaries (untested)
 
1 Attachment(s)
[QUOTE=msft;292729]Ver 1.65
1) change behavior round off error
iterations < 1000:increasing fft length
iterations >= 1000:exit program
2) print maxerror
3) change -s option
4) add -agressive option
5) add -threads option
[code]
cudalucas.1.65$ ./CUDALucas
Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s folder] [-t] [-agressive] -r|exponent|input_filename
-threads set threads number(default=256)
-f set fft length
-s save all checkpoint files
-t check round off error all iterations
-agressive GPU agressive(default polite)
cudalucas.1.65$ ./CUDALucas -threads 1024 -r
DEVICE:0------------------------
name GeForce GTX 460
~~~
Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.65 err = 0.04723 (0:20 real, 1.9987 ms/iter, ETA 3:51:51)
Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.65 err = 0.03019 (0:39 real, 3.9262 ms/iter, ETA 14:40:07)
Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.65 err = 0.09749 (0:54 real, 5.3697 ms/iter, ETA 31:17:36)
Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.65 err = 0.1996 (1:03 real, 6.2895 ms/iter, ETA 41:57:54)
Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.65 err = 0.01873 (1:17 real, 7.7218 ms/iter, ETA 55:39:40)
Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.65 err = 0.02155 (1:26 real, 8.6305 ms/iter, ETA 72:51:20)
Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.65 err = 0.1181 (1:27 real, 8.6291 ms/iter, ETA 78:04:10)
err = 0.441193, increasing n from 1966080
Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.65 err = 0.1117 (1:35 real, 9.4234 ms/iter, ETA 97:13:04)
Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.65 err = 0.1871 (1:50 real, 10.9708 ms/iter, ETA 129:54:47)
Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.65 err = 0.2798 (1:50 real, 10.9809 ms/iter, ETA 131:27:57)
[/code][/QUOTE]


Attached v1.65 x64 binaries (untested): [LIST][*]CUDA 4.0 / SM 2.0[*]CUDA 4.1 / SM 2.0[*]CUDA 4.1 / SM 2.1[/LIST]EDIT: Just tried running 1.65 4.1 | 2.0 and it quit right after displaying the inital startup stuff. I switched back to 1.64 because I have to go to work.

Karl M Johnson 2012-03-12 15:26

[CODE]>CUDALucas.exe -d 1 -threads 512 -c 10000 -t 216091
DEVICE:1------------------------
name GeForce GTX 480
totalGlobalMem 1610612736
sharedMemPerBlock 49152
regsPerBlock 32768
warpSize 32
memPitch 2147483647
maxThreadsPerBlock 1024
maxThreadsDim[3] 1024,1024,64
maxGridSize[3] 65535,65535,65535
totalConstMem 65536
major.minor 2.0
clockRate 1640000
textureAlignment 512
deviceOverlap 1
multiProcessorCount 15
too small Exponent 216091
>pause
Press any key to continue . . .[/CODE]

Why?
CUDALucas no longer accepts small exponents?


All times are UTC. The time now is 23:12.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.