![]() |
[QUOTE=LaurV;292662]Anyhow, if two copies are testing the same exponent (in two different folders) then [B]-s can not be used[/B], as they will try writing the [B]SAME[/B] checkpoint files. The idea with the "backup" subfolder was to have it [B]in the current folder[/B], and not in the root of the disk... Like in ".\backup\......." and not "c:\backup\....." Anyhow, you could argue that no one will test the same expo with more copies of CL in the same time, but in the case you re-test the same expo later using -s, the chechpoint files will be overwritten too... Why not let the user to customize the output path?[/QUOTE]
It is bug with Windows. Can someone fix this bug? |
[QUOTE=Brain;292671]I cannot even run low res playback with 1.64. Because of lags / bad responsiveness. I suggest - again - a command line switch: for example: --polite or --agressive where --polite would be default. This would insert an artificial CUDA wait loop where other apps (playback) have a go.
It was introduced when an unnecessary CudaMemCpy was killed.[/QUOTE] It is mean,CudaMemCpy fix bad responsiveness with Windows? |
[QUOTE=msft;292682]It is bug with Windows.
Can someone fix this bug?[/QUOTE] I never heard of such windows bug, therefore I went to have a look into CL1.64 source, in Cudalucas.cu at line 1330 we can see: [CODE]#ifdef linux mode_t mode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH; if (mkdir ("./backup", mode) != 0) printf ("mkdir: cannot create directory `backup': File exists\n"); #else if (_mkdir ("\\backup") != 0) printf ("mkdir: cannot create directory `backup': File exists\n"); #endif [/CODE]which, when no linux, it will create a backup folder in the ROOT of the current windows disk. This isn't what was intended, is it? [edit: for Jerry or other builders, the double backslash should be eliminated in front of backup, to create the backup dir as a subfolder of the current folder. It also appears at line 1079, when written.] |
[QUOTE=msft;292683]It is mean,CudaMemCpy fix bad responsiveness with Windows?[/QUOTE]
CudaMemCpy was removed by ethan in his 1.3 --> GPU usage went from 97% to 99% (good, faster) --> But same time, displays became laggy (not good) Suggestion: Reduce GPU usage a little bit to allow other apps to access device. Formerly, the unneeded CudaMemCpy did this waiting... |
[QUOTE=Brain;292693]CudaMemCpy was removed by ethan in his 1.3
--> GPU usage went from 97% to 99% (good, faster) --> But same time, displays became laggy (not good) Suggestion: Reduce GPU usage a little bit to allow other apps to access device. Formerly, the unneeded CudaMemCpy did this waiting...[/QUOTE] This happens in the reverse way when CL jumped from 1.58 to 1.63 (I don't know exactly where 1.61 falls): -- GPU went from 99% to 92-95% (GTX580, Tesla) - the "a bit slower" was not visible due to better fft size, changing some tempvars from double to float, etc, compensatory stuff, which in fact made CL 1.63 and 1.64 faster (letting apart the -t switch, which slows things down 2-3 percents, we are talking about speed comparison without -t). -- And same time the computer (display, cpu) become more responsive. Running with -t will lower the speed more, but in the same time the GPU will be less busy (I believe that checking the errors at every iteration make the GPU to wait longer), like 88% instead of 95%, or even lower, 83%, the computer will be even more responsive and the process will be safer, not talking about the consumed energy and the produced heat, which would also be lower. So, why won't you try to use -t? For me it works nice with it. If this "GPU-busy percent" could be parametrized, it could be even better. |
Do you like?
[code] cudalucas.1.65$ ./CUDALucas Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] [-agressive] -r|exponent|input_filename -threads set threads number(default=256) -f set fft length -s save all checkpoint files -t check round off error all iterations [/code] |
[QUOTE=msft;292703]Do you like?
[code] cudalucas.1.65$ ./CUDALucas Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] [-agressive] -r|exponent|input_filename -threads set threads number(default=256) -f set fft length -s save all checkpoint files -t check round off error all iterations [/code][/QUOTE] yeaaa... (where the hack is the salivating smiley???) :smile: edit: well, [-s folder] would sound perfect... :razz: |
[QUOTE=msft;292703]Do you like?
[code] cudalucas.1.65$ ./CUDALucas Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s] [-t] [-agressive] -r|exponent|input_filename -threads set threads number(default=256) -f set fft length -s save all checkpoint files -t check round off error all iterations [/code][/QUOTE] Love it! |
1 Attachment(s)
Ver 1.65
1) change behavior round off error iterations < 1000:increasing fft length iterations >= 1000:exit program 2) print maxerror 3) change -s option 4) add -agressive option 5) add -threads option [code] cudalucas.1.65$ ./CUDALucas Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s folder] [-t] [-agressive] -r|exponent|input_filename -threads set threads number(default=256) -f set fft length -s save all checkpoint files -t check round off error all iterations -agressive GPU agressive(default polite) cudalucas.1.65$ ./CUDALucas -threads 1024 -r DEVICE:0------------------------ name GeForce GTX 460 ~~~ Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.65 err = 0.04723 (0:20 real, 1.9987 ms/iter, ETA 3:51:51) Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.65 err = 0.03019 (0:39 real, 3.9262 ms/iter, ETA 14:40:07) Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.65 err = 0.09749 (0:54 real, 5.3697 ms/iter, ETA 31:17:36) Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.65 err = 0.1996 (1:03 real, 6.2895 ms/iter, ETA 41:57:54) Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.65 err = 0.01873 (1:17 real, 7.7218 ms/iter, ETA 55:39:40) Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.65 err = 0.02155 (1:26 real, 8.6305 ms/iter, ETA 72:51:20) Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.65 err = 0.1181 (1:27 real, 8.6291 ms/iter, ETA 78:04:10) err = 0.441193, increasing n from 1966080 Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.65 err = 0.1117 (1:35 real, 9.4234 ms/iter, ETA 97:13:04) Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.65 err = 0.1871 (1:50 real, 10.9708 ms/iter, ETA 129:54:47) Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.65 err = 0.2798 (1:50 real, 10.9809 ms/iter, ETA 131:27:57) [/code] |
v1.65 x64 binaries (untested)
1 Attachment(s)
[QUOTE=msft;292729]Ver 1.65
1) change behavior round off error iterations < 1000:increasing fft length iterations >= 1000:exit program 2) print maxerror 3) change -s option 4) add -agressive option 5) add -threads option [code] cudalucas.1.65$ ./CUDALucas Usage: ./CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s folder] [-t] [-agressive] -r|exponent|input_filename -threads set threads number(default=256) -f set fft length -s save all checkpoint files -t check round off error all iterations -agressive GPU agressive(default polite) cudalucas.1.65$ ./CUDALucas -threads 1024 -r DEVICE:0------------------------ name GeForce GTX 460 ~~~ Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v1.65 err = 0.04723 (0:20 real, 1.9987 ms/iter, ETA 3:51:51) Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v1.65 err = 0.03019 (0:39 real, 3.9262 ms/iter, ETA 14:40:07) Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v1.65 err = 0.09749 (0:54 real, 5.3697 ms/iter, ETA 31:17:36) Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v1.65 err = 0.1996 (1:03 real, 6.2895 ms/iter, ETA 41:57:54) Iteration 10000 M( 25964951 )C, 0x62eb3ff0a5f6237c, n = 1572864, CUDALucas v1.65 err = 0.01873 (1:17 real, 7.7218 ms/iter, ETA 55:39:40) Iteration 10000 M( 30402457 )C, 0x0b8600ef47e69d27, n = 1835008, CUDALucas v1.65 err = 0.02155 (1:26 real, 8.6305 ms/iter, ETA 72:51:20) Iteration 10000 M( 32582657 )C, 0x02751b7fcec76bb1, n = 1835008, CUDALucas v1.65 err = 0.1181 (1:27 real, 8.6291 ms/iter, ETA 78:04:10) err = 0.441193, increasing n from 1966080 Iteration 10000 M( 37156667 )C, 0x67ad7646a1fad514, n = 2097152, CUDALucas v1.65 err = 0.1117 (1:35 real, 9.4234 ms/iter, ETA 97:13:04) Iteration 10000 M( 42643801 )C, 0x8f90d78d5007bba7, n = 2359296, CUDALucas v1.65 err = 0.1871 (1:50 real, 10.9708 ms/iter, ETA 129:54:47) Iteration 10000 M( 43112609 )C, 0xe86891ebf6cd70c4, n = 2359296, CUDALucas v1.65 err = 0.2798 (1:50 real, 10.9809 ms/iter, ETA 131:27:57) [/code][/QUOTE] Attached v1.65 x64 binaries (untested): [LIST][*]CUDA 4.0 / SM 2.0[*]CUDA 4.1 / SM 2.0[*]CUDA 4.1 / SM 2.1[/LIST]EDIT: Just tried running 1.65 4.1 | 2.0 and it quit right after displaying the inital startup stuff. I switched back to 1.64 because I have to go to work. |
[CODE]>CUDALucas.exe -d 1 -threads 512 -c 10000 -t 216091
DEVICE:1------------------------ name GeForce GTX 480 totalGlobalMem 1610612736 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 totalConstMem 65536 major.minor 2.0 clockRate 1640000 textureAlignment 512 deviceOverlap 1 multiProcessorCount 15 too small Exponent 216091 >pause Press any key to continue . . .[/CODE] Why? CUDALucas no longer accepts small exponents? |
| All times are UTC. The time now is 23:12. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.