![]() |
Does anyone have any Linux (64bit) binaries?
If not, what SDK version do I need, and how exactly do I use the makefile? (Just 'make'? I've sometimes seen fancier things, like 'make; 'makeall && install' or some such, so I want to be sure.) |
[QUOTE=Dubslow;295197]Does anyone have any Linux (64bit) binaries?
If not, what SDK version do I need, and how exactly do I use the makefile? (Just 'make'? I've sometimes seen fancier things, like 'make; 'makeall && install' or some such, so I want to be sure.)[/QUOTE] You not need SDK. [code] $ cat Makefile CUDALucas: CUDALucas.o g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm CUDALucas.o: CUDALucas.cu /usr/local/cuda/bin/nvcc -O2 -arch=sm_13 -I/usr/local/include CUDALucas.cu -c clean: -rm *.o CUDALucas $ make /usr/local/cuda/bin/nvcc -O2 -arch=sm_13 -I/usr/local/include CUDALucas.cu -c g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm $ ./CUDALucas -r Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 4608, CUDALucas v2.00 err = 0.01074 (0:20 real, 2.0263 ms/iter, ETA 2:21) [/code] |
[QUOTE=msft;295230]You not need SDK.
[code] $ cat Makefile CUDALucas: CUDALucas.o g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm CUDALucas.o: CUDALucas.cu /usr/local/cuda/bin/nvcc -O2 -arch=sm_13 -I/usr/local/include CUDALucas.cu -c clean: -rm *.o CUDALucas $ make /usr/local/cuda/bin/nvcc -O2 -arch=sm_13 -I/usr/local/include CUDALucas.cu -c g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm $ ./CUDALucas -r Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 4608, CUDALucas v2.00 err = 0.01074 (0:20 real, 2.0263 ms/iter, ETA 2:21) [/code][/QUOTE] I downloaded the 2.00 version from [url]http://www.mersenneforum.org/showpost.php?p=294046&postcount=1098[/url], and here's my make file: [code]bill@Gravemind:~/CUDALucas/2.00∰∂ cat Makefile NVIDIA_SDK = $(HOME)/NVIDIA_GPU_Computing_SDK CUDALucas: CUDALucas.o g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm CUDALucas.o: CUDALucas.cu /usr/local/cuda/bin/nvcc -O2 -arch=sm_13 -I/usr/local/include -I$(NVIDIA_SDK)/C/common/inc CUDALucas.cu -c clean: -rm *.o CUDALucas CUDALucas.cu~[/code] Also, even in yours, it calls /usr/local/cuda/bin/nvcc, which I don't have without the SDK. (And also cuda/lib64...) |
[QUOTE=Dubslow;295235]Also, even in yours, it calls /usr/local/cuda/bin/nvcc, which I don't have without the SDK. (And also cuda/lib64...)[/QUOTE]
nvcc and the libs aren't in the SDK; they're in the Cuda Toolkit. |
1 Attachment(s)
I'm a bit late to this game, but I took a look at the best FFT sizes on my GTX 480 using a 64-bit Linux binary and both CUDA 3.2 and CUDA 4.1:
[CODE]CUDA 3.2 CUDA 4.1 Size Time (ms) Size Time (ms) 1179648 0.737176 1179648 0.757979 1310720 0.869311 1310720 0.912768 1474560 0.972916 1474560 0.964209 1572864 1.047643 1605632 1.067629 1605632 1.072745 1638400 1.172933 1638400 1.190849 1769472 1.206898 1769472 1.216339 2097152 1.340003 1835008 1.248738 2293760 1.612199 2097152 1.296626 2359296 1.617333 2359296 1.522869 2654208 1.791644 2621440 1.760007 2752512 1.978546 2654208 1.784613 2949120 2.053249 2949120 2.100391 3211264 2.292879 3145728 2.111622 3276800 2.457472 3211264 2.369003 3538944 2.529949 3670016 2.552411 3670016 2.624719 4194304 2.814626 4194304 2.849048 4423680 3.067510 4423680 3.36062 4718592 3.135987 4718592 3.720233 5242880 3.531422 4816896 3.86769 5308416 3.911258 5242880 3.875832 5505024 4.235169 5308416 4.06778 5898240 4.466444 5734400 4.509711 6193152 4.464647 6193152 4.648901[/CODE] Although there is a little variation, the best FFT sizes are mostly the same for the two versions of CUDA. Overall, CUDA 3.2 is slightly faster than CUDA 4.1 except for an FFT region around 4.2M-5M, where CUDA 3.2 is significantly faster. |
I don't know why but I've had a e[COLOR=black][FONT=Verdana]xtremely [/FONT][/COLOR]bad streak of luck with 2.00:
It took me 5 runs to get a good DC on 26101843: [CODE] M( 26101843 )C, 0x2e20628d2010b7__, n = 1474560, CUDALucas v2.00 M( 26101843 )C, 0x9699f17722194e__, n = 1474560, CUDALucas v2.00 M( 26101843 )C, 0xa6c3cd3038b506__, n = 1474560, CUDALucas v2.00 M( 26101843 )C, 0x44c5575f619091__, n = 1474560, CUDALucas v2.00 M( 26101843 )C, 0x4c96108b152c6266, n = 1474560, CUDALucas v2.00 [/CODE] I thought it had something to do with me remoting into the system, but when I got home it took 2 more runs. I also just had a bad run on 26120921: [CODE]M( 26120921 )C, 0xe34c177a793b96__, n = 1474560, CUDALucas v2.00[/CODE] Here's my typical run line: [CODE]e:\cuda2\cuda20032 -d 1 -threads 512 -c 10000 -f 1474560 -t -polite 0 26101843 >> 26101843.txt[/CODE] This brings me to 6 bad and 8 good with 2.00. I'm not sure the change with -t is working. Anyone else have similar results? |
[QUOTE=flashjh;295365]I don't know why but I've had a e[COLOR=black][FONT=Verdana]xtremely [/FONT][/COLOR]bad streak of luck with 2.00:
It took me 5 runs to get a good DC on 26101843: ... This brings me to 6 bad and 8 good with 2.00. I'm not sure the change with -t is working. Anyone else have similar results?[/QUOTE] I'm batting 8 for 8 successful double-checks. 3 using my 560Ti. 5 using my 570. [code] ./CUDALucas -t -d 1 cudal560.txt or ./CUDALucas -t -d 0 cudal570.txt[/code][code] M( 28376339 )C, 0xb3e29f7739547b38, n = 1572864, CUDALucas v2.00 M( 28573841 )C, 0x64c4cbb92a9c8f47, n = 1572864, CUDALucas v2.00 M( 29462357 )C, 0x3df0d8cf19726aad, n = 1835008, CUDALucas v2.00 M( 29462599 )C, 0x60e55f600332f5cd, n = 1835008, CUDALucas v2.00 M( 29462387 )C, 0x5eacbd9aaa0cca16, n = 1835008, CUDALucas v2.00 M( 29465929 )C, 0x828cde7005d78b0d, n = 1835008, CUDALucas v2.00 M( 29462623 )C, 0x8a91307e8d3531b6, n = 1835008, CUDALucas v2.00 M( 29465977 )C, 0xf49765b40d2129ae, n = 1835008, CUDALucas v2.00[/code]Unless you have a flaky card or a flaky compilation, next most obvious is the FFT-size. Mine are a bit higher than yours. (These were auto-selected by cudalucas.) If you care to try to reproduce my results, the oldest residues (one from the 560Ti and one from the 570) that haven't scrolled off are: [CODE]Iteration 13900000 M( 29462623 )C, 0xefef88bdb9a24848, n = 1835008, CUDALucas v2.00 err = 0.0127 (0:53 real, 5.3098 ms/iter, ETA 22:56:59) and Iteration 10170000 M( 29465977 )C, 0xf8d0abc8611c8221, n = 1835008, CUDALucas v2.00 err = 0.01367 (0:36 real, 3.5695 ms/iter, ETA 19:07:36)[/CODE]BTW, the final reported err= values were 0.01367 and 0.01416, respectively, which gives these results a pretty wide safety margin. |
Just got another good run with 2.00. Who knows, maybe my card is failing?
On this one: [CODE][URL="http://www.mersenne.org/report_exponent/?exp_lo=26231297&exp_hi=&B1=Get+status"]26231297[/URL] No factors below 2^69 P-1 B1=405000 [COLOR=red]Bad LL[/COLOR] 75A6F23E6769F0DA by "David Glynn" [COLOR=seagreen]Verified[/COLOR] LL 61CF09FB162017FF by "Scotch&Gloves_RUS" on 2011-06-15 [COLOR=red]Bad LL[/COLOR] BFD7978ACFBF14BF by "linded" on 2011-09-12 [COLOR=seagreen]Verified[/COLOR] LL 61CF09FB162017FF by "Jerry Hallett" on 2012-04-05 History 61CF09FB162017__ by "Scotch&Gloves_RUS" on 2011-06-15 History BFD7978ACFBF14__ by "linded" on 2011-09-12 History no factor for M26231297 from 2^67 to 2^68 [mfaktc 0.16 barrett79_mul32] by "Carsten Kossendey" on 2011-11-17 History no factor for M26231297 from 2^68 to 2^69 [mfaktc 0.17-Win barrett79_mul32] by "David Campeau" on 2011-11-24 History 61cf09fb162017__ by "Jerry Hallett" on 2012-04-05 [/CODE] Scotch&Gloves_RUS's LL was Suspect before I submitted, turned out to be correct. |
[QUOTE=aaronhaviland;295323]nvcc and the libs aren't in the SDK; they're in the Cuda Toolkit.[/QUOTE]
Ok, thanks. I just downloaded 4.1 and tried to compile. Unfortunately, I've always (always ALWAYS) had problem finding CUDA libs, no matter what I put in LD_LIBRARY_PATH. [quote]* Please make sure your PATH includes /usr/local/cuda/bin * Please make sure your LD_LIBRARY_PATH * for 32-bit Linux distributions includes /usr/local/cuda/lib * for 64-bit Linux distributions includes /usr/local/cuda/lib64:/usr/local/cuda/lib * OR * for 32-bit Linux distributions add /usr/local/cuda/lib * for 64-bit Linux distributions add /usr/local/cuda/lib64 and /usr/local/cuda/lib * to /etc/ld.so.conf and run ldconfig as root[/quote] I tried just the ldconfig thing, but that didn't work, so I tried again with LD_L_P, but that still doesn't work. To use mfaktc, I had to make a copy of the libs in the mfaktc folder, then set LD_L_P to the mfaktc folder. I just can't get it to work right with the nVidia location for the life of me. [code]bill@Gravemind:~∰∂ tail .bashrc fi # set PATH so it includes user's private bin if it exists if [ -d $HOME/bin ]; then PATH=$PATH:$HOME/bin:$HOME/bin/c:$HOME/bin/py fi [U]PATH=$PATH:/usr/local/cuda/bin LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib[/U] PYTHONPATH=$HOME/bin/py[/code] Despite having run ldconfig as they suggested and setting LDLP correctly, I always got this: [code]bill@Gravemind:~/CUDALucas/2.00∰∂ cat Makefile NVIDIA_SDK = $(HOME)/NVIDIA_GPU_Computing_SDK CUDALucas: CUDALucas.o g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm CUDALucas.o: CUDALucas.cu /usr/local/cuda/bin/nvcc -O2 -arch=sm_13 -I/usr/local/include -I$(NVIDIA_SDK)/C/common/inc CUDALucas.cu -c clean: -rm *.o CUDALucas CUDALucas.cu~ bill@Gravemind:~/CUDALucas/2.00∰∂ make g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm /usr/bin/ld: warning: libcudart.so.4, needed by /usr/local/cuda/lib64/libcufft.so, not found (try using -rpath or -rpath-link) <snip> /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaPeekAtLastError' /usr/local/cuda/lib64/libcufft.so: undefined reference to `__cudaRegisterVar' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaGetLastError' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaMemcpyToSymbolAsync' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaStreamWaitEvent' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaGetDevice' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaGetExportTable' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaFuncSetCacheConfig' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaUnbindTexture' /usr/local/cuda/lib64/libcufft.so: undefined reference to `__cudaRegisterTexture' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaCreateChannelDesc' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaBindTexture' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaFuncGetAttributes' collect2: ld returned 1 exit status make: *** [CUDALucas] Error 1 bill@Gravemind:~/CUDALucas/2.00∰∂[/code][code]bill@Gravemind:/usr/local/cuda/lib64∰∂ ls libcublas.so libcufft.so libcurand.so libnpp.so libcublas.so.4 libcufft.so.4 libcurand.so.4 libnpp.so.4 libcublas.so.4.1.21 libcufft.so.4.1.21 libcurand.so.4.1.21 libnpp.so.4.1.21 libcudart.so libcuinj.so libcusparse.so libcudart.so.4 libcuinj.so.4 libcusparse.so.4 libcudart.so.4.1.21 libcuinj.so.4.1.21 libcusparse.so.4.1.21 bill@Gravemind:/usr/local/cuda/lib64∰∂[/code] |
[QUOTE=Dubslow;295519]Ok, thanks. I just downloaded 4.1 and tried to compile. Unfortunately, I've always (always ALWAYS) had problem finding CUDA libs, no matter what I put in LD_LIBRARY_PATH.
I tried just the ldconfig thing, but that didn't work, so I tried again with LD_L_P, but that still doesn't work. [code]g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm /usr/bin/ld: warning: libcudart.so.4, needed by /usr/local/cuda/lib64/libcufft.so, not found (try using -rpath or -rpath-link) <snip> /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaPeekAtLastError' <snip>[/code][/QUOTE] The environment variable $LD_LIBRARY_PATH has no bearing on compilation: it only affects how the run-time linker works, and has nothing to do with the compile-time link. Likewise, ldconfig updates symlinks for the run-time linker, and has no bearing on compilation. Based on your output, you should only need to do one thing to fix the compilation: - It says "warning: libcudart.so.4 ... not found" - Add "-lcudart" to the g++ line in the Makefile For reference, here is the variant of the Makefile that I use with CUDALucas. If nvcc is in your path, you don't need to specify /usr/local/bin. Likewise, if the libs are in a system directory (e.g. /usr/local/lib/ or lib64 depending on your configuration) you may not need to specify -L/usr/local/lib: [code]NVCC_ARCHES += -gencode arch=compute_20,code=compute_20 NVCC_ARCHES += -gencode arch=compute_13,code=compute_13 OPT = -O3 CFLAGS = $(OPT) -Wall NVCC_FLAGS = $(OPT) -use_fast_math $(NVCC_ARCHES) --compiler-options="$(CFLAGS) -fno-strict-aliasing" --ptxas-options=-v CUDALucas: CUDALucas.o g++ -fPIC -o CUDALucas CUDALucas.o -lcufft -lm -lcudart -Wl,-O1 -Wl,--as-needed $(CFLAGS) CUDALucas.o: CUDALucas.cu cuda_safecalls.h nvcc CUDALucas.cu -c $(NVCC_FLAGS) clean: -rm CUDALucas *.o *~ [/code] |
[QUOTE=aaronhaviland;295523]The environment variable $LD_LIBRARY_PATH has no bearing on compilation: it only affects how the run-time linker works, and has nothing to do with the compile-time link. Likewise, ldconfig updates symlinks for the run-time linker, and has no bearing on compilation.
Based on your output, you should only need to do one thing to fix the compilation: - It says "warning: libcudart.so.4 ... not found" - Add "-lcudart" to the g++ line in the Makefile [/QUOTE] Thanks, I got it compiled, however despite all the trouble with LDLP etc., runtime linking still fails. I can workaround that though. Edit: Could you help me understand why the latter works, but the former doesn't? [code]bill@Gravemind:~/CUDALucas∰∂ echo $LD_LIBRARY_PATH ./lib bill@Gravemind:~/CUDALucas∰∂ ./CUDALucas ./CUDALucas: error while loading shared libraries: libcufft.so.4: cannot open shared object file: No such file or directory bill@Gravemind:~/CUDALucas∰∂ LD_LIBRARY_PATH=./lib ./CUDALucas $ CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s folder] [-t] [-polite iteration] [-k] exponent|input_filename $ CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-t] [-polite iteration] -r $ CUDALucas [-d device_number] -cufftbench start end distance -threads set threads number(default=256) -f set fft length(if round off error then exit) -s save all checkpoint files -t check round off error all iterations -polite GPU polite per iteration(default -polite 1) -polite 0 GPU aggressive -cufftbench exec CUFFT benchmark (Ex. $ ./CUDALucas -d 1 -cufftbench 1179648 6291456 32768 ) -r exec residue test. -k enable keys (p change -polite,t disable -t,s change -s) bill@Gravemind:~/CUDALucas∰∂ [/code] |
[QUOTE=Dubslow;295527]Thanks, I got it compiled, however despite all the trouble with LDLP etc., runtime linking still fails. I can workaround that though.
[/QUOTE] Attached Linux -x86-64 binaries with standard make file, which I believe is built as sm_13, however it requires 4.x .so files, which are in the zip. Well, actually, whenever I try and upload the file, it fails, so I'll make it available here: [url]http://dubslow.tk/gimps/CUDALucas2.00_sm13_4.1.tar.gz[/url] If for some reason that doesn't work, I'll try and upload it here again. Edit: Can't change fft size mid run? |
By the way, top is showing that CUDALucas is consistently using around 15+% of a core. Is there any reason for this, or a way to stop it?
|
Does this look right so far?
>cudalucas.2.00]$ ./cul -d 0 -f 524288 -t 49845883 DEVICE:0------------------------ name GeForce GT 430 totalGlobalMem 1072889856 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 totalConstMem 65536 major.minor 2.1 clockRate 1400000 textureAlignment 512 deviceOverlap 1 multiProcessorCount 2 start M49845883 fft length = 524288 Iteration 10000 M( 49845883 )C, 0xffffffff80000000, n = 524288, CUDALucas v2.00 err = 0 (1:12 real, 7.2164 ms/iter, ETA 99:53:12) Iteration 20000 M( 49845883 )C, 0xffffffff80000000, n = 524288, CUDALucas v2.00 err = 0 (1:12 real, 7.2018 ms/iter, ETA 99:39:55) |
...no. Those numbers after the 0x should be different; they are the interim residues. Try either increasing the fft size, or leave it blank and letting CL choose. You might also consider running -r, a built-in self test.
|
Thanks - I suspected something was wrong. The -r test output is below:
>cudalucas.2.00]$ ./cul -d 0 -r DEVICE:0------------------------ name GeForce GT 430 totalGlobalMem 1072889856 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 totalConstMem 65536 major.minor 2.1 clockRate 1400000 textureAlignment 512 deviceOverlap 1 multiProcessorCount 2 Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 4608, CUDALucas v2.00 err = 0.01367 (0:02 real, 0.2141 ms/iter, ETA 0:14) Iteration 10000 M( 132049 )C, 0x4c52a92b54635f9e, n = 7168, CUDALucas v2.00 err = 0.01025 (0:03 real, 0.3181 ms/iter, ETA 0:38) Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 12288, CUDALucas v2.00 err = 0.004517 (0:03 real, 0.3145 ms/iter, ETA 1:02) Iteration 10000 M( 756839 )C, 0x5d2cbe7cb24a109a, n = 40960, CUDALucas v2.00 err = 0.03125 (0:06 real, 0.6416 ms/iter, ETA 7:54) Iteration 10000 M( 859433 )C, 0x3c4ad525c2d0aed0, n = 49152, CUDALucas v2.00 err = 0.009399 (0:07 real, 0.7422 ms/iter, ETA 10:23) Iteration 10000 M( 1257787 )C, 0x3f45bf9bea7213ea, n = 65536, CUDALucas v2.00 err = 0.1086 (0:09 real, 0.8838 ms/iter, ETA 18:15) Iteration 10000 M( 1398269 )C, 0xa4a6d2f0e34629db, n = 73728, CUDALucas v2.00 err = 0.08594 (0:10 real, 1.0161 ms/iter, ETA 23:22) Iteration 10000 M( 2976221 )C, 0x2a7111b7f70fea2f, n = 163840, CUDALucas v2.00 err = 0.04883 (0:21 real, 2.1135 ms/iter, ETA 1:44:15) Iteration 10000 M( 3021377 )C, 0x6387a70a85d46baf, n = 163840, CUDALucas v2.00 err = 0.06543 (0:21 real, 2.1132 ms/iter, ETA 1:46:00) Iteration 10000 M( 6972593 )C, 0x88f1d2640adb89e1, n = 393216, CUDALucas v2.00 err = 0.04785 (0:57 real, 5.6621 ms/iter, ETA 10:56:48) Iteration 10000 M( 13466917 )C, 0x9fdc1f4092b15d69, n = 786432, CUDALucas v2.00 err = 0.02905 (1:56 real, 11.6230 ms/iter, ETA 43:25:29) Iteration 10000 M( 20996011 )C, 0x5fc58920a821da11, n = 1179648, CUDALucas v2.00 err = 0.08691 (2:42 real, 16.2728 ms/iter, ETA 94:50:02) Iteration 10000 M( 24036583 )C, 0xcbdef38a0bdc4f00, n = 1310720, CUDALucas v2.00 err = 0.2031 (3:09 real, 18.8810 ms/iter, ETA 125:58:42) iteration = 22 < 1000 && err = 0.31543 >= 0.25, increasing n from 1310720 |
Wow... I went to a bigger FFT (84*32768), the residues are now changing but the iteration rate has slowed down dramatically - does this look right?
>cudalucas.2.00]$ ./cul -d 0 -f "$((84*2**15))" -t 49845883 DEVICE:0------------------------ name GeForce GT 430 totalGlobalMem 1072889856 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 totalConstMem 65536 major.minor 2.1 clockRate 1400000 textureAlignment 512 deviceOverlap 1 multiProcessorCount 2 start M49845883 fft length = 2752512 Iteration 10000 M( 49845883 )C, 0xbb8661cd90463e94, n = 2752512, CUDALucas v2.00 err = 0.2422 (6:52 real, 41.1366 ms/iter, ETA 569:23:56) Iteration 20000 M( 49845883 )C, 0xf1d53981f966befa, n = 2752512, CUDALucas v2.00 err = 0.25 (6:51 real, 41.1303 ms/iter, ETA 569:11:49) |
First of all, put all copied output in [ code] [ /code] tags, without the spaces; it makes the post easier to read. Quote my post and you can see how I did it.
Secondly, I just figured out why your first try was bogus; the fft (~500,000) was way too small; your second attempt seems about right. Try running CuLu without specifying an FFT size, and see what it chooses. (As for the times, look at the -r results; 41ms/iter for a 49M expo matches well with the other results there.) After you do that, I would run [code]./CL -cufftbench 32768 3276800 32768[/code]. Then look at the list that's produced, and see if you can find an fft size that's roughly the same or a bit smaller than the one chosen by CuLu, that has good times, and run the test with that. |
1 Attachment(s)
Results of -cufftbench option for GT 430 attached as a graph.
|
[QUOTE=zs6nw;295703]Results of -cufftbench option for GT 430 attached as a graph.[/QUOTE]I'm sure it's been discussed before, but prime 32k multiple FFT sizes have horrible timings. It seems the more factors the multiplier has the better the performance, e.g:
95*32k = 3.13ms [5*19] 96*32k = 2.30ms [2^5*3] 97*32k = 6.82ms [97] |
1280MB VRAM and 332M exponents
I just tried CL 2.00 to run M(332,192,831) on my GTX 560Ti 448 Cores (GF110) with 1280MB VRAM:
[CODE]F:\Eigene Dateien\Computing\CUDALucas\cudalucas.2.00\D0\bin>CUDALucas2.00.cuda4.0.sm_20.x64.exe 332192831 [COLOR=Red]over specifications Grid = 65536[/COLOR][/CODE]FAILURE. :cry: @msft/others: What does this error message exactly mean? Formerly, I got "allocation errors" so I'm a bit surprised... It does work until ~270,000,000. I had my GTX 560Ti (GF114) with 1024MB VRAM with former CL versions run up to ~290,000,000. Has anybody ever been able to run M(332,192,831)? What VRAM size was available? Which CL version was it? If not: What were the max exponents you could run with which CL version and VRAM? I'd like to add these borders to the GPU Computing Guide. I will probably add a warning that the propability of erroreous results and energy waste is higher than with DCs. |
[QUOTE=Brain;295705]Has anybody ever been able to run M(332,192,831)? What VRAM size was available? Which CL version was it?[/QUOTE]Continuing my observation of 2 posts above: not only do prime FFT size multipliers run a lot slower, they also use a lot more VRAM. Running the benchmark:[code]C:\Prime95\cudalucas>cudalucas_200_41_20 -cufftbench 14680064 16777216 32768
CUFFT bench start = 14680064 end = 16777216 distance = 32768 CUFFT_Z2Z size= 14680064 time= 11.371921 msec CUFFT_Z2Z size= 14712832 time= 65.277473 msec CUFFT_Z2Z size= 14745600 time= 12.608562 msec CUFFT_Z2Z size= 14778368 time= 23.158552 msec CUFFT_Z2Z size= 14811136 time= 39.372547 msec CUFFT_Z2Z size= 14843904 time= 65.161163 msec CUFFT_Z2Z size= 14876672 time= 65.470688 msec[/code]The FFT sizes that run around ~10ms use ~400MB of VRAM. The ones that are ~65ms use ~1100MB of VRAM. The largest supported FFT size appears to be 511*32k = 16744448 (larger than that you get the "over grid" error) and that runs on my GTX 570 1280MB, but fails with too high error:[code]C:\Prime95\cudalucas>cudalucas_200_41_20 -f 16744448 332192831 start M332192831 fft length = 16744448 iteration = 1001 >= 1000 && err = 0.75 >= 0.35,fft length = 16744448 not write checkpoint file and exit.(when disable -t option)[/code] |
[QUOTE=Brain;295705]I just tried CL 2.00 to run M(332,192,831) on my GTX 560Ti 448 Cores (GF110) with 1280MB VRAM:
[CODE]F:\Eigene Dateien\Computing\CUDALucas\cudalucas.2.00\D0\bin>CUDALucas2.00.cuda4.0.sm_20.x64.exe 332192831 [COLOR=Red]over specifications Grid = 65536[/COLOR][/CODE]FAILURE. :cry: @msft/others: What does this error message exactly mean? Formerly, I got "allocation errors" so I'm a bit surprised... It does work until ~270,000,000. I had my GTX 560Ti (GF114) with 1024MB VRAM with former CL versions run up to ~290,000,000. Has anybody ever been able to run M(332,192,831)? What VRAM size was available? Which CL version was it? If not: What were the max exponents you could run with which CL version and VRAM? I'd like to add these borders to the GPU Computing Guide. I will probably add a warning that the propability of erroreous results and energy waste is higher than with DCs.[/QUOTE] I have a hunch it has something to do with grid size. [QUOTE]Maximum x-, y-, or z-dimension of a grid of thread blocks for GPUs of CC 1.x - 2.x = 65535[/QUOTE] |
[QUOTE=James Heinrich;295704]I'm sure it's been discussed before, but prime 32k multiple FFT sizes have horrible timings. It seems the more factors the multiplier has the better the performance[/QUOTE]
Absolutely (from the CUFFT documentation): [QUOTE]A general DFT can be implemented as a matrix vector multiplication that requires O(N2) operations. However, the CUFFT Library employs the Cooley-Tukey algorithm ([URL="http://en.wikipedia.org/wiki/Cooley%E2%80%93Tukey_FFT_algorithm"]http://en.wikipedia.org/wiki/Cooley–Tukey_FFT_algorithm[/URL]) to reduce the number of required operations to optimize the performance of particular transform sizes. This algorithm expresses a DFT recursively in terms of smaller DFT building blocks. The CUFFT Library implements the following DFT building blocks: radix-2, radix-3, radix-5, and radix-7. Hence the performance of any transform size that can be factored as 2a 3b 5c 7d (where a, b, c, and d are non-negative integers) is optimized in the CUFFT library.[/QUOTE]I've been testing CUFFT timings for other lengths than just multiples of 32768. I've excluded the timings because they're not run exactly as CUDALucas would run them, but the fact that they are "optimal lengths" should still apply. Eff% is is calculated similarly to the prior examples here, but scaled so that the results are all within the range 0-100. Very few lengths have Eff% between 15% - 75%; the majority of inefficient lengths ran around 9-10%. These have all been excluded. Some of the 70-80% efficient run-lengths have also been excluded because they are smaller than a larger+faster length. [COLOR=Blue]Note the exponents in blue which would be skipped over if only looking at multiples of 32768[/COLOR]: [CODE] FFT Exponent Size Eff% 2 3 5 7 ====================== 1048576 97.23 20 0 0 0 [COLOR=Blue]1105920 88.82 13 3 1 0[/COLOR] 1179648 91.20 17 2 0 0[COLOR=Blue] 1204224 82.49 13 1 0 2[/COLOR] 1310720 89.06 18 0 1 0[COLOR=Blue] 1327104 90.86 14 4 0 0[/COLOR] 1376256 85.13 16 1 0 1 1474560 89.14 15 2 1 0[COLOR=Blue] 1548288 89.05 13 3 0 1[/COLOR] 1572864 89.23 19 1 0 0 1605632 88.84 15 0 0 2 1769472 92.58 16 3 0 0 1835008 89.17 18 0 0 1 2097152 95.87 21 0 0 0[COLOR=Blue] 2211840 87.81 14 3 1 0[/COLOR] 2359296 89.84 18 2 0 0[COLOR=Blue] 2370816 80.62 8 3 0 3[/COLOR][COLOR=Blue] 2408448 81.08 14 1 0 2[/COLOR] 2621440 87.60 19 0 1 0 2654208 85.52 15 4 0 0 [COLOR=Blue]2709504 82.21 11 3 0 2 2809856 82.38 13 0 0 3 2985984 87.28 12 6 0 0 3096576 85.87 14 3 0 1 [/COLOR]3145728 85.74 20 1 0 0 3211264 82.12 16 0 0 2 [COLOR=Blue]3317760 82.69 13 4 1 0 3359232 74.71 9 8 0 0 3386880 71.31 9 3 1 2 [/COLOR]3932160 80.93 18 1 1 0 [COLOR=Blue]4014080 80.65 14 0 1 2 [/COLOR]4096000 73.66 15 0 3 0 4194304 95.87 22 0 0 0 4423680 87.81 15 3 1 0 4718592 89.84 19 2 0 0 [COLOR=Blue]4741632 80.62 9 3 0 3 [/COLOR]4816896 81.08 15 1 0 2 5242880 87.60 20 0 1 0 5308416 85.52 16 4 0 0 [COLOR=Blue]5419008 82.21 12 3 0 2 5619712 82.38 14 0 0 3 5971968 87.28 13 6 0 0 [/COLOR]6193152 85.87 15 3 0 1 6291456 85.74 21 1 0 0 6422528 82.12 17 0 0 2 [COLOR=Blue]6635520 82.69 14 4 1 0 6718464 74.71 10 8 0 0 6773760 71.31 10 3 1 2 [/COLOR]7864320 80.93 19 1 1 0 8028160 80.65 15 0 1 2 8192000 73.66 16 0 3 0[/CODE] |
[QUOTE=Brain;295705]@msft/others: What does this error message exactly mean? Formerly, I got "allocation errors" so I'm a bit surprised...
[/QUOTE] [code] $ ./CUDALucas -threads 512 332220523 DEVICE:0------------------------ name GeForce GTX 550 Ti totalGlobalMem 1072889856 ... start M332220523 fft length = 18874368 err = 0.35937, increasing n from 18874368 start M332220523 fft length = 18874368 err = 0.35937, increasing n from 18874368 start M332220523 fft length = 20971520 Iteration 10000 M( 332220523 )C, 0x1a313d709bfa6663, n = 20971520, CUDALucas v1.66 err = 0.03358 (22:30 real, 134.9292 ms/iter, ETA 12451:20:29) Iteration 20000 M( 332220523 )C, 0x73dc7a5c8b839081, n = 20971520, CUDALucas v1.66 err = 0.03358 (22:26 real, 134.5456 ms/iter, ETA 12415:34:17) [/code] |
[QUOTE=msft;295750][code]
$ ./CUDALucas -threads 512 332220523 DEVICE:0------------------------ name GeForce GTX 550 Ti totalGlobalMem 1072889856 ... start M332220523 fft length = 18874368 err = 0.35937, increasing n from 18874368 start M332220523 fft length = 18874368 err = 0.35937, increasing n from 18874368 start M332220523 fft length = 20971520 Iteration 10000 M( 332220523 )C, 0x1a313d709bfa6663, n = 20971520, CUDALucas v1.66 err = 0.03358 (22:30 real, 134.9292 ms/iter, ETA 12451:20:29) Iteration 20000 M( 332220523 )C, 0x73dc7a5c8b839081, n = 20971520, CUDALucas v1.66 err = 0.03358 (22:26 real, 134.5456 ms/iter, ETA 12415:34:17) [/code][/QUOTE] Why is that 1.66 instead of 2.00? |
[QUOTE=msft;295750]CUDALucas [b]-threads 512[/b] 332220523[/QUOTE]That also works fine here on v2.00 Windows. VRAM usage is 893MB (minus the 126MB idle at desktop = 767MB used).
What is the default value of "-threads"? Is 512 larger or smaller than default? |
[QUOTE=msft;292776]multiples 32768(threads=256)
multiples 65536(threads=512) multiples 131072(threads=1024)[/QUOTE] [QUOTE=aaronhaviland;295742]Absolutely (from the CUFFT documentation): I've been testing CUFFT timings for other lengths than just multiples of 32768. I've excluded the timings because they're not run exactly as CUDALucas would run them, but the fact that they are "optimal lengths" should still apply. Eff% is is calculated similarly to the prior examples here, but scaled so that the results are all within the range 0-100. Very few lengths have Eff% between 15% - 75%; the majority of inefficient lengths ran around 9-10%. These have all been excluded. Some of the 70-80% efficient run-lengths have also been excluded because they are smaller than a larger+faster length. [COLOR=Blue]Note the exponents in blue which would be skipped over if only looking at multiples of 32768[/COLOR]: [CODE] FFT Exponent Size Eff% 2 3 5 7 ====================== 1048576 97.23 20 0 0 0 [COLOR=Blue]1105920 88.82 13 3 1 0[/COLOR] 1179648 91.20 17 2 0 0[COLOR=Blue] 1204224 82.49 13 1 0 2[/COLOR] 1310720 89.06 18 0 1 0[COLOR=Blue] 1327104 90.86 14 4 0 0[/COLOR] 1376256 85.13 16 1 0 1 1474560 89.14 15 2 1 0[COLOR=Blue] 1548288 89.05 13 3 0 1[/COLOR] 1572864 89.23 19 1 0 0 1605632 88.84 15 0 0 2 1769472 92.58 16 3 0 0 1835008 89.17 18 0 0 1 2097152 95.87 21 0 0 0[COLOR=Blue] 2211840 87.81 14 3 1 0[/COLOR] 2359296 89.84 18 2 0 0[COLOR=Blue] 2370816 80.62 8 3 0 3[/COLOR][COLOR=Blue] 2408448 81.08 14 1 0 2[/COLOR] 2621440 87.60 19 0 1 0 2654208 85.52 15 4 0 0 [COLOR=Blue]2709504 82.21 11 3 0 2 2809856 82.38 13 0 0 3 2985984 87.28 12 6 0 0 3096576 85.87 14 3 0 1 [/COLOR]3145728 85.74 20 1 0 0 3211264 82.12 16 0 0 2 [COLOR=Blue]3317760 82.69 13 4 1 0 3359232 74.71 9 8 0 0 3386880 71.31 9 3 1 2 [/COLOR]3932160 80.93 18 1 1 0 [COLOR=Blue]4014080 80.65 14 0 1 2 [/COLOR]4096000 73.66 15 0 3 0 4194304 95.87 22 0 0 0 4423680 87.81 15 3 1 0 4718592 89.84 19 2 0 0 [COLOR=Blue]4741632 80.62 9 3 0 3 [/COLOR]4816896 81.08 15 1 0 2 5242880 87.60 20 0 1 0 5308416 85.52 16 4 0 0 [COLOR=Blue]5419008 82.21 12 3 0 2 5619712 82.38 14 0 0 3 5971968 87.28 13 6 0 0 [/COLOR]6193152 85.87 15 3 0 1 6291456 85.74 21 1 0 0 6422528 82.12 17 0 0 2 [COLOR=Blue]6635520 82.69 14 4 1 0 6718464 74.71 10 8 0 0 6773760 71.31 10 3 1 2 [/COLOR]7864320 80.93 19 1 1 0 8028160 80.65 15 0 1 2 8192000 73.66 16 0 3 0[/CODE][/QUOTE] [url]http://www.mersenneforum.org/showpost.php?p=292776&postcount=959[/url] [QUOTE=msft;292776]multiples 32768(threads=256) multiples 65536(threads=512) multiples 131072(threads=1024)[/QUOTE] So with threads >= 256, you must have multiples of 32K. (Threads lower than that would significantly impact performance, I would think.) |
[QUOTE=Dubslow;295754][URL]http://www.mersenneforum.org/showpost.php?p=292776&postcount=959[/URL]
So with threads >= 256, you must have multiples of 32K. (Threads lower than that would significantly impact performance, I would think.)[/QUOTE] I believe that's a recommendation, not a requirement. There *is* a requirement that the length is a multiple of threads. e.g. I just ran a quick test, and got a valid result for: ./CUDALucas -threads 512 -f 175616 2700067 -t M( 2700067 )C, 0x787c1272dc144ba2, n = 175616, CUDALucas v2.00 Granted, 175616 isn't one of the faster lengths, but it is (2^9)*(7^3) i.e. not a multiple of 32768 |
[QUOTE=apsen;295165]I've been assigned triple check and got mismatch with the first two checks for 28982959.
I've run the check twice with different FFT lengths (and -t both times) and got all residues match. Could someone run it through P95? Thanks, Andriy[/QUOTE] @apsen My P95 run is complete: [CODE] UID: flashjh/TF2, M28982959 is not prime. Res64: 5B3274500F7D17__. We4: 858095B2,16603096,00000000 [/CODE] If it matches yours, let me know so we can submit the results together. It it doesn't match, let me know what you'd like to do. |
Aaron, thanks for that great insight, and would you believe, available for reading in a manual. :smile:
This immediately interested me in FFT_size = 4194304 which is 95.87% efficient, but unfortunately the program terminated with error too large. [QUOTE=aaronhaviland;295742]Absolutely (from the CUFFT documentation): I've been testing CUFFT timings for other lengths than just multiples of 32768. I've excluded the timings because they're not run exactly as CUDALucas would run them, but the fact that they are "optimal lengths" should still apply. Eff% is is calculated similarly to the prior examples here, but scaled so that the results are all within the range 0-100. Very few lengths have Eff% between 15% - 75%; the majority of inefficient lengths ran around 9-10%. These have all been excluded. Some of the 70-80% efficient run-lengths have also been excluded because they are smaller than a larger+faster length. [COLOR=Blue]Note the exponents in blue which would be skipped over if only looking at multiples of 32768[/COLOR]: [CODE] FFT Exponent Size Eff% 2 3 5 7 ====================== 1048576 97.23 20 0 0 0 [COLOR=Blue]1105920 88.82 13 3 1 0[/COLOR] 1179648 91.20 17 2 0 0[COLOR=Blue] 1204224 82.49 13 1 0 2[/COLOR] 1310720 89.06 18 0 1 0[COLOR=Blue] 1327104 90.86 14 4 0 0[/COLOR] 1376256 85.13 16 1 0 1 1474560 89.14 15 2 1 0[COLOR=Blue] 1548288 89.05 13 3 0 1[/COLOR] 1572864 89.23 19 1 0 0 1605632 88.84 15 0 0 2 1769472 92.58 16 3 0 0 1835008 89.17 18 0 0 1 2097152 95.87 21 0 0 0[COLOR=Blue] 2211840 87.81 14 3 1 0[/COLOR] 2359296 89.84 18 2 0 0[COLOR=Blue] 2370816 80.62 8 3 0 3[/COLOR][COLOR=Blue] 2408448 81.08 14 1 0 2[/COLOR] 2621440 87.60 19 0 1 0 2654208 85.52 15 4 0 0 [COLOR=Blue]2709504 82.21 11 3 0 2 2809856 82.38 13 0 0 3 2985984 87.28 12 6 0 0 3096576 85.87 14 3 0 1 [/COLOR]3145728 85.74 20 1 0 0 3211264 82.12 16 0 0 2 [COLOR=Blue]3317760 82.69 13 4 1 0 3359232 74.71 9 8 0 0 3386880 71.31 9 3 1 2 [/COLOR]3932160 80.93 18 1 1 0 [COLOR=Blue]4014080 80.65 14 0 1 2 [/COLOR]4096000 73.66 15 0 3 0 4194304 95.87 22 0 0 0 4423680 87.81 15 3 1 0 4718592 89.84 19 2 0 0 [COLOR=Blue]4741632 80.62 9 3 0 3 [/COLOR]4816896 81.08 15 1 0 2 5242880 87.60 20 0 1 0 5308416 85.52 16 4 0 0 [COLOR=Blue]5419008 82.21 12 3 0 2 5619712 82.38 14 0 0 3 5971968 87.28 13 6 0 0 [/COLOR]6193152 85.87 15 3 0 1 6291456 85.74 21 1 0 0 6422528 82.12 17 0 0 2 [COLOR=Blue]6635520 82.69 14 4 1 0 6718464 74.71 10 8 0 0 6773760 71.31 10 3 1 2 [/COLOR]7864320 80.93 19 1 1 0 8028160 80.65 15 0 1 2 8192000 73.66 16 0 3 0[/CODE][/QUOTE] |
So I picked FFT_size = 2985984 (2^12 * 3^6).
Does 45ms/iteration look OK for a GT 430? [code]>cudalucas.2.00]$ ./cul -d 0 -f 2985984 -t 49845883 DEVICE:0------------------------ name GeForce GT 430 clockRate 1400000 start M49845883 fft length = 2985984 Iteration 10000 M( 49845883 )C, 0xbb8661cd90463e94, n = 2985984, CUDALucas v2.00 err = 0.03711 (7:38 real, 45.7782 ms/iter, ETA 633:38:49) Iteration 20000 M( 49845883 )C, 0xf1d53981f966befa, n = 2985984, CUDALucas v2.00 err = 0.03711 (7:39 real, 45.8859 ms/iter, ETA 635:00:37) .[/code] |
[QUOTE=zs6nw;295768]Aaron, thanks for that great insight, and would you believe, available for reading in a manual. :smile:
This immediately interested me in FFT_size = 4194304 which is 95.87% efficient, but unfortunately the program terminated with error too large.[/QUOTE] The eff% numbers are artificially scaled so that the numbers are all in the range of 0-100%. It's better to assume they represent the efficiency as compared to the theoretical maximum efficiency at that given size, although I am not sure what the actual theoretical maximum would be... I don't know why that FFT size is "too large" as it doesn't take much memory, and seems to work just fine for me. What's the specific error message? |
[QUOTE=aaronhaviland;295773]
I don't know why that FFT size is "too large" as it doesn't take much memory, and seems to work just fine for me. What's the specific error message?[/QUOTE]I think he just meant round-off error, not an FFT/mem error. |
[QUOTE=Dubslow;295774]I think he just meant round-off error, not an FFT/mem error.[/QUOTE]
Pardon my inconcise-ness... It was indeed a round-off error. If you do get a round-off error "deep" in, is all your previous work wasted? |
[QUOTE=zs6nw;295779]
If you do get a round-off error "deep" in, is all your previous work wasted?[/QUOTE] Somewhere from maybe to probably. Prime95 automatically skips back to the most recent checkpoint, which defaults to every half hour; that older work may still be good, and is worth finishing. (Around half of so called "Suspect" tests turn out to be correct, mostly because of such error handling). I'm not sure how CUDALucas handles the errors, but I do know that if you use the -s option, it writes all checkpoint files separately, meaning you can choose one and manually revert to an older save file. In such case it is worth finishing the test, but it may or may not be good. |
CUDALucas Submission Spider
In the vein of chalsall's submission spider, and in the lookout to test my Python-foo, I decided to write a CUDALucas result submission spider; more than one exponent has had a mismatch, but we couldn't run Prime95 ourselves because the exponent was submitted before checking for a match. That's what this script is designed to do automatically; it checks the exponent status page, and under the right conditions, submits the exponent result, or else prints a warning.
It is used in the same way as chalsall's spider (and draws much of its design from there): Modify the variables at the top as necessary, and run the spider in the directory containing "result.txt", or provide the correct directory as an argument (e.g. for crontabs). Errors automatically terminate the program, and include such things as a bad internet connection or a bad response from PrimeNet. If an error occurs, the current "result.txt" is moved to "failed_(NOW).txt", where result.txt contains everything it did before the script started, i.e. no information is lost (excepting an OS error, but then you've got other problems). Warnings are issued if the script can't parse a line, or can't decide what to do with an exponent; an appropriate message is logged, the offending line is also logged, and the script moves on to the next line. If an exponent result is correctly parsed, it's passed into the "decide" function, which decides what's appropriate; it has the following logic: [code] if( 2*"Verified LL" is in the expo status page) { then submit anyways, just in case } else if( there is a string of 14 lowercase hex digits [or all decimals] { if(all decimals) { print warning; ask user to check exponent manually } else { we know there's a CuLu test; exponent will not be submitted if(your residue matches another) { print "match"; do not submit } else { print no current matches, use Prime95! do not submit } } else { no previous CuLu, and not DCed if( there is a matching residue ) {submit!} else { print warning: no match; do not submit } }[/code] If you're reading this far, then presumably you're at least slightly interested; the code is viewable (in browser, no download necessary) at [url]www.dubslow.tk/gimps/CuLuSpider.txt[/url], or you can download it directly from [url]www.dubslow.tk/gimps/CuLuSpider.py[/url] . Unfortunately, due to a paucity of exponents, I have not tested every case; I do know that the "Verified LL" portion works as advertised (thanks msft for posting that 2.00 result you had :smile:), and that it works in the basic case of a match with no prior CuLu result, however the other scenarios remain untested (but *should* work). If you would like to try running this, you need Python 3, which can be downloaded here: [url]http://python.org/download/releases/3.2.2/#download[/url] In Windows, the installer will associate .py extensions with the interpreter; in *nix, the hashbang should be good enough, assuming you have the interpreter in your PATH. As always, PLEASE report any bugs or unhandled-exceptions, etc. For the first two or three results, I'd recommend checking the exponent status page, and then watching the script run. And, because I'm not sure that the rest works, please also report any successes, especially for cases I haven't been able to test yet. (Yes, yes, this is overkill, but now my Python-foo is that much better :smile:) |
Success
[QUOTE=msft;295750][code]
$ ./CUDALucas -threads 512 332220523 DEVICE:0------------------------ name GeForce GTX 550 Ti totalGlobalMem 1072889856 ... start M332220523 fft length = 18874368 err = 0.35937, increasing n from 18874368 start M332220523 fft length = 18874368 err = 0.35937, increasing n from 18874368 start M332220523 fft length = 20971520 Iteration 10000 M( 332220523 )C, 0x1a313d709bfa6663, n = 20971520, CUDALucas v1.66 err = 0.03358 (22:30 real, 134.9292 ms/iter, ETA 12451:20:29) Iteration 20000 M( 332220523 )C, 0x73dc7a5c8b839081, n = 20971520, CUDALucas v1.66 err = 0.03358 (22:26 real, 134.5456 ms/iter, ETA 12415:34:17) [/code][/QUOTE] You're my man. Increasing threads from default 256 to 512 has helped. Thanks. |
[QUOTE=flashjh;295762]If it matches yours, let me know so we can submit the results together. It it doesn't match, let me know what you'd like to do.[/QUOTE]
Yes, that matches the one I got. |
[QUOTE=Dubslow;295613]By the way, top is showing that CUDALucas is consistently using around 15+% of a core. Is there any reason for this, or a way to stop it?[/QUOTE] It seems to be the -k option. Without -k option, my machine is at 1-2% of a core, with -k option it rises to 10-20%.
Also would appreciate a confirmation whether 45ms/iteration is normal for a GT 430: [code]Iteration 1000000 M( 49845883 )C, 0x656fca42e4bb67e3, n = 2985984, CUDALucas v2.00 err = 0.03711 (1:14:57 real, 44.9724 ms/iter, ETA 609:37:34)[/code] |
[QUOTE=zs6nw;295831]It seems to be the -k option. Without -k option, my machine is at 1-2% of a core, with -k option it rises to 10-20%.[/quote]That's depressing, I was having fun with 'p'. I guess I'll turn it off.
[QUOTE=zs6nw;295831] Also would appreciate a confirmation whether 45ms/iteration is normal for a GT 430: [code]Iteration 1000000 M( 49845883 )C, 0x656fca42e4bb67e3, n = 2985984, CUDALucas v2.00 err = 0.03711 (1:14:57 real, 44.9724 ms/iter, ETA 609:37:34)[/code][/QUOTE] That seems about right; that's a low end card, and you're working on the main LL wave, whose tests are 3-4x as much work as the double checks that others here are doing. For 26M, my GTX 460 gets ~6ms/iter, so it'd get ~15ms/iter would be my guesstimate for a 49M. Divide by three for the 430 vs. 460 and then it seems about right. Have you fiddled with other FFT lengths? (Maybe mess with -threads?) |
[QUOTE=zs6nw;295831]It seems to be the -k option. Without -k option, my machine is at 1-2% of a core, with -k option it rises to 10-20%.
Also would appreciate a confirmation whether 45ms/iteration is normal for a GT 430: [code]Iteration 1000000 M( 49845883 )C, 0x656fca42e4bb67e3, n = 2985984, CUDALucas v2.00 err = 0.03711 (1:14:57 real, 44.9724 ms/iter, ETA 609:37:34)[/code][/QUOTE] It is close, James' site estimates 580.1 hours for a 50M exp, so you're within 5%. |
[QUOTE=apsen;295827]Yes, that matches the one I got.[/QUOTE]
Do you want me to submit my LL and you can submit the DC? |
[QUOTE=flashjh;295837]Do you want me to submit my LL and you can submit the DC?[/QUOTE]
Go ahead. |
[QUOTE=apsen;295872]Go ahead.[/QUOTE]
All done, thanks. |
[QUOTE=zs6nw;295831]It seems to be the -k option. Without -k option, my machine is at 1-2% of a core, with -k option it rises to 10-20%.
[/QUOTE] Hmm, I've turned off -k but I'm still seeing 15-20% CPU usage. [code]LD_LIBRARY_PATH=~/CUDALucas/lib ~/CUDALucas/CUDALucas -c 10000 -f 1474560 -polite 0 worktodo.txt[/code] |
The -polite 0 option seems to be the actual culprit.
[QUOTE=Dubslow;295890]Hmm, I've turned off -k but I'm still seeing 15-20% CPU usage. [code]LD_LIBRARY_PATH=~/CUDALucas/lib ~/CUDALucas/CUDALucas -c 10000 -f 1474560 -polite 0 worktodo.txt[/code][/QUOTE] |
[QUOTE=zs6nw;295931]The -polite 0 option seems to be the actual culprit.[/QUOTE]
Indeed, I can confirm this. msft, do you know why it does that? (Btw, if you DLed the script between 10 minutes before this post and ~1 hr after the original post, there was a typo that is now fixed that prevented it from working properly.) |
[QUOTE=Dubslow;295933]Indeed, I can confirm this. msft, do you know why it does that?[/QUOTE]
Side effect. |
[QUOTE=msft;295965]Side effect.[/QUOTE]
I suppose the better question is, is it possible to get aggressive (p=0) performance without adding the extra cpu time, or is that just the way it is? |
[QUOTE=Dubslow;295971]I suppose the better question is, is it possible to get aggressive (p=0) performance without adding the extra cpu time, or is that just the way it is?[/QUOTE]
I haven't looked carefully at the CUDALucas code. However, this sounds very similar to the standard NVIDIA problem of using spin-loops. @msft: I sent a patch to Cyril for the GPU version of gmp-ecm that cures this problem for his application. xilman acknowledged in another thread ( [URL]http://www.mersenneforum.org/showpost.php?p=295541&postcount=63[/URL] ) that it worked for him. It's only a dozen+ lines. Since you are using the CUFFT library, there may be some complications, but if you are interested and if you think it may be the NVIDIA spin-loops, PM me, and I'll send you the technique. |
[QUOTE=Dubslow;295971]I suppose the better question is, is it possible to get aggressive (p=0) performance without adding the extra cpu time, or is that just the way it is?[/QUOTE]
"-polite 64" Good balance on my linux box. |
[QUOTE=msft;295993]"-polite 64" Good balance on my linux box.[/QUOTE]
Well I'll be. What exactly does the 64 mean? I thought it was just a binary switch. |
1 Attachment(s)
Graph of GTX460 cufftbench timing data.
[QUOTE=Prime95;294530]Attached is my cufftbench for a GTX460. I flagged with a "Y" the FFT sizes that make sense.[/QUOTE] |
[QUOTE=Dubslow;295994]Well I'll be. What exactly does the 64 mean? I thought it was just a binary switch.[/QUOTE]
"-polite x" will be polite... every x iterations. It says in the help. :P There was an example with 100 earlier in this thread. So, the higher the number, the more aggressive it becomes. Use 0 for "infinite aggressive" (never do the wait loop). Use 1 for the most polite (do the wait loop every "1" iterations). |
[QUOTE=Dubslow;295786]CuLu Spider...
If an exponent result is correctly parsed, it's passed into the "decide" function, which decides what's appropriate; it has the following logic: <snip> Unfortunately, due to a paucity of exponents, I have not tested every case; I do know that the "Verified LL" portion works as advertised (thanks msft for posting that 2.00 result you had :smile:), and that it works in the basic case of a match with no prior CuLu result, however the other scenarios remain untested (but *should* work).[/QUOTE] I can now confirm that it will successfully notice a previous CUDALucas test, and will therefore not submit such an exponent. The underlined parts of the logic have now been tested at least once: [code] [U]if( 2*"Verified LL" is in the expo status page) { then submit anyways, just in case[/U] } [U]else if( there is a string of 14 lowercase hex digits [or all decimals][/U] { if(all decimals) { print warning; ask user to check exponent manually } [U]else { we know there's a CuLu test; exponent will not be submitted if(your residue matches another) { print "match"; do not submit }[/U] else { print no current matches, use Prime95! do not submit } } else {[U] no previous CuLu, and not DCed if( there is a matching residue ) {submit!} else { print warning: no match; do not submit }[/U] }[/code] Edit: Version 0.02 now available; fixed a logging bug; no change to major functionality. I have now also tested that a mismatch with no previous CuLu result is properly detected. Chart above updated as such. ([url]http://dubslow.tk/gimps/CuLuSpider.txt[/url] View in browser) ([url]http://dubslow.tk/gimps/CuLuSpider.py[/url] Download) |
All this double accounting is dubious. Who is served by having some unobservable wrong or right residue? (Perhaps some misplaced pride? Well, in that case, it would be better served by tuning the card to work right, not just "look ma! no hands! 5GHz!!")
I've submitted a non-matching residue long ago and never thought twice about it but bookmarked the result to revisit later. [URL="http://www.mersenne.org/report_exponent/?exp_lo=27402559&exp_hi=10000&B1=Get+status"]Et voila[/URL] - CUDA was right. (I've looked back at the version, it was CUDALucas v1.48.) |
LaurV and flash and me etc. prefer to be notified, so that 1) the TC is run on P95 (would probably happen anyways, but just to be sure) and with the various changes and bad residues in 1.55-1.6x they weren't very trustworthy of the prog, though I think it's gotten better since 2.00. (And besides, if you look back to the original post, I definitely said it was overkill :razz:)
|
I am simply observing that this is a fractionism not very much dissimilar from rcv's: "I'll take the glue and the plywood and strings and I'll build my own Wright Stuff plane (or my own catapult) at home and I won't show you guys anything --::raspberry::-- until the school competition" ...and then at the school competition <left for the reader to fill in>.
It is a bit different from John Galsworthy/Groucho Marx's "I don’t care to belong to any club that will have me as a member". But I may be wrong, who knows. |
[QUOTE=Batalov;296169]All this double accounting is dubious. [/QUOTE]
I agree, there [URL="http://www.mersenne.org/report_exponent/?exp_lo=26556359&exp_hi=&B1=Get+status"]are[/URL] [URL="http://www.mersenne.org/report_exponent/?exp_lo=36000199&exp_hi=&B1=Get+status"]instances[/URL] where I've submitted a CUDALucas residue that didn't agree with a previous residue, but were later confirmed, or one of two previous differing residues were confirmed correct. Then there are a [URL="http://www.mersenne.org/report_exponent/?exp_lo=36500089&exp_hi=&B1=Get+status"]couple[/URL] [URL="http://www.mersenne.org/report_exponent/?exp_lo=36500119&exp_hi=&B1=Get+status"]where[/URL] the CUDALucas residue was wrong. I believe that if you've got a residue, submit it! |
[QUOTE=Dubslow;296171]and bad residues in 1.55-1.6x[/QUOTE]
All causes was hardware. |
1 Attachment(s)
[QUOTE=Batalov;296169]All this double accounting is dubious. Who is served by having some unobservable wrong or right residue? (Perhaps some misplaced pride? Well, in that case, it would be better served by tuning the card to work right, not just "look ma! no hands! 5GHz!!")
I've submitted a non-matching residue long ago and never thought twice about it but bookmarked the result to revisit later. [URL="http://www.mersenne.org/report_exponent/?exp_lo=27402559&exp_hi=10000&B1=Get+status"]Et voila[/URL] - CUDA was right. (I've looked back at the version, it was CUDALucas v1.48.)[/QUOTE] There is nothing about pride (well... a little bit :P) but about altruism :smile:. If I submit a DC with CL which is not matching a previous P95 FIRST check, the exponent will CONTINUE to be assigned to other [U]CudaLucas[/U] workers, they will NOT notify, and - in case [B]MY residue was correct[/B] - WASTE their time and resources. There is not about "setting the card right". If I overclock and my residue was wrong, then the only one wasting time is ME. Because the third worker will have his TC [B]accepted[/B] (matching original [U]P95[/U] residue). The real problem -- you as a TC worker, wasting your time -- is when I DO NOT overclock (or I use the Teslas, I have 2 of them) and my rsidues ARE correct. That is why we have (had) threads as "do not DC them with CL" etc. To avoid wasting the time of the TC-ers. You don't know how many others wasted their time between you reporting the mismatched DC and the "et voila". Maybe other 1, 2, 10 etc tried it with CL, reported, but their result refused by the PrimeNet DB and it is nowhere recorded. And yes, I also bookmark my mismatches, and always revisit them to actualize the bookmark list, and if one stays there too long I will queue it myself in P95 or a CL TC just to make sure. See former discussions here around. And here is a screen snap to prove it: |
[QUOTE=LaurV;296227]There is nothing about pride (well... a little bit :P) but about altruism :smile:. If I submit a DC with CL which is not matching a previous P95 FIRST check, the exponent will CONTINUE to be assigned to other [U]CudaLucas[/U] workers, they will NOT notify, and - in case [B]MY residue was correct[/B] - WASTE their time and resources. There is not about "setting the card right". If I overclock and my residue was wrong, then the only one wasting time is ME. Because the third worker will have his TC [B]accepted[/B] (matching original [U]P95[/U] residue). The real problem -- you as a TC worker, wasting your time -- is when I DO NOT overclock (or I use the Teslas, I have 2 of them) and my rsidues ARE correct. That is why we have (had) threads as "do not DC them with CL" etc. To avoid wasting the time of the TC-ers.
You don't know how many others wasted their time between you reporting the mismatched DC and the "et voila". Maybe other 1, 2, 10 etc tried it with CL, reported, but their result refused by the PrimeNet DB and it is nowhere recorded. And yes, I also bookmark my mismatches, and always revisit them to actualize the bookmark list, and if one stays there too long I will queue it myself in P95 or a CL TC just to make sure. See former discussions here around. And here is a screen snap to prove it:[/QUOTE] I'm pretty sure the server records all bad residues, example: [URL]http://www.mersenne.org/report_exponent/?exp_lo=22545883&exp_hi=22545883&B1=Get+status[/URL], where it took a QC(quadruple check) in order to get a match. So, turning in your result, whether it matches or not, is as Batalov says. After all, there really is not a difference if you overclock and get a bad residue and you don't overclock and get a good residue while the original residue is bad, you either have LL=good, DC=bad, TC=good or LL=bad, DC=good, TC=good. If you do not turn in, waiting for someone else to run it to 'double check you' you could run into: LL=bad, DC=good, TC=bad, QC=good or LL=good, DC=bad, TC=bad, QC=good or LL=bad, DC=bad, TC=good, QC=good, etc. If, in the example I pasted above, the original run was bad, you would have ended up with LL=bad, DC=bad, TC=bad, QC=good, QTC(Quintuple check)=good. Primenet will hand the exponent out until it gets a match, and record all attempts. IMO, it's better to submit regardless of match/mismatch, and let primenet take care of it automatically, which is less time consuming. |
That is all gibberish. LL bad, DC (by CL) good, then all the other TC, QC whatever, by CL, is WASTE of time, they are either bad (which means waste of time, but you get credit) or good (matching the DC), which means they are refused by the server (as "same result by third party program", no credit is given, the report is NOWHERE recorded. Try reporting same CL result two times and see what's happening, then talk about the subject when you know it).
|
[QUOTE=LaurV;296251]That is all gibberish. LL bad, DC (by CL) good, then all the other TC, QC whatever, by CL, is WASTE of time, they are either bad (which means waste of time, but you get credit) or good (matching the DC), which means they are refused by the server (as "same result by third party program", no credit is given, the report is NOWHERE recorded. Try reporting same CL result two times and see what's happening, then talk about the subject when you know it).[/QUOTE]
Maybe I am misunderstanding you due to the differences in language understanding. You are either saying 1) you tested and submitted a CL result that was a mismatch, so you reran the test and resubmitted a matching result and the server refused to take it "same result by third party program", or 2) you tested and submitted a mismatch and someone else using the same program tested it and submitted a matching result to your and was told "same result by third party program". If case 1 is correst, I would expect this from the server, since it should NOT accept the same result more than once from the same person, regardless of the program used. It would be too easy for a person to take the results from P95 and create a CL result and get double credit. If case 2 is correct, then it sounds lilke a problem, because 2 separate people submitting matching results and happening to use the same program would, to me anyway, not make sense. |
[QUOTE=LaurV;296251]That is all gibberish. LL bad, DC (by CL) good, then all the other TC, QC whatever, by CL, is WASTE of time, they are either bad (which means waste of time, but you get credit) or good (matching the DC), which means they are refused by the server (as "same result by third party program", no credit is given, the report is NOWHERE recorded. Try reporting same CL result two times and see what's happening, then talk about the subject when you know it).[/QUOTE]
I still don't get the point. Let's assume the first run is by P95 and the second run is by CL and doesn't match. Now there are 4 cases, the CL run is submitted or not, and valid or not: 1. Don't submit and not valid: A TC can be done and submitted by either CL or P95. 2. Submit and not valid: Again, a TC can be done and submitted by either CL or P95. So if the residue is not valid, it makes [B]no difference[/B] whether you submit or not. 3. Don't submit and valid: A TC must be done by P95. Although it can be run again with CL, the server will reject the TC when both valid CL runs are submitted. 4. Submit and valid: A TC must be done by P95. Although it can be run again with CL, the server will reject it. So if the residue is valid, it makes [B]no difference[/B] whether you submit or not. Therefore, always submit. :smile: Now, the interesting point is that if a number has two non-matching residues, one done by CL, the number should not be run again with CL. It could be either case 2 or 4, so the CL run may be wasted. But if you have already completed a run with CL, there is no reason to not submit it. Greg |
In the vein of other mostly useless projects...
1 Attachment(s)
I hacked some mfatkc code so that CUDALucas can read "standard" GIMPS assignment lines ("Test=AID,exponent" and "DoubleCheck=..."), and remove each one from worktodo.txt as they're completed. What this really means is that I copied Christenson's (?) code, modified it slightly, and pasted it into CUDALucas.cu. At any rate, it works for me :smile:
It's now a few hundred lines longer, but *in theory* (in the vaguest sense of the phrase) it means that CUDALucas will be easier to automate, if Christenson ever decides to put his precious efforts towards that task :smile:. It also means that you can now copy and paste work straight from PrimeNet/GPU272 without having to delete all the information except the exponent. On the other hand, it might require a re-licensing; mfaktc is under the GPL, though I'm not aware what the current CUDALucas license is (if any), or if anybody cares enough to bother :razz:. At the very least, I'm proud that it works, even if I hardly wrote anything :razz::razz: (thank you Christenson!) It is compatible with 2.00, meaning that you call it with exactly the same command, and it'll resume just the same as before. It (temporarily) renders cudalucas.ini useless; this was the first half of my project to hack even more mfaktc code (:smile:) to get basic .ini functionality, and perhaps be able to specify FFT length in the worktodo line. (Note that the version string is "lol" at the moment :smile:) I also modified some of the messages printed to be slightly more grammatically correct; that's the only liberty I took with the existing code, besides modifying main(). Fortunately for me, everything like resuming/writing checkpoints etc. was abstracted from main() (in check()) so I didn't mess with anything critical. The only change to main was the part the reads in assignments. I did add a few declarations above main() (but below the rest of the existing code); all other additions are below main(), which is to say below all previously existing code. For convenience of anybody checking my hacking (it hardly qualifies as coding) all my comments are preceded by hashes, e.g. "//#" or "/*#" so that a Ctrl+F should be sufficient to find all my comments, and therefore all my changes. New CUDALucas.cu is attached; it should compile just fine with the old Makefile. (I can't test Windows compiling, but g++/nvcc didn't complain at me.) (I did test that resuming 2.00 stuff works, because I'm running my current expo with this new version, while it was started with 2.00.) [/more useless spam] Edit: Here's a copy/paste of my "production" terminal: [code]Iteration 16330000 M( 26273341 )C, 0x66b743a75bcbccea, n = 1474560, CUDALucas v2.00 err = 0.1162 (0:54 real, 5.4614 ms/iter, ETA 15:04:46) Iteration 16340000 M( 26273341 )C, 0x780b400cb7e3ef0b, n = 1474560, CUDALucas v2.00 err = 0.1162 (0:56 real, 5.5237 ms/iter, ETA 15:14:10) Iteration 16350000 M( 26273341 )C, 0xe1cba399ba32200e, n = 1474560, CUDALucas v2.00 err = 0.1162 (0:57 real, 5.7331 ms/iter, ETA 15:47:52) ^C^C caught. Writing checkpoint. bill@Gravemind:~/CUDALucas∰∂ CUDALucas -c 10000 -f 1474560 -polite 64 worktodo.txt WARNING: ignoring line 1 in "worktodo.txt"! Reason: doesn't begin with Test= or DoubleCheck= WARNING: ignoring line 2 in "worktodo.txt"! Reason: doesn't begin with Test= or DoubleCheck= WARNING: ignoring line 3 in "worktodo.txt"! Reason: doesn't begin with Test= or DoubleCheck= WARNING: ignoring line 4 in "worktodo.txt"! Reason: doesn't begin with Test= or DoubleCheck= WARNING: ignoring line 5 in "worktodo.txt"! Reason: doesn't begin with Test= or DoubleCheck= WARNING: ignoring line 6 in "worktodo.txt"! Reason: doesn't begin with Test= or DoubleCheck= No valid assignment found. bill@Gravemind:~/CUDALucas∰∂ nano worktodo.txt bill@Gravemind:~/CUDALucas∰∂ CUDALucas -c 10000 -f 1474560 -polite 64 worktodo.txt continuing work from a partial result M26273341 fft length = 1474560 iteration = 16358145 Iteration 16360000 M( 26273341 )C, 0x523ba68f8a9962ce, n = 1474560, CUDALucas vlol err = 0.09326 (0:12 real, 1.1720 ms/iter, ETA 3:13:34) Iteration 16370000 M( 26273341 )C, 0x2e8afa1230a7ce30, n = 1474560, CUDALucas vlol err = 0.09766 (0:56 real, 5.6254 ms/iter, ETA 15:28:11) Iteration 16380000 M( 26273341 )C, 0x22b7d6757e8729a1, n = 1474560, CUDALucas vlol err = 0.1016 (0:55 real, 5.4859 ms/iter, ETA 15:04:15)[/code] Where is says "ignoring line..." is where I forgot to take the list of exponents and convert them to proper GIMPS format :smile: |
@Dubslow- I'm impressed. You seem to have progressed a lot in code tweaking.
I may have missed something in the last few days, but what does "-polite 64" indicate? Does the '64' relate to '-polite', or not? :question: |
[QUOTE=kladner;296392]I may have missed something in the last few days, but what does "-polite 64" indicate?[/QUOTE]You [url=http://www.mersenneforum.org/showpost.php?p=296091&postcount=1219]did miss it[/url].
|
[QUOTE=James Heinrich;296405]You [URL="http://www.mersenneforum.org/showpost.php?p=296091&postcount=1219"]did miss it[/URL].[/QUOTE]
Thanks James. That [I]is[/I] an interesting capacity. |
msft, I certainly appreciate the masterpiece of a CUDA program source code that is CUDALucas.cu v1.66 speaking from a commercial software developer's point-of-view.
Of course, you did a little bit more than merely making a correction to the spelling of agressive->aggressive. Thank you. |
Found a Bug in my hack
I found a bug (design error?) in my hack; it deletes the current assignment from worktodo.txt regardless of whether or not the test is finished or if it was just paused. I'm sure there's an easy fix, but it's 0519 local, and I found the bug while setting up my comp for sleep :sleep:, so I'm definitely not going to investigate right now :razz:
|
1 Attachment(s)
[QUOTE=Dubslow;296442]I found a bug (design error?) in my hack; it deletes the current assignment from worktodo.txt regardless of whether or not the test is finished or if it was just paused. I'm sure there's an easy fix, but it's 0519 local, and I found the bug while setting up my comp for sleep :sleep:, so I'm definitely not going to investigate right now :razz:[/QUOTE]
Easy fix. New file attached; version string changed to "v2.00a". I tried to add some of Prime95's torture test cases, but as it turns out, all those have a variety of different checkpoints, none of which are 10,000; and worse, only the 33rd-64th bits of the residue are actually checked, not all 64, so there's incomplete data for msft's check() function. |
Bad News: I got a DC mismatch.
Good News: My script detected the mismatch. Good News: I'm quite sure my run is correct, barring cosmic errors; it's my first mismatch (out of 3 recent tests and many more 1.2/1.3 DCs). Anyone want to run it through Prime95? (Edit: Expo is 26409557) Bad News: The prog didn't automatically continue with next assignment. Good News: "Stupid Programmer Error": Somehow I failed in understanding how a "Test=..." line is constructed, and counted the wrong number of commas (but it worked fine for the first few assignments). Either way, stupid error == easy fix; change line 1772 from [code]if ((2!=number_of_commas)) // must have 2 commas...[/code] to [code]if ((2!=number_of_commas) && 3!=number_of_commas) // must have 2 or 3 commas...[/code] and all will be good :razz: |
[QUOTE=Dubslow;296584]Anyone want to run it through Prime95? (Edit: Expo is 26409557)[/QUOTE]
I will queue it second on my list (after current DC expo finishes in 6-7 hours) on one core. ETA ~3 days. Should I automatically report it? Or check with you first? (in this case please remind me in few days, otherwise it may remain unreported). |
Go ahead and report it. I'll claim the Anon, so when you report it the assignment isn't closed.
|
[QUOTE=Dubslow;296584]Somehow I failed in understanding how a "Test=..." line is constructed, and counted the wrong number of commas[code]// must have 2 or 3 commas...[/code][/QUOTE]It can potentially have 4 commas, at least in Prime95, which may record the FFT size in use if it's close to the transition point. This is the regex I use:[code]^(Test|DoubleCheck)=(|N/A,|[A-F0-9]{32},)(FFT2=[0-9]+[KM],)?([0-9]{4,10}),([0-9]{1,2}),([01])$[/code]Which would match something like[quote]
Test=[color=orangered]ABCDEF01234567890123456789ABCDEF,[/color][color=blue]FFT2=2880K,[/color][color=darkgreen]45678901,[/color][color=darkorange]72,[/color][color=purple]1[/color] Test=[color=orangered]n/a,[/color][color=blue]FFT2=2880K,[/color][color=darkgreen]45678901,[/color][color=darkorange]72,[/color][color=purple]1[/color] Test=[color=blue]FFT2=2880K,[/color][color=darkgreen]45678901,[/color][color=darkorange]72,[/color][color=purple]1[/color] Test=[color=orangered]ABCDEF01234567890123456789ABCDEF,[/color][color=darkgreen]45678901,[/color][color=darkorange]72,[/color][color=purple]1[/color] Test=[color=darkgreen]45678901,[/color][color=darkorange]72,[/color][color=purple]1[/color] Test=[color=orangered]n/a,[/color][color=darkgreen]45678901,[/color][color=darkorange]72,[/color][color=purple]1[/color] [/quote]The first two elements (assignment ID and FFT size) are either-or optional (one or both could be missing). |
Nuts. Thanks, though as I recall FFT2=... is only inserted by Prime95 whenever it starts an assignment, such that PrimeNet would never assign them that way?
|
[QUOTE=Dubslow;296624]Nuts. Thanks, though as I recall FFT2=... is only inserted by Prime95 whenever it starts an assignment, such that PrimeNet would never assign them that way?[/QUOTE]Correct. That portion only appears when Prime95 is uncertain which FFT size to use and runs a few iterations before the test to determine the best FFT size. It then records the FFT size used in case the test is interrupted/resumed.
|
[QUOTE=James Heinrich;296625]Correct. That portion only appears when Prime95 is uncertain which FFT size to use and runs a few iterations before the test to determine the best FFT size. It then records the FFT size used in case the test is interrupted/resumed.[/QUOTE]
Then as it stands, it's safe, I think... maybe not. I think the AID is required, even if it's blank, such that Test=2945063,69,1 wouldn't be parsed properly... crap. (And how do you do regexs in C? My lack-of-true-skill makes itself apparent.) |
Your experiences
I've had a dozen successfull matches but since 2.00 i've had 6 mismatches and only 2 matches. :-( All mismatches happened when CL started with fftlength=1572864 and increased after less than 100 iters to 1835008... Can anybody confirm this observation?
Additionally, when I insert a new expo to worktodo.txt (CL already running) it doesn't resume: cudalucas.ini is +1 to high in that case: Line 4 contains the next expo, cudalucas.ini says 4 but would require 3... Am I missing something? |
Umm... for the former problem, if you're testing in the 26M range, most of us have found that a smaller size of n = 1474560 works much better.
As for cudalucas.ini, it should only increase the count iff the exponent quits without the quit flag being set, i.e. the exponent's test is finished; if you quit with ^C, it should NOT increment; I believe that had been my experience, but obviously I'm no longer using a version that uses cudalucas.ini. |
[QUOTE=Dubslow;296746]Umm... for the former problem, if you're testing in the 26M range, most of us have found that a smaller size of n = 1474560 works much better.[/QUOTE]
Good tests are in the range 27M, bad tests beginning in upper half 28M. I'm testing with standard CL fft sizes to represent the default CL newby. [QUOTE=Dubslow;296746] As for cudalucas.ini, it should only increase the count iff the exponent quits without the quit flag being set, i.e. the exponent's test is finished; if you quit with ^C, it should NOT increment; I believe that had been my experience, but obviously I'm no longer using a version that uses cudalucas.ini.[/QUOTE] I haven't been at the PC. It ran, got a new assignment, then finished the running one and when a came home he hadn't started the new expo. All I find is a high cl.ini. No Ctrl-C was harmed during this test (harmed=pressed). |
Hmm... that's how it would behave if it didn't find anything else in worktodo.txt, even though you had added something. Here's the code:
[code] while (fgets (str, 132, fp) != NULL && quitting == 0) //fp is the workfile { if (sscanf (str, "%u", &q) == 1) //If we've read in an exponent { if (q < 86243) fprintf (stderr, " too small Exponent %d\n", q); else check (q, 0); //This is the main LL function } if (quitting == 0) //To get here, check() has returned either because we've finished and quitting==0, //or because ^C was hit and quitting!=0. //If we are not quitting, the increment the line and write it to cl.ini ++currentLine; fpi = fopen ("cudalucas.ini", "w"); if (fpi != NULL) { fprintf (fpi, "%d\n", currentLine); fclose (fpi); } } //Go around the while loop again, if reading the next workfile line is not NULL fclose (fp); //Close the work file and exit main(), terminating the program [/code] |
[QUOTE=Dubslow;296752]Hmm... that's how it would behave if it didn't find anything else in worktodo.txt, even though you had added something. Here's the code:[/QUOTE]
Just shooting in the dark here, since I haven't reviewed all of the CUDALucas code... But does CL close the FH (fp in the code) for the worktodo.txt file while it is working, and re-open it only when it needs to check for additional work? The quoted code suggests not, and this is probably the problem. |
[QUOTE=Dubslow;296752]Hmm... that's how it would behave if it didn't find anything else in worktodo.txt, even though you had added something. Here's the code:
[/QUOTE] I will try adding exponents only when CL is not running. I'll see if this makes a difference. I'm more worried about the mismatches. @Dub: Maybe you could post your personally trusted fft sizes and expo borders..? |
[QUOTE=chalsall;296754]Just shooting in the dark here, since I haven't reviewed all of the CUDALucas code...
But does CL close the FH (fp in the code) for the worktodo.txt file while it is working, and re-open it only when it needs to check for additional work? The quoted code suggests not, and this is probably the problem.[/QUOTE] Good point. I thought the same but am not very experienced with C file handles... |
[QUOTE=Brain;296756]Good point. I thought the same but am not very experienced with C file handles...[/QUOTE]
Modifying open files is a Really Bad Idea [SUP](TM)[/SUP]. Undefined behavior without IPC. (And, depending on the file system, even with IPC.) This is why worktodo.ADD functionality would be so useful.... |
I hate worktodo.add functions; I would vastly prefer if it just closed the dang file. It doesn't need it to be open for the whole damn test.
@Brain: The length I posted is my only trusted length :razz: I've only done 26M work, and am now moving it to 25M work (where the same length should run just fine; in fact a smaller one might do as well :P). I doubt it'd work for 28M expos, but maybe it's worth a shot? |
[QUOTE=Dubslow;296758]I hate worktodo.add functions;[/QUOTE]
From the perspective of a programmer, please explain why. [QUOTE=Dubslow;296758]I would vastly prefer if it just closed the dang file. It doesn't need it to be open for the whole damn test.[/QUOTE] Agreed. But without IPC (for example, a file lock or some other semaphore) you still have a race condition if any program other than CUDALucas touches that file. Getting extremely geeky... Yes, you also have a race condition with a worktodo.add file. But there's a trick we Unix geeks often use: if you "mv" (move) a file onto another, if the first file is open the reader will still have access to it unmodified until it closes it. (I believe (but don't know) that the same behavior exists under Windows.) Thus, if you know you only have one reader and one writer to worktodo.add (or whatever) (for, example, a work assignment spider) you can be absolutely sure of a sane state even without IPC. |
[QUOTE=chalsall;296761]From the perspective of a programmer, please explain why.[/quote]I prefer to know that exponents are added exactly in the order I want them ordered. With worktodo.add the order of adding is at the mercy of the programmer. (I realize I'm not the "typical" use case, but that doesn't change the fact that the "typical" solution still bugs me.)
[QUOTE=chalsall;296761] Agreed. But without IPC (for example, a file lock or some other semaphore) you still have a race condition if any program other than CUDALucas touches that file. [/quote]How? When CUDALucas finishes the test, it opens the file again, and reads the first "good" assignment out, closes the file again, and works on that. Thus you can modify the file however you want in between, and CUDALucas will always test the first line in the file. (You could get a race condition if you modify the file while it's between tests, but that lasts for less than a second, but that is easily avoidable if the user just uses a little common sense, or a lock/semaphore could solve that minor issue programming-side as well, and that lock/semaphore would only exist for less than a second, as opposed to existing the whole time that CUDALucas is running.) Note that I'm 99.999% sure that v2.00a, with the hacked mfaktc functions, [i]does[/i] close the file during tests; it uses a get_next_assignment() function that closes the file after getting a good assignment. (It would be trivial to check this by downloading the mfatkc source, and I'll check it myself later today.) |
[QUOTE=Dubslow;296762]How? When CUDALucas finishes the test, it opens the file again, and reads the first "good" assignment out, closes the file again, and works on that. Thus you can modify the file however you want in between, and CUDALucas will always test the first line in the file. (You could get a race condition if you modify the file while it's between tests, but that lasts for less than a second...)[/QUOTE]
That's the race condition I'm talking about. Remember I've talked before about the "once a month" or "once a year" bug. This is an example of how such bugs manifest. The programmer thinks "oh, that won't happen very often"... And, as an aside, the "once a month" bug issue was really driven home for me a few years ago when I was deploying a wireless network here in Barbados. I was using Linux boxen with ATM cards as base stations. The ATM kernel driver had a bug which would manifest about once a month, locking up the entire machine. This became a real problem when we had thirty base stations deployed.... :cry: |
[QUOTE=chalsall;296764]That's the race condition I'm talking about.
Remember I've talked before about the "once a month" or "once a year" bug. This is an example of how such bugs manifest. The programmer thinks "oh, that won't happen very often"... [/QUOTE]It's an easily solvable race condition, either with common sense on the user's part, or a lock from the program/mer. The advantage here over worktodo.add is that the lock would be [i]very[/i] short, only the time it takes to read in an assignment, as opposed to being locked for the entire duration of running the program (i.e., indefinitely) with the .add method. |
[QUOTE=Dubslow;296765]It's an easily solvable race condition, either with common sense on the user's part...[/QUOTE]
Assuming that a human is in the loop... [QUOTE=Dubslow;296765]... or a lock from the program/mer. The advantage here over worktodo.add is that the lock would be [i]very[/i] short, only the time it takes to read in an assignment, as opposed to being locked for the entire duration of running the program (i.e., indefinitely) with the .add method.[/QUOTE] You might be forgetting that file locking is a cooperative agreement (a type of semaphore) between the programs which use the file(s) in question. And file locking doesn't work on some file systems (for example, many network file systems). Additionally, having CUDALucas close the file while it works does not really enter the equation from the perspective of a fully defined automated run-time environment. To achieve your particular requirements (even if it closed the file when not needed), you would need to slave CUDALucas under another program, and "kill" it inbetween modifications to the worktodo.txt file, then restart it. (Or, have a human in the loop. But humans are so damn unreliable.... :wink:) |
[QUOTE=chalsall;296767]
You might be forgetting that file locking is a cooperative agreement (a type of semaphore) between the programs which use the file(s) in question. [/quote]This problem is not specific to my solution, but also is a problem for worktodo.add, and again, I think that my way is safer because the lock time is (much) shorter.[QUOTE=chalsall;296767] And file locking doesn't work on some file systems (for example, many network file systems).[/quote]Again, this also applies to worktodo.add.[/quote] [QUOTE=chalsall;296767] Additionally, having CUDALucas close the file while it works does not really enter the equation from the perspective of a fully defined automated run-time environment.[/quote]I'm not sure what you mean by "fully defined automated run-time environment" (this is just my own ignorance of this term). [QUOTE=chalsall;296767] To achieve your particular requirements (even if it closed the file when not needed), you would need to slave CUDALucas under another program, and "kill" it inbetween modifications to the worktodo.txt file, then restart it. (Or, have a human in the loop. But humans are so damn unreliable.... :wink:)[/QUOTE]I don't understand this. My requirements are that I can modify worktodo.txt as I like while CUDALucas is running an assignment, and that when it needs work, it reads the first/top assignment in worktodo.txt at that time. I don't see why I would need to slave it to another program--closing the file is only intended to make sure it gets the most recent version without clobbering any changes I made while it was working on the previous test. |
[QUOTE=Dubslow;296772]This problem is not specific to my solution, but also is a problem for worktodo.add, and again, I think that my way is safer because the lock time is (much) shorter. Again, this also applies to worktodo.add.[/QUOTE]
Incorrect. If you have a program which reads and writes a file ("worktodo.txt" in this case), plus reads another file and then deletes it (if it exists; "worktodo.add"), then you can create a system where another program can write to the second file in such a way that everything is sane. One simple solution is the writer to the second file checks to see if worktodo.add exists. If does, then it simply exits. Or, it moves the worktodo.add file to a temporary location, and waits for some period of time. If it still exists, then it merges what it wants to add, and moves the file back. But short of some form of cooperation (like a worktodo.add facility, passing data through an SQL table, etc), without stopping the first program it can never be entirely safe to modify worktodo.txt. And all of this assumes that there will only be one writer to worktodo.add. [QUOTE=Dubslow;296772]I'm not sure what you mean by "fully defined automated run-time environment" (this is just my own ignorance of this term).[/QUOTE] Probably not; it's my own term. It's meant to be descriptive, not a formal or official term. [QUOTE=Dubslow;296772]I don't understand this. My requirements are that I can modify worktodo.txt as I like while CUDALucas is running an assignment, and that when it needs work, it reads the first/top assignment in worktodo.txt at that time. I don't see why I would need to slave it to another program--closing the file is only intended to make sure it gets the most recent version without clobbering any changes I made while it was working on the previous test.[/QUOTE] We're looking at this from different perspectives. You want to be able to be able to modify worktodo.add in the manner you desire, knowing that you are able to observe the state of CUDALucas, and know that you have enough time to do so based on how much longer CUDALucas is going to take. I'm looking at it from the perspective of how does one create a program which modifies another program's file(s) without the knowledge of when the second program is going to modify the file(s). This is a difficult problem which occurs a lot. Think Mail Transport Agents (MTAs), for example. Or, closer to "home", how about "mfakt*" where the potential writer (human or 'bot) has no way of knowing when the file will be modified because a factor could be found at any time. |
[QUOTE=chalsall;296774]Incorrect.
If you have a program which reads and writes a file ("worktodo.txt" in this case), plus reads another file and then deletes it (if it exists; "worktodo.add"), then you can create a system where another program can write to the second file in such a way that everything is sane. One simple solution is the writer to the second file checks to see if worktodo.add exists. If does, then it simply exits. Or, it moves the worktodo.add file to a temporary location, and waits for some period of time. If it still exists, then it merges what it wants to add, and moves the file back. But short of some form of cooperation (like a worktodo.add facility, passing data through an SQL table, etc), without stopping the first program it can never be entirely safe to modify worktodo.txt. And all of this assumes that there will only be one writer to worktodo.add. [/quote]I mostly don't understand this. If the first program is not using the file in any way, how can it be unsafe to modify worktodo.txt? [QUOTE=chalsall;296774] We're looking at this from different perspectives. You want to be able to be able to modify worktodo[strike].add[/strike] [i].txt[/i] in the manner you desire, knowing that you are able to observe the state of CUDALucas, and know that you have enough time to do so based on how much longer CUDALucas is going to take.[/quote]Right, given that what I corrected was, in fact, a typo... [QUOTE=chalsall;296774] I'm looking at it from the perspective of how does one create a program which modifies another program's file(s) without the knowledge of when the second program is going to modify the file(s). This is a difficult problem which occurs a lot. Think Mail Transport Agents (MTAs), for example. Or, closer to "home", how about "mfakt*" where the potential writer (human or 'bot) has no way of knowing when the file will be modified because a factor could be found at any time.[/QUOTE]Okay, fair. But, in CUDALucas' case, we [i]do[/i] know exactly when CUDALucas will modify a file (and if it didn't delete test history, it wouldn't do any writing at all, just reading). That's why I'm saying I don't like worktodo.add in this case, because CUDALucas doesn't fall into this class of program that requires worktodo.add. (That's why I've been continuing this debate -- because this is the CUDALucas thread, so I assumed we were excepting this class of program.) (This sort of reasoning also applies to Prime95 with all GIMPS worktypes except TF, which is why it frustrates me that I have to kill it to modify worktodo.txt.) [Somewhat OT] Additionally, with mfaktc, the added benefit (for me) of being able to modify worktodo.txt out of order (i.e. not with .add) outweighs the risk that mfaktc will write to worktodo.txt, given that factors are printed on screen (and the comment about deleting history also applies here). That's why I didn't like the talk about having mfakt* apply a lock indefinitely; at the very least I'd prefer that to be a disable-able option.[/OT] |
| All times are UTC. The time now is 13:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.