![]() |
[QUOTE=msft;295230]You not need SDK.
[code] $ cat Makefile CUDALucas: CUDALucas.o g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm CUDALucas.o: CUDALucas.cu /usr/local/cuda/bin/nvcc -O2 -arch=sm_13 -I/usr/local/include CUDALucas.cu -c clean: -rm *.o CUDALucas $ make /usr/local/cuda/bin/nvcc -O2 -arch=sm_13 -I/usr/local/include CUDALucas.cu -c g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm $ ./CUDALucas -r Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 4608, CUDALucas v2.00 err = 0.01074 (0:20 real, 2.0263 ms/iter, ETA 2:21) [/code][/QUOTE] I downloaded the 2.00 version from [url]http://www.mersenneforum.org/showpost.php?p=294046&postcount=1098[/url], and here's my make file: [code]bill@Gravemind:~/CUDALucas/2.00∰∂ cat Makefile NVIDIA_SDK = $(HOME)/NVIDIA_GPU_Computing_SDK CUDALucas: CUDALucas.o g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm CUDALucas.o: CUDALucas.cu /usr/local/cuda/bin/nvcc -O2 -arch=sm_13 -I/usr/local/include -I$(NVIDIA_SDK)/C/common/inc CUDALucas.cu -c clean: -rm *.o CUDALucas CUDALucas.cu~[/code] Also, even in yours, it calls /usr/local/cuda/bin/nvcc, which I don't have without the SDK. (And also cuda/lib64...) |
[QUOTE=Dubslow;295235]Also, even in yours, it calls /usr/local/cuda/bin/nvcc, which I don't have without the SDK. (And also cuda/lib64...)[/QUOTE]
nvcc and the libs aren't in the SDK; they're in the Cuda Toolkit. |
1 Attachment(s)
I'm a bit late to this game, but I took a look at the best FFT sizes on my GTX 480 using a 64-bit Linux binary and both CUDA 3.2 and CUDA 4.1:
[CODE]CUDA 3.2 CUDA 4.1 Size Time (ms) Size Time (ms) 1179648 0.737176 1179648 0.757979 1310720 0.869311 1310720 0.912768 1474560 0.972916 1474560 0.964209 1572864 1.047643 1605632 1.067629 1605632 1.072745 1638400 1.172933 1638400 1.190849 1769472 1.206898 1769472 1.216339 2097152 1.340003 1835008 1.248738 2293760 1.612199 2097152 1.296626 2359296 1.617333 2359296 1.522869 2654208 1.791644 2621440 1.760007 2752512 1.978546 2654208 1.784613 2949120 2.053249 2949120 2.100391 3211264 2.292879 3145728 2.111622 3276800 2.457472 3211264 2.369003 3538944 2.529949 3670016 2.552411 3670016 2.624719 4194304 2.814626 4194304 2.849048 4423680 3.067510 4423680 3.36062 4718592 3.135987 4718592 3.720233 5242880 3.531422 4816896 3.86769 5308416 3.911258 5242880 3.875832 5505024 4.235169 5308416 4.06778 5898240 4.466444 5734400 4.509711 6193152 4.464647 6193152 4.648901[/CODE] Although there is a little variation, the best FFT sizes are mostly the same for the two versions of CUDA. Overall, CUDA 3.2 is slightly faster than CUDA 4.1 except for an FFT region around 4.2M-5M, where CUDA 3.2 is significantly faster. |
I don't know why but I've had a e[COLOR=black][FONT=Verdana]xtremely [/FONT][/COLOR]bad streak of luck with 2.00:
It took me 5 runs to get a good DC on 26101843: [CODE] M( 26101843 )C, 0x2e20628d2010b7__, n = 1474560, CUDALucas v2.00 M( 26101843 )C, 0x9699f17722194e__, n = 1474560, CUDALucas v2.00 M( 26101843 )C, 0xa6c3cd3038b506__, n = 1474560, CUDALucas v2.00 M( 26101843 )C, 0x44c5575f619091__, n = 1474560, CUDALucas v2.00 M( 26101843 )C, 0x4c96108b152c6266, n = 1474560, CUDALucas v2.00 [/CODE] I thought it had something to do with me remoting into the system, but when I got home it took 2 more runs. I also just had a bad run on 26120921: [CODE]M( 26120921 )C, 0xe34c177a793b96__, n = 1474560, CUDALucas v2.00[/CODE] Here's my typical run line: [CODE]e:\cuda2\cuda20032 -d 1 -threads 512 -c 10000 -f 1474560 -t -polite 0 26101843 >> 26101843.txt[/CODE] This brings me to 6 bad and 8 good with 2.00. I'm not sure the change with -t is working. Anyone else have similar results? |
[QUOTE=flashjh;295365]I don't know why but I've had a e[COLOR=black][FONT=Verdana]xtremely [/FONT][/COLOR]bad streak of luck with 2.00:
It took me 5 runs to get a good DC on 26101843: ... This brings me to 6 bad and 8 good with 2.00. I'm not sure the change with -t is working. Anyone else have similar results?[/QUOTE] I'm batting 8 for 8 successful double-checks. 3 using my 560Ti. 5 using my 570. [code] ./CUDALucas -t -d 1 cudal560.txt or ./CUDALucas -t -d 0 cudal570.txt[/code][code] M( 28376339 )C, 0xb3e29f7739547b38, n = 1572864, CUDALucas v2.00 M( 28573841 )C, 0x64c4cbb92a9c8f47, n = 1572864, CUDALucas v2.00 M( 29462357 )C, 0x3df0d8cf19726aad, n = 1835008, CUDALucas v2.00 M( 29462599 )C, 0x60e55f600332f5cd, n = 1835008, CUDALucas v2.00 M( 29462387 )C, 0x5eacbd9aaa0cca16, n = 1835008, CUDALucas v2.00 M( 29465929 )C, 0x828cde7005d78b0d, n = 1835008, CUDALucas v2.00 M( 29462623 )C, 0x8a91307e8d3531b6, n = 1835008, CUDALucas v2.00 M( 29465977 )C, 0xf49765b40d2129ae, n = 1835008, CUDALucas v2.00[/code]Unless you have a flaky card or a flaky compilation, next most obvious is the FFT-size. Mine are a bit higher than yours. (These were auto-selected by cudalucas.) If you care to try to reproduce my results, the oldest residues (one from the 560Ti and one from the 570) that haven't scrolled off are: [CODE]Iteration 13900000 M( 29462623 )C, 0xefef88bdb9a24848, n = 1835008, CUDALucas v2.00 err = 0.0127 (0:53 real, 5.3098 ms/iter, ETA 22:56:59) and Iteration 10170000 M( 29465977 )C, 0xf8d0abc8611c8221, n = 1835008, CUDALucas v2.00 err = 0.01367 (0:36 real, 3.5695 ms/iter, ETA 19:07:36)[/CODE]BTW, the final reported err= values were 0.01367 and 0.01416, respectively, which gives these results a pretty wide safety margin. |
Just got another good run with 2.00. Who knows, maybe my card is failing?
On this one: [CODE][URL="http://www.mersenne.org/report_exponent/?exp_lo=26231297&exp_hi=&B1=Get+status"]26231297[/URL] No factors below 2^69 P-1 B1=405000 [COLOR=red]Bad LL[/COLOR] 75A6F23E6769F0DA by "David Glynn" [COLOR=seagreen]Verified[/COLOR] LL 61CF09FB162017FF by "Scotch&Gloves_RUS" on 2011-06-15 [COLOR=red]Bad LL[/COLOR] BFD7978ACFBF14BF by "linded" on 2011-09-12 [COLOR=seagreen]Verified[/COLOR] LL 61CF09FB162017FF by "Jerry Hallett" on 2012-04-05 History 61CF09FB162017__ by "Scotch&Gloves_RUS" on 2011-06-15 History BFD7978ACFBF14__ by "linded" on 2011-09-12 History no factor for M26231297 from 2^67 to 2^68 [mfaktc 0.16 barrett79_mul32] by "Carsten Kossendey" on 2011-11-17 History no factor for M26231297 from 2^68 to 2^69 [mfaktc 0.17-Win barrett79_mul32] by "David Campeau" on 2011-11-24 History 61cf09fb162017__ by "Jerry Hallett" on 2012-04-05 [/CODE] Scotch&Gloves_RUS's LL was Suspect before I submitted, turned out to be correct. |
[QUOTE=aaronhaviland;295323]nvcc and the libs aren't in the SDK; they're in the Cuda Toolkit.[/QUOTE]
Ok, thanks. I just downloaded 4.1 and tried to compile. Unfortunately, I've always (always ALWAYS) had problem finding CUDA libs, no matter what I put in LD_LIBRARY_PATH. [quote]* Please make sure your PATH includes /usr/local/cuda/bin * Please make sure your LD_LIBRARY_PATH * for 32-bit Linux distributions includes /usr/local/cuda/lib * for 64-bit Linux distributions includes /usr/local/cuda/lib64:/usr/local/cuda/lib * OR * for 32-bit Linux distributions add /usr/local/cuda/lib * for 64-bit Linux distributions add /usr/local/cuda/lib64 and /usr/local/cuda/lib * to /etc/ld.so.conf and run ldconfig as root[/quote] I tried just the ldconfig thing, but that didn't work, so I tried again with LD_L_P, but that still doesn't work. To use mfaktc, I had to make a copy of the libs in the mfaktc folder, then set LD_L_P to the mfaktc folder. I just can't get it to work right with the nVidia location for the life of me. [code]bill@Gravemind:~∰∂ tail .bashrc fi # set PATH so it includes user's private bin if it exists if [ -d $HOME/bin ]; then PATH=$PATH:$HOME/bin:$HOME/bin/c:$HOME/bin/py fi [U]PATH=$PATH:/usr/local/cuda/bin LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib[/U] PYTHONPATH=$HOME/bin/py[/code] Despite having run ldconfig as they suggested and setting LDLP correctly, I always got this: [code]bill@Gravemind:~/CUDALucas/2.00∰∂ cat Makefile NVIDIA_SDK = $(HOME)/NVIDIA_GPU_Computing_SDK CUDALucas: CUDALucas.o g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm CUDALucas.o: CUDALucas.cu /usr/local/cuda/bin/nvcc -O2 -arch=sm_13 -I/usr/local/include -I$(NVIDIA_SDK)/C/common/inc CUDALucas.cu -c clean: -rm *.o CUDALucas CUDALucas.cu~ bill@Gravemind:~/CUDALucas/2.00∰∂ make g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm /usr/bin/ld: warning: libcudart.so.4, needed by /usr/local/cuda/lib64/libcufft.so, not found (try using -rpath or -rpath-link) <snip> /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaPeekAtLastError' /usr/local/cuda/lib64/libcufft.so: undefined reference to `__cudaRegisterVar' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaGetLastError' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaMemcpyToSymbolAsync' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaStreamWaitEvent' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaGetDevice' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaGetExportTable' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaFuncSetCacheConfig' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaUnbindTexture' /usr/local/cuda/lib64/libcufft.so: undefined reference to `__cudaRegisterTexture' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaCreateChannelDesc' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaBindTexture' /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaFuncGetAttributes' collect2: ld returned 1 exit status make: *** [CUDALucas] Error 1 bill@Gravemind:~/CUDALucas/2.00∰∂[/code][code]bill@Gravemind:/usr/local/cuda/lib64∰∂ ls libcublas.so libcufft.so libcurand.so libnpp.so libcublas.so.4 libcufft.so.4 libcurand.so.4 libnpp.so.4 libcublas.so.4.1.21 libcufft.so.4.1.21 libcurand.so.4.1.21 libnpp.so.4.1.21 libcudart.so libcuinj.so libcusparse.so libcudart.so.4 libcuinj.so.4 libcusparse.so.4 libcudart.so.4.1.21 libcuinj.so.4.1.21 libcusparse.so.4.1.21 bill@Gravemind:/usr/local/cuda/lib64∰∂[/code] |
[QUOTE=Dubslow;295519]Ok, thanks. I just downloaded 4.1 and tried to compile. Unfortunately, I've always (always ALWAYS) had problem finding CUDA libs, no matter what I put in LD_LIBRARY_PATH.
I tried just the ldconfig thing, but that didn't work, so I tried again with LD_L_P, but that still doesn't work. [code]g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm /usr/bin/ld: warning: libcudart.so.4, needed by /usr/local/cuda/lib64/libcufft.so, not found (try using -rpath or -rpath-link) <snip> /usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaPeekAtLastError' <snip>[/code][/QUOTE] The environment variable $LD_LIBRARY_PATH has no bearing on compilation: it only affects how the run-time linker works, and has nothing to do with the compile-time link. Likewise, ldconfig updates symlinks for the run-time linker, and has no bearing on compilation. Based on your output, you should only need to do one thing to fix the compilation: - It says "warning: libcudart.so.4 ... not found" - Add "-lcudart" to the g++ line in the Makefile For reference, here is the variant of the Makefile that I use with CUDALucas. If nvcc is in your path, you don't need to specify /usr/local/bin. Likewise, if the libs are in a system directory (e.g. /usr/local/lib/ or lib64 depending on your configuration) you may not need to specify -L/usr/local/lib: [code]NVCC_ARCHES += -gencode arch=compute_20,code=compute_20 NVCC_ARCHES += -gencode arch=compute_13,code=compute_13 OPT = -O3 CFLAGS = $(OPT) -Wall NVCC_FLAGS = $(OPT) -use_fast_math $(NVCC_ARCHES) --compiler-options="$(CFLAGS) -fno-strict-aliasing" --ptxas-options=-v CUDALucas: CUDALucas.o g++ -fPIC -o CUDALucas CUDALucas.o -lcufft -lm -lcudart -Wl,-O1 -Wl,--as-needed $(CFLAGS) CUDALucas.o: CUDALucas.cu cuda_safecalls.h nvcc CUDALucas.cu -c $(NVCC_FLAGS) clean: -rm CUDALucas *.o *~ [/code] |
[QUOTE=aaronhaviland;295523]The environment variable $LD_LIBRARY_PATH has no bearing on compilation: it only affects how the run-time linker works, and has nothing to do with the compile-time link. Likewise, ldconfig updates symlinks for the run-time linker, and has no bearing on compilation.
Based on your output, you should only need to do one thing to fix the compilation: - It says "warning: libcudart.so.4 ... not found" - Add "-lcudart" to the g++ line in the Makefile [/QUOTE] Thanks, I got it compiled, however despite all the trouble with LDLP etc., runtime linking still fails. I can workaround that though. Edit: Could you help me understand why the latter works, but the former doesn't? [code]bill@Gravemind:~/CUDALucas∰∂ echo $LD_LIBRARY_PATH ./lib bill@Gravemind:~/CUDALucas∰∂ ./CUDALucas ./CUDALucas: error while loading shared libraries: libcufft.so.4: cannot open shared object file: No such file or directory bill@Gravemind:~/CUDALucas∰∂ LD_LIBRARY_PATH=./lib ./CUDALucas $ CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s folder] [-t] [-polite iteration] [-k] exponent|input_filename $ CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-t] [-polite iteration] -r $ CUDALucas [-d device_number] -cufftbench start end distance -threads set threads number(default=256) -f set fft length(if round off error then exit) -s save all checkpoint files -t check round off error all iterations -polite GPU polite per iteration(default -polite 1) -polite 0 GPU aggressive -cufftbench exec CUFFT benchmark (Ex. $ ./CUDALucas -d 1 -cufftbench 1179648 6291456 32768 ) -r exec residue test. -k enable keys (p change -polite,t disable -t,s change -s) bill@Gravemind:~/CUDALucas∰∂ [/code] |
[QUOTE=Dubslow;295527]Thanks, I got it compiled, however despite all the trouble with LDLP etc., runtime linking still fails. I can workaround that though.
[/QUOTE] Attached Linux -x86-64 binaries with standard make file, which I believe is built as sm_13, however it requires 4.x .so files, which are in the zip. Well, actually, whenever I try and upload the file, it fails, so I'll make it available here: [url]http://dubslow.tk/gimps/CUDALucas2.00_sm13_4.1.tar.gz[/url] If for some reason that doesn't work, I'll try and upload it here again. Edit: Can't change fft size mid run? |
By the way, top is showing that CUDALucas is consistently using around 15+% of a core. Is there any reason for this, or a way to stop it?
|
| All times are UTC. The time now is 23:14. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.