mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

Dubslow 2012-04-03 03:51

[QUOTE=msft;295230]You not need SDK.
[code]
$ cat Makefile
CUDALucas: CUDALucas.o
g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm
CUDALucas.o: CUDALucas.cu
/usr/local/cuda/bin/nvcc -O2 -arch=sm_13 -I/usr/local/include CUDALucas.cu -c
clean:
-rm *.o CUDALucas
$ make
/usr/local/cuda/bin/nvcc -O2 -arch=sm_13 -I/usr/local/include CUDALucas.cu -c
g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm
$ ./CUDALucas -r
Iteration 10000 M( 86243 )C, 0x23992ccd735a03d9, n = 4608, CUDALucas v2.00 err = 0.01074 (0:20 real, 2.0263 ms/iter, ETA 2:21)
[/code][/QUOTE]

I downloaded the 2.00 version from [url]http://www.mersenneforum.org/showpost.php?p=294046&postcount=1098[/url], and here's my make file:
[code]bill@Gravemind:~/CUDALucas/2.00∰∂ cat Makefile
NVIDIA_SDK = $(HOME)/NVIDIA_GPU_Computing_SDK
CUDALucas: CUDALucas.o
g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm
CUDALucas.o: CUDALucas.cu
/usr/local/cuda/bin/nvcc -O2 -arch=sm_13 -I/usr/local/include -I$(NVIDIA_SDK)/C/common/inc CUDALucas.cu -c
clean:
-rm *.o CUDALucas CUDALucas.cu~[/code]
Also, even in yours, it calls /usr/local/cuda/bin/nvcc, which I don't have without the SDK. (And also cuda/lib64...)

aaronhaviland 2012-04-03 22:29

[QUOTE=Dubslow;295235]Also, even in yours, it calls /usr/local/cuda/bin/nvcc, which I don't have without the SDK. (And also cuda/lib64...)[/QUOTE]

nvcc and the libs aren't in the SDK; they're in the Cuda Toolkit.

frmky 2012-04-03 23:32

1 Attachment(s)
I'm a bit late to this game, but I took a look at the best FFT sizes on my GTX 480 using a 64-bit Linux binary and both CUDA 3.2 and CUDA 4.1:

[CODE]CUDA 3.2 CUDA 4.1
Size Time (ms) Size Time (ms)
1179648 0.737176 1179648 0.757979
1310720 0.869311 1310720 0.912768
1474560 0.972916 1474560 0.964209
1572864 1.047643 1605632 1.067629
1605632 1.072745 1638400 1.172933
1638400 1.190849 1769472 1.206898
1769472 1.216339 2097152 1.340003
1835008 1.248738 2293760 1.612199
2097152 1.296626 2359296 1.617333
2359296 1.522869 2654208 1.791644
2621440 1.760007 2752512 1.978546
2654208 1.784613 2949120 2.053249
2949120 2.100391 3211264 2.292879
3145728 2.111622 3276800 2.457472
3211264 2.369003 3538944 2.529949
3670016 2.552411 3670016 2.624719
4194304 2.814626 4194304 2.849048
4423680 3.067510 4423680 3.36062
4718592 3.135987 4718592 3.720233
5242880 3.531422 4816896 3.86769
5308416 3.911258 5242880 3.875832
5505024 4.235169 5308416 4.06778
5898240 4.466444 5734400 4.509711
6193152 4.464647 6193152 4.648901[/CODE]

Although there is a little variation, the best FFT sizes are mostly the same for the two versions of CUDA. Overall, CUDA 3.2 is slightly faster than CUDA 4.1 except for an FFT region around 4.2M-5M, where CUDA 3.2 is significantly faster.

flashjh 2012-04-04 07:57

I don't know why but I've had a e[COLOR=black][FONT=Verdana]xtremely [/FONT][/COLOR]bad streak of luck with 2.00:

It took me 5 runs to get a good DC on 26101843:

[CODE]
M( 26101843 )C, 0x2e20628d2010b7__, n = 1474560, CUDALucas v2.00
M( 26101843 )C, 0x9699f17722194e__, n = 1474560, CUDALucas v2.00
M( 26101843 )C, 0xa6c3cd3038b506__, n = 1474560, CUDALucas v2.00
M( 26101843 )C, 0x44c5575f619091__, n = 1474560, CUDALucas v2.00
M( 26101843 )C, 0x4c96108b152c6266, n = 1474560, CUDALucas v2.00
[/CODE]
I thought it had something to do with me remoting into the system, but when I got home it took 2 more runs.

I also just had a bad run on 26120921:
[CODE]M( 26120921 )C, 0xe34c177a793b96__, n = 1474560, CUDALucas v2.00[/CODE]

Here's my typical run line:
[CODE]e:\cuda2\cuda20032 -d 1 -threads 512 -c 10000 -f 1474560 -t -polite 0 26101843 >> 26101843.txt[/CODE]

This brings me to 6 bad and 8 good with 2.00. I'm not sure the change with -t is working. Anyone else have similar results?

rcv 2012-04-04 11:17

[QUOTE=flashjh;295365]I don't know why but I've had a e[COLOR=black][FONT=Verdana]xtremely [/FONT][/COLOR]bad streak of luck with 2.00:

It took me 5 runs to get a good DC on 26101843:
...

This brings me to 6 bad and 8 good with 2.00. I'm not sure the change with -t is working. Anyone else have similar results?[/QUOTE]

I'm batting 8 for 8 successful double-checks. 3 using my 560Ti. 5 using my 570.

[code]
./CUDALucas -t -d 1 cudal560.txt
or
./CUDALucas -t -d 0 cudal570.txt[/code][code]
M( 28376339 )C, 0xb3e29f7739547b38, n = 1572864, CUDALucas v2.00
M( 28573841 )C, 0x64c4cbb92a9c8f47, n = 1572864, CUDALucas v2.00
M( 29462357 )C, 0x3df0d8cf19726aad, n = 1835008, CUDALucas v2.00
M( 29462599 )C, 0x60e55f600332f5cd, n = 1835008, CUDALucas v2.00
M( 29462387 )C, 0x5eacbd9aaa0cca16, n = 1835008, CUDALucas v2.00
M( 29465929 )C, 0x828cde7005d78b0d, n = 1835008, CUDALucas v2.00
M( 29462623 )C, 0x8a91307e8d3531b6, n = 1835008, CUDALucas v2.00
M( 29465977 )C, 0xf49765b40d2129ae, n = 1835008, CUDALucas v2.00[/code]Unless you have a flaky card or a flaky compilation, next most obvious is the FFT-size. Mine are a bit higher than yours. (These were auto-selected by cudalucas.)

If you care to try to reproduce my results, the oldest residues (one from the 560Ti and one from the 570) that haven't scrolled off are:
[CODE]Iteration 13900000 M( 29462623 )C, 0xefef88bdb9a24848, n = 1835008, CUDALucas v2.00 err = 0.0127 (0:53 real, 5.3098 ms/iter, ETA 22:56:59)
and
Iteration 10170000 M( 29465977 )C, 0xf8d0abc8611c8221, n = 1835008, CUDALucas v2.00 err = 0.01367 (0:36 real, 3.5695 ms/iter, ETA 19:07:36)[/CODE]BTW, the final reported err= values were 0.01367 and 0.01416, respectively, which gives these results a pretty wide safety margin.

flashjh 2012-04-05 00:29

Just got another good run with 2.00. Who knows, maybe my card is failing?

On this one:
[CODE][URL="http://www.mersenne.org/report_exponent/?exp_lo=26231297&exp_hi=&B1=Get+status"]26231297[/URL]
No factors below 2^69
P-1 B1=405000
[COLOR=red]Bad LL[/COLOR] 75A6F23E6769F0DA by "David Glynn"
[COLOR=seagreen]Verified[/COLOR] LL 61CF09FB162017FF by "Scotch&Gloves_RUS" on 2011-06-15
[COLOR=red]Bad LL[/COLOR] BFD7978ACFBF14BF by "linded" on 2011-09-12
[COLOR=seagreen]Verified[/COLOR] LL 61CF09FB162017FF by "Jerry Hallett" on 2012-04-05
History 61CF09FB162017__ by "Scotch&Gloves_RUS" on 2011-06-15
History BFD7978ACFBF14__ by "linded" on 2011-09-12
History no factor for M26231297 from 2^67 to 2^68 [mfaktc 0.16 barrett79_mul32] by "Carsten Kossendey" on 2011-11-17
History no factor for M26231297 from 2^68 to 2^69 [mfaktc 0.17-Win barrett79_mul32] by "David Campeau" on 2011-11-24
History 61cf09fb162017__ by "Jerry Hallett" on 2012-04-05 [/CODE]
Scotch&Gloves_RUS's LL was Suspect before I submitted, turned out to be correct.

Dubslow 2012-04-06 00:15

[QUOTE=aaronhaviland;295323]nvcc and the libs aren't in the SDK; they're in the Cuda Toolkit.[/QUOTE]

Ok, thanks. I just downloaded 4.1 and tried to compile. Unfortunately, I've always (always ALWAYS) had problem finding CUDA libs, no matter what I put in LD_LIBRARY_PATH.
[quote]* Please make sure your PATH includes /usr/local/cuda/bin
* Please make sure your LD_LIBRARY_PATH
* for 32-bit Linux distributions includes /usr/local/cuda/lib
* for 64-bit Linux distributions includes /usr/local/cuda/lib64:/usr/local/cuda/lib
* OR
* for 32-bit Linux distributions add /usr/local/cuda/lib
* for 64-bit Linux distributions add /usr/local/cuda/lib64 and /usr/local/cuda/lib
* to /etc/ld.so.conf and run ldconfig as root[/quote]
I tried just the ldconfig thing, but that didn't work, so I tried again with LD_L_P, but that still doesn't work. To use mfaktc, I had to make a copy of the libs in the mfaktc folder, then set LD_L_P to the mfaktc folder. I just can't get it to work right with the nVidia location for the life of me.
[code]bill@Gravemind:~∰∂ tail .bashrc
fi

# set PATH so it includes user's private bin if it exists
if [ -d $HOME/bin ]; then
PATH=$PATH:$HOME/bin:$HOME/bin/c:$HOME/bin/py
fi

[U]PATH=$PATH:/usr/local/cuda/bin
LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib[/U]
PYTHONPATH=$HOME/bin/py[/code]
Despite having run ldconfig as they suggested and setting LDLP correctly, I always got this:
[code]bill@Gravemind:~/CUDALucas/2.00∰∂ cat Makefile
NVIDIA_SDK = $(HOME)/NVIDIA_GPU_Computing_SDK
CUDALucas: CUDALucas.o
g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm
CUDALucas.o: CUDALucas.cu
/usr/local/cuda/bin/nvcc -O2 -arch=sm_13 -I/usr/local/include -I$(NVIDIA_SDK)/C/common/inc CUDALucas.cu -c
clean:
-rm *.o CUDALucas CUDALucas.cu~
bill@Gravemind:~/CUDALucas/2.00∰∂ make
g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm
/usr/bin/ld: warning: libcudart.so.4, needed by /usr/local/cuda/lib64/libcufft.so, not found (try using -rpath or -rpath-link)
<snip>
/usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaPeekAtLastError'
/usr/local/cuda/lib64/libcufft.so: undefined reference to `__cudaRegisterVar'
/usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaGetLastError'
/usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaMemcpyToSymbolAsync'
/usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaStreamWaitEvent'
/usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaGetDevice'
/usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaGetExportTable'
/usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaFuncSetCacheConfig'
/usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaUnbindTexture'
/usr/local/cuda/lib64/libcufft.so: undefined reference to `__cudaRegisterTexture'
/usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaCreateChannelDesc'
/usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaBindTexture'
/usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaFuncGetAttributes'
collect2: ld returned 1 exit status
make: *** [CUDALucas] Error 1
bill@Gravemind:~/CUDALucas/2.00∰∂[/code][code]bill@Gravemind:/usr/local/cuda/lib64∰∂ ls
libcublas.so libcufft.so libcurand.so libnpp.so
libcublas.so.4 libcufft.so.4 libcurand.so.4 libnpp.so.4
libcublas.so.4.1.21 libcufft.so.4.1.21 libcurand.so.4.1.21 libnpp.so.4.1.21
libcudart.so libcuinj.so libcusparse.so
libcudart.so.4 libcuinj.so.4 libcusparse.so.4
libcudart.so.4.1.21 libcuinj.so.4.1.21 libcusparse.so.4.1.21
bill@Gravemind:/usr/local/cuda/lib64∰∂[/code]

aaronhaviland 2012-04-06 00:45

[QUOTE=Dubslow;295519]Ok, thanks. I just downloaded 4.1 and tried to compile. Unfortunately, I've always (always ALWAYS) had problem finding CUDA libs, no matter what I put in LD_LIBRARY_PATH.

I tried just the ldconfig thing, but that didn't work, so I tried again with LD_L_P, but that still doesn't work.
[code]g++ -O2 -fPIC -o CUDALucas CUDALucas.o -L/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 -lcufft -lm
/usr/bin/ld: warning: libcudart.so.4, needed by /usr/local/cuda/lib64/libcufft.so, not found (try using -rpath or -rpath-link)
<snip>
/usr/local/cuda/lib64/libcufft.so: undefined reference to `cudaPeekAtLastError'
<snip>[/code][/QUOTE]

The environment variable $LD_LIBRARY_PATH has no bearing on compilation: it only affects how the run-time linker works, and has nothing to do with the compile-time link. Likewise, ldconfig updates symlinks for the run-time linker, and has no bearing on compilation.

Based on your output, you should only need to do one thing to fix the compilation:
- It says "warning: libcudart.so.4 ... not found"
- Add "-lcudart" to the g++ line in the Makefile


For reference, here is the variant of the Makefile that I use with CUDALucas. If nvcc is in your path, you don't need to specify /usr/local/bin. Likewise, if the libs are in a system directory (e.g. /usr/local/lib/ or lib64 depending on your configuration) you may not need to specify -L/usr/local/lib:

[code]NVCC_ARCHES += -gencode arch=compute_20,code=compute_20
NVCC_ARCHES += -gencode arch=compute_13,code=compute_13

OPT = -O3
CFLAGS = $(OPT) -Wall
NVCC_FLAGS = $(OPT) -use_fast_math $(NVCC_ARCHES) --compiler-options="$(CFLAGS) -fno-strict-aliasing" --ptxas-options=-v

CUDALucas: CUDALucas.o
g++ -fPIC -o CUDALucas CUDALucas.o -lcufft -lm -lcudart -Wl,-O1 -Wl,--as-needed $(CFLAGS)
CUDALucas.o: CUDALucas.cu cuda_safecalls.h
nvcc CUDALucas.cu -c $(NVCC_FLAGS)
clean:
-rm CUDALucas *.o *~
[/code]

Dubslow 2012-04-06 01:14

[QUOTE=aaronhaviland;295523]The environment variable $LD_LIBRARY_PATH has no bearing on compilation: it only affects how the run-time linker works, and has nothing to do with the compile-time link. Likewise, ldconfig updates symlinks for the run-time linker, and has no bearing on compilation.

Based on your output, you should only need to do one thing to fix the compilation:
- It says "warning: libcudart.so.4 ... not found"
- Add "-lcudart" to the g++ line in the Makefile
[/QUOTE]

Thanks, I got it compiled, however despite all the trouble with LDLP etc., runtime linking still fails. I can workaround that though.


Edit: Could you help me understand why the latter works, but the former doesn't?
[code]bill@Gravemind:~/CUDALucas∰∂ echo $LD_LIBRARY_PATH
./lib
bill@Gravemind:~/CUDALucas∰∂ ./CUDALucas
./CUDALucas: error while loading shared libraries: libcufft.so.4: cannot open shared object file: No such file or directory
bill@Gravemind:~/CUDALucas∰∂ LD_LIBRARY_PATH=./lib ./CUDALucas
$ CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-c checkpoint_iteration] [-f fft_length] [-s folder] [-t] [-polite iteration] [-k] exponent|input_filename
$ CUDALucas [-d device_number] [-threads 32|64|128|256|512|1024] [-t] [-polite iteration] -r
$ CUDALucas [-d device_number] -cufftbench start end distance
-threads set threads number(default=256)
-f set fft length(if round off error then exit)
-s save all checkpoint files
-t check round off error all iterations
-polite GPU polite per iteration(default -polite 1) -polite 0 GPU aggressive
-cufftbench exec CUFFT benchmark (Ex. $ ./CUDALucas -d 1 -cufftbench 1179648 6291456 32768 )
-r exec residue test.
-k enable keys (p change -polite,t disable -t,s change -s)
bill@Gravemind:~/CUDALucas∰∂ [/code]

Dubslow 2012-04-06 04:31

[QUOTE=Dubslow;295527]Thanks, I got it compiled, however despite all the trouble with LDLP etc., runtime linking still fails. I can workaround that though.
[/QUOTE]
Attached Linux -x86-64 binaries with standard make file, which I believe is built as sm_13, however it requires 4.x .so files, which are in the zip.


Well, actually, whenever I try and upload the file, it fails, so I'll make it available here:

[url]http://dubslow.tk/gimps/CUDALucas2.00_sm13_4.1.tar.gz[/url]

If for some reason that doesn't work, I'll try and upload it here again.

Edit: Can't change fft size mid run?

Dubslow 2012-04-06 19:36

By the way, top is showing that CUDALucas is consistently using around 15+% of a core. Is there any reason for this, or a way to stop it?


All times are UTC. The time now is 23:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.