mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-12-28, 04:19   #650
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Ver 1.41
Fix minor Bug.
Attached Files
File Type: bz2 CUDALucas.1.4.1.tar.bz2 (27.2 KB, 88 views)
msft is offline   Reply With Quote
Old 2011-12-28, 04:21   #651
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

1.41 binarie file for Linux64.
Attached Files
File Type: bz2 CUDALucas.1.4.1.cuda4.0.Linux64.tar.bz2 (32.7 KB, 91 views)
msft is offline   Reply With Quote
Old 2011-12-28, 20:28   #652
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default Help

I can now successfully compile a 32 bit version but not 64 bit.

What I have done so far:
32 bit:
1. Installed WinXP 32 bit VM
2. Installed Nvidia GPU Toolkit and GPU SDK (e.g. version 4.0)
3. Installed Make for Windows
4. Installed MS Visual Studio 2010 Express Edition
5. Set Path for nvcc, make and cl.exe (from VS/bin)
6. Adapted makefiles taken from apsen's 1.2b for my needs. (He posted several sets for CUDA3.2/4.0 and 32/64bit )
7. Compile with make ... --> SUCCESS. But test with 216.091 failed due to wrong residue.

64 bit:
1. Used my real Win7 64 bit
2.-6. as for 32 bit, with the exception of the special 64 bit flag (-m64 and -Dx64, see apsen's makefiles)
7. Compile with make ... --> FAILED:
Code:
nvcc fatal due to (null) configuration file
I could track this back to MS Visual Studio 2010 Express Edition not natively supporting 64bit targets. Solution shall be to install Windows SDK 7.1. But this installation completely fails for me. As far as I know, the Visual Studio editions that must be paied do support it natively but I don't have that version.

For now, I have to give up.

So again, could anybody with a real MS VS (2010) edition please compile CUDALucas or has a suggestion for me?
Attached Files
File Type: txt Makefile32bit.txt (975 Bytes, 109 views)
Brain is offline   Reply With Quote
Old 2011-12-28, 23:33   #653
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default

To add for step 7:
For 32 bit, I had to call vcvars32.bat... (mspdb100.dll missing solution)
For 64 bit, I have nothing locally, only vcvarsall.bat with dead references to the other batch files.
Main wish is to be able to self compile CUDALucas for win without spending money...
Brain is offline   Reply With Quote
Old 2011-12-29, 13:43   #654
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default

New day, new try. I'm still trying to compile CUDALucas 1.4.1 with CUDA 4.0 for Win64.

First of all, I've installed VS Studio 2010 Professional Trial. Former error has vanished ('null').

Then, I've found apsen's compile readme, find attached, too. As a consequence of this I swapped to Visual Studio x64 Command Prompt.
I made some changes to my makefile: I changed my includes to "-I$(CUDA)/include" "-I$(CUDA)/include/cudart".

Makefile:
Code:
CUDA_VERSION = 4.0
CUDA_ARCH = sm_21
BIT = WIN64
CUDA_BIT = x64
CUDA = C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v$(CUDA_VERSION)
NVIDIA_SDK = C:/ProgramData/NVIDIA Corporation/NVIDIA GPU Computing SDK $(CUDA_VERSION)
VCINSTALLDIR = C:/Program Files (x86)/Microsoft Visual Studio 10.0/VC

LIBS = "$(CUDA)/lib/$(CUDA_BIT)/cudart.lib" "$(CUDA)/lib/$(CUDA_BIT)/cufft.lib"

CUFLAGS = -m64 --ptxas-options=-v "-ccbin=$(VCINSTALLDIR)/bin" -D$(BIT)  -Xcompiler /EHsc,/W3,/nologo,/Ox,/Oy,/GL -arch=$(CUDA_ARCH) -DMERS_PACKAGE -DBIT_SIEVE -DTESTING_SMALL_EXPONENTS -DSIEVE_SIZE_IN_BYTES=32 -DNUM_SMALL_PRIMES=32768 -DDO_NOT_USE_LONG_DOUBLE  "-I$(CUDA)/include" "-I$(CUDA)/include/cudart" "-I$(NVIDIA_SDK)/C/common/inc"  -D__x86_64_ -O3

LINK = link
LFLAGS = /nologo /LTCG #/ltcg:pgo

CUSRC = CUDALucas.cu setup.cu rw.cu balance.cu zero.cu

CUOBJS = $(CUSRC:.cu=.obj)

CUDALucas.exe: $(CUOBJS)
    $(LINK) $(LFLAGS) $^ $(LIBS) /out:$@

%.obj: %.cu
    nvcc -c $< -o $@ $(CUFLAGS)
1. There's a directory /include but no /include/cudart on my hard disk.
2. If I try to compile with these includes I get lots of errors, see below.
3. If I try to compile without these include it compiles successfully (see below) but does not run correctly as residue mismatches, see below.

Compile failing with includes:
Code:
nvcc -c CUDALucas.cu -o CUDALucas.obj -m64 --ptxas-options=-v "-ccbin=C:/Program Files (x86)/Microsoft Visual Studio 10.0/VC/bin" -DWIN64  -Xcompiler /EHsc,/W3,/nologo,/Ox,/Oy,/GL -arch=sm_21 -DMERS_PACKAGE -DBIT_SIEVE -DTESTING_SMALL_EXPONENTS -DSIEVE_SIZE_IN_BYTES=32 -DNUM_SMALL_PRIMES=32768 -DDO_NOT_USE_LONG_DOUBLE  "-IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v4.0/include" "-IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v4.0/include/cudart" "-IC:/ProgramData/NVIDIA Corporation/NVIDIA GPU Computing SDK 4.0/C/common/inc"  -D__x86_64_ -O3

c:\program files\nvidia gpu computing toolkit\cuda\v4.0\include\driver_types.h(3
87): error: "cudaErrorSetOnActiveProcess" has already been declared in the curre
nt scope

[...many more...]

c:\program files\nvidia gpu computing toolkit\cuda\v4.0\include\driver_types.h(8
09): error: "cudaComputeMode" has already been declared in the current scope

Error limit reached.
100 errors detected in the compilation of "C:/Users/FAMILI~1/AppData/Local/Temp/
tmpxft_000013c4_00000000-7_CUDALucas.cpp2.i".
Compilation terminated.
make: *** [CUDALucas.obj] Fehler 4
Compile succeeds without includes:
Code:
nvcc -c CUDALucas.cu -o CUDALucas.obj -m64 --ptxas-options=-v "-ccbin=C:/Program Files (x86)/Microsoft Visual Studio 10.0/VC/bin" -DWIN64  -Xcompiler /EHsc,/W3,/nologo,/Ox,/Oy,/GL -arch=sm_21 -DMERS_PACKAGE -DBIT_SIEVE -DTESTING_SMALL_EXPONENTS -DSIEVE_SIZE_IN_BYTES=32 -DNUM_SMALL_PRIMES=32768 -DDO_NOT_USE_LONG_DOUBLE "-IC:/ProgramData/NVIDIA Corporation/NVIDIA GPU Computing SDK 4.0/C/common/inc"  -D__x86_64_ -O3
tmpxft_000010bc_00000000-14_CUDALucas.ii
CUDALucas.cu(602) : warning C4018: '<' : signed/unsigned mismatch
CUDALucas.cu(609) : warning C4018: '<' : signed/unsigned mismatch
CUDALucas.cu(669) : warning C4018: '>=' : signed/unsigned mismatch
CUDALucas.cu(793) : warning C4244: 'argument' : conversion from 'float' to 'size_t', possible loss of data
nvcc -c setup.cu -o setup.obj -m64 --ptxas-options=-v "-ccbin=C:/Program Files (x86)/Microsoft Visual Studio 10.0/VC/bin" -DWIN64  -Xcompiler /EHsc,/W3,/nologo,/Ox,/Oy,/GL -arch=sm_21 -DMERS_PACKAGE -DBIT_SIEVE -DTESTING_SMALL_EXPONENTS -DSIEVE_SIZE_IN_BYTES=32 -DNUM_SMALL_PRIMES=32768 -DDO_NOT_USE_LONG_DOUBLE "-IC:/ProgramData/NVIDIA Corporation/NVIDIA GPU Computing SDK 4.0/C/common/inc"  -D__x86_64_ -O3
tmpxft_000002c0_00000000-14_setup.ii
nvcc -c rw.cu -o rw.obj -m64 --ptxas-options=-v "-ccbin=C:/Program Files (x86)/Microsoft Visual Studio 10.0/VC/bin" -DWIN64  -Xcompiler /EHsc,/W3,/nologo,/Ox,/Oy,/GL -arch=sm_21 -DMERS_PACKAGE -DBIT_SIEVE -DTESTING_SMALL_EXPONENTS -DSIEVE_SIZE_IN_BYTES=32 -DNUM_SMALL_PRIMES=32768 -DDO_NOT_USE_LONG_DOUBLE "-IC:/ProgramData/NVIDIA Corporation/NVIDIA GPU Computing SDK 4.0/C/common/inc"  -D__x86_64_ -O3
tmpxft_0000015c_00000000-14_rw.ii
rw.cu(1479) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
rw.cu(1479) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
rw.cu(1479) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
rw.cu(1479) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
rw.cu(1479) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
rw.cu(1479) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
rw.cu(1615) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
rw.cu(1622) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
rw.cu(1629) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
rw.cu(1637) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
rw.cu(1645) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
rw.cu(1653) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
rw.cu(1826) : warning C4018: '>' : signed/unsigned mismatch
rw.cu(1917) : warning C4018: '>' : signed/unsigned mismatch
nvcc -c balance.cu -o balance.obj -m64 --ptxas-options=-v "-ccbin=C:/Program Files (x86)/Microsoft Visual Studio 10.0/VC/bin" -DWIN64  -Xcompiler /EHsc,/W3,/nologo,/Ox,/Oy,/GL -arch=sm_21 -DMERS_PACKAGE -DBIT_SIEVE -DTESTING_SMALL_EXPONENTS -DSIEVE_SIZE_IN_BYTES=32 -DNUM_SMALL_PRIMES=32768 -DDO_NOT_USE_LONG_DOUBLE "-IC:/ProgramData/NVIDIA Corporation/NVIDIA GPU Computing SDK 4.0/C/common/inc"  -D__x86_64_ -O3
tmpxft_00001048_00000000-14_balance.ii
nvcc -c zero.cu -o zero.obj -m64 --ptxas-options=-v "-ccbin=C:/Program Files (x86)/Microsoft Visual Studio 10.0/VC/bin" -DWIN64  -Xcompiler /EHsc,/W3,/nologo,/Ox,/Oy,/GL -arch=sm_21 -DMERS_PACKAGE -DBIT_SIEVE -DTESTING_SMALL_EXPONENTS -DSIEVE_SIZE_IN_BYTES=32 -DNUM_SMALL_PRIMES=32768 -DDO_NOT_USE_LONG_DOUBLE "-IC:/ProgramData/NVIDIA Corporation/NVIDIA GPU Computing SDK 4.0/C/common/inc"  -D__x86_64_ -O3
tmpxft_00001244_00000000-14_zero.ii
link /nologo /LTCG  CUDALucas.obj setup.obj rw.obj balance.obj zero.obj "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v4.0/lib/x64/cudart.lib" "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v4.0/lib/x64/cufft.lib" /out:CUDALucas.exe
Generating code
Finished generating code
CUDALucas runs incorrectly:
Code:
F:\Eigene Dateien\Computing\cudalucas.1.4.1\win64\CUDA4.0\sm_21>CUDALucas.exe 216091
Iteration 10000 2.5 msec/Iter M( 216091 )C, 0xfffffffffffffffe, n = 524288, CUDALucas v1.41
Iteration 20000 2.8 msec/Iter M( 216091 )C, 0xfffffffffffffffe, n = 524288, CUDALucas v1.41
Iteration 30000 2.5 msec/Iter M( 216091 )C, 0xfffffffffffffffe, n = 524288, CUDALucas v1.41
Iteration 40000 2.6 msec/Iter M( 216091 )C, 0xfffffffffffffffd, n = 524288, CUDALucas v1.41
Iteration 50000 2.6 msec/Iter M( 216091 )C, 0xfffffffffffffffe, n = 524288, CUDALucas v1.41
Iteration 60000 2.6 msec/Iter M( 216091 )C, 0xfffffffffffffffd, n = 524288, CUDALucas v1.41
Iteration 70000 2.6 msec/Iter M( 216091 )C, 0xfffffffffffffffe, n = 524288, CUDALucas v1.41
Iteration 80000 2.5 msec/Iter M( 216091 )C, 0xfffffffffffffffe, n = 524288, CUDALucas v1.41
Iteration 90000 2.6 msec/Iter M( 216091 )C, 0xfffffffffffffffd, n = 524288, CUDALucas v1.41
Iteration 100000 2.6 msec/Iter M( 216091 )C, 0xfffffffffffffffe, n = 524288, CUDALucas v1.41
Iteration 110000 2.6 msec/Iter M( 216091 )C, 0xfffffffffffffffd, n = 524288, CUDALucas v1.41
Iteration 120000 2.5 msec/Iter M( 216091 )C, 0xfffffffffffffffe, n = 524288, CUDALucas v1.41
Iteration 130000 2.5 msec/Iter M( 216091 )C, 0xfffffffffffffffe, n = 524288, CUDALucas v1.41
Iteration 140000 2.5 msec/Iter M( 216091 )C, 0xfffffffffffffffd, n = 524288, CUDALucas v1.41
Iteration 150000 2.5 msec/Iter M( 216091 )C, 0xfffffffffffffffd, n = 524288, CUDALucas v1.41
Iteration 160000 2.6 msec/Iter M( 216091 )C, 0xfffffffffffffffe, n = 524288, CUDALucas v1.41
Iteration 170000 2.5 msec/Iter M( 216091 )C, 0xfffffffffffffffd, n = 524288, CUDALucas v1.41
Iteration 180000 2.5 msec/Iter M( 216091 )C, 0xfffffffffffffffd, n = 524288, CUDALucas v1.41
Iteration 190000 2.6 msec/Iter M( 216091 )C, 0xfffffffffffffffe, n = 524288, CUDALucas v1.41
Iteration 200000 2.6 msec/Iter M( 216091 )C, 0xfffffffffffffffd, n = 524288, CUDALucas v1.41
Iteration 210000 2.6 msec/Iter M( 216091 )C, 0xfffffffffffffffd, n = 524288, CUDALucas v1.41
M( 216091 )C, 0xfffffffffffffffd, n = 524288, CUDALucas v1.41
CUDALucas: Could not find a checkpoint file to resume from
Shader Model (-sm) had no influence. Tried sm_13 and sm_21.

Any ideas?
Attached Files
File Type: txt README.TXT (1.5 KB, 173 views)
Brain is offline   Reply With Quote
Old 2011-12-29, 18:16   #655
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default I've made fire

No comments please: My self-made makefile must have an error. I took apsen's one without modifications, pathes fit without changes. Now it seems to work:

GTX 560 Ti @ default clock:
Code:
F:\Eigene Dateien\Computing\cudalucas.1.4.1\win64\CUDA4.0\sm_13>CUDALucas.cuda4.0.sm_13.WIN64.exe 216091
Iteration 10000 2.4 msec/Iter M( 216091 )C, 0x30247786758b8792, n = 524288, CUDALucas v1.41
Iteration 20000 2.4 msec/Iter M( 216091 )C, 0x13e968bf40fda4d7, n = 524288, CUDALucas v1.41
Iteration 30000 2.4 msec/Iter M( 216091 )C, 0x540772c2abb7833a, n = 524288, CUDALucas v1.41
Iteration 40000 2.3 msec/Iter M( 216091 )C, 0xc26da9695ac418c1, n = 524288, CUDALucas v1.41
Iteration 50000 2.4 msec/Iter M( 216091 )C, 0x95ce3ff44abdd1e5, n = 524288, CUDALucas v1.41
Iteration 60000 2.5 msec/Iter M( 216091 )C, 0x99aa87c495daffe7, n = 524288, CUDALucas v1.41
Iteration 70000 2.4 msec/Iter M( 216091 )C, 0x505d249be3145893, n = 524288, CUDALucas v1.41
Iteration 80000 2.5 msec/Iter M( 216091 )C, 0xddf612c72037b8a1, n = 524288, CUDALucas v1.41
Iteration 90000 2.4 msec/Iter M( 216091 )C, 0xb5d8309a1ce9e2b6, n = 524288, CUDALucas v1.41
Iteration 100000 2.4 msec/Iter M( 216091 )C, 0x4de7f101ee1cb7a5, n = 524288, CUDALucas v1.41
Iteration 110000 2.5 msec/Iter M( 216091 )C, 0x10aa3286c0b03369, n = 524288, CUDALucas v1.41
Iteration 120000 2.4 msec/Iter M( 216091 )C, 0x3981b56788b529e2, n = 524288, CUDALucas v1.41
Iteration 130000 2.4 msec/Iter M( 216091 )C, 0x80438af231f8fccd, n = 524288, CUDALucas v1.41
Iteration 140000 2.4 msec/Iter M( 216091 )C, 0x669382faea06df89, n = 524288, CUDALucas v1.41
Iteration 150000 2.4 msec/Iter M( 216091 )C, 0x1b73cb121df7d6fa, n = 524288, CUDALucas v1.41
Iteration 160000 2.6 msec/Iter M( 216091 )C, 0xb391010f29c70ee1, n = 524288, CUDALucas v1.41
Iteration 170000 2.4 msec/Iter M( 216091 )C, 0x04055d84a77be1d8, n = 524288, CUDALucas v1.41
Iteration 180000 2.5 msec/Iter M( 216091 )C, 0xe3d74c104f02967d, n = 524288, CUDALucas v1.41
Iteration 190000 2.4 msec/Iter M( 216091 )C, 0x54b2a8b9cb149f9f, n = 524288, CUDALucas v1.41
Iteration 200000 2.4 msec/Iter M( 216091 )C, 0xf433496947b7b103, n = 524288, CUDALucas v1.41
Iteration 210000 2.4 msec/Iter M( 216091 )C, 0xcfe091c8f59f8a7b, n = 524288, CUDALucas v1.41
M( 216091 )P, n = 524288, CUDALucas v1.41
CUDALucas: Could not find a checkpoint file to resume from
I shouldn't think on my own...

Starting to test this compile...

P.S.: We should try the non-power-of-2 FFT sizes.
Attached Files
File Type: exe CUDALucas.cuda4.0.sm_13.WIN64.exe (181.5 KB, 90 views)
Brain is offline   Reply With Quote
Old 2011-12-29, 18:40   #656
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default Shader Model 2.1

And one compiled for Compute Capability 2.1 (sm_21) instead of 1.3. I don't know yet wheter this is relevant or not. File size differs by 3 KB and performance..:
Code:
F:\Eigene Dateien\Computing\cudalucas.1.4.1\win64\CUDA4.0\sm_21>CUDALucas.cuda4.0.sm_21.WIN64.exe 216091
Iteration 10000 2.4 msec/Iter M( 216091 )C, 0x30247786758b8792, n = 524288, CUDALucas v1.41
Iteration 20000 2.3 msec/Iter M( 216091 )C, 0x13e968bf40fda4d7, n = 524288, CUDALucas v1.41
Iteration 30000 2.5 msec/Iter M( 216091 )C, 0x540772c2abb7833a, n = 524288, CUDALucas v1.41
Iteration 40000 2.4 msec/Iter M( 216091 )C, 0xc26da9695ac418c1, n = 524288, CUDALucas v1.41
Iteration 50000 2.4 msec/Iter M( 216091 )C, 0x95ce3ff44abdd1e5, n = 524288, CUDALucas v1.41
Iteration 60000 2.4 msec/Iter M( 216091 )C, 0x99aa87c495daffe7, n = 524288, CUDALucas v1.41
Iteration 70000 2.5 msec/Iter M( 216091 )C, 0x505d249be3145893, n = 524288, CUDALucas v1.41
Iteration 80000 2.5 msec/Iter M( 216091 )C, 0xddf612c72037b8a1, n = 524288, CUDALucas v1.41
Iteration 90000 2.4 msec/Iter M( 216091 )C, 0xb5d8309a1ce9e2b6, n = 524288, CUDALucas v1.41
Iteration 100000 2.4 msec/Iter M( 216091 )C, 0x4de7f101ee1cb7a5, n = 524288, CUDALucas v1.41
Iteration 110000 2.6 msec/Iter M( 216091 )C, 0x10aa3286c0b03369, n = 524288, CUDALucas v1.41
Iteration 120000 2.5 msec/Iter M( 216091 )C, 0x3981b56788b529e2, n = 524288, CUDALucas v1.41
Iteration 130000 2.4 msec/Iter M( 216091 )C, 0x80438af231f8fccd, n = 524288, CUDALucas v1.41
Iteration 140000 2.4 msec/Iter M( 216091 )C, 0x669382faea06df89, n = 524288, CUDALucas v1.41
Iteration 150000 2.4 msec/Iter M( 216091 )C, 0x1b73cb121df7d6fa, n = 524288, CUDALucas v1.41
Iteration 160000 2.4 msec/Iter M( 216091 )C, 0xb391010f29c70ee1, n = 524288, CUDALucas v1.41
Iteration 170000 2.4 msec/Iter M( 216091 )C, 0x04055d84a77be1d8, n = 524288, CUDALucas v1.41
Iteration 180000 2.4 msec/Iter M( 216091 )C, 0xe3d74c104f02967d, n = 524288, CUDALucas v1.41
Iteration 190000 2.4 msec/Iter M( 216091 )C, 0x54b2a8b9cb149f9f, n = 524288, CUDALucas v1.41
Iteration 200000 2.5 msec/Iter M( 216091 )C, 0xf433496947b7b103, n = 524288, CUDALucas v1.41
Iteration 210000 2.3 msec/Iter M( 216091 )C, 0xcfe091c8f59f8a7b, n = 524288, CUDALucas v1.41
M( 216091 )P, n = 524288, CUDALucas v1.41
CUDALucas: Could not find a checkpoint file to resume from
GPU utilisation here for both 1.41 versions 89% to 90%.
Attached Files
File Type: exe CUDALucas.cuda4.0.sm_21.WIN64.exe (179.0 KB, 102 views)
Brain is offline   Reply With Quote
Old 2011-12-29, 22:32   #657
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

46316 Posts
Default

Quote:
Originally Posted by Brain View Post
And one compiled for Compute Capability 2.1 (sm_21) instead of 1.3. I don't know yet wheter this is relevant or not. File size differs by 3 KB and performance..:
.
.
.

GPU utilisation here for both 1.41 versions 89% to 90%.
Awesome! I'll let you know how it works. Thanks for compliling it!

EDIT: Test wokrs good. Command line changed from 1.2b. Don't need -t anymore and to specify GPU 1 or 2 I had to use -D00 or -D01. I'll compare results to my old runs in a bit.

Last fiddled with by flashjh on 2011-12-29 at 23:18
flashjh is offline   Reply With Quote
Old 2011-12-30, 00:09   #658
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

46316 Posts
Default

Quote:
Originally Posted by flashjh View Post
Awesome! I'll let you know how it works. Thanks for compliling it!

EDIT: Test wokrs good. Command line changed from 1.2b. Don't need -t anymore and to specify GPU 1 or 2 I had to use -D00 or -D01. I'll compare results to my old runs in a bit.
Ok, I did some more testing with different driver versions. The newest beta drivers need -D0 or -D1 for GPU affinity, but even then sometimes with the -D1 switch it still runs on GPU 1 --I don't know why?

4.11 build is slower than 1.2b for me. I had to install CUDA 4.0 to get the lastest .dll files. Times are below

1.2b with nVidia 290.53
Code:
C:\CUDA>cuda12b -t 216091
CUDALucas: Could not find a checkpoint file to resume from
Iteration 10000 M( 216091 )C, 0x30247786758b8792, n = 524288, CUDALucas v1.2b (0:13 real, 1.3126 ms/iter, ETA 4:22)
Iteration 20000 M( 216091 )C, 0x13e968bf40fda4d7, n = 524288, CUDALucas v1.2b (0:13 real, 1.2928 ms/iter, ETA 4:05)
Iteration 30000 M( 216091 )C, 0x540772c2abb7833a, n = 524288, CUDALucas v1.2b (0:13 real, 1.2923 ms/iter, ETA 3:52)
Iteration 40000 M( 216091 )C, 0xc26da9695ac418c1, n = 524288, CUDALucas v1.2b (0:13 real, 1.2925 ms/iter, ETA 3:39)
Iteration 50000 M( 216091 )C, 0x95ce3ff44abdd1e5, n = 524288, CUDALucas v1.2b (0:13 real, 1.2923 ms/iter, ETA 3:26)
Iteration 60000 M( 216091 )C, 0x99aa87c495daffe7, n = 524288, CUDALucas v1.2b (0:13 real, 1.2924 ms/iter, ETA 3:13)
Iteration 70000 M( 216091 )C, 0x505d249be3145893, n = 524288, CUDALucas v1.2b (0:13 real, 1.2924 ms/iter, ETA 3:00)
Iteration 80000 M( 216091 )C, 0xddf612c72037b8a1, n = 524288, CUDALucas v1.2b (0:13 real, 1.2920 ms/iter, ETA 2:47)
Iteration 90000 M( 216091 )C, 0xb5d8309a1ce9e2b6, n = 524288, CUDALucas v1.2b (0:13 real, 1.2923 ms/iter, ETA 2:35)
Iteration 100000 M( 216091 )C, 0x4de7f101ee1cb7a5, n = 524288, CUDALucas v1.2b (0:12 real, 1.2921 ms/iter, ETA 2:22)
Iteration 110000 M( 216091 )C, 0x10aa3286c0b03369, n = 524288, CUDALucas v1.2b (0:13 real, 1.2924 ms/iter, ETA 2:09)
Iteration 120000 M( 216091 )C, 0x3981b56788b529e2, n = 524288, CUDALucas v1.2b (0:13 real, 1.2924 ms/iter, ETA 1:56)
Iteration 130000 M( 216091 )C, 0x80438af231f8fccd, n = 524288, CUDALucas v1.2b (0:13 real, 1.2924 ms/iter, ETA 1:43)
Iteration 140000 M( 216091 )C, 0x669382faea06df89, n = 524288, CUDALucas v1.2b (0:13 real, 1.2925 ms/iter, ETA 1:30)
Iteration 150000 M( 216091 )C, 0x1b73cb121df7d6fa, n = 524288, CUDALucas v1.2b (0:13 real, 1.2923 ms/iter, ETA 1:17)
Iteration 160000 M( 216091 )C, 0xb391010f29c70ee1, n = 524288, CUDALucas v1.2b (0:13 real, 1.2952 ms/iter, ETA 1:04)
Iteration 170000 M( 216091 )C, 0x04055d84a77be1d8, n = 524288, CUDALucas v1.2b (0:13 real, 1.3023 ms/iter, ETA 0:52)
Iteration 180000 M( 216091 )C, 0xe3d74c104f02967d, n = 524288, CUDALucas v1.2b (0:13 real, 1.2959 ms/iter, ETA 0:38)
Iteration 190000 M( 216091 )C, 0x54b2a8b9cb149f9f, n = 524288, CUDALucas v1.2b (0:13 real, 1.2948 ms/iter, ETA 0:25)
Iteration 200000 M( 216091 )C, 0xf433496947b7b103, n = 524288, CUDALucas v1.2b (0:13 real, 1.2940 ms/iter, ETA 0:12)
Iteration 210000 M( 216091 )C, 0xcfe091c8f59f8a7b, n = 524288, CUDALucas v1.2b (0:13 real, 1.2941 ms/iter, ETA 0:00)
M( 216091 )P, n = 524288, CUDALucas v1.2b
no more input
1st run 4.11 with nVidia driver 285.62
Code:
 
C:\CUDA>cuda411 216091
Iteration 10000 1.5 msec/Iter M( 216091 )C, 0x30247786758b8792, n = 524288, CUDALucas v1.41
Iteration 20000 1.5 msec/Iter M( 216091 )C, 0x13e968bf40fda4d7, n = 524288, CUDALucas v1.41
Iteration 30000 1.5 msec/Iter M( 216091 )C, 0x540772c2abb7833a, n = 524288, CUDALucas v1.41
Iteration 40000 1.5 msec/Iter M( 216091 )C, 0xc26da9695ac418c1, n = 524288, CUDALucas v1.41
Iteration 50000 1.4 msec/Iter M( 216091 )C, 0x95ce3ff44abdd1e5, n = 524288, CUDALucas v1.41
Iteration 60000 1.5 msec/Iter M( 216091 )C, 0x99aa87c495daffe7, n = 524288, CUDALucas v1.41
Iteration 70000 1.5 msec/Iter M( 216091 )C, 0x505d249be3145893, n = 524288, CUDALucas v1.41
Iteration 80000 1.5 msec/Iter M( 216091 )C, 0xddf612c72037b8a1, n = 524288, CUDALucas v1.41
Iteration 90000 1.5 msec/Iter M( 216091 )C, 0xb5d8309a1ce9e2b6, n = 524288, CUDALucas v1.41
Iteration 100000 1.5 msec/Iter M( 216091 )C, 0x4de7f101ee1cb7a5, n = 524288, CUDALucas v1.41
Iteration 110000 1.5 msec/Iter M( 216091 )C, 0x10aa3286c0b03369, n = 524288, CUDALucas v1.41
Iteration 120000 1.4 msec/Iter M( 216091 )C, 0x3981b56788b529e2, n = 524288, CUDALucas v1.41
Iteration 130000 1.5 msec/Iter M( 216091 )C, 0x80438af231f8fccd, n = 524288, CUDALucas v1.41
Iteration 140000 1.5 msec/Iter M( 216091 )C, 0x669382faea06df89, n = 524288, CUDALucas v1.41
Iteration 150000 1.5 msec/Iter M( 216091 )C, 0x1b73cb121df7d6fa, n = 524288, CUDALucas v1.41
Iteration 160000 1.5 msec/Iter M( 216091 )C, 0xb391010f29c70ee1, n = 524288, CUDALucas v1.41
Iteration 170000 1.4 msec/Iter M( 216091 )C, 0x04055d84a77be1d8, n = 524288, CUDALucas v1.41
Iteration 180000 1.5 msec/Iter M( 216091 )C, 0xe3d74c104f02967d, n = 524288, CUDALucas v1.41
Iteration 190000 1.5 msec/Iter M( 216091 )C, 0x54b2a8b9cb149f9f, n = 524288, CUDALucas v1.41
Iteration 200000 1.5 msec/Iter M( 216091 )C, 0xf433496947b7b103, n = 524288, CUDALucas v1.41
Iteration 210000 1.4 msec/Iter M( 216091 )C, 0xcfe091c8f59f8a7b, n = 524288, CUDALucas v1.41
M( 216091 )P, n = 524288, CUDALucas v1.41
CUDALucas: Could not find a checkpoint file to resume from
2nd run 4.11 with nVidia drivers 290.53
Code:
 
C:\CUDA>cuda411 -D1 216091
Iteration 10000 1.5 msec/Iter M( 216091 )C, 0x30247786758b8792, n = 524288, CUDALucas v1.41
Iteration 20000 1.4 msec/Iter M( 216091 )C, 0x13e968bf40fda4d7, n = 524288, CUDALucas v1.41
Iteration 30000 1.4 msec/Iter M( 216091 )C, 0x540772c2abb7833a, n = 524288, CUDALucas v1.41
Iteration 40000 1.4 msec/Iter M( 216091 )C, 0xc26da9695ac418c1, n = 524288, CUDALucas v1.41
Iteration 50000 1.5 msec/Iter M( 216091 )C, 0x95ce3ff44abdd1e5, n = 524288, CUDALucas v1.41
Iteration 60000 1.4 msec/Iter M( 216091 )C, 0x99aa87c495daffe7, n = 524288, CUDALucas v1.41
Iteration 70000 1.4 msec/Iter M( 216091 )C, 0x505d249be3145893, n = 524288, CUDALucas v1.41
Iteration 80000 1.4 msec/Iter M( 216091 )C, 0xddf612c72037b8a1, n = 524288, CUDALucas v1.41
Iteration 90000 1.5 msec/Iter M( 216091 )C, 0xb5d8309a1ce9e2b6, n = 524288, CUDALucas v1.41
Iteration 100000 1.4 msec/Iter M( 216091 )C, 0x4de7f101ee1cb7a5, n = 524288, CUDALucas v1.41
Iteration 110000 1.4 msec/Iter M( 216091 )C, 0x10aa3286c0b03369, n = 524288, CUDALucas v1.41
Iteration 120000 1.5 msec/Iter M( 216091 )C, 0x3981b56788b529e2, n = 524288, CUDALucas v1.41
Iteration 130000 1.4 msec/Iter M( 216091 )C, 0x80438af231f8fccd, n = 524288, CUDALucas v1.41
Iteration 140000 1.4 msec/Iter M( 216091 )C, 0x669382faea06df89, n = 524288, CUDALucas v1.41
Iteration 150000 1.4 msec/Iter M( 216091 )C, 0x1b73cb121df7d6fa, n = 524288, CUDALucas v1.41
Iteration 160000 1.5 msec/Iter M( 216091 )C, 0xb391010f29c70ee1, n = 524288, CUDALucas v1.41
Iteration 170000 1.4 msec/Iter M( 216091 )C, 0x04055d84a77be1d8, n = 524288, CUDALucas v1.41
Iteration 180000 1.4 msec/Iter M( 216091 )C, 0xe3d74c104f02967d, n = 524288, CUDALucas v1.41
Iteration 190000 1.4 msec/Iter M( 216091 )C, 0x54b2a8b9cb149f9f, n = 524288, CUDALucas v1.41
Iteration 200000 1.5 msec/Iter M( 216091 )C, 0xf433496947b7b103, n = 524288, CUDALucas v1.41
Iteration 210000 1.4 msec/Iter M( 216091 )C, 0xcfe091c8f59f8a7b, n = 524288, CUDALucas v1.41
M( 216091 )P, n = 524288, CUDALucas v1.41
CUDALucas: Could not find a checkpoint file to resume from
Anyone know why the newer one is slower or something I can do to make it faster? Anyone know how to get the ETA back in 4.11?

Last fiddled with by flashjh on 2011-12-30 at 00:12
flashjh is offline   Reply With Quote
Old 2011-12-30, 13:47   #659
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

5138 Posts
Default No milk today

Quote:
Originally Posted by flashjh View Post
Anyone know why the newer one is slower or something I can do to make it faster? Anyone know how to get the ETA back in 4.11?
I would be willing to compile CUDALucas 1.41 for CUDA 3.2 as I guess this is the reason for the slight performance drop. Installed the whole CUDA 3.2 stuff, guess what:
Code:
F:\Eigene Dateien\Computing\CUDALucas\cudalucas.1.4.1\src>make
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2/bin/nvcc" -c CUDALucas.
cu -o CUDALucas.cuda3.2.sm_13.WIN64.obj -m64 --ptxas-options=-v "-ccbin=C:\Progr
am Files (x86)\Microsoft Visual Studio 10.0\VC\/bin" -DWIN64  -Xcompiler /EHsc,/
W3,/nologo,/Ox,/Oy,/GL -arch=sm_13 -DMERS_PACKAGE -DBIT_SIEVE -DTESTING_SMALL_EX
PONENTS -DSIEVE_SIZE_IN_BYTES=32 -DNUM_SMALL_PRIMES=32768 -DDO_NOT_USE_LONG_DOUB
LE  "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2/include" "-IC:\Pr
ogram Files\NVIDIA GPU Computing Toolkit\CUDA\v3.2/include/cudart" "-IC:/Program
Data/NVIDIA Corporation/NVIDIA GPU Computing SDK 3.2/C/common/inc"  -D__x86_64__
 -O3
nvcc fatal   : nvcc cannot find a supported cl version. Only MSVC 8.0 and MSVC 9
.0 are supported
make: *** [CUDALucas.cuda3.2.sm_13.WIN64.obj] Fehler -1
I need another MS Visual Studio installation (2005 or 2008). I won't do that as I am "saturated"... By the way, CUDA 3.2 is "retro style". ;-)

Maybe one of our precious CUDA experts (TheJudger...) has an opinion...?

There were tries to speed CUDALucas up. User Ethan (EO) eliminated one CudaMemCpy() call but that made no great difference. For me, it only became laggy.
I cannot run the CUDA profiler (yet) but I'm wondering about the comparetively weak utilisation of 90%...
Brain is offline   Reply With Quote
Old 2011-12-30, 14:18   #660
Brain
 
Brain's Avatar
 
Dec 2009
Peine, Germany

331 Posts
Default Tests with non-power-of-2-FFTs

Code:
F:\Eigene Dateien\Computing\CUDALucas\cudalucas.1.4.1\win64\CUDA4.0\sm_21>CUDALucas.cuda4.0.sm_21.WIN64.exe -c10000 44128291
err = 0.363031, increasing n from 2359296
Iteration 10000 11.6 msec/Iter M( 44128291 )C, 0xed435ed5c3d2f2d9, n = 2621440, CUDALucas v1.41
Iteration 20000 11.1 msec/Iter M( 44128291 )C, 0xa13d48abcc4f92bc, n = 2621440, CUDALucas v1.41
Iteration 30000 10.9 msec/Iter M( 44128291 )C, 0xa3d71bee2f8c9ebd, n = 2621440, CUDALucas v1.41
Iteration 40000 10.8 msec/Iter M( 44128291 )C, 0x8c175ce356b74bf2, n = 2621440, CUDALucas v1.41
Seems to run (length 2621440 = 2560K) but...

... it now takes one full core of my Core i5-750 (task demands ~20% of my 4 core CPU). Further tests show that this only seems to happen if the "-c" switch is given in command line. *relived*

Code:
F:\Eigene Dateien\Computing\CUDALucas\cudalucas.1.4.1\win64\CUDA4.0\sm_21>CUDALucas.cuda4.0.sm_21.WIN64.exe -c10000 3750
0083
CUDALucas: Could not find a checkpoint file to resume from

F:\Eigene Dateien\Computing\CUDALucas\cudalucas.1.4.1\win64\CUDA4.0\sm_21>CUDALucas.cuda4.0.sm_21.WIN64.exe 37500083
CUDALucas: Resuming from checkpoint file c37500083
caso 2
Iteration 10000 0.3 msec/Iter M( 37500083 )C, 0x1c79920a5816ec18, n = 2097152, CUDALucas v1.41
Iteration 20000 8.3 msec/Iter M( 37500083 )C, 0xd14e10268cf86636, n = 2097152, CUDALucas v1.41

F:\Eigene Dateien\Computing\CUDALucas\cudalucas.1.4.1\win64\CUDA4.0\sm_21>CUDALucas.cuda4.0.sm_21.WIN64.exe 37500083
CUDALucas: Resuming from checkpoint file c37500083
caso 2
Iteration 30000 2.4 msec/Iter M( 37500083 )C, 0x4a2607f9f4ed9248, n = 2097152, CUDALucas v1.41
Iteration 40000 8.4 msec/Iter M( 37500083 )C, 0x914ca98d0383db54, n = 2097152, CUDALucas v1.41
Iteration 50000 8.3 msec/Iter M( 37500083 )C, 0xc9bdc23854802d33, n = 2097152, CUDALucas v1.41
As CUDALucas does auto-resume from checkpoint files we should recommend not using "-c" any more, do we? I will have to update the GPU guide in the new year...

By the way, the iteration times are so low as I didn't do complete 10000 runs. Kind of a bug.

Last but not least, utilisation for state-of-the-art expos (40M range) is 97% as before. Low utilisation is understandable for small FFT sizes...

Last fiddled with by Brain on 2011-12-30 at 14:22 Reason: Last but not least, utilisation
Brain is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 23:30.


Fri Aug 6 23:30:36 UTC 2021 up 14 days, 17:59, 1 user, load averages: 3.80, 3.85, 3.94

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.