mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

owftheevil 2013-05-13 19:48

Hi Oliver,

No problem, as soon as I get home from work.

Carl

TheJudger 2013-05-13 21:09

Hi Carl,

I like your memtest. Today I had on K20 (ECC enabled) where your memtest indicated (lots of) errors and nvidia-smi didn't report any memory errors... By lowering the GPU clock rate (Tesla K20 do "GPU Boost" by hand, this is called "application clock") the errors disappeared.

Oliver

owftheevil 2013-05-13 21:17

Excellent, that's exactly the kind of thing I wanted it to be able to do.

owftheevil 2013-05-13 22:01

1 Attachment(s)
Here's version 0.13 with Olivers requested fflush(NULL) statements.

Oliver, thanks for showing me that.

chalsall 2013-05-13 23:04

[QUOTE=owftheevil;340301]Here's version 0.13 with Olivers requested fflush(NULL) statements.[/QUOTE]

I'm a Droid compared to Oliver, but some additional suggestions...

At the start of the run, print out the environment within which the code finds itself running.

On each output line, include the empirical data available. Like, memory used, temperature, etc.

Not to be negative -- you're doing a [I]great[/I] job!

[CODE][chalsall@hobbit memtest013]$ tar -xzvf memtest-0.13.tar.gz
readme
memtest.cu
Makefile
cuda_safecalls.h
[chalsall@hobbit memtest013]$ vi readme
[chalsall@hobbit memtest013]$ make
/usr/local/cuda/bin/nvcc -O3 --generate-code arch=compute_13,code=sm_13 --generate-code arch=compute_20,code=sm_20 --generate-code arch=compute_35,code=sm_35 --compiler-options=-Wall -I/usr/local/cuda/include -c memtest.cu
gcc memtest.o -O3 -Wall -fPIC -L/usr/local/cuda/lib64 -lcufft -lcudart -lm -o memtest
[chalsall@hobbit memtest013]$ ./memtest 39 1000 1 | tee 201305131841.txt

Initializing test using 975MiB of memory on device 1

memtest.cu(207) : cudaSafeCall() Runtime API error 10: invalid device ordinal.
[chalsall@hobbit memtest013]$ ./memtest 39 1000 0 | tee 201305131841.txt

Initializing test using 975MiB of memory on device 0

Beginning test.

Position 0, Iteration 1000, Errors: 0, completed 2.56%
...
Position 38, Iteration 1000, Errors: 0, completed 100.00%
[chalsall@hobbit memtest013]$ ./memtest 70 10000 0 | tee 201305131842.txt

Initializing test using 1750MiB of memory on device 0

Beginning test.

Position 0, Iteration 10000, Errors: 0, completed 1.43%
...
Position 69, Iteration 10000, Errors: 0, completed 100.00%
[chalsall@hobbit memtest013]$ ./memtest 74 100000 0 | tee 201305131857.txt

Initializing test using 1850MiB of memory on device 0

Beginning test.

....[/CODE]

[CODE]Mon May 13 19:02:13 2013
+------------------------------------------------------+
| NVIDIA-SMI 4.313.30 Driver Version: 313.30 |
|-------------------------------+----------------------+----------------------+
| GPU Name | Bus-Id Disp. | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 560 | 0000:03:00.0 N/A | N/A |
| 67% 82C N/A N/A / N/A | 95% 1947MB / 2047MB | N/A Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+[/CODE]

chalsall 2013-05-14 00:58

On my demonstrated to be [I]just[/I] unstable card...

[CODE][chalsall@hobbit memtest013]$ ./memtest 74 100000 0 | tee 201305131857.txt
Initializing test using 1850MiB of memory on device 0

Beginning test.

Position 0, Iteration 10000, Errors: 0, completed 0.14%
Position 0, Iteration 20000, Errors: 0, completed 0.27%
...
Position 57, Iteration 70000, Errors: 0, completed 77.97%
Position 57, Iteration 80000, Errors: 0, completed 78.11%
Position 57, Iteration 90000, Errors: 1, completed 78.24%
Position 57, Iteration 100000, Errors: 1, completed 78.38%
...
Position 65, Iteration 40000, Errors: 1, completed 88.38%
Position 65, Iteration 50000, Errors: 1, completed 88.51%
Position 65, Iteration 60000, Errors: 2, completed 88.65%
Position 65, Iteration 70000, Errors: 2, completed 88.78%
...
Position 73, Iteration 90000, Errors: 2, completed 99.86%
Position 73, Iteration 100000, Errors: 2, completed 100.00%
[/CODE]

Manpowre 2013-05-14 01:22

CudaLucas with HyperQ enabled code
 
Good news.. after 7 hours tonight, I finally made HyperQ code with 32 simultaniously threads run fine here with my Titan..

I am using a number 31670941 as test number and normal cudalucas with my titan card runs this with an ETA of 21h 22m.

With 32 HyperQ threads, I actually manage to get ETA of 12h 44m. Thats alot.. almost half the time.. there is some overhead here which I expected, but still. its tremendously powerfull.

I still have some work to do, there are some iterations which needs to be checked, and I also have to run the code with all known mersenne primes to check that this version of cudalucas actually iterates through and gives me a prime on each and everyone.

The good thing is that I got the hyperQ code to run and the first test results came out..

owftheevil 2013-05-14 01:30

There's a quicker way to see if you are getting correct results. Run CuLu with the -r option. This goes through 10000 iterations of many of the known primes and compares the result to known residues.

owftheevil 2013-05-14 01:34

Thanks chalsall. Suggestions for improvement are always welcome.

Manpowre 2013-05-14 01:39

Titan HyperQ
 
1 HyperQ thread = 22h 27m = the same as cualucas normal mode
4 HyperQ threads = 10h 40m
8 HyperQ threads = 8h 12m
16 HyperQ threads = 7h 22m
24 HyperQ threads = 21h 28m
32 HyperQ threads = 13h

Very interesting to see that 4,8,16 threads scale well on the Titan with the CudaLucas code.

Im going to run the code now on a mersenne prime which doesnt take more than a few hours to go through, then Ill report back tomorrow..

Manpowre 2013-05-14 01:42

[QUOTE=owftheevil;340330]There's a quicker way to see if you are getting correct results. Run CuLu with the -r option. This goes through 10000 iterations of many of the known primes and compares the result to known residues.[/QUOTE]

haha, I got a nice access violation with the -r.. beatifull.. as I expected, there are iterations here which isnt safe to run for a longer time.. I already can see 2 places where there are IF checks where the iterations are not in sync with the numer of HyperQ threads being spawned.

But, in progress. I will continue with this tomorrow.. off to bed :)

Thanks for the great tip with -r..


All times are UTC. The time now is 23:13.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.