mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

Brain 2012-01-03 21:35

[QUOTE=f11ksx;284236]V1.41 is running pretty well just like v1.42 :smile:

9.3 ms/iter for 54M exponent on GTX-580 card with v1.41
8.6 ms/iter with v1.42

Domo[/QUOTE]
8.6ms still correct? Was the 580 at clock speed?

Brain 2012-01-05 19:55

First testing results
 
Two first results:
1. Found a checkpoint failure:
[CODE]F:\Eigene Dateien\Computing\CUDALucas\cudalucas.1.4.2\win64\CUDA4.0\sm_21>CUDALucas.cuda4.0.sm_21.WIN64.exe -c1000 -t 72000503
err = 0.369665, increasing n from 3932160
Iteration 10000 18.2 msec/Iter M( 72000503 )C, 0xc4177e6840e2ebe5, n = 4718592, CUDALucas v1.4.2
CUDALucas: Could not find a checkpoint file to resume from

F:\Eigene Dateien\Computing\CUDALucas\cudalucas.1.4.2\win64\CUDA4.0\sm_21>CUDALucas.cuda4.0.sm_21.WIN64.exe -c10000 -t 72000503
CUDALucas: Resuming from checkpoint file c72000503
something wrong; error message, if any, already printed
[COLOR=Red]CUDALucas: inconsistent binary checkpoint; bad FFT run length of 4718592[/COLOR]
0:00 real[/CODE]2. Testing iteration times with M(72000503):
a) CUDALucas 1.2 with CUDA 3.2 --> 4096K FFT--> 13.4ms per Iter
b) CUDALucas 1.4.2 with CUDA 4.0 --> 4608K FFT--> 17.9ms per Iter
If I downscale b) down to 4096K FFT--> 15.9ms per Iter

13.4 ms --> 15.9ms makes me conclude that CUDALucas 1.4.2 is 19% slower than 1.2. :sad:

Any thoughts on that? Why does CL 1.4.2 choose a longer FFT?

flashjh 2012-01-06 00:08

With 1.4.2 is there any way to get ETA?

Jerry

kdgehman 2012-01-06 01:16

1 Attachment(s)
I set up a Visual Studio 2008 / CUDA 3.2 environment and compiled v1.4.2.

Timings for M36200419 on a GTX570:
CUDALucas 1.4.2 with CUDA 4.0 --> 5.35 ms per Iter
CUDALucas 1.4.2 with CUDA 3.2 --> 5.16 ms per Iter
CUDALucas 1.2 with CUDA 3.2 --> 4.50 ms per Iter

LaurV 2012-01-06 02:57

Tried one of the binaries attached here against v1.3(beta) 4.0, cc 2.0 (which I am currently using). Experiencing the same as the people before said: reduced performance from 15% to 20% on v1.4.2 :no:
Therefore I am staying on v1.3 for now.

apsen 2012-01-06 18:43

What's new in 1.4.2
 
I haven't been watching this thread for a while and I see a lot has happened.

I've noticed that 1.4.2 should support non-power of 2 FFT. What else is different from 1.2?

Thanks,
Andriy

PS Could we get rid of K&R style prototypes in the code to make it more readable?

Brain 2012-01-06 21:01

Status report
 
msft coded:
1.3 with ms per Iter output and
1.4 with non-power-of-2-ffts
1.4.1 minor bugfix
1.4.2 cpu issue & some speedup

A 1.3 was also code by ethan (eo) some time ago.

I think msft resumed from his 1.2 but replaced "caso" prints with more speaking texts.

[B] Main problem is performance slowdown. [/B]We need some GPU debugging.

I don't think msft eliminated the CUDA safe call that ethan eliminated but this is okay for me as graphics became laggy for me.

msft was very nice to support with 1.4 problems....

msft 2012-01-07 04:57

Hi ,
I can't debug rw.cu.
I think rewrite check point routine.

msft 2012-01-07 11:03

1 Attachment(s)
Ver 1.44
Fixed checkpoint issue.
Ver 1.44:
[code]
Iteration 10000 6.7 msec/Iter M( 10000000 )C, 0x55318a84ffd14bc7, n = 1048576, CUDALucas v1.44
Iteration 10000 10.7 msec/Iter M( 20000000 )C, 0xb6475f8cb0888740, n = 1572864, CUDALucas v1.44
Iteration 10000 13.8 msec/Iter M( 30000000 )C, 0xbf70feed29774eba, n = 2097152, CUDALucas v1.44
Iteration 10000 17.6 msec/Iter M( 40000000 )C, 0x2318fe9e59886055, n = 2621440, CUDALucas v1.44
Iteration 10000 21.3 msec/Iter M( 50000000 )C, 0x80dabfda58bb63db, n = 3145728, CUDALucas v1.44
Iteration 10000 24.6 msec/Iter M( 60000000 )C, 0x8db3527512f3559b, n = 3670016, CUDALucas v1.44
Iteration 10000 28 msec/Iter M( 70000000 )C, 0x652d4a670f44317e, n = 4194304, CUDALucas v1.44
Iteration 10000 31.1 msec/Iter M( 80000000 )C, 0xa2dfe07c9f24275d, n = 4718592, CUDALucas v1.44
Iteration 10000 35.5 msec/Iter M( 90000000 )C, 0xf0703f404c4eb47a, n = 5242880, CUDALucas v1.44
[/code]
Ver 1.3:
[code]
Iteration 10000 5.9 msec/Iter M( 10000000 )C, 0x55318a84ffd14bc7, n = 1048576, CUDALucas v1.3
Iteration 10000 11.6 msec/Iter M( 30000000 )C, 0xbf70feed29774eba, n = 2097152, CUDALucas v1.3
Iteration 10000 25 msec/Iter M( 70000000 )C, 0x652d4a670f44317e, n = 4194304, CUDALucas v1.3
Iteration 10000 50.4 msec/Iter M( 90000000 )C, 0xf0703f404c4eb47a, n = 8388608, CUDALucas v1.3
[/code]

msft 2012-01-07 12:30

1 Attachment(s)
Ver 1.45
Fixed performance issue.
[code]
Iteration 10000 6 msec/Iter M( 10000000 )C, 0x55318a84ffd14bc7, n = 1048576, CUDALucas v1.45
Iteration 10000 11.8 msec/Iter M( 30000000 )C, 0xbf70feed29774eba, n = 2097152, CUDALucas v1.45
Iteration 10000 25.2 msec/Iter M( 70000000 )C, 0x652d4a670f44317e, n = 4194304, CUDALucas v1.45
[/code]

Brain 2012-01-07 13:08

Win64 SM 13 compile
 
1 Attachment(s)
1.45 Win64 SM 1.3 compile, untested.


All times are UTC. The time now is 23:07.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.