mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

Dubslow 2012-01-11 05:55

Triple post? 55 mins later?

Dubslow 2012-01-11 23:50

:razz:


Okay, I'm curious now.
[url]http://andrewthall.org/papers/gpuMersenne2011MKII.pdf[/url]
The research paper concerning gpuLucas. I read it, and found out that the massive 2x performance gains only comes at exponents in-between the power-of-2 FFT lengths. CUDALucas had roughly the same performance at equal FFT lengths. Now that non-p-o-2 FFT lengths HAVE been implemented in CUDALucas (albeit at an apparent 5% or so penalty) now how do the programs compare? There is a list of timings near in the paper. Anyone have a 480? (Or a Tesla 2050?)

Brain 2012-01-12 17:38

1 Attachment(s)
[QUOTE=Dubslow;286000]:razz:


Okay, I'm curious now.
[URL]http://andrewthall.org/papers/gpuMersenne2011MKII.pdf[/URL]
The research paper concerning gpuLucas. I read it, and found out that the massive 2x performance gains only comes at exponents in-between the power-of-2 FFT lengths. CUDALucas had roughly the same performance at equal FFT lengths. Now that non-p-o-2 FFT lengths HAVE been implemented in CUDALucas (albeit at an apparent 5% or so penalty) now how do the programs compare? There is a list of timings near in the paper. Anyone have a 480? (Or a Tesla 2050?)[/QUOTE]
See timing comparison from the gpuLucas paper.

I confirm 5% penalty. I liked to know why... It seems like CUDA 4.0 compared to 3.2 is the reason..?

Dubslow 2012-01-12 21:00

Yes, I know, in fact I mentioned that in my post. My question is, how do the other FFT sizes compare now that there are non-p-o-2 FFT lengths? That's where gpuLucas claimed its advantage on something like M46, only the FFT sizes. So how do the others compare then?

msft 2012-01-13 05:28

Hi ,
Ver 1.46 use 100% CPU TIME on Linux.
Someone can try on Win?

flashjh 2012-01-13 06:03

[QUOTE=msft;286124]Hi ,
Ver 1.46 use 100% CPU TIME on Linux.
Someone can try on Win?[/QUOTE]

I've tested two double-checks from GPU-to-72 so far on 1.46. Double-checks were good and the CPU usage is almost nothing on Win7 64. Using the "1.46 Win64 SM 2.1 compile, untested."

Jerry

msft 2012-01-13 06:14

Hi ,flashjh
[QUOTE=flashjh;286131]I've tested two double-checks from GPU-to-72 so far on 1.46. Double-checks were good and the CPU usage is almost nothing on Win7 64. Using the "1.46 Win64 SM 2.1 compile, untested."

Jerry[/QUOTE]
Than you report.

msft 2012-01-14 03:41

1 Attachment(s)
Ver 1.48
Fixed Resuming and increasing issue.
[code]
Iteration 2390000 20.3 msec/Iter M( 57794413 )C, 0x5bbf93c5ffdc08d6, n = 3145728, CUDALucas v1.46
Iteration 2400000 20.4 msec/Iter M( 57794413 )C, 0xdf8ebc9bb563c1b2, n = 3145728, CUDALucas v1.46
err = 0.370362, increasing n from 3145728
CUDALucas.cu(1513) : cudaSafeCall() Runtime API error : invalid argument.
[/code]
Fixed CPU TIME 100% with linux issue.

Brain 2012-01-14 08:16

1.48 Win64 SM 1.3 compile, untested.
 
1 Attachment(s)
1.48 Win64 SM 1.3 compile, untested.

Brain 2012-01-14 08:18

1.48 Win64 SM 2.1 compile, untested.
 
1 Attachment(s)
1.48 Win64 SM 2.1 compile, untested.

flashjh 2012-01-14 14:34

[QUOTE=apsen;266882]BTW does anyone experience crashes with the program. It seem to finish the work fine but crashes after that so the results are apparently not affected but it is not nice to crash.[/QUOTE]

I haven't finsihed a test with 1.48, but I'm still having this proplem with 1.46. I use the -D01 command line switch to select my 2nd GPU.

Anyone had any luck fixing the issue?


All times are UTC. The time now is 23:07.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.