![]() |
Triple post? 55 mins later?
|
:razz:
Okay, I'm curious now. [url]http://andrewthall.org/papers/gpuMersenne2011MKII.pdf[/url] The research paper concerning gpuLucas. I read it, and found out that the massive 2x performance gains only comes at exponents in-between the power-of-2 FFT lengths. CUDALucas had roughly the same performance at equal FFT lengths. Now that non-p-o-2 FFT lengths HAVE been implemented in CUDALucas (albeit at an apparent 5% or so penalty) now how do the programs compare? There is a list of timings near in the paper. Anyone have a 480? (Or a Tesla 2050?) |
1 Attachment(s)
[QUOTE=Dubslow;286000]:razz:
Okay, I'm curious now. [URL]http://andrewthall.org/papers/gpuMersenne2011MKII.pdf[/URL] The research paper concerning gpuLucas. I read it, and found out that the massive 2x performance gains only comes at exponents in-between the power-of-2 FFT lengths. CUDALucas had roughly the same performance at equal FFT lengths. Now that non-p-o-2 FFT lengths HAVE been implemented in CUDALucas (albeit at an apparent 5% or so penalty) now how do the programs compare? There is a list of timings near in the paper. Anyone have a 480? (Or a Tesla 2050?)[/QUOTE] See timing comparison from the gpuLucas paper. I confirm 5% penalty. I liked to know why... It seems like CUDA 4.0 compared to 3.2 is the reason..? |
Yes, I know, in fact I mentioned that in my post. My question is, how do the other FFT sizes compare now that there are non-p-o-2 FFT lengths? That's where gpuLucas claimed its advantage on something like M46, only the FFT sizes. So how do the others compare then?
|
Hi ,
Ver 1.46 use 100% CPU TIME on Linux. Someone can try on Win? |
[QUOTE=msft;286124]Hi ,
Ver 1.46 use 100% CPU TIME on Linux. Someone can try on Win?[/QUOTE] I've tested two double-checks from GPU-to-72 so far on 1.46. Double-checks were good and the CPU usage is almost nothing on Win7 64. Using the "1.46 Win64 SM 2.1 compile, untested." Jerry |
Hi ,flashjh
[QUOTE=flashjh;286131]I've tested two double-checks from GPU-to-72 so far on 1.46. Double-checks were good and the CPU usage is almost nothing on Win7 64. Using the "1.46 Win64 SM 2.1 compile, untested." Jerry[/QUOTE] Than you report. |
1 Attachment(s)
Ver 1.48
Fixed Resuming and increasing issue. [code] Iteration 2390000 20.3 msec/Iter M( 57794413 )C, 0x5bbf93c5ffdc08d6, n = 3145728, CUDALucas v1.46 Iteration 2400000 20.4 msec/Iter M( 57794413 )C, 0xdf8ebc9bb563c1b2, n = 3145728, CUDALucas v1.46 err = 0.370362, increasing n from 3145728 CUDALucas.cu(1513) : cudaSafeCall() Runtime API error : invalid argument. [/code] Fixed CPU TIME 100% with linux issue. |
1.48 Win64 SM 1.3 compile, untested.
1 Attachment(s)
1.48 Win64 SM 1.3 compile, untested.
|
1.48 Win64 SM 2.1 compile, untested.
1 Attachment(s)
1.48 Win64 SM 2.1 compile, untested.
|
[QUOTE=apsen;266882]BTW does anyone experience crashes with the program. It seem to finish the work fine but crashes after that so the results are apparently not affected but it is not nice to crash.[/QUOTE]
I haven't finsihed a test with 1.48, but I'm still having this proplem with 1.46. I use the -D01 command line switch to select my 2nd GPU. Anyone had any luck fixing the issue? |
| All times are UTC. The time now is 23:07. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.