mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW) (https://www.mersenneforum.org/showthread.php?t=12576)

owftheevil 2015-02-21 21:14

How much trouble would it be to compile the source locally? My build environment might be too much different than yours.

ET_ 2015-02-22 10:50

[QUOTE=owftheevil;396034]How much trouble would it be to compile the source locally? My build environment might be too much different than yours.[/QUOTE]

I've compiled other CUDA sources before, I suppose I can manage it with a sufficient makefile, thanks.

Luigi

diep 2015-02-22 12:10

Good Afternoon!

[url]http://www.anandtech.com/show/8069/nvidia-releases-geforce-gtx-titan-z[/url]

Ti 780 is 8x slower in double precision than Titan.
That's a hardware lobotomization. Of course bandwidth of the thing is great.

How comes in this table : [url]http://www.mersenne.ca/cudalucas.php[/url]
that the Geforce 780 looks fast at all and beats cards that with sureness have more double precision resources?

Was this a 780 where someone modified the chip and enabled the double precision resources, like they managed to modify it at tomshardware?

That is not the same sort of "geforce 780" you buy in the shop which is factor 8 slower there.
Is it fair to put it in the table like this?

axn 2015-02-22 12:31

I dont believe the 780/780 Ti numbers are from modified chips. CUDALucas computation is very sensitive to memory bandwidth. Titan has at least 6x DP FLOPS compared to a 580 and is only 2x as fast. Are you suggesting that even the 580 numbers are "modified"?

EDIT:- From the table here ([url]http://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#GeForce_900_Series[/url]), 980 Ti should be slightly faster than 780 Ti in CuLu

diep 2015-02-22 12:36

If it's just bandwidth dependant then that would explain it.

That's a junk FFT implementation of course.

A double precision floating point has 8 bytes.
So a Tesla K20x can deliver for example 1.31 Tflop double precision,
yet in terms of bandwidth you'll not even get 10% out of the card. Idemdito Titan.

And i am always worried if in benchmarks in the table only the single precision performance gets shown,
whereas FFT/DWT is a double precision excercise.

James Heinrich 2015-02-22 12:50

I haven't had any benchmarks submitted that deviate more than ~10% from expected according to my chart.

Of course, I've had very few benchmarks submitted. I would encourage anyone reading this thread to please submit new benchmarks using CUDALucas v2.05

ET_ 2015-02-22 14:36

[QUOTE=James Heinrich;396071]I haven't had any benchmarks submitted that deviate more than ~10% from expected according to my chart.

Of course, I've had very few benchmarks submitted. I would encourage anyone reading this thread to please submit new benchmarks using CUDALucas v2.05[/QUOTE]

I will send you mine on the GTX 980 as soon as I grab some code... :smile:

Luigi

axn 2015-02-22 16:00

[QUOTE=diep;396070]That's a junk FFT implementation of course. [/QUOTE]

FWIW, CuLu uses the cuFFT library provided by Nvidia itself for the FFT. The real problem, I think, is that Nvidia GPUs (pre maxwell) has very little L2 cache, and hence much more reliant on memory bandwidth.

In fact, we're seeing a similar thing on the CPU side as well - with the advent of AVX, we're seeing "large FFT" performance that scales almost perfectly with memory bandwidth. But thanks to the relatively large L3 caches, the effects are somewhat countered at lower FFT sizes.

tenethor 2015-02-25 20:02

V2.05.1 Fails all self tests
 
Hello all

Just got a new (to me) nvidia gtx 590 and am working on getting cudalucas running. After some fighting with drivers and toolkits I got it running but it fails every self test.

[CODE]Using threads: square 256, splice 128.
Starting self test M57885161 fft length = 3136K
Iteration 10000 / 57885161, 0x0000000000000000, 3136K, CUDALucas v2.05.1, error = 0.00000, real: 0:49, 4.9186 ms/iter
Expected residue [76c27556683cd84d] does not match actual residue [0000000000000000][/CODE]

Just an example. All twenty return the same residue of 0.

I did finally settle on nvidia driver 340.29 and CUDA toolkit 6.5

any suggestions?

Thanks

tenethor 2015-02-26 17:22

Well disregard that one. After a little more poking around I found in the make file where you have to set the compute version you want to compile for. Fixed that one and now we're cranking out LL tests.

flashjh 2015-02-26 18:06

I was going to post that it looked like you were missing the correct compute for your card.

Happy Hunting!


All times are UTC. The time now is 23:04.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.