mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Prime95 vs. CUDALucas (https://www.mersenneforum.org/showthread.php?t=21777)

storm5510 2016-11-28 08:26

Prime95 vs. CUDALucas
 
1 Attachment(s)
This is in regard to the following assignment:

[QUOTE]Test=A5992806202E1212029B4F9445CE945D,79437629,75,1[/QUOTE]I was thinking Culu would run much faster than Prime95. In the attached image showing a comparison between the two running the same test. Culu runs only 13% faster than Prime95. Does anyone have any ideas as to why this is happening?

Edit: I am using an nVidia GTX-750Ti and CUDA 8.

fivemack 2016-11-28 08:44

And what processor are you comparing it against?

Modern graphics cards do not have exceptionally good double-precision performance; a GTX1080 is 256 gigaflops peak, which is the same peak as a quad-core 4GHz Haswell. The GTX750Ti is about 40 gigaflops peak, so slower than a single core of a 4GHz Haswell.

ATH 2016-11-28 10:51

The last Nvidia cards with "good" double precision performance was like GTX 580/590 and then the original Titan from 2013 and Titan Black / Titan Z from 2014 in the 700 series.

By "good" I mean 1/3rd of its single precision performance. All consumer cards since has DP performance of 1/24th or 1/32th of its SP performance.

[url]http://www.mersenne.ca/cudalucas.php[/url]


Maybe you should use the SP performance for factoring with mfaktc instead.

Your 750Ti has 1306 GFLOPs SP and 40.8 GFLOPs DP:
[url]https://en.wikipedia.org/wiki/GeForce_700_series[/url]

storm5510 2016-11-28 16:54

[QUOTE=fivemack;447949]And what processor are you comparing it against?[/QUOTE]

i5-3570 @ 3.4 GHz.

[QUOTE=ATH]Maybe you should use the SP performance for factoring with mfaktc instead.

Your 750Ti has 1306 GFLOPs SP and 40.8 GFLOPs DP:[/QUOTE]

So, you are saying there are two different ways to run [I]Culu[/I] and [I]mfaktc[/I]? I am not familiar with SP and DP. How do I do this?

CRGreathouse 2016-11-28 17:16

[QUOTE=storm5510;447979]So, you are saying there are two different ways to run [I]Culu[/I] and [I]mfaktc[/I]? I am not familiar with SP and DP. How do I do this?[/QUOTE]

SP is single-precision (32 bits), DP is double-precision (64 bits). Your GPU can do single-precision operations 32 times faster than it can do double-precision operations, so you'd be better off doing work that requires only single-precision. (If SP was, say, only 4 times faster than DP you'd be better off doing DP work.)

I [i]think[/i] that ATH was suggesting that you use mfaktc instead of Culu, rather than switching modes of one or the other.

ATH 2016-11-28 17:28

Yes, CUDALucas requires double precision, and it is therefore slow because it is running only 1/32 of your cards single precision performance.

It would probably be more beneficial for GIMPS and for the amount of GHz-days accumulating on your account (if you care about that) if you do factoring on the card with mfaktc (single precision) instead of LL tests with CUDALucas (double precision).

storm5510 2016-11-28 17:36

[QUOTE=ATH;447984]
It would probably be more beneficial for GIMPS and for the amount of GHz-days accumulating on your account (if you care about that) if you do factoring on the card with mfaktc (single precision)...[/QUOTE]

This is primarily what I have been doing. I wanted to see how [I]CUDALucas[/I] would perform on this hardware. Obviously, not as good as others. Case closed.

Magellan3s 2022-04-17 16:21

[QUOTE=ATH;447984]Yes, CUDALucas requires double precision, and it is therefore slow because it is running only 1/32 of your cards single precision performance.

It would probably be more beneficial for GIMPS and for the amount of GHz-days accumulating on your account (if you care about that) if you do factoring on the card with mfaktc (single precision) instead of LL tests with CUDALucas (double precision).[/QUOTE]

If we could get CUDALucas to work with single precision the performance would be 32 times higher!

VBCurtis 2022-04-17 17:49

[QUOTE=Magellan3s;604133]If we could get CUDALucas to work with single precision the performance would be 32 times higher![/QUOTE]

No, because doing single-precision FFT would require many many more operations per iteration to keep error levels low enough for the computation to be correct.

It's not impossible, just less efficient.

Magellan3s 2022-04-17 21:08

[QUOTE=VBCurtis;604146]No, because doing single-precision FFT would require many many more operations per iteration to keep error levels low enough for the computation to be correct.

It's not impossible, just less efficient.[/QUOTE]

Ah, how many more times faster would it be though?

VBCurtis 2022-04-18 05:11

Per iteration, slower. That's what I mean by "less efficient". Otherwise it would have been implemented by now.


All times are UTC. The time now is 14:07.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.