![]() |
Prime95 vs. CUDALucas
1 Attachment(s)
This is in regard to the following assignment:
[QUOTE]Test=A5992806202E1212029B4F9445CE945D,79437629,75,1[/QUOTE]I was thinking Culu would run much faster than Prime95. In the attached image showing a comparison between the two running the same test. Culu runs only 13% faster than Prime95. Does anyone have any ideas as to why this is happening? Edit: I am using an nVidia GTX-750Ti and CUDA 8. |
And what processor are you comparing it against?
Modern graphics cards do not have exceptionally good double-precision performance; a GTX1080 is 256 gigaflops peak, which is the same peak as a quad-core 4GHz Haswell. The GTX750Ti is about 40 gigaflops peak, so slower than a single core of a 4GHz Haswell. |
The last Nvidia cards with "good" double precision performance was like GTX 580/590 and then the original Titan from 2013 and Titan Black / Titan Z from 2014 in the 700 series.
By "good" I mean 1/3rd of its single precision performance. All consumer cards since has DP performance of 1/24th or 1/32th of its SP performance. [url]http://www.mersenne.ca/cudalucas.php[/url] Maybe you should use the SP performance for factoring with mfaktc instead. Your 750Ti has 1306 GFLOPs SP and 40.8 GFLOPs DP: [url]https://en.wikipedia.org/wiki/GeForce_700_series[/url] |
[QUOTE=fivemack;447949]And what processor are you comparing it against?[/QUOTE]
i5-3570 @ 3.4 GHz. [QUOTE=ATH]Maybe you should use the SP performance for factoring with mfaktc instead. Your 750Ti has 1306 GFLOPs SP and 40.8 GFLOPs DP:[/QUOTE] So, you are saying there are two different ways to run [I]Culu[/I] and [I]mfaktc[/I]? I am not familiar with SP and DP. How do I do this? |
[QUOTE=storm5510;447979]So, you are saying there are two different ways to run [I]Culu[/I] and [I]mfaktc[/I]? I am not familiar with SP and DP. How do I do this?[/QUOTE]
SP is single-precision (32 bits), DP is double-precision (64 bits). Your GPU can do single-precision operations 32 times faster than it can do double-precision operations, so you'd be better off doing work that requires only single-precision. (If SP was, say, only 4 times faster than DP you'd be better off doing DP work.) I [i]think[/i] that ATH was suggesting that you use mfaktc instead of Culu, rather than switching modes of one or the other. |
Yes, CUDALucas requires double precision, and it is therefore slow because it is running only 1/32 of your cards single precision performance.
It would probably be more beneficial for GIMPS and for the amount of GHz-days accumulating on your account (if you care about that) if you do factoring on the card with mfaktc (single precision) instead of LL tests with CUDALucas (double precision). |
[QUOTE=ATH;447984]
It would probably be more beneficial for GIMPS and for the amount of GHz-days accumulating on your account (if you care about that) if you do factoring on the card with mfaktc (single precision)...[/QUOTE] This is primarily what I have been doing. I wanted to see how [I]CUDALucas[/I] would perform on this hardware. Obviously, not as good as others. Case closed. |
[QUOTE=ATH;447984]Yes, CUDALucas requires double precision, and it is therefore slow because it is running only 1/32 of your cards single precision performance.
It would probably be more beneficial for GIMPS and for the amount of GHz-days accumulating on your account (if you care about that) if you do factoring on the card with mfaktc (single precision) instead of LL tests with CUDALucas (double precision).[/QUOTE] If we could get CUDALucas to work with single precision the performance would be 32 times higher! |
[QUOTE=Magellan3s;604133]If we could get CUDALucas to work with single precision the performance would be 32 times higher![/QUOTE]
No, because doing single-precision FFT would require many many more operations per iteration to keep error levels low enough for the computation to be correct. It's not impossible, just less efficient. |
[QUOTE=VBCurtis;604146]No, because doing single-precision FFT would require many many more operations per iteration to keep error levels low enough for the computation to be correct.
It's not impossible, just less efficient.[/QUOTE] Ah, how many more times faster would it be though? |
Per iteration, slower. That's what I mean by "less efficient". Otherwise it would have been implemented by now.
|
| All times are UTC. The time now is 14:07. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.