mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

Dubslow 2014-09-19 20:07

[QUOTE=TheJudger;383456]
Stock/reference GTX 980[/QUOTE]

Where did you get that? It was only launched today!

VictordeHolland 2014-09-19 20:54

[QUOTE=Mark Rose;383458]What exactly determines/effects mfaktc performance on a given GPU?[/QUOTE]
From what I can tell:
- Compute capability (higher is not always better)
- Number of CUDA cores
- Core/shader Clockspeed

Memory clock/bandwidth has little to no effect.

But I guess you already knew that and want a more specific/ architectural answer?

James Heinrich 2014-09-19 20:56

[QUOTE=TheJudger;383456]Seems we have new highscore for energy efficient trial factoring: Stock/reference GTX 980[/QUOTE]I just added the 980 to my [url=http://www.mersenne.ca/mfaktc.php]benchmark chart[/url] yesterday, but your numbers are exactly 20% higher than predicted by my lack-of-data (expected: 420.5 * (1215/1126) = 453GHd/d).
What Compute version does the 980 claim to be? (NVIDIA hasn't updated [url=https://developer.nvidia.com/cuda-gpus]their chart[/url] yet)
If you can, a benchmark submission would be most welcome:
[url]http://www.mersenne.ca/mfaktc.php#benchmark[/url]

Mark Rose 2014-09-19 21:01

I figured it was CUDA cores x core clock. What's worse about the higher compute capability/versions though? Do instructions on later version sometimes take more clock cycles? Do later compute versions allow anything to be done more efficiently?

James Heinrich 2014-09-19 22:08

I can't tell you what's better or worse about the different versions, but in terms of performance this is how many GFLOPS you need to get 1GHz-day/day of throughput (therefore lower is better):[code]NVIDIA:
1.x => 14.00 // horrible
2.0 => 3.65 // awesome
2.1 => 5.35 // pretty good
3.0 => 10.50 // not great
3.5 => 11.20 // getting worse

AMD:
VLIW5 => 11.3
VLIW4 => 10.5
GCN => 9.3[/code]So in terms of compute throughput NVIDIA seems to get worse with each revision (except, as noted above, the GTX 980 seems to have jumped 20% in the good direction from what I was expecting based on the previous generation). Which is why the relatively ancient GTX 580 (Compute 2.0 is still very competitive in terms of single-GPU throughput so many years later). AMD on the other hand seems to get more mfakto-efficient with each generation.

TheJudger 2014-09-19 23:52

[QUOTE=Mark Rose;383458]What exactly determines/effects mfaktc performance on a given GPU?[/QUOTE]

Integer intruction throughput (some pages ago in this thread).

[QUOTE=Dubslow;383459]Where did you get that? It was only launched today![/QUOTE]

It was a hard launch, just bought in local shop here.

[QUOTE=James Heinrich;383463]What Compute version does the 980 claim to be?[/QUOTE]
5.2

Oliver

kladner 2014-09-19 23:54

[QUOTE=Dubslow;383459]Where did you get that? It was only launched today![/QUOTE]

Long time no see! :smile:

Mark Rose 2014-09-20 00:05

[QUOTE=James Heinrich;383469]I can't tell you what's better or worse about the different versions, but in terms of performance this is how many GFLOPS you need to get 1GHz-day/day of throughput (therefore lower is better):[code]NVIDIA:
1.x => 14.00 // horrible
2.0 => 3.65 // awesome
2.1 => 5.35 // pretty good
3.0 => 10.50 // not great
3.5 => 11.20 // getting worse

AMD:
VLIW5 => 11.3
VLIW4 => 10.5
GCN => 9.3[/code][/quote]

Thanks for that table! I was curious what the factors were.

[quote]
So in terms of compute throughput NVIDIA seems to get worse with each revision (except, as noted above, the GTX 980 seems to have jumped 20% in the good direction from what I was expecting based on the previous generation). Which is why the relatively ancient GTX 580 (Compute 2.0 is still very competitive in terms of single-GPU throughput so many years later). AMD on the other hand seems to get more mfakto-efficient with each generation.[/QUOTE]

Over the last two months I bought a couple of used GTX 580's to contribute to the project ([url=http://www.gpu72.com/reports/worker_exact/ea39a75de82cd896610be22735054fc5/]see the bumps[/url]) because I saw they were so awesome. It seemed strange that 4 year old cards were still some of the best, but hey, cheap to acquire ($130 and $150). It also explains why the equally old GT 430's and GT 520's (both compute 2.1) I have crunching away are still worth bothering with (160 GHz-d/day total).

I'm really tempted to sell my GTX 760 in my home desktop (might get $150) and replace it with a GTX 980. The power requirements are basically the same and I wouldn't need to upgrade anything else. I find the GTX 760 struggles to keep up with 2560x1440 resolution in games.

ET_ 2014-09-20 11:30

I read that the 980 has 96KB of shared memory instead of 48K-64K of the previous versions.

I don't know if this would account for the augmented efficiency, as I suppose that mfaktc doesn't dynamically check for the shared memory presence/quantity.

BTW, anybody did benchmarks for CUDALucas/CUDAP-1? From what I see, there should still be the issue of 1/16 between doubles and floats, but maybe cc5.2 and the higher number of cores and clocks may show some interesting surprises... :smile:

Luigi

kracker 2014-09-20 14:19

[QUOTE=ET_;383527]I read that the 980 has 96KB of shared memory instead of 48K-64K of the previous versions.

I don't know if this would account for the augmented efficiency, as I suppose that mfaktc doesn't dynamically check for the shared memory presence/quantity.

BTW, anybody did benchmarks for CUDALucas/CUDAP-1? From what I see, there should still be the issue of 1/16 between doubles and floats, but maybe cc5.2 and the higher number of cores and clocks may show some interesting surprises... :smile:

Luigi[/QUOTE]

Maxwell has 1/32 DP.

Mark Rose 2014-09-20 23:36

[QUOTE=TheJudger;383476]Integer intruction throughput (some pages ago in this thread).
[/QUOTE]

So I spent six hours today reading the whole thread. It cleared up a lot.

From what I read it's possible to create a kernel that uses floating point instructions instead. Is it still worth investigating?


All times are UTC. The time now is 23:14.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.