![]() |
[QUOTE=TheJudger;383456]
Stock/reference GTX 980[/QUOTE] Where did you get that? It was only launched today! |
[QUOTE=Mark Rose;383458]What exactly determines/effects mfaktc performance on a given GPU?[/QUOTE]
From what I can tell: - Compute capability (higher is not always better) - Number of CUDA cores - Core/shader Clockspeed Memory clock/bandwidth has little to no effect. But I guess you already knew that and want a more specific/ architectural answer? |
[QUOTE=TheJudger;383456]Seems we have new highscore for energy efficient trial factoring: Stock/reference GTX 980[/QUOTE]I just added the 980 to my [url=http://www.mersenne.ca/mfaktc.php]benchmark chart[/url] yesterday, but your numbers are exactly 20% higher than predicted by my lack-of-data (expected: 420.5 * (1215/1126) = 453GHd/d).
What Compute version does the 980 claim to be? (NVIDIA hasn't updated [url=https://developer.nvidia.com/cuda-gpus]their chart[/url] yet) If you can, a benchmark submission would be most welcome: [url]http://www.mersenne.ca/mfaktc.php#benchmark[/url] |
I figured it was CUDA cores x core clock. What's worse about the higher compute capability/versions though? Do instructions on later version sometimes take more clock cycles? Do later compute versions allow anything to be done more efficiently?
|
I can't tell you what's better or worse about the different versions, but in terms of performance this is how many GFLOPS you need to get 1GHz-day/day of throughput (therefore lower is better):[code]NVIDIA:
1.x => 14.00 // horrible 2.0 => 3.65 // awesome 2.1 => 5.35 // pretty good 3.0 => 10.50 // not great 3.5 => 11.20 // getting worse AMD: VLIW5 => 11.3 VLIW4 => 10.5 GCN => 9.3[/code]So in terms of compute throughput NVIDIA seems to get worse with each revision (except, as noted above, the GTX 980 seems to have jumped 20% in the good direction from what I was expecting based on the previous generation). Which is why the relatively ancient GTX 580 (Compute 2.0 is still very competitive in terms of single-GPU throughput so many years later). AMD on the other hand seems to get more mfakto-efficient with each generation. |
[QUOTE=Mark Rose;383458]What exactly determines/effects mfaktc performance on a given GPU?[/QUOTE]
Integer intruction throughput (some pages ago in this thread). [QUOTE=Dubslow;383459]Where did you get that? It was only launched today![/QUOTE] It was a hard launch, just bought in local shop here. [QUOTE=James Heinrich;383463]What Compute version does the 980 claim to be?[/QUOTE] 5.2 Oliver |
[QUOTE=Dubslow;383459]Where did you get that? It was only launched today![/QUOTE]
Long time no see! :smile: |
[QUOTE=James Heinrich;383469]I can't tell you what's better or worse about the different versions, but in terms of performance this is how many GFLOPS you need to get 1GHz-day/day of throughput (therefore lower is better):[code]NVIDIA:
1.x => 14.00 // horrible 2.0 => 3.65 // awesome 2.1 => 5.35 // pretty good 3.0 => 10.50 // not great 3.5 => 11.20 // getting worse AMD: VLIW5 => 11.3 VLIW4 => 10.5 GCN => 9.3[/code][/quote] Thanks for that table! I was curious what the factors were. [quote] So in terms of compute throughput NVIDIA seems to get worse with each revision (except, as noted above, the GTX 980 seems to have jumped 20% in the good direction from what I was expecting based on the previous generation). Which is why the relatively ancient GTX 580 (Compute 2.0 is still very competitive in terms of single-GPU throughput so many years later). AMD on the other hand seems to get more mfakto-efficient with each generation.[/QUOTE] Over the last two months I bought a couple of used GTX 580's to contribute to the project ([url=http://www.gpu72.com/reports/worker_exact/ea39a75de82cd896610be22735054fc5/]see the bumps[/url]) because I saw they were so awesome. It seemed strange that 4 year old cards were still some of the best, but hey, cheap to acquire ($130 and $150). It also explains why the equally old GT 430's and GT 520's (both compute 2.1) I have crunching away are still worth bothering with (160 GHz-d/day total). I'm really tempted to sell my GTX 760 in my home desktop (might get $150) and replace it with a GTX 980. The power requirements are basically the same and I wouldn't need to upgrade anything else. I find the GTX 760 struggles to keep up with 2560x1440 resolution in games. |
I read that the 980 has 96KB of shared memory instead of 48K-64K of the previous versions.
I don't know if this would account for the augmented efficiency, as I suppose that mfaktc doesn't dynamically check for the shared memory presence/quantity. BTW, anybody did benchmarks for CUDALucas/CUDAP-1? From what I see, there should still be the issue of 1/16 between doubles and floats, but maybe cc5.2 and the higher number of cores and clocks may show some interesting surprises... :smile: Luigi |
[QUOTE=ET_;383527]I read that the 980 has 96KB of shared memory instead of 48K-64K of the previous versions.
I don't know if this would account for the augmented efficiency, as I suppose that mfaktc doesn't dynamically check for the shared memory presence/quantity. BTW, anybody did benchmarks for CUDALucas/CUDAP-1? From what I see, there should still be the issue of 1/16 between doubles and floats, but maybe cc5.2 and the higher number of cores and clocks may show some interesting surprises... :smile: Luigi[/QUOTE] Maxwell has 1/32 DP. |
[QUOTE=TheJudger;383476]Integer intruction throughput (some pages ago in this thread).
[/QUOTE] So I spent six hours today reading the whole thread. It cleared up a lot. From what I read it's possible to create a kernel that uses floating point instructions instead. Is it still worth investigating? |
| All times are UTC. The time now is 23:14. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.