mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   RTX 4090/4080 spec /price (https://www.mersenneforum.org/showthread.php?t=28087)

James Heinrich 2023-05-29 12:17

[QUOTE=firejuggler;631468]FP64 is what matter.[/QUOTE]For TF it's FP[B]32[/B] ("Single Precision") GFLOPS that matters for TF.
For gpuowl etc I have low confidence in my ability to predict performance, either from FP32 or FP64 theoretical throughput, but TF always translates directly.

All the mfkatx performance numbers on [URL="https://www.mersenne.ca/mfaktc.php"]my chart[/URL] are derived from FP32 GFLOPS and a magic multiplier for architecture (generally corresponding to CUDA level for NVIDIA):[code]$TF_GFLOPS_per_GHzDayPerDay = array(
'N' => array(
10 => 0.00,
11 => 14.00,
12 => 14.00,
13 => 14.00,
20 => 3.65,
21 => 5.35,
30 => 10.75,
35 => 11.55,
37 => 11.05, // Tesla K80 -- single benchmark, note that K80 is dual-GPU model
50 => 9.00,
52 => 9.00,
60 => 9.70, // Tesla P100
61 => 7.90,
70 => 3.58, // Titan V100 -- only one benchmark so far
75 => 3.30, // RTX 20x0
80 => 2.90, // A100-SXM4
86 => 6.15, // RTX 30?0/A?000
89 => 6.35, // RTX 40x0
),
'A' => array(
1 => 11.3, // VLIW5
2 => 11.0, // VLIW4
10 => 9.3, // GCN 1.0
11 => 9.3, // GCN 1.1
12 => 9.3, // GCN 1.2
13 => 10.9, // GCN 1.3
14 => 10.9, // GCN 1.3
15 => 10.8, // GCN 1.5
20 => 13.0, // RDNA 1 (RX 5700)
30 => 11.0, // RDNA 2 (RX 6600/6700/6800/6900)
40 => 15.0, // RDNA 3 (RX 7x00)
),
'I' => array(
10 => 9.5, // Arc A380, A770
),
);[/code]

Jurzal 2023-05-29 13:11

[QUOTE=James Heinrich;631470]For TF it's FP[B]32[/B] ("Single Precision") GFLOPS that matters for TF.
[/QUOTE]

Neat, thanks for information, FP32 it is!
Sorry, if I missed it before, but how I submit my 3060 Ti benchmark? Mine is averaging around 3000, while benchmark table shows 2100 average.

EDIT: Never mind, I found it and I uploaded my benchmark.

James Heinrich 2023-05-29 17:44

[QUOTE=Jurzal;631474]how I submit my 3060 Ti benchmark? Mine is averaging around 3000, while benchmark table shows 2100 average.[/QUOTE]That's pretty normal these days. The performance table is based on nominal "stock" clockspeeds, while in reality with decent cooling you're likely to see both Boost and any manufacturer overclock on top of that, oftentimes a significant difference. Your [URL="https://www.mersenne.ca/mfaktc.php?filter=3060+Ti"]3060 Ti[/URL], for example, has a stock clock of [URL="https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3060-3060ti/"]1410[/URL] and your submitted benchmark showed 1965, nearly 40% higher, so the theoretical 2230 GHzd/d * 1.394 = 3108 GHd/d which is in line with your reported performance.

Jurzal 2023-05-30 12:08

[QUOTE=James Heinrich;631486]That's pretty normal these days. The performance table is based on nominal "stock" clockspeeds, while in reality with decent cooling you're likely to see both Boost and any manufacturer overclock on top of that, oftentimes a significant difference. Your [URL="https://www.mersenne.ca/mfaktc.php?filter=3060+Ti"]3060 Ti[/URL], for example, has a stock clock of [URL="https://www.nvidia.com/en-us/geforce/graphics-cards/30-series/rtx-3060-3060ti/"]1410[/URL] and your submitted benchmark showed 1965, nearly 40% higher, so the theoretical 2230 GHzd/d * 1.394 = 3108 GHd/d which is in line with your reported performance.[/QUOTE]

Thanks for confirming!
Nvidia cards respond very well to proper undervolting + overclocking. Can gain 40% higher performance with -20% power consumption reduction.
1965 MHz clock from base 1410 MHz is with 165W power consumption, instead of 200W default.

henryzz 2023-05-30 13:00

[QUOTE=Jurzal;631518]Thanks for confirming!
Nvidia cards respond very well to proper undervolting + overclocking. Can gain 40% higher performance with -20% power consumption reduction.
1965 MHz clock from base 1410 MHz is with 165W power consumption, instead of 200W default.[/QUOTE]

Seems strange that they would release cards like that.

preda 2023-05-31 07:50

[QUOTE=Jurzal;631518]
Nvidia cards respond very well to proper undervolting + overclocking. Can gain 40% higher performance with -20% power consumption reduction.
1965 MHz clock from base 1410 MHz is with 165W power consumption, instead of 200W default.[/QUOTE]

BUT does it compute correctly at that overclock+undervolt?

In my experience, expecially with TF, it's very easy to overlook wrong compute. You simply don't find factors, and there is no other indication that the GPU is not working correctly.

So if your GPU undervolts+overclocks fantastically, you should spend a significant effort making sure it still works correctly before jumping into serious TF. To check you need to run known-factors TF and verify that the factors are all detected correctly without exception.

James Heinrich 2023-05-31 11:10

[QUOTE=preda;631570]To check you need to run known-factors TF and verify that the factors are all detected correctly without exception.[/QUOTE]The easiest way to do this is [c]mfaktc -st2[/c] which will test a large number of known factors of all sizes across multiple kernels and different exponent sizes, and give you confirmation at the end:[code]Selftest statistics
number of tests 26192
successfull tests 26192

kernel | success | fail
-------------------+---------+-------
UNKNOWN kernel | 0 | 0
71bit_mul24 | 2586 | 0
75bit_mul32 | 2682 | 0
95bit_mul32 | 2867 | 0
barrett76_mul32 | 1096 | 0
barrett77_mul32 | 1114 | 0
barrett79_mul32 | 1153 | 0
barrett87_mul32 | 1066 | 0
barrett88_mul32 | 1069 | 0
barrett92_mul32 | 1084 | 0
75bit_mul32_gs | 2420 | 0
95bit_mul32_gs | 2597 | 0
barrett76_mul32_gs | 1079 | 0
barrett77_mul32_gs | 1096 | 0
barrett79_mul32_gs | 1130 | 0
barrett87_mul32_gs | 1044 | 0
barrett88_mul32_gs | 1047 | 0
barrett92_mul32_gs | 1062 | 0

selftest PASSED![/code]
(there is also the less-extensive [c]-st[/c] test which does the same thing, just less of it, since -st2 can easily take several hours on a slower gpu)

Jurzal 2023-05-31 17:49

[Code]
Selftest statistics
number of tests 26192
successfull tests 26192

kernel | success | fail
-------------------+---------+-------
UNKNOWN kernel | 0 | 0
71bit_mul24 | 2586 | 0
75bit_mul32 | 2682 | 0
95bit_mul32 | 2867 | 0
barrett76_mul32 | 1096 | 0
barrett77_mul32 | 1114 | 0
barrett79_mul32 | 1153 | 0
barrett87_mul32 | 1066 | 0
barrett88_mul32 | 1069 | 0
barrett92_mul32 | 1084 | 0
75bit_mul32_gs | 2420 | 0
95bit_mul32_gs | 2597 | 0
barrett76_mul32_gs | 1079 | 0
barrett77_mul32_gs | 1096 | 0
barrett79_mul32_gs | 1130 | 0
barrett87_mul32_gs | 1044 | 0
barrett88_mul32_gs | 1047 | 0
barrett92_mul32_gs | 1062 | 0

selftest PASSED![/Code]

Jurzal 2023-05-31 17:53

2 Attachment(s)
I have completed 4001 assignments, found 52 factors.
Mostly at wavefront 75-77.

Added screenshots of GPU72 config.
I will run a recently tested known factor 168785003 at 76-77 bit, to see if I match it.

Jurzal 2023-05-31 19:33

1 Attachment(s)
[QUOTE=preda;631570]BUT does it compute correctly at that overclock+undervolt?

In my experience, expecially with TF, it's very easy to overlook wrong compute. You simply don't find factors, and there is no other indication that the GPU is not working correctly.

So if your GPU undervolts+overclocks fantastically, you should spend a significant effort making sure it still works correctly before jumping into serious TF. To check you need to run known-factors TF and verify that the factors are all detected correctly without exception.[/QUOTE]


Test succesful, factor found.
This GPU goes up to 2175 MHz with 1.081v on core, 1965 MHz with 0.925v on core, and that is a safe margin, since it can do it on 0.900v too.

I know my overclocks. Cheers.


All times are UTC. The time now is 14:41.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.