mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-11-01, 19:13   #2564
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

502910 Posts
Default

Quote:
Originally Posted by axn View Post
I'm hoping that it [6900xt] will achieve 90%+ performance of R VII. Of course, at $999, it is still too expensive
Sounds optimistic.
The 6900XT has half the memory bandwidth 512GB/sec, 1.44 GFLOPS FP64, vs 1TB/sec & 3.36 GFLOPS for Radeon VII.
https://www.techpowerup.com/gpu-spec...-6900-xt.c3481
https://www.techpowerup.com/gpu-specs/radeon-vii.c3358
As a point of reference, 5700XT 448GB/s memory bandwidth, FP64 0.6 TFLOPS is around 40-50% of Radeon VII speed in gpuowl, and that's with the benefit of power limits imposed on the Radeon VIIs here but the 5700XT running free.
https://www.techpowerup.com/gpu-spec...-5700-xt.c3339

Last fiddled with by kriesel on 2020-11-01 at 19:32
kriesel is offline   Reply With Quote
Old 2020-11-01, 20:14   #2565
moebius
 
moebius's Avatar
 
Jul 2009
Germany

547 Posts
Default

Quote:
Originally Posted by kriesel View Post
Sounds optimistic.
The 6900XT has half the memory bandwidth 512GB/sec, 1.44 GFLOPS FP64, vs 1TB/sec & 3.36 GFLOPS for Radeon VII.
https://www.techpowerup.com/gpu-spec...-6900-xt.c3481
https://www.techpowerup.com/gpu-specs/radeon-vii.c3358
As a point of reference, 5700XT 448GB/s memory bandwidth, FP64 0.6 TFLOPS is around 40-50% of Radeon VII speed in gpuowl, and that's with the benefit of power limits imposed on the Radeon VIIs here but the 5700XT running free.
https://www.techpowerup.com/gpu-spec...-5700-xt.c3339
GigaFLOPS (GFLOPS) = 10^9 FLOPS = 1.000.000.000 FLOPS
TeraFLOPS (TFLOPS) = 10^12 FLOPS = 1.000.000.000.000 FLOPS
TFLOPS= GFLOPS / 10^3

Last fiddled with by moebius on 2020-11-01 at 20:18
moebius is offline   Reply With Quote
Old 2020-11-01, 21:23   #2566
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

47×107 Posts
Default

Still sounds optimistic.
The 6900XT has half the memory bandwidth 512 GB/sec, 1.44 TFLOPS FP64, vs 1TB/sec & 3.36 TFLOPS for Radeon VII.
https://www.techpowerup.com/gpu-spec...-6900-xt.c3481
https://www.techpowerup.com/gpu-specs/radeon-vii.c3358
As a point of reference, 5700XT 448 GB/s memory bandwidth, FP64 0.6 TFLOPS is around 40-50% of Radeon VII speed in gpuowl, and that's with the benefit of power limits imposed on the Radeon VIIs here but the 5700XT running free.
https://www.techpowerup.com/gpu-spec...-5700-xt.c3339
(using this convention: https://www.mathsisfun.com/definitio...mal-point.html)

Radeon VII 100M P-1, 862 us/it; 5700XT 2186; so ~39%.

Last fiddled with by kriesel on 2020-11-01 at 21:45
kriesel is offline   Reply With Quote
Old 2020-11-01, 23:08   #2567
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

2·223 Posts
Default

Quote:
Originally Posted by kriesel View Post
The 6900XT has half the memory bandwidth 512 GB/sec...
You didn't account for the Infinity cache, 128 MiB, and pretty high bandwidth (maybe even 1,5 TB/s) but I couldn't find the exact numbers.
Viliam Furik is online now   Reply With Quote
Old 2020-11-01, 23:50   #2568
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10011101001012 Posts
Default

The claim is the cache is 2.17x faster than the GDDR6. 128MB sounds like a lot of cache. But it's not going to hold the whole footprint of a wavefront Gpuowl primality test, much less P-1 stage 2. We see prime95 GhzD/day drop off with cache effectiveness at larger fft lengths.
https://www.dailystuff.org/2020/10/2...rting-usd-549/
0.39 x 2.17 is still not the 90% hoped for earlier, and that's with the benefit of unequal power limits
It will be interesting to see how it does in real world conditions.

Last fiddled with by kriesel on 2020-11-01 at 23:53
kriesel is offline   Reply With Quote
Old 2020-11-02, 00:14   #2569
moebius
 
moebius's Avatar
 
Jul 2009
Germany

547 Posts
Default

Quote:
Originally Posted by Viliam Furik View Post
You didn't account for the Infinity cache, 128 MiB, and pretty high bandwidth (maybe even 1,5 TB/s) but I couldn't find the exact numbers.
The Radeon Instinct MI 100 with CDNA architecture (Release 2020) then actually has a memory bandwidth of 1229 GB/s and 32 GB HBM2 and 4.098 TFlops FP 64, but is of course from the professional sector and unaffordable. I think only the A100 will compete with it.
https://www.techpowerup.com/gpu-spec...ct-mi100.c3496

Last fiddled with by moebius on 2020-11-02 at 00:17
moebius is offline   Reply With Quote
Old 2020-11-02, 00:28   #2570
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

2·223 Posts
Default

Quote:
Originally Posted by kriesel View Post
The claim is the cache is 2.17x faster than the GDDR6. 128MB sounds like a lot of cache. But it's not going to hold the whole footprint of a wavefront Gpuowl primality test, much less P-1 stage 2. We see prime95 GhzD/day drop off with cache effectiveness at larger fft lengths.
https://www.dailystuff.org/2020/10/2...rting-usd-549/
0.39 x 2.17 is still not the 90% hoped for earlier, and that's with the benefit of unequal power limits
It will be interesting to see how it does in real world conditions.
The claim is specifically said about the GDDR6 with 384bit bus, which most probably refers to the memory bandwidth of RTX 3090 (936 GB/s), compared to combined bandwidth of Infinity cache and GDDR6 with 256bit bus. (image attached)

Infinity cache bandwidth can be estimated by subtracting the 256bit bandwidth (~512 GB/s) from 2.17 times the 936 GB/s, resulting in about 1.5 TB/s - 1.5 times higher than Radeon VII's.

Also, 128 MiB should be plenty for the current wavefront (100M to 110M) as 64 MiB on Ryzen 9 3900X is still enough for it. Based on my testing (results graph attached), 64 MiB is good until the 6400K FFT length. Further calculations predict that 128 MiB should suffice for FFT length up to about 16128K (about 297M exponents), assuming the same behaviour as in 3900X.
Attached Thumbnails
Click image for larger version

Name:	75953_09_what-is-amds-new-rdna-2-feature-infinity-cache-and-does-it-do_full.png
Views:	37
Size:	811.7 KB
ID:	23710   Click image for larger version

Name:	Ryzen 9 3900X - extended (ln).png
Views:	35
Size:	75.4 KB
ID:	23712  

Last fiddled with by Viliam Furik on 2020-11-02 at 00:28
Viliam Furik is online now   Reply With Quote
Old 2020-11-02, 00:36   #2571
moebius
 
moebius's Avatar
 
Jul 2009
Germany

547 Posts
Default

it's been dealt with....

Last fiddled with by moebius on 2020-11-02 at 00:44
moebius is offline   Reply With Quote
Old 2020-11-02, 01:08   #2572
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

502910 Posts
Default

Radeon VII and 5700XT 100M P-1 stage 1 are showing 1-2 GB gpu ram occupancy in GPU-Z with gpuowl V6.11-380.
PRP on RX480 and RX550 at ~180M exponent show 1-1.5 GB occupancy.

The article refers to "traditional" gddr6. An RTX3090, which is still too scarce to buy, doesn't seem to me very traditional, nor does the 384bit width.

If the cache does pan out the 6800XT looks like a better buy to me than the 6900XT.

Last fiddled with by kriesel on 2020-11-02 at 01:48
kriesel is offline   Reply With Quote
Old 2020-11-02, 02:52   #2573
xx005fs
 
"Eric"
Jan 2018
USA

110101002 Posts
Default

Quote:
Originally Posted by kriesel View Post
Still sounds optimistic.
The 6900XT has half the memory bandwidth 512 GB/sec, 1.44 TFLOPS FP64, vs 1TB/sec & 3.36 TFLOPS for Radeon VII.
https://www.techpowerup.com/gpu-spec...-6900-xt.c3481
https://www.techpowerup.com/gpu-specs/radeon-vii.c3358
As a point of reference, 5700XT 448 GB/s memory bandwidth, FP64 0.6 TFLOPS is around 40-50% of Radeon VII speed in gpuowl, and that's with the benefit of power limits imposed on the Radeon VIIs here but the 5700XT running free.
https://www.techpowerup.com/gpu-spec...-5700-xt.c3339
(using this convention: https://www.mathsisfun.com/definitio...mal-point.html)

Radeon VII 100M P-1, 862 us/it; 5700XT 2186; so ~39%.

Actually if those numbers are accurate, it is highly probable that the 6900xt will be more than 2x the performance of 5700xt, since assuming the same FP32 to FP64 ratio with RDNA2 the FP32 performance of 6900xt is 23.04TFLOPs (divide by 16 to get FP64) and with the 5700xt standing only at 9.75TFLOPs, so some 2.36x performance. With the 128MB of cache having huge effective bandwidth for wavefront FFT testing I think there might be a chance.

Last fiddled with by xx005fs on 2020-11-02 at 02:52
xx005fs is offline   Reply With Quote
Old 2020-11-02, 10:11   #2574
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

78910 Posts
Default

Quote:
Originally Posted by axn View Post
...
I'm hoping that it will achieve 90%+ performance of R VII. Of course, at $999, it is still too expensive but 6800 & 6800XT might be good value. All pure speculation currently, obviously.
Allow me to be the pessimistic one. The recent 5700XT numbers posted weren't terrible, it gives some hope that doubling the CU count and an incremental progression from Navi despite being on the same node might get big navi within spitting distance of R7, but it does ignore power consumption and that memory bandwidth hasn't improved in step with the cores. IF infinity cache is the miracle we need then there is the potential for big navi to be in the same league as R7 if power consumption can be ignored. Of course I hope that they've seriously overvolted the cards to get a 5% performance gain for +50% power consumption which we promptly reverse, but I have a feeling that to catch up like they have they've had to run run wider and better on the efficiency curve by default. Judging by prior hardware they're working towards squeezing 95% of what a card has to offer by default instead of relying on overclocks, how far along they are with that goal remains to be seen. Infinity cache is also not going to be free from a power perspective and 16GB of GDDR6 consumes way more than HBM2 did.

I still maintain that the types of gamer who bought R7 will want to upgrade to big navi so there may be a glut of them on the secondhand market. If big navi doesn't perform for compute or uses too much power to do so then hoovering up R7's is a good plan B.

Of big navi 6800XT is the one to get, 8 extra CU's and a better bin are likely not worth $350 but $70 for 12 extra CU's seems like a no-brainer.
M344587487 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1668 2020-12-22 15:38
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 12:29.

Sun Apr 18 12:29:42 UTC 2021 up 10 days, 7:10, 0 users, load averages: 1.94, 1.59, 1.59

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.