mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2018-11-07, 12:47   #1
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

3B616 Posts
Default Vega 20 announced with 7.64 TFlops of FP64

Horizon event announcement (bad audio quality):

https://youtu.be/GwX13bo0RDQ?t=3270


Nice summary:

https://www.youtube.com/watch?v=YmPimQp7xLE&t=350


Highlights:
  • 1:2 FP64 performance at 7.64 TFlops
  • 32GB HBM2 at ~2Gbps
  • ~1TB/s memory bandwidth
  • ECC
  • Probably expensive as hell as a professional card

What could these specs translate to for prime hunting, particularly PRP? The memory bandwidth is just over double that of Vega64, is that a limiting factor to roughly double the Vega64 performance regardless of what the improved FP64 performance might offer? Am I right in thinking that gpuowl already uses FP64 despite the 1:16 ratio of current Vega, meaning that if it can be kept fed it can be naively said that throughput multiplies by ~8 all other things equal? If FP32 is involved I can't make even an uneducated guess. Anyone knowledgeable want to hazard a guess as to the estimated performance uplift versus Vega64? How about versus a Titan V (which has ~650GB/s memory bandwidth and ~6.9TFlops of FP64 performance)?
M344587487 is offline   Reply With Quote
Old 2018-11-07, 14:17   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

7,823 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Highlights:
  • 1:2 FP64 performance at 7.64 TFlops
  • 32GB HBM2 at ~2Gbps
  • ~1TB/s memory bandwidth
  • ECC
  • Probably expensive as hell as a professional card
You're correct about PRP making use of FP64. As does LL, and P-1. That in combination with the large 32GB ram would make this card outstanding for P-1.
kriesel is online now   Reply With Quote
Old 2018-11-07, 21:47   #3
xx005fs
 
"Eric"
Jan 2018
USA

223 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Horizon event announcement (bad audio quality):

https://youtu.be/GwX13bo0RDQ?t=3270


Nice summary:

https://www.youtube.com/watch?v=YmPimQp7xLE&t=350


Highlights:
  • 1:2 FP64 performance at 7.64 TFlops
  • 32GB HBM2 at ~2Gbps
  • ~1TB/s memory bandwidth
  • ECC
  • Probably expensive as hell as a professional card

What could these specs translate to for prime hunting, particularly PRP? The memory bandwidth is just over double that of Vega64, is that a limiting factor to roughly double the Vega64 performance regardless of what the improved FP64 performance might offer? Am I right in thinking that gpuowl already uses FP64 despite the 1:16 ratio of current Vega, meaning that if it can be kept fed it can be naively said that throughput multiplies by ~8 all other things equal? If FP32 is involved I can't make even an uneducated guess. Anyone knowledgeable want to hazard a guess as to the estimated performance uplift versus Vega64? How about versus a Titan V (which has ~650GB/s memory bandwidth and ~6.9TFlops of FP64 performance)?
I have both a vega 56 and titan v and titan v never should go above 1300MHz even with 1040MHz memory yielding about 770GB/s BW. Vega is quite limited without ocing the memory at 1:16 ratio so that 1TB/s will def help. I would assume it being around 30% faster than Titan V which put it about 230% faster than vega 56 overclocked.
xx005fs is offline   Reply With Quote
Old 2018-11-08, 16:25   #4
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2·52·19 Posts
Default

Official AMD Horizon video with better audio: https://youtu.be/kC3ny3LBfi4?t=3090

A presentation they did after that talk going into more detail about MI50 and MI60: https://www.youtube.com/watch?v=m0h6-VfH3Xo

Highlights of the second talk:
  • MI50 and MI60 are the two GPUs
  • MI50 has 60 CUs, MI60 has 64 CUs, which translates to ~10% peak theoretical differences
  • Slide shows MI50 has 6.7TFlops and MI60 has 7.4TFlops of FP64
  • MI50 has 16GB HBM2, MI60 has 32GB HBM2
  • Both have 1TB/s memory bandwidth and 4096 bit bus
  • Both have 300W TDP
  • ECC extends beyond HBM and includes registers which it didn't before
  • Lower latency cache than current gen Vega
  • The first we're likely to get our hands on these cards is via renting from cloud vendors


If memory is our main bottleneck the two cards should be functionally the same for PRP, trial factoring and P-1 may be a different matter. The pricing will be interesting, not that they're going to be available to the likes of us for a while.
M344587487 is offline   Reply With Quote
Old 2018-11-08, 16:56   #5
xx005fs
 
"Eric"
Jan 2018
USA

223 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Official AMD Horizon video with better audio: https://youtu.be/kC3ny3LBfi4?t=3090

A presentation they did after that talk going into more detail about MI50 and MI60: https://www.youtube.com/watch?v=m0h6-VfH3Xo

Highlights of the second talk:
  • MI50 and MI60 are the two GPUs
  • MI50 has 60 CUs, MI60 has 64 CUs, which translates to ~10% peak theoretical differences
  • Slide shows MI50 has 6.7TFlops and MI60 has 7.4TFlops of FP64
  • MI50 has 16GB HBM2, MI60 has 32GB HBM2
  • Both have 1TB/s memory bandwidth and 4096 bit bus
  • Both have 300W TDP
  • ECC extends beyond HBM and includes registers which it didn't before
  • Lower latency cache than current gen Vega
  • The first we're likely to get our hands on these cards is via renting from cloud vendors


If memory is our main bottleneck the two cards should be functionally the same for PRP, trial factoring and P-1 may be a different matter. The pricing will be interesting, not that they're going to be available to the likes of us for a while.
AMD didn't publicly disclose their int32 performance, hence trial factoring performance is going to be weak if they stay the same as current gen Vega and any other GCN GPU (compared to Volta and Turing). However, if the memories are overclockable then MI60 definitely have an edge than MI50 because as tested on Titan V at 800GB/s BW (OCed a bit further), it saturates the memory system at about 1230MHz which yields to about 6TFLOP DP. scale it up to 1TB/s (assuming same FFT scaling between Nvidia and AMD arch) meaning that it will use 7.2TFLOP of DP which saturates MI50. However, the difference should be negligible.

Last fiddled with by xx005fs on 2018-11-08 at 16:57
xx005fs is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
RX Vega performance xx005fs GPU Computing 5 2018-01-17 00:22
FP64 or FP32 marigonzes Software 5 2017-02-11 01:06
1000 TFLOPs ramgeis PrimeNet 2 2014-04-08 10:27
Supercomputer Blizzard wíth 158 TFLOPS online moebius Science & Technology 3 2010-12-14 10:45
nVIDIA's GeForce 9800/G92 series to hit 1 TFLOPS ixfd64 Hardware 0 2007-10-01 08:05

All times are UTC. The time now is 14:18.


Fri Jul 7 14:18:07 UTC 2023 up 323 days, 11:46, 0 users, load averages: 0.94, 1.22, 1.24

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔