mersenneforum.org Nvidia Pascal, a third of DP
 Register FAQ Search Today's Posts Mark Forums Read

 2016-02-21, 22:21 #1 firejuggler     Apr 2010 Over the rainbow 32·5·59 Posts Nvidia Pascal, a third of DP http://www.techtimes.com/articles/13...sing-power.htm Based on a number of slides from an independent researcher, the Nvidia Pascal GPU100 features Stacked DRAM (1 TB/s) giving it as much as 12 TFLOPs of Single-Precision (FP32) compute performance. The flagship GPU is purportedly able to provide four TFLOPs of Double-Precision (FP64) compute performance as well. Last fiddled with by wblipp on 2016-02-21 at 23:40
 2016-02-22, 02:13 #2 LaurV Romulan Interpreter     "name field" Jun 2011 Thailand 978710 Posts Yarrrr !!!
2016-02-22, 04:04   #3
0PolarBearsHere

Oct 2015

4128 Posts

Quote:
 So far, we know the following aspects about Nvidia's upcoming flagship Pascal GP100 graphics processing unit: - Pascal graphics architecture - When we compare it to Maxwell, Pascal rolls two-fold performance per watt - Successor to the GM200 GPU found in the GTX Titan X and GTX 980 Ti - Features about 17 billion transistors, two times more than the GM200 - Uses 16nm FinFET base from TSMC - 4 Mb bus interface, comparable to the Fiji GPU power from the AMD Fury models - Has half-precision FP16 compute at double the rate of full-precision FP32 - Will sport four 4-Hi HBM2 stacks, giving it 16 GB of VRAM, as well as 8-Hi stacks mounting to 32 GB for professional computations in SKUs - It will sport exclusive compatibility with next-gen IBM PowerPC server processors due to NVLink - DirectX 12 feature will be at 12_1 or higher levels - Launch is scheduled for the second half of 2016
So it'll have more VRAM than many computers have normal RAM.

2016-02-22, 07:58   #4
ATH
Einyen

Dec 2003
Denmark

2×7×227 Posts

12 TFLOPS FP32 and 4 TFLOPS FP64 *drool* It will be hard to decide whether to run factoring or LL on it.

Quote:
 - Has half-precision FP16 compute at double the rate of full-precision FP32
Can FP16 be used for anything useful? That would be crazy at ~24 TFLOPS.

2016-02-22, 08:37   #5
LaurV
Romulan Interpreter

"name field"
Jun 2011
Thailand

9,787 Posts

Quote:
 Originally Posted by ATH Can FP16 be used for anything useful? That would be crazy at ~24 TFLOPS.
Games... 3D rotations, translation, tesselation.. whatever...
Trial Factoring... (maybe.... you can use 8 of those to keep the precision of 80 to 88 bits, depending of content, but you will need about 72 multiplications to multiply those 8x8 "digits", with some karatsuba-like stuff (edit: can it be done in 72 multiplications?? don't forget that you multiply 11 bits times 11 bits and get 11 bits result, not 22 bits), so you will only get a third of a teraflop, like a gtx 560 or so. OTOH, I assume they will make a kill at integer arithmetic too, so FP16 will not be the best choice for TF either ...)

Last fiddled with by LaurV on 2016-02-22 at 08:40

 2016-02-22, 09:27 #6 axn     Jun 2003 3×17×101 Posts I am gonna go out on a limb and predict that there will be no consumer/prosumer version that offers 4 TFLOPS DP (i.e. not even a 1000$Titan variety will offer 4 TFLOPS)  2016-02-22, 12:23 #7 mackerel Feb 2016 UK 23·5·11 Posts Looking at previous high rate DP cards and their release dates: R9 280X, ~1 TFLOP, Oct. 2013 Titan ~1.5 TFLOP, Feb. 2013 Titan Black 1.7 TFLOP, Feb. 2014 Could they manage to get it up to 4 in 2 years? I think there's more than a possibility they can, if they want to, in a higher end card. Especially now they're finally moving onto smaller manufacturing process again. 2016-02-22, 12:43 #8 ATH Einyen Dec 2003 Denmark 61528 Posts Quote:  Originally Posted by axn I am gonna go out on a limb and predict that there will be no consumer/prosumer version that offers 4 TFLOPS DP (i.e. not even a 1000$ Titan variety will offer 4 TFLOPS)
Yeah unfortunately if it seems too good to be true, it most often is. But we can hope....

2016-02-22, 12:47   #9
axn

Jun 2003

3·17·101 Posts

Quote:
 Originally Posted by mackerel Could they manage to get it up to 4 in 2 years?
Yes, they could.
Quote:
 Originally Posted by mackerel if they want to
I am saying they don't want to, since it will cannibalize their compute line of offerings.

2016-02-22, 15:19   #10
tServo

"Marv"
May 2009
near the Tannhäuser Gate

12358 Posts

Quote:
 Originally Posted by axn I am gonna go out on a limb and predict that there will be no consumer/prosumer version that offers 4 TFLOPS DP (i.e. not even a 1000\$ Titan variety will offer 4 TFLOPS)
I agree completely with AXN on this, as I explained in my post of a week ago:
http://www.mersenneforum.org/showpos...&postcount=604

Do you want 3 Tflops FP64 from an Nvidia board? It's been available for a year on their K80 Tesla! The catch is it costs 5,000 dollars and requires intense, cooling found in servers located in frigid computer rooms. You can bet your boots Pascal chips with gobs of FP64 are destined for Teslas and not for the great unwashed masses ( us ). Even Nvidia's expected April announcement will probably be a "tease" in that regard.

BTW, FP16 is there for Deep Learning Neural Nets, which is the hottest thing in AI right now. Researchers have done some truly amazing things with these such as driving cars and beating a GO master. Nvidia has very nice libraries for these. They require zillions of small FP values for all the weights used during training. They can tolerate loss of precision FP16 provides and are willing to trade that for having twice as many in memory as FP32 values.

 2016-02-22, 15:20 #11 mackerel     Feb 2016 UK 23×5×11 Posts Go on then, I'll take the optimistic route that they will put this in a consumer device, perhaps a future Titan something. The fastest single-chip compute device they do is the K40, which appears to use the same chip as the Titan Black. They can still differentiate between the products in other ways. It's not like they're going to stand still in compute either, and they can't risk AMD not crippling their offering and looking bad.

 Similar Threads Thread Thread Starter Forum Replies Last Post ThomRuley Factoring 965 2021-09-09 14:06 Xyzzy GPU Computing 1 2017-05-17 20:22 Mark Rose GPU Computing 52 2016-07-02 12:11 Elhueno Homework Help 5 2008-06-12 16:37 Vijay Miscellaneous Math 5 2005-04-09 20:36

All times are UTC. The time now is 19:34.

Mon Oct 25 19:34:36 UTC 2021 up 94 days, 14:03, 0 users, load averages: 2.20, 2.50, 2.32