![]() |
|
|
#23 |
|
Romulan Interpreter
"name field"
Jun 2011
Thailand
41·251 Posts |
As usual, a lot of blah-blah and false claims....
|
|
|
|
|
|
#24 | |
|
Jun 2003
23×683 Posts |
Quote:
|
|
|
|
|
|
|
#25 | |
|
"Mihai Preda"
Apr 2015
22×3×112 Posts |
Quote:
It should be possible to do a 1:1 conversion to CUDA if desired. Or use as inspiration, or a starting point, only. |
|
|
|
|
|
|
#26 |
|
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts |
See https://docs.nvidia.com/cuda/turing-...ide/index.html
1.4.1.4. Integer Arithmetic Similar to Volta, the Turing SM includes dedicated FP32 and INT32 cores. This enables simultaneous execution of FP32 and INT32 operations. Applications can interleave pointer arithmetic with floating-point computations. For example, each iteration of a pipelined loop could update addresses and load data for the next iteration while simultaneously processing the current iteration at full FP32 throughput. Not talking about tensor cores now, that's just FP16 4x4 matrix multiply to an FP32 result matrix. Probably useless for our purposes, but who knows. |
|
|
|
|
|
#27 | |
|
Sep 2006
The Netherlands
3×269 Posts |
Quote:
The definition of 'executing at the same time' at a GPU is fuzzy logic because even generations of gpu's from 12 years ago and before basically take long time for an instruction to get out of the execution units - so there is a nonstop state of 'executing at the same time'. Furthermore if i run 8-20 warps of 32 cudacores at an 128 cudacore SIMD (9000 and 1000 series) or 192 cudacore SIMD (kepler) then obviously warps already get executed 'at the same time'. If the claim now is that if the same kernel with n warps has a mix of integer and floating point instructions and that you can get a higher IPC at a SIMD by mixing floating point and integer instructions through each other - that's implicitly more of the same. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Memory Bandwidth | Fred | Hardware | 12 | 2016-02-01 18:29 |
| High Bandwidth Memory | tha | GPU Computing | 4 | 2015-07-31 00:21 |
| configuration for max memory bandwidth | smartypants | Hardware | 11 | 2015-07-26 09:16 |
| P-1 memory bandwidth | TheMawn | Hardware | 1 | 2013-06-15 23:15 |
| Parallel memory bandwidth | fivemack | Factoring | 14 | 2008-06-11 20:43 |