![]() |
|
|
#89 |
|
Sep 2006
The Netherlands
11001001112 Posts |
Ok i'm digging in older articles.
https://www.anandtech.com/show/4455/...-for-compute/4 Here is see first time in my life picture with 16 ALU's for each SIMD as description for older GCN. |
|
|
|
|
|
#90 |
|
Sep 2006
The Netherlands
3·269 Posts |
Ok if i understand it correctly AMD moved to 16 alu's in each SIMD, which is pathetic little, to sail around an old problem they had, that's that they can't execute different kernels (wavefronts) at the same time.
By having now 4 different SIMDs in each CU, it allows within the same CU different wavefronts to execute. So instructions that take longer than 1 clock throughput then also can get handled meanwhile the other execution units still can execute simple instructions then, instead of needing to wait long time. edit: Ok so i need to dig more into how many wavefronts can most ideally get executed simultaneously (not the maximum as that's clear) to still get good IPC. Blindfolded i would gamble at 8 for GCN. Last fiddled with by diep on 2017-12-08 at 00:05 |
|
|
|
|
|
#91 |
|
Sep 2006
The Netherlands
3·269 Posts |
Ok that opens possibilities! More than 4 wavefronts @ 64 streamcores doesn't make sense it seems if i estimate it here.
Means 64KB / 4 = 16KB LDS available. For the internal iterations of the FFT that's more than sufficient. That's 2K doubles or 2^11 is 11 iterations. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Cost to compute prime | Unregistered | Information & Answers | 30 | 2013-12-18 03:34 |
| Doubled compute power for a day? | Christenson | PrimeNet | 19 | 2011-10-26 08:29 |
| New Compute Box | Christenson | Hardware | 0 | 2011-01-15 04:44 |
| Compute billions of digits of Pi using GMP | M0CZY | Miscellaneous Math | 5 | 2010-10-14 09:40 |
| My throughput does not compute... | petrw1 | Hardware | 9 | 2007-08-13 14:38 |