![]() |
|
|
#12 | |
|
Sep 2006
The Netherlands
3·269 Posts |
Quote:
The least significant 32 bits are in a single unsigned integer and the top 16 bits idemdito. That eats 2 cycles in short. Now i hope they provide, as some AMD engineer proposed, a solution to add this to OpenCL, if that request is still in time for the next APP SDK. That might mean that APP SDK 2.5 also provides access to the top16 bits. Right now only a slower instruction the 32x32 bits top 32 bits are there. So it would be possible, garantuee until the door, to study a transform for the AMD gpu in opencl using 24 bits multiplications. Now i must admit i didn't study this very well yet, i'm sure TheJudger did do that a lot better as he's using this for TF as well and the 72 bits is his fastest code there. Adding Carry isn't so easy in AMD. Also it seems to me that using multiply-add is rather complicated as the low bits gives a 32 bits result rathre than 24 bits, so it will be impossible probably to get a throughput of above 880Mhz * 1536 = 1.35 Tflop With 24 bits multiplications it's possible however to emulate bigger numbers than 64 bits i'd argue. Up to 72 bits maybe is possible. Note that would still be rather slow 72 bits numbers. 48 bits would go a lot faster i'd guess. This will require puzzling! Of course in contradiction to 32 bits math where overflow to 33 bits is not possible, with all this 24 bits junk, in reality it's in an integer that has 8 bits left, so overflow is nearly impossible. So the full 48 or 72 bits can get used. Karatsuba type shifting operations also more easy then. Or maybe even 96 bits. 4 x 24 bits. If we multiply 96 x 96 bits using 24 bits at a time, we need 9 multiplications using Karatsuba. I'll generate a bunch of primes close to those limits. |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| ivy bridge versus haswell | diep | Hardware | 29 | 2017-12-06 13:43 |
| Freedom of Information versus the Right to Privacy | Brian-E | Soap Box | 10 | 2014-07-07 17:59 |
| CUDALucas versus Mfaktc/o | Brain | GPU Computing | 26 | 2011-12-06 08:48 |
| Head versus tail | R.D. Silverman | Lounge | 9 | 2008-12-16 14:28 |
| Pfactor versus Pminus1 | GP2 | Marin's Mersenne-aries | 4 | 2003-09-30 02:52 |