![]() |
|
|
#12 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
|
|
|
|
|
|
#13 | |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
11010100010012 Posts |
Quote:
Nobody breaks encryption nowadays anyway. It's all about the metadata. And five dollar wrenches. But actually what are some real uses for quad precision? I mean really, where are they required? |
|
|
|
|
|
|
#14 |
|
Sep 2016
17C16 Posts |
It would make GIMPS more efficient by reducing the memory (and bandwidth) requirement for LL.
|
|
|
|
|
|
#15 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Efficacy toward the stated purpose is not a requirement of a government program or position. "However, as those facts did not support Bomber Commands’ philosophy, they were suppressed at the time." http://www.reformationsa.org/index.php/history/174-the-bombing-of-cities-in-world-war-ii
|
|
|
|
|
|
#16 | |
|
Feb 2016
UK
1110000002 Posts |
Quote:
If FP128 were a thing, presumably it would have higher limits still. And if implemented in a SIMD format, all the better. I found it interesting when comparing x87 tasks between Ryzen and Intel, the Ryzen throughput was about half that of Intel. I do wonder if x87 is somehow tied into their AVX2/FMA FPU units which is similarly about half the potential, and if Zen 2's implementation will catch up with consumer Intel CPUs as expected from that. Edit: random thought: would there be an advantage for our use cases if more bits were put towards the number part as opposed to the exponent? So still 64-but overall, but the question is where to put them. Last fiddled with by mackerel on 2019-05-30 at 17:50 |
|
|
|
|
|
|
#17 |
|
Dec 2012
The Netherlands
22×33×17 Posts |
|
|
|
|
|
|
#18 | |||
|
Sep 2016
17C16 Posts |
Quote:
The only thing having an super-efficient x87 FPU would get is an amazing SuperPi or PiFast benchmark - which aren't really used that much anymore for benchmarking. Quote:
What we see here is strong evidence of a complete revamp of the SIMD unit in Skylake while the x87 FPU remains untouched. The most logical explanation for this is that the x87 FPU is completely separate from the SIMD as of Skylake (and possibly going back a bit further) Likewise if you look at the die-shot for Skylake, you can see 8 identical squares in a 2x4 pattern (16 in a 4x4 on Skylake X). Each square is a 64-bit FP-FMA lane. They are all identical and the same size. There's no "special" one which looks a bit bigger - as would be needed for a 64-bit multiplier instead of a 53-bit multiplier. ----- There isn't enough evidence to say much about Ryzen. The x87 FPU latencies are 5-cycles for both fadd and fmul. That's higher than all the SIMD latencies except for the FMA. So it's hard to say whether Ryzen has a dedicated x87 FPU? Or whether one of the SIMD lanes is "bigger" to accommodate the 64-bit multiplier that's needed. And as I mentioned before, there's the possibility of the x87 FPU sharing the 64-bit multiplier with the scalar integer unit. But someone needs to test this. Even though a 64-bit multiplier would be large in silicon, it's not obvious from the die shots since there's only one of them as opposed to the SIMD lanes which is the same thing copy-pasted multiple times. Quote:
Last fiddled with by Mysticial on 2019-05-30 at 18:38 |
|||
|
|
|
|
|
#19 | |
|
Oct 2007
Manchester, UK
53·11 Posts |
Quote:
Edit: I should add that JPL explicitly state they use quad precision during simulation when generating their planetary ephemeris data sets, so I presume they do not use x86. Last fiddled with by lavalamp on 2019-05-30 at 19:48 |
|
|
|
|
|
|
#20 |
|
∂2ω=0
Sep 2002
República de California
22×2,939 Posts |
A few thoughts re. x87 80-bit FP and general 128-bit FP usage:
o Like Alex (Mysticial), I've long been an advocate for instruction sharing of expensive hardware, mainly multipliers. For the x87 having both floating and integer MUL share a single 64x64 hardware multiplier makes perfect sense. But I strongly believe that Intel and AMD have simply 'frozen' their current legacy-core-on-die tech - it'll keep benefiting from process size reductions but it's a fixed block of IP, they are not dedicating any engineering resources to changing anything in there. o Another instruction which GIMPS makes huge use of for which hardware support would be useful (but will likely not happen) is paired-add/subtract, a +- b. This is a huge component in transform arithmetic. The idea is that in an FP context, the add/sub of the 2 signficands is only part of a longer execution pipeline which proceeds (in simplified form, ignoring under-and-overflow handling) something like this, for FADD: 1. Unpack FP inputs to extract sign, exponent and significand (restoring hidden bit in latter if normalized, i.e. not underflowed); 2. Compute absolute difference of a_exp and b_exp, right-shift significand of the smaller-exponent datum |a_exp - b_exp| bits to align the data; 3. Add the aligned significands; 4. Round the low bits of the sum (hardware will have several extra bits at the low end to support IEEE rounding rules). We round before checking for an add-carryout because of the possibility for a carry rippling all the way from least to most-signifcant bit on rounding; 5. If a carry bit results, shift the sum rightward one place and add one to the larger of the 2 exponent fields, yielding the exponent field of the output; 6. Repack the output into IEEE64 form. For paired add/sub steps 1 and 2 can be done just once, at which point copies of the resulting unpacked/aligned data would get sent to the dedicated add and subtract logic needed to do steps 3-5. The shared-computation savings would be even greater for an FMA-based add/sub butterfly, a*b +- c, because the MUL need only be done once. o The main use cases I know of for 128-bit FP are in finance and scientific computation. I come from the latter milieu so understand that better. The old Cray supercomputers, before Cray moved to building around commodity microprocessors, had hardware support for 128-bit quad-precision FP. But even there the economics were such that it now makes more sense to use commodity microprocessors and support 128-bit FP via emulation in software. o The old DEC Vax-11 architecture had two distinct 64-bit FP types, G_floating which is more less the same as today's IEEE64 with 11 bits for the scaled exponent, and D_floating, which gained 3 significand bits at the expense of the exponent, the latter's 8 bits only support an operand range of approximately plus or minus 2.9E-39 to plus or minus 1.7E+38. But it's expensive to support even just a single float type in hardware, so it makes sense to pick one which gives a a broadest-general-usage-optimized balance of precision and range. Having broadly standardized industry-wide rules for these, as well as for rounding is vitally important in the era of ubiquitous computation. That's why IEEE stepped in decades ago, and why we have just a single industry-wide IEEE64 floating-point standard. And there is also a codified IEEE FP128 standard known as "binary128", with 15 exponent bits and 112 explicit significand bits (i.e. 113 bits, including the hidden bit). That Wikipage also has a section re. hardware support -- it looks like specialty IBM hardware is the only kind which currently offers such. Last fiddled with by ewmayer on 2019-05-30 at 20:15 |
|
|
|
|
|
#21 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
172208 Posts |
Quote:
Last fiddled with by kriesel on 2019-05-30 at 20:57 |
|
|
|
|
|
|
#22 | ||
|
Feb 2016
UK
26×7 Posts |
Quote:
Quote:
|
||
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Ryzen help | Prime95 | Hardware | 9 | 2018-05-14 04:06 |
| Ryzen 2 efficiency improvements | M344587487 | Hardware | 3 | 2018-04-25 15:23 |
| Help to choose components for a Ryzen rig | robert44444uk | Hardware | 50 | 2018-04-07 20:41 |
| 29.2 benchmark help #2 (Ryzen only) | Prime95 | Software | 10 | 2017-05-08 13:24 |
| AMD Ryzen is risin' up. | jasong | Hardware | 11 | 2017-03-02 19:56 |