mersenneforum.org Long Double support on Ryzen?
 Register FAQ Search Today's Posts Mark Forums Read

 2019-05-29, 15:23 #1 lavalamp     Oct 2007 Manchester, UK 25148 Posts Long Double support on Ryzen? Hi guys, I've tried to find this out from reading around the net but can't. Basically I have a niche use for the 80 bit extended precision data type on Intel x86 CPUs, sometimes called Long Double, but I don't know if this is included in Ryzen CPUs. I know it's an older and slower part of the core from yesteryear, but still of some use to me. Anyway, I was hoping that someone with a Ryzen CPU might run the following C++ code and just test it out directly. ryzen_test.cpp: Code: #include #include #include #include int main() { std::cout << "Char size: " << sizeof(char) << std::endl; std::cout << "Short size: " << sizeof(short) << std::endl; std::cout << "Int size: " << sizeof(int) << std::endl; std::cout << "Long Int size: " << sizeof(long) << std::endl; std::cout << "Long Long Int size: " << sizeof(long long) << std::endl << std::endl; std::cout << "Float size: " << sizeof(float) << std::endl; std::cout << "Double size: " << sizeof(double) << std::endl; std::cout << "Long Double size: " << sizeof(long double) << std::endl << std::endl; std::cout << "Float fraction: " << std::numeric_limits::digits << std::endl; std::cout << "Double fraction: " << std::numeric_limits::digits << std::endl; std::cout << "Long Double fraction: " << std::numeric_limits::digits << std::endl << std::endl; std::cout << std::setprecision(20); std::cout << std::sqrt(5.0F) << std::endl; std::cout << std::sqrt(5.0) << std::endl; std::cout << std::sqrt(5.0L) << std::endl << std::endl; return 0; } It can be compiled and run with the following command on Linux: Code: g++ ryzen_test.cpp -march=native -o main_test && ./ryzen_test &> ryzen_test_results.txt I shouldn't think it matters whether using Ryzen 1 or 2, but I'd appreciate it if you'd post which CPU was used.
 2019-05-29, 15:31 #2 Mysticial     Sep 2016 2·167 Posts The x87 FPU instructions still exist on Ryzen. There's way too much legacy software that still uses it. So they can't get rid of it. Though it probably will get progressively slower as time goes on. So the real question here is whether the compiler still supports generating these instructions.
2019-05-29, 16:07   #3
lavalamp

Oct 2007
Manchester, UK

22·3·113 Posts

Quote:
 Originally Posted by Mysticial There's way too much legacy software that still uses it. So they can't get rid of it.
On a purely selfish note, yay! (At least until 128 bit floats are natively supported.)

The compilers still definitely support those instructions, I've tested with Intel, GCC and Clang compilers.

I have wondered about the possibility of removing instructions from x86, but it honestly seems too difficult. Perhaps when mainstream computing moves to another architecture (RISC-V?) 80 bit floats will die.

2019-05-29, 19:17   #4
TheJudger

"Oliver"
Mar 2005
Germany

11×101 Posts

Quote:
 Originally Posted by lavalamp I have wondered about the possibility of removing instructions from x86, but it honestly seems too difficult.
AFAIK most (all expect prefetch?) of AMDs 3Dnow instructions are no longer available on recent AMD CPUs. Not sure whether you call this x86 or not.

Oliver

2019-05-29, 19:55   #5
ewmayer
2ω=0

Sep 2002
República de California

2D7F16 Posts

Quote:
 Originally Posted by TheJudger AFAIK most (all expect prefetch?) of AMDs 3Dnow instructions are no longer available on recent AMD CPUs. Not sure whether you call this x86 or not. Oliver
By 'legacy x86/x87' we typically refer to the common core instruction set which predates introduction of the various SIMDs, with regard to which Intel and AMD did a great job fragmenting and mutually incompatible-izing. A small bit of inline assembly or compiler-intrinsic for parsing CPUID is handy in terms of sorting out the various features which may or may not be available on a particular x86-based CPU.

 2019-05-29, 19:59 #6 retina Undefined     "The unspeakable one" Jun 2006 My evil lair 24×389 Posts I think the FPU code will live forever. And I mean that the code will live and work, but not necessarily the hardware behind it. For now the hardware is still supporting it, so no problem. But in the future, if there is some good emulator code available, it might be decided to let the OS do it in software to make room for some other new killer hardware feature.
2019-05-29, 21:10   #7
Mysticial

Sep 2016

2×167 Posts

Quote:
 Originally Posted by retina I think the FPU code will live forever. And I mean that the code will live and work, but not necessarily the hardware behind it. For now the hardware is still supporting it, so no problem. But in the future, if there is some good emulator code available, it might be decided to let the OS do it in software to make room for some other new killer hardware feature.
Based on the latencies of the x87 and SIMD instructions on Skylake, I can deduce that the x87 FPU hardware is likely its own execution unit that's separate from the SIMD. This allows them to make an optimized and "canonical" SIMD lane that can be easily copy-pasted for wider SIMD. (we can already see this from the Skylake die shots)

So Intel is already in the process of gutting out the x87 hardware. Though whether it actually disappears at some point is harder take since the only reason to get rid of it is that it becomes a burden on the rest of the chip.
• The area required for the x87 FPU shouldn't be an issue. If there's enough dark silicon to implement all those specialized instructions, there should be enough to keep around the (now dedicated) x87 FPU. Furthermore, the x87 FPU can use the same 64-bit multiplier as the scalar integer unit.*
• The logistics of maintaining the x87 FPU stack and registers in the execution engine could be great enough to force Intel or AMD to get rid of it at some point. If and when they do kill off x87, the MMX will probably go with it since it uses the same set of registers.

*Speaking of which, the integer multiply and the x87 fmul go into different execution ports. So it's possible to write a benchmark to determine whether "fmul" uses the same 64-bit multiplier as "mul/imul/mulx". Anyone want to volunteer to try this benchmark?

 2019-05-30, 09:38 #8 mackerel     Feb 2016 UK 22×109 Posts x87 has been included in CPU hardware since... 486DX? Over 25 years ago... and optionally as co-pro long before that. Still I wonder how much code today depends on it that can't be done adequately with FP64. The only software I use that could fit that would be genefer. What would be the performance cost to transparently emulate it in either hardware or software (assuming OS has capability to do so)? I also wonder why we've seemingly stuck at 64 bits for a long time. No 128-bit data size? I mean, excluding SIMD-like operations on multiple smaller sizes at once. More so in GPUs we see the ability to split FP64 units into smaller sizes, can't they go the other way? Combine two 64-bit units to perform as a single 128-bit unit? I assume I'm ignorant of some implementation cost making it undesirable for the few use cases it might be used for. On Mysticial's last question, it's way beyond my ability but sounds like something that might be in Agner Fog's architecture guide.
 2019-05-30, 12:48 #9 retina Undefined     "The unspeakable one" Jun 2006 My evil lair 622410 Posts Re: 128 bits You'd need to convince Intel/AMD of the need for 128 bit values. I don't imagine there are any hard problems with widening the arithmetic units. Aside from allocating some more transistors and wiring them up there isn't much to do to support it. The basics of making wider adders and multipliers is well known. But no company will do that if the usage case is only a tiny fraction of applications.
2019-05-30, 15:11   #10
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24×3×113 Posts

Quote:
 Originally Posted by mackerel x87 has been included in CPU hardware since... 486DX? Over 25 years ago... and optionally as co-pro long before that.
Yes; 8086 or 8088 and 8087; designed 1977-1980; original IBM pc released in 1981 had a separate empty socket for the 8087. Through the 80386 and 486SX, it was a separate coprocessor. I still have a 386 and 387 from 1990. https://en.wikipedia.org/wiki/Intel_8087 Cyrix made a drop-in faster replacement for the 387, the Cyrix Fasmath 83D87, which displaced mine. https://en.wikipedia.org/wiki/Cyrix. Along with a TI 486SLC-66, a double-speed 486sx core on a tiny circuit board which interfaced on a 386's pinout, to boost the performance of a 386/33 system. https://arstechnica.com/civis/viewto...33353&start=40

Last fiddled with by kriesel on 2019-05-30 at 15:18

2019-05-30, 15:50   #11
Mysticial

Sep 2016

2·167 Posts

Quote:
 Originally Posted by mackerel I also wonder why we've seemingly stuck at 64 bits for a long time. No 128-bit data size? I mean, excluding SIMD-like operations on multiple smaller sizes at once. More so in GPUs we see the ability to split FP64 units into smaller sizes, can't they go the other way? Combine two 64-bit units to perform as a single 128-bit unit? I assume I'm ignorant of some implementation cost making it undesirable for the few use cases it might be used for.
Because of all the DL/AI stuff, it's going in the opposite direction: FP16, BF16, and 8-bit stuff...

At one of the conferences I went to the last year, they were talking about using 1-bit precision for DL.

Quote:
 On Mysticial's last question, it's way beyond my ability but sounds like something that might be in Agner Fog's architecture guide.
Agner has the port information. But he doesn't normally test for EU-sharing across different ports.

Quote:
 Originally Posted by retina You'd need to convince Intel/AMD of the need for 128 bit values. I don't imagine there are any hard problems with widening the arithmetic units. Aside from allocating some more transistors and wiring them up there isn't much to do to support it. The basics of making wider adders and multipliers is well known. But no company will do that if the usage case is only a tiny fraction of applications.
To get something like a full-throughput "vmulpq" (quad-precision FP multiply), you'd need to double the amount of area and power consumption for that - and for half as many lanes as double-precision since the data width is now 128-bit.

Current Intel chips seem to be very natively designed around 64-bit data widths. No execution unit can cross a 64-bit boundary except for the shuffle unit.

Looking at the die-shots of Skylake, a single FMA lane is actually pretty big already. If they double it up in size, we're looking at maybe 20% of the entire core - including all the L1. And having all that silicon that's only for 128-bit FP is probably going to hurt yields as well.

Last fiddled with by Mysticial on 2019-05-30 at 15:50

 Similar Threads Thread Thread Starter Forum Replies Last Post Prime95 Hardware 9 2018-05-14 04:06 M344587487 Hardware 3 2018-04-25 15:23 robert44444uk Hardware 50 2018-04-07 20:41 Prime95 Software 10 2017-05-08 13:24 jasong Hardware 11 2017-03-02 19:56

All times are UTC. The time now is 22:31.

Mon Aug 2 22:31:51 UTC 2021 up 10 days, 17 hrs, 0 users, load averages: 1.11, 1.22, 1.36