mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2019-05-29, 15:23   #1
lavalamp
 
lavalamp's Avatar
 
Oct 2007
London, UK

22·52·13 Posts
Default Long Double support on Ryzen?

Hi guys, I've tried to find this out from reading around the net but can't. Basically I have a niche use for the 80 bit extended precision data type on Intel x86 CPUs, sometimes called Long Double, but I don't know if this is included in Ryzen CPUs. I know it's an older and slower part of the core from yesteryear, but still of some use to me.

Anyway, I was hoping that someone with a Ryzen CPU might run the following C++ code and just test it out directly.

ryzen_test.cpp:
Code:
#include <cmath>
#include <iomanip>
#include <iostream>
#include <limits>

int main()
{
    std::cout << "Char size: " << sizeof(char) << std::endl;
    std::cout << "Short size: " << sizeof(short) << std::endl;
    std::cout << "Int size: " << sizeof(int) << std::endl;
    std::cout << "Long Int size: " << sizeof(long) << std::endl;
    std::cout << "Long Long Int size: " << sizeof(long long) << std::endl
              << std::endl;

    std::cout << "Float size: " << sizeof(float) << std::endl;
    std::cout << "Double size: " << sizeof(double) << std::endl;
    std::cout << "Long Double size: " << sizeof(long double) << std::endl
              << std::endl;

    std::cout << "Float fraction: " << std::numeric_limits<float>::digits
              << std::endl;
    std::cout << "Double fraction: " << std::numeric_limits<double>::digits
              << std::endl;
    std::cout << "Long Double fraction: "
              << std::numeric_limits<long double>::digits << std::endl
              << std::endl;

    std::cout << std::setprecision(20);
    std::cout << std::sqrt(5.0F) << std::endl;
    std::cout << std::sqrt(5.0) << std::endl;
    std::cout << std::sqrt(5.0L) << std::endl << std::endl;

    return 0;
}
It can be compiled and run with the following command on Linux:
Code:
g++ ryzen_test.cpp -march=native -o main_test && ./ryzen_test &> ryzen_test_results.txt
I shouldn't think it matters whether using Ryzen 1 or 2, but I'd appreciate it if you'd post which CPU was used.
lavalamp is offline   Reply With Quote
Old 2019-05-29, 15:31   #2
Mysticial
 
Mysticial's Avatar
 
Sep 2016

7×47 Posts
Default

The x87 FPU instructions still exist on Ryzen. There's way too much legacy software that still uses it. So they can't get rid of it. Though it probably will get progressively slower as time goes on.

So the real question here is whether the compiler still supports generating these instructions.
Mysticial is offline   Reply With Quote
Old 2019-05-29, 16:07   #3
lavalamp
 
lavalamp's Avatar
 
Oct 2007
London, UK

22×52×13 Posts
Default

Quote:
Originally Posted by Mysticial View Post
There's way too much legacy software that still uses it. So they can't get rid of it.
On a purely selfish note, yay! (At least until 128 bit floats are natively supported.)

The compilers still definitely support those instructions, I've tested with Intel, GCC and Clang compilers.

I have wondered about the possibility of removing instructions from x86, but it honestly seems too difficult. Perhaps when mainstream computing moves to another architecture (RISC-V?) 80 bit floats will die.
lavalamp is offline   Reply With Quote
Old 2019-05-29, 19:17   #4
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

110710 Posts
Default

Quote:
Originally Posted by lavalamp View Post
I have wondered about the possibility of removing instructions from x86, but it honestly seems too difficult.
AFAIK most (all expect prefetch?) of AMDs 3Dnow instructions are no longer available on recent AMD CPUs. Not sure whether you call this x86 or not.

Oliver
TheJudger is offline   Reply With Quote
Old 2019-05-29, 19:55   #5
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

101·113 Posts
Default

Quote:
Originally Posted by TheJudger View Post
AFAIK most (all expect prefetch?) of AMDs 3Dnow instructions are no longer available on recent AMD CPUs. Not sure whether you call this x86 or not.

Oliver
By 'legacy x86/x87' we typically refer to the common core instruction set which predates introduction of the various SIMDs, with regard to which Intel and AMD did a great job fragmenting and mutually incompatible-izing. A small bit of inline assembly or compiler-intrinsic for parsing CPUID is handy in terms of sorting out the various features which may or may not be available on a particular x86-based CPU.
ewmayer is offline   Reply With Quote
Old 2019-05-29, 19:59   #6
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

562810 Posts
Default

I think the FPU code will live forever.

And I mean that the code will live and work, but not necessarily the hardware behind it. For now the hardware is still supporting it, so no problem. But in the future, if there is some good emulator code available, it might be decided to let the OS do it in software to make room for some other new killer hardware feature.
retina is offline   Reply With Quote
Old 2019-05-29, 21:10   #7
Mysticial
 
Mysticial's Avatar
 
Sep 2016

7×47 Posts
Default

Quote:
Originally Posted by retina View Post
I think the FPU code will live forever.

And I mean that the code will live and work, but not necessarily the hardware behind it. For now the hardware is still supporting it, so no problem. But in the future, if there is some good emulator code available, it might be decided to let the OS do it in software to make room for some other new killer hardware feature.
Based on the latencies of the x87 and SIMD instructions on Skylake, I can deduce that the x87 FPU hardware is likely its own execution unit that's separate from the SIMD. This allows them to make an optimized and "canonical" SIMD lane that can be easily copy-pasted for wider SIMD. (we can already see this from the Skylake die shots)

So Intel is already in the process of gutting out the x87 hardware. Though whether it actually disappears at some point is harder take since the only reason to get rid of it is that it becomes a burden on the rest of the chip.
  • The area required for the x87 FPU shouldn't be an issue. If there's enough dark silicon to implement all those specialized instructions, there should be enough to keep around the (now dedicated) x87 FPU. Furthermore, the x87 FPU can use the same 64-bit multiplier as the scalar integer unit.*
  • The logistics of maintaining the x87 FPU stack and registers in the execution engine could be great enough to force Intel or AMD to get rid of it at some point. If and when they do kill off x87, the MMX will probably go with it since it uses the same set of registers.

*Speaking of which, the integer multiply and the x87 fmul go into different execution ports. So it's possible to write a benchmark to determine whether "fmul" uses the same 64-bit multiplier as "mul/imul/mulx". Anyone want to volunteer to try this benchmark?
Mysticial is offline   Reply With Quote
Old 2019-05-30, 09:38   #8
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

2·191 Posts
Default

x87 has been included in CPU hardware since... 486DX? Over 25 years ago... and optionally as co-pro long before that. Still I wonder how much code today depends on it that can't be done adequately with FP64. The only software I use that could fit that would be genefer. What would be the performance cost to transparently emulate it in either hardware or software (assuming OS has capability to do so)?

I also wonder why we've seemingly stuck at 64 bits for a long time. No 128-bit data size? I mean, excluding SIMD-like operations on multiple smaller sizes at once. More so in GPUs we see the ability to split FP64 units into smaller sizes, can't they go the other way? Combine two 64-bit units to perform as a single 128-bit unit? I assume I'm ignorant of some implementation cost making it undesirable for the few use cases it might be used for.

On Mysticial's last question, it's way beyond my ability but sounds like something that might be in Agner Fog's architecture guide.
mackerel is offline   Reply With Quote
Old 2019-05-30, 12:48   #9
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

22·3·7·67 Posts
Default Re: 128 bits

You'd need to convince Intel/AMD of the need for 128 bit values.

I don't imagine there are any hard problems with widening the arithmetic units. Aside from allocating some more transistors and wiring them up there isn't much to do to support it. The basics of making wider adders and multipliers is well known. But no company will do that if the usage case is only a tiny fraction of applications.
retina is offline   Reply With Quote
Old 2019-05-30, 15:11   #10
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

23·232 Posts
Default

Quote:
Originally Posted by mackerel View Post
x87 has been included in CPU hardware since... 486DX? Over 25 years ago... and optionally as co-pro long before that.
Yes; 8086 or 8088 and 8087; designed 1977-1980; original IBM pc released in 1981 had a separate empty socket for the 8087. Through the 80386 and 486SX, it was a separate coprocessor. I still have a 386 and 387 from 1990. https://en.wikipedia.org/wiki/Intel_8087 Cyrix made a drop-in faster replacement for the 387, the Cyrix Fasmath 83D87, which displaced mine. https://en.wikipedia.org/wiki/Cyrix. Along with a TI 486SLC-66, a double-speed 486sx core on a tiny circuit board which interfaced on a 386's pinout, to boost the performance of a 386/33 system. https://arstechnica.com/civis/viewto...33353&start=40

Last fiddled with by kriesel on 2019-05-30 at 15:18
kriesel is online now   Reply With Quote
Old 2019-05-30, 15:50   #11
Mysticial
 
Mysticial's Avatar
 
Sep 2016

7·47 Posts
Default

Quote:
Originally Posted by mackerel View Post
I also wonder why we've seemingly stuck at 64 bits for a long time. No 128-bit data size? I mean, excluding SIMD-like operations on multiple smaller sizes at once. More so in GPUs we see the ability to split FP64 units into smaller sizes, can't they go the other way? Combine two 64-bit units to perform as a single 128-bit unit? I assume I'm ignorant of some implementation cost making it undesirable for the few use cases it might be used for.
Because of all the DL/AI stuff, it's going in the opposite direction: FP16, BF16, and 8-bit stuff...

At one of the conferences I went to the last year, they were talking about using 1-bit precision for DL.

Quote:
On Mysticial's last question, it's way beyond my ability but sounds like something that might be in Agner Fog's architecture guide.
Agner has the port information. But he doesn't normally test for EU-sharing across different ports.


Quote:
Originally Posted by retina View Post
You'd need to convince Intel/AMD of the need for 128 bit values.

I don't imagine there are any hard problems with widening the arithmetic units. Aside from allocating some more transistors and wiring them up there isn't much to do to support it. The basics of making wider adders and multipliers is well known. But no company will do that if the usage case is only a tiny fraction of applications.
To get something like a full-throughput "vmulpq" (quad-precision FP multiply), you'd need to double the amount of area and power consumption for that - and for half as many lanes as double-precision since the data width is now 128-bit.

Current Intel chips seem to be very natively designed around 64-bit data widths. No execution unit can cross a 64-bit boundary except for the shuffle unit.

Looking at the die-shots of Skylake, a single FMA lane is actually pretty big already. If they double it up in size, we're looking at maybe 20% of the entire core - including all the L1. And having all that silicon that's only for 128-bit FP is probably going to hurt yields as well.

Last fiddled with by Mysticial on 2019-05-30 at 15:50
Mysticial is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Ryzen help Prime95 Hardware 9 2018-05-14 04:06
Ryzen 2 efficiency improvements M344587487 Hardware 3 2018-04-25 15:23
Help to choose components for a Ryzen rig robert44444uk Hardware 50 2018-04-07 20:41
29.2 benchmark help #2 (Ryzen only) Prime95 Software 10 2017-05-08 13:24
AMD Ryzen is risin' up. jasong Hardware 11 2017-03-02 19:56

All times are UTC. The time now is 16:48.

Tue Aug 11 16:48:53 UTC 2020 up 25 days, 12:35, 1 user, load averages: 1.37, 1.61, 1.56

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.