mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2019-10-02, 11:20   #23
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

23×727 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
And you know for sure that Apple itself won't make a laptop with a proper OS and an ARM chip? No one knows that for sure. And that's exactly why I phrased my very first comment in this thread the way I did (though I certainly was ambiguous when I talked about 'chips' rather than 'core' or 'CPU;).
This thread is about servers. Even if Apple do make a laptop, AND open the CPU specs for everyone, it wouldn't apply here. When I make my servers I don't make them from mobile Intel or AMD chips, I use server chips. And I need full documentation for everything else I can't make them.
retina is offline   Reply With Quote
Old 2019-10-02, 11:40   #24
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

21016 Posts
Default

Well yes, but at least I did post some info about ARM servers. Conversation closed.
ldesnogu is offline   Reply With Quote
Old 2019-10-02, 11:47   #25
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

24·3·11 Posts
Default

TheNextPlatform:Arm’s Chances In Servers May Hinge On Success In HPC
ldesnogu is offline   Reply With Quote
Old 2019-10-15, 17:30   #26
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

24×3×11 Posts
Default

Another article from TheNextPlatform: Growing Up In An HPC World.
ldesnogu is offline   Reply With Quote
Old 2019-11-07, 16:25   #27
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

24×3×11 Posts
Default

Slides about upcoming Fujitsu chip with SVE: http://www.ssken.gr.jp/MAINSITE/even...PCF_shinjo.pdf


It's in Japanese but with enough English to get a few things. Impressive beast.
ldesnogu is offline   Reply With Quote
Old 2019-11-08, 21:26   #28
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

9,791 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
Slides about upcoming Fujitsu chip with SVE: http://www.ssken.gr.jp/MAINSITE/even...PCF_shinjo.pdf


It's in Japanese but with enough English to get a few things. Impressive beast.
If I read the english snips correctly, it uses 512-bit SIMD processed in 128-bit chunks by the read/write ports ... that would need a 512-bit ARMified version of my AVX-512 asm routines for Mlucas to exploit. Will any of these make their way in single-chip form into HPC workstations, or are they strictly for the supercomputer market?

At some point I'll surely need to do both 256-bit and 512-bit ARMv8 coding, but said SIMD widths would need to appear in volume in some consumer-market form (e.g. smartphones) to make the effort worthwhile.
ewmayer is offline   Reply With Quote
Old 2019-11-09, 09:32   #29
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

24·3·11 Posts
Default

Quote:
Originally Posted by ewmayer View Post
If I read the english snips correctly, it uses 512-bit SIMD processed in 128-bit chunks by the read/write ports ...
I'm not sure where you got that 128-bit from. For the rest, yes, 2 x 512-bit wide FMA.

Quote:
that would need a 512-bit ARMified version of my AVX-512 asm routines for Mlucas to exploit.
Yes, you'd need a dedicated path for using SVE, it's not the same instruction set as NEON.

Quote:
Will any of these make their way in single-chip form into HPC workstations, or are they strictly for the supercomputer market?
I don't know for sure. But if it comes to low power devices it's likely it won't be very wide (both vector width, and number of units).

Quote:
At some point I'll surely need to do both 256-bit and 512-bit ARMv8 coding, but said SIMD widths would need to appear in volume in some consumer-market form (e.g. smartphones) to make the effort worthwhile.
I think you missed the point of SVE: you don't need to care (too much) of the hardware vector length I think this document explains things nicely.

But that's perhaps me being naive (and underestimating your knowledge of SVE, sorry), and you still might care a lot due to FFT structure. At the very least, the instructions would be the same (contrary to AVX2 vs AVX-512).

I know you don't have a lot of free time, but if you ever want to start playing with SVE, you can pick a recent ARM cross-compiler and QEMU emulator (the SVE support in QEMU is validated).
ldesnogu is offline   Reply With Quote
Old 2019-11-09, 20:11   #30
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

263F16 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
I'm not sure where you got that 128-bit from. For the rest, yes, 2 x 512-bit wide FMA.
I mis-read slide #8, the one showing the read/write ports, as '128 bits/cycle', but in fact it reads '128 bytes/cycle'.

Quote:
I think you missed the point of SVE: you don't need to care (too much) of the hardware vector length I think this document explains things nicely.

But that's perhaps me being naive (and underestimating your knowledge of SVE, sorry), and you still might care a lot due to FFT structure. At the very least, the instructions would be the same (contrary to AVX2 vs AVX-512).
Having data-width-independent instructions would be great, to be sure - but even so, there are 2 key areas where the width still matters:

1. Literal byte address offsets in asm instructions - this could surely be parameterized, say via a literal byte argument to the asm macros whose value is set at build time;

2. The FFT data are arranged in memory is a SIMD-width-dependent fashion, e.g. for 256-bit SIMD we use quartets of doubles, whereby 4 complex [re,im] double-pairs are stored as [0.re,1.re,2.re,3.re],[0.im,1.im,2.im,3.im]. In a typical FFT step we butterfly several such data segments from disjoint (wide-stide-separated) portions of the big data array, call them segments A,B,C,D,... . There are 2 points in each FFT-convolution step - one bracketing the dyadic-mul step beween the forward and inverse FFT, and another bracketing the round-and-carry step - where we need to transpose such data, e.g. in our 256-bit-double-quartets example, we need to take

[A0.re,A1.re,A2.re,A3.re]
[B0.re,B1.re,B2.re,B3.re]
[C0.re,C1.re,C2.re,C3.re]
[D0.re,D1.re,D2.re,D3.re]

and transpose those (and similarly for the im-parts) to

[A0.re,B0.re,C0.re,D0.re]
[A1.re,B1.re,C1.re,D1.re]
[A2.re,B2.re,C2.re,D2.re]
[A3.re,B3.re,C3.re,D3.re]

Such transposes unavoidably involve data-width-dependent shuffle/permute instructions. In an SVE-styke paradigm, the way to handle this would seem to break the transpose work out of the macros where it currently is combined with other non-transpose operations, thus minimizing the amount of asm code with data-width-dependent instructions.

Quote:
I know you don't have a lot of free time, but if you ever want to start playing with SVE, you can pick a recent ARM cross-compiler and QEMU emulator (the SVE support in QEMU is validated).
That will be worth looking into once I complete work on the coming-soon Mlucas v19 code release, the one which adds support for PRP-testing with Gerbicz check.
ewmayer is offline   Reply With Quote
Old 2019-11-19, 07:29   #31
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

24×3×11 Posts
Default

Thanks for the explanation!


I took a look at SVE support in gcc. Right now the intrinsics are not there, they're coming in version 10. I'm not sure if you need them.
ldesnogu is offline   Reply With Quote
Old 2019-11-19, 07:53   #32
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

21016 Posts
Default

Fujitsu A64fx system (with SVE) tops Green 500 and ranks 159 on Top 500. It achieves 85% of the theoretical peak TFLOPS on LINPACK.

Last fiddled with by ldesnogu on 2019-11-19 at 07:54
ldesnogu is offline   Reply With Quote
Old 2019-11-19, 19:42   #33
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

230778 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
I took a look at SVE support in gcc. Right now the intrinsics are not there, they're coming in version 10. I'm not sure if you need them.
I don't use intrinsics myself, prefer to work 'close to the metal'. So the thing I need is inline-asm support from the compiler/assembler.
ewmayer is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
OFFICIAL "SERVER PROBLEMS" THREAD ewmayer PrimeNet 2026 2020-10-23 08:43
Primenet Server - Official Maintenance Thread Madpoo PrimeNet 71 2020-07-18 18:25
Official AVX-512 programming thread ewmayer Programming 31 2016-10-14 05:49
Official 'Let's move the hyphen!' thread. Flatlander Lounge 29 2013-01-12 19:29
Official Odd Perfect Number thread ewmayer Math 14 2008-10-23 13:43

All times are UTC. The time now is 15:40.

Tue Oct 27 15:40:54 UTC 2020 up 47 days, 12:51, 1 user, load averages: 2.76, 2.49, 2.42

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.