mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Knights Landing reservations (https://www.mersenneforum.org/showthread.php?t=20152)

GP2 2016-08-29 14:24

[QUOTE=ewmayer;440950]Bottom line: Around $5K as I surmised, including all the sweet Intel compiler&tuning tools. If we can get 10 people together and 1 to play physical host, that's $500 each, roughly what I paid last year for my little Intel 2-core Broadwell NUC. Who's interested on those terms?[/QUOTE]

How would the funds be collected? Perhaps a third-party site like GoFundMe would allow smaller contributions from a larger circle.

However, maybe it's jumping the gun. If a KNL machine was acquired then that would really put George on the spot to start tackling an enormous amount of work right away. I know that you would use the machine for Mlucas and others would have their other uses for it, but I think the goal of most would-be funders would be to further mprime/Prime95 development.

Maybe some other funding could be earmarked instead to finding an assembler expert on a RentACoder type site who could carry out some of the drudge work of converting MASM macros to NASM. Would that help at all? Just thinking out loud.

It sounds like that sort of preliminary groundwork would need to be done in any case before the feasibility of a KNL version of mprime/Prime95 could even be contemplated. And much of it (i.e., the existing code base) would not require a KNL machine to do it.

airsquirrels 2016-08-29 15:40

[QUOTE=GP2;440970]...I know that you would use the machine for Mlucas and others would have their other uses for it, but I think the goal of most would-be funders would be to further mprime/Prime95 development.

Maybe some other funding could be earmarked instead to finding an assembler expert on a RentACoder type site who could carry out some of the drudge work of converting MASM macros to NASM. Would that help at all?...[/QUOTE]

I am no expert on the nuanced differences between MASM and NASM, though I do write NASM from time to time. With that said, if it really is limited to the syntactical differences suggested by [url]http://left404.com/2011/01/04/converting-x86-assembly-from-masm-to-nasm-3/[/url] then perhaps I could find the time. Looks like there are about 30288 non-comment lines of assembly.

tServo 2016-08-29 20:30

[QUOTE=Prime95;440925]I dread doing this development:

2) AVX2 code took a year to develop. I've not been in a coding mood recently.
3) Let's face it KNL is a niche product, so the KNL optimized code would be little used.[/QUOTE]

If we do this, there should be no pressure on George to participate; implicit or explicit. He's right about it being a niche product. How many people are going to have these processors?
If I decide to participate, it would be out of curiosity and something to put on a resume, perhaps.

airsquirrels 2016-08-29 20:53

Is prime95 version controlled at all? Or is there just a static source download from George's latest work?

I think the goal would be to make sure Prime95 continues to be available 2-3 years out when AVX-512 in some fashion exists on consumer grade CPUs. If migrating from MASM->NASM is required to handle that development then it has to happen at some point. I would just be concerned about putting in the requisite work but ending up with an orphaned fork of Prime95.

chalsall 2016-08-29 21:20

[QUOTE=airsquirrels;441005]I would just be concerned about putting in the requisite work but ending up with an orphaned fork of Prime95.[/QUOTE]

You raise an important (but possibly uncomfortable) question: Is GIMPS reliant upon George?

Moving Prime95/mprime development into a community space would make a lot of sense. And it could be done today.

There is always the "secret sauce" code (read: security via obscurity) to take into consideration. There are many ways this could be managed without preventing a "fork" today.

Absolutely no disrespect intended towards George in this message. I always have a "If I'm hit by a bus" document ready for automatic release for all my clients (and being hit by a bus is a surprisingly likely event here in Bim.)

Madpoo 2016-08-29 21:28

[QUOTE=tServo;441003]If we do this, there should be no pressure on George to participate; implicit or explicit. He's right about it being a niche product. How many people are going to have these processors?
If I decide to participate, it would be out of curiosity and something to put on a resume, perhaps.[/QUOTE]

I may be confused on the particulars, but AVX-512 on Knights Landing (Xeon Phi x200) is the same as what will be on the Xeon Skylake processors (Xeon E5/E7 v5), correct?

I wasn't aware of any differences in that regard.

Of course the other big differences are the # of cores and the memory/caching architecture, but the thing about KNL was the x86 compatibility, so unless I'm missing something, I thought optimizations for AVX-512 on KNL would also apply to future AVX-512 implementations on Xeon and desktop CPUs.

The best I could figure (and this was a while ago), future AVX-512 will have new features. The only incompatible 512-bit stuff was the jump from Knights Corner to Knights Landing... so yeah, the older Xeon Phi 71xx stuff wouldn't be worth coding for since it's a dead-end.

ewmayer 2016-08-29 21:41

[QUOTE=GP2;440970]How would the funds be collected? Perhaps a third-party site like GoFundMe would allow smaller contributions from a larger circle.[/quote]

Don't see why check and/or paypal shouldn't suffice.

[quote]However, maybe it's jumping the gun. If a KNL machine was acquired then that would really put George on the spot to start tackling an enormous amount of work right away. I know that you would use the machine for Mlucas and others would have their other uses for it, but I think the goal of most would-be funders would be to further mprime/Prime95 development.[/QUOTE]

That was not my intention, nor do I think George would take it that way. He's an adult, and I'm sure perfectly capable of deciding how his time is best spent. My intention was to set up a system for developers-not-necessarily-named-George with an interest in cutting-edge x86 vector/manycore programming to get up to speed on AVX512 and the manythread paradigm embodied by KL - which I believe is also the direction of future desktop CPUs, ever more cores.

[QUOTE=Madpoo;441011]I may be confused on the particulars, but AVX-512 on Knights Landing (Xeon Phi x200) is the same as what will be on the Xeon Skylake processors (Xeon E5/E7 v5), correct?

I wasn't aware of any differences in that regard.

Of course the other big differences are the # of cores and the memory/caching architecture, but the thing about KNL was the x86 compatibility, so unless I'm missing something, I thought optimizations for AVX-512 on KNL would also apply to future AVX-512 implementations on Xeon and desktop CPUs.

The best I could figure (and this was a while ago), future AVX-512 will have new features.[/QUOTE]
The fact that AVX512, unlike previous Xeon-Phi "special" instruction sets, *is* a standard which will also carry over to future desktop-CPUs, is why I have waited 'til now to do any serious Xeon-Phi-oriented coding.

Also, based on my reading of the first-gen AVX512 instruction set, while there will likely be enhancements in future updates, I doubt they will be anywhere near the level of, say, the AVX-to-AVX2 transition, in which Intel added FMA and rectified their stupidity of not including full-vector-width integer support. AVX512 has no obvious 'holes' like those.

ATH 2016-08-29 22:04

So AVX-512 will not be available in Skylake-E it seems, only in Skylake Xeon and then in Cannonlake, so until Cannonlake it will be a niche product, since I don't expect lots of people getting Skylake Xeons.

[url]https://en.wikipedia.org/wiki/AVX-512[/url]
[url]http://wccftech.com/mainstream-intel-core-processors-support-avx-512-skylake-xeon/[/url]

[CODE]AVX-512 Subset F CDI ERI PFI VL BW DQ IFMA VBMI
Knights Landing Yes Yes Yes Yes
Skylake Xeon (SKX) Yes Yes Yes Yes Yes
Cannonlake Yes Yes Yes Yes Yes Yes Yes

AVX-512 F Foundation (F) – expands most 32-bit and 64-bit based AVX instructions with EVEX coding scheme to support 512-bit registers, operation masks, parameter broadcasting, and embedded rounding and exception control, supported by Knights Landing and Skylake Xeon
AVX-512 CDI Conflict Detection Instructions (CDI) – efficient conflict detection to allow more loops to be vectorized, supported by Knights Landing[1] and Skylake Xeon

AVX-512 ERI Exponential and Reciprocal Instructions (ERI) – exponential and reciprocal operations designed to help implement transcendental operations, supported by Knights Landing[1]
AVX-512 PFI Prefetch Instructions (PFI) – new prefetch capabilities, supported by Knights Landing[1]

AVX-512 VL Vector Length Extensions (VL) – extends most AVX-512 operations to also operate on XMM (128-bit) and YMM (256-bit) registers[2]
AVX-512 BW Byte and Word Instructions (BW) – extends AVX-512 to cover 8-bit and 16-bit integer operations[2]
AVX-512 DQ Doubleword and Quadword Instructions (DQ) – adds new 32-bit and 64-bit AVX-512 instructions[2]

AVX-512 IFMA Integer Fused Multiply Add (IFMA) - fused multiply add of integers using 52-bit precision.
AVX-512 VBMI Vector Byte Manipulation Instructions (VBMI) adds vector byte permutation instructions which were not present in AVX-512BW.
[/CODE]

[QUOTE=Madpoo;441011]I may be confused on the particulars, but AVX-512 on Knights Landing (Xeon Phi x200) is the same as what will be on the Xeon Skylake processors (Xeon E5/E7 v5), correct?[/QUOTE]

No, see above. Knights Landing will have ERI and PFI which neither Xeon or Cannonlake will have.
Xeon will add VL, BW and DQ which Knights Landing does not have and Cannonlake will add IFMA and VBMI.

I do not pretend to know what all these subsets mean, but there is clearly differences between them all unfortunately. It would be simpler if they all had all the subsets.

airsquirrels 2016-08-29 22:16

[QUOTE=airsquirrels;440978]I am no expert on the nuanced differences between MASM and NASM, though I do write NASM from time to time. With that said, if it really is limited to the syntactical differences suggested by [url]http://left404.com/2011/01/04/converting-x86-assembly-from-masm-to-nasm-3/[/url] then perhaps I could find the time. Looks like there are about 30288 non-comment lines of assembly.[/QUOTE]

I should not have tried to make such a quick estimate while at work. Make that more like 190441 lines of non comment/non-blank assembly + macros......

ewmayer 2016-08-30 01:58

[QUOTE=ATH;441015]Knights Landing will have ERI and PFI which neither Xeon or Cannonlake will have.
Xeon will add VL, BW and DQ which Knights Landing does not have and Cannonlake will add IFMA and VBMI.

I do not pretend to know what all these subsets mean, but there is clearly differences between them all unfortunately. It would be simpler if they all had all the subsets.[/QUOTE]

The foundation instructions include more or less everything we want for FFT-based LL-testing and TF, too. As to the 'extra' subsets supported by KL but not the later desktop CPUs, the enhanced prefetch stuff (PFI) for strided (scattered) data might be nice for some applications but I don't see it as a dealbreaker by any stretch. Similarly for the Exponential and Reciprocal (ERI) Instructions - the Foundation instructions include the 512-bit vector versions of 14-bit accurate approximate exp/recip instructions AVX users expect; the 'extended' instructions provide 28-bit accurate versions as well, which can save an iteration in the usual get-approximant/do-Newton-iteration-to-desired-precision application of such instructions. My code does make use of such instructions, but only in the context of some relatively infrequent data initializations.

I similarly see no major impact for the other subsets, the ones which later CPUs will have but missing in KL. When I first did a deep dive into the various 512-bit instruction subsets last year I recall being very pleased by the relative completeness of the foundation set for the high-performance numerics I am interested in.

airsquirrels 2016-08-30 02:17

[QUOTE=ewmayer;441031]The foundation instructions include more or less everything we want for FFT-based LL-testing and TF, too...[/QUOTE]

Well in that case, has anyone looked at HJWASM as a viable option? MASM compatible with support for AVX512-F....


All times are UTC. The time now is 21:12.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.