mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Prime95 version 29.6/29.7/29.8 (https://www.mersenneforum.org/showthread.php?t=24094)

AG5BPilot 2019-08-20 18:29

[QUOTE=Evil Genius;524033]Are you really sure? Zen 2 has double the AVX-256 bit speed compared to Zen 1. All data paths were widened for this purpose also. I was kinda hoping for an AVX-256 bit implementation for Zen 2.[/QUOTE]

FMA3 *is* AVX-256.

For that matter, AVX is also AVX-256.

You're getting AVX-256 and AVX-512 confused. SSE is 128 bit registers. AVX is 256 bit registers. AVX-512 is 512 bit registers.

Mark Rose 2019-08-20 18:55

[QUOTE=Evil Genius;524033]Are you really sure? Zen 2 has double the AVX-256 bit speed compared to Zen 1. All data paths were widened for this purpose also. I was kinda hoping for an AVX-256 bit implementation for Zen 2.[/QUOTE]

Zen 2 doesn't downclock when doing AVX-256 either.

Mark Rose 2019-08-20 18:57

[QUOTE=AG5BPilot;524035]FMA3 *is* AVX-256.[/QUOTE]

Then how do Piledriver cores support FMA3 without supporting AVX-256?

AG5BPilot 2019-08-20 19:28

[QUOTE=Mark Rose;524041]Then how do Piledriver cores support FMA3 without supporting AVX-256?[/QUOTE]

There's no such thing as "AVX-256". There's AVX, AVX2 (which isn't important for Prime95/gwnum but comes along with FMA3, which is important), and AVX-512.

What you're calling AVX-256 is plain old original AVX, which has been supported by AMD for as long as Intel has supported it. AMD's implementation was crippled, however, so it hasn't been used here until Zen 2. With Zen 2, it's useful, finally -- but it's always been there.

Zen2 supports FMA3. And it supports AVX. And, finally, they're as good as Intel's implementation. They don't support AVX-512, but that's a whole different discussion.

Edit: Please see the Wikipedia page for the Piledriver architecture: [url]https://en.wikipedia.org/wiki/Piledriver_(microarchitecture)[/url] . It clearly states that Piledriver supports AVX and FMA3.

ixfd64 2019-08-20 20:55

George, could you please have a look at this issue?

[url]https://mersenneforum.org/showpost.php?p=519786&postcount=296[/url]

On macOS, Prime95 doesn't let me set one core per worker unless I edit the configuration files. This happens even for non-100 million digit work types.

Evil Genius 2019-08-20 21:00

[QUOTE=AG5BPilot;524035]FMA3 *is* AVX-256.

For that matter, AVX is also AVX-256.

You're getting AVX-256 and AVX-512 confused. SSE is 128 bit registers. AVX is 256 bit registers. AVX-512 is 512 bit registers.[/QUOTE]


Sigh. No I'm not confused. AVX also has an 128-bit compatibility mode. Which is used on current Zen models. Of which the older don't have 256-bit logic, so they used two consecutive 128-bit operations. The newer Zen models can execute AVX-256 bit code without penalty.

Evil Genius 2019-08-20 21:08

I should elaborate to prevent confusion:

vxorpd %xmm0,%xmm0,%xmm0 -> AVX-128 bit
vxorpd %ymm0,%ymm0,%ymm0 -> AVX-256 bit

See the difference?


There are also two FMA3s: a 128-bit one, and a 256-bit one. The '3' only implies the number of arguments:

vfmaddpd213 %xmm2,%xmm1,%xmm1 -> FMA3-128 bit
vfmaddpd213 %ymm2,%ymm1,%ymm1 -> FMA3-256 bit

AG5BPilot 2019-08-20 21:18

[QUOTE=Evil Genius;524074]I should elaborate to prevent confusion:


vxorpd %xmm0,%xmm0,%xmm0 -> AVX-128 bit
vxorpd %ymm0,%ymm0,%ymm0 -> AVX-256 bit

See the difference?[/QUOTE]

The difference, yes, but not your point. AVX added the 256-bit ymm# registers.

The 128-bit xmm# registers are SSE registers (also usable by AVX instructions).

Are you trying to say that Piledriver lacked the 16 256 bit ymm# registers?

Prime95 2019-08-20 21:20

[QUOTE=ixfd64;524071]George, could you please have a look at this issue?

[url]https://mersenneforum.org/showpost.php?p=519786&postcount=296[/url]

On macOS, Prime95 doesn't let me set one core per worker unless I edit the configuration files. This happens even for non-100 million digit work types.[/QUOTE]

I cannot replicate. If I set NumCPUs=2 and CoresPerTest=1 the WorkerWindows dialog comes up properly greyed out. If I set NumCPUs=2 and CoresPerTest=2 the WorkerWindows dialog comes up and lets me edit the CPU counts.

What do I need to do differently?

ixfd64 2019-08-20 21:26

[QUOTE=Prime95;524077]I cannot replicate. If I set NumCPUs=2 and CoresPerTest=1 the WorkerWindows dialog comes up properly greyed out. If I set NumCPUs=2 and CoresPerTest=2 the WorkerWindows dialog comes up and lets me edit the CPU counts.

What do I need to do differently?[/QUOTE]

I don't have [c]NumCPUs[/c] set on this computer. I'm also not able to reproduce this on a Mac Pro. It seems this issue only affects certain computers.

Evil Genius 2019-08-20 21:44

[QUOTE=AG5BPilot;524076]The difference, yes, but not your point. AVX added the 256-bit ymm# registers.

The 128-bit xmm# registers are SSE registers (also usable by AVX instructions).

Are you trying to say that Piledriver lacked the 16 256 bit ymm# registers?[/QUOTE]


No, I'm saying that although there's a 256-bit implementation on the outside, on the inside many AVX compatible processors implemented 256-bit operations as 2 consecutive 128-bit operations.
Zen 1(+) was no exception, but this changed with Zen 2.


Also of note is that if there's no native 256-bit implementation, the 128-bit implementation is faster.


All times are UTC. The time now is 22:52.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.