mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2019-08-20, 18:29   #342
AG5BPilot
 
AG5BPilot's Avatar
 
Dec 2011
New York, U.S.A.

97 Posts
Default

Quote:
Originally Posted by Evil Genius View Post
Are you really sure? Zen 2 has double the AVX-256 bit speed compared to Zen 1. All data paths were widened for this purpose also. I was kinda hoping for an AVX-256 bit implementation for Zen 2.
FMA3 *is* AVX-256.

For that matter, AVX is also AVX-256.

You're getting AVX-256 and AVX-512 confused. SSE is 128 bit registers. AVX is 256 bit registers. AVX-512 is 512 bit registers.
AG5BPilot is offline   Reply With Quote
Old 2019-08-20, 18:55   #343
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

22×733 Posts
Default

Quote:
Originally Posted by Evil Genius View Post
Are you really sure? Zen 2 has double the AVX-256 bit speed compared to Zen 1. All data paths were widened for this purpose also. I was kinda hoping for an AVX-256 bit implementation for Zen 2.
Zen 2 doesn't downclock when doing AVX-256 either.
Mark Rose is offline   Reply With Quote
Old 2019-08-20, 18:57   #344
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

1011011101002 Posts
Default

Quote:
Originally Posted by AG5BPilot View Post
FMA3 *is* AVX-256.
Then how do Piledriver cores support FMA3 without supporting AVX-256?
Mark Rose is offline   Reply With Quote
Old 2019-08-20, 19:28   #345
AG5BPilot
 
AG5BPilot's Avatar
 
Dec 2011
New York, U.S.A.

97 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
Then how do Piledriver cores support FMA3 without supporting AVX-256?
There's no such thing as "AVX-256". There's AVX, AVX2 (which isn't important for Prime95/gwnum but comes along with FMA3, which is important), and AVX-512.

What you're calling AVX-256 is plain old original AVX, which has been supported by AMD for as long as Intel has supported it. AMD's implementation was crippled, however, so it hasn't been used here until Zen 2. With Zen 2, it's useful, finally -- but it's always been there.

Zen2 supports FMA3. And it supports AVX. And, finally, they're as good as Intel's implementation. They don't support AVX-512, but that's a whole different discussion.

Edit: Please see the Wikipedia page for the Piledriver architecture: https://en.wikipedia.org/wiki/Piledr...oarchitecture) . It clearly states that Piledriver supports AVX and FMA3.

Last fiddled with by AG5BPilot on 2019-08-20 at 19:31
AG5BPilot is offline   Reply With Quote
Old 2019-08-20, 20:55   #346
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

95616 Posts
Default

George, could you please have a look at this issue?

https://mersenneforum.org/showpost.p...&postcount=296

On macOS, Prime95 doesn't let me set one core per worker unless I edit the configuration files. This happens even for non-100 million digit work types.

Last fiddled with by ixfd64 on 2019-08-20 at 20:57
ixfd64 is offline   Reply With Quote
Old 2019-08-20, 21:00   #347
Evil Genius
 
Evil Genius's Avatar
 
Jul 2019
the Netherlands

2·11 Posts
Default

Quote:
Originally Posted by AG5BPilot View Post
FMA3 *is* AVX-256.

For that matter, AVX is also AVX-256.

You're getting AVX-256 and AVX-512 confused. SSE is 128 bit registers. AVX is 256 bit registers. AVX-512 is 512 bit registers.

Sigh. No I'm not confused. AVX also has an 128-bit compatibility mode. Which is used on current Zen models. Of which the older don't have 256-bit logic, so they used two consecutive 128-bit operations. The newer Zen models can execute AVX-256 bit code without penalty.
Evil Genius is offline   Reply With Quote
Old 2019-08-20, 21:08   #348
Evil Genius
 
Evil Genius's Avatar
 
Jul 2019
the Netherlands

2·11 Posts
Default

I should elaborate to prevent confusion:

vxorpd %xmm0,%xmm0,%xmm0 -> AVX-128 bit
vxorpd %ymm0,%ymm0,%ymm0 -> AVX-256 bit

See the difference?


There are also two FMA3s: a 128-bit one, and a 256-bit one. The '3' only implies the number of arguments:

vfmaddpd213 %xmm2,%xmm1,%xmm1 -> FMA3-128 bit
vfmaddpd213 %ymm2,%ymm1,%ymm1 -> FMA3-256 bit

Last fiddled with by Evil Genius on 2019-08-20 at 21:23
Evil Genius is offline   Reply With Quote
Old 2019-08-20, 21:18   #349
AG5BPilot
 
AG5BPilot's Avatar
 
Dec 2011
New York, U.S.A.

97 Posts
Default

Quote:
Originally Posted by Evil Genius View Post
I should elaborate to prevent confusion:


vxorpd %xmm0,%xmm0,%xmm0 -> AVX-128 bit
vxorpd %ymm0,%ymm0,%ymm0 -> AVX-256 bit

See the difference?
The difference, yes, but not your point. AVX added the 256-bit ymm# registers.

The 128-bit xmm# registers are SSE registers (also usable by AVX instructions).

Are you trying to say that Piledriver lacked the 16 256 bit ymm# registers?
AG5BPilot is offline   Reply With Quote
Old 2019-08-20, 21:20   #350
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2·53·71 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
George, could you please have a look at this issue?

https://mersenneforum.org/showpost.p...&postcount=296

On macOS, Prime95 doesn't let me set one core per worker unless I edit the configuration files. This happens even for non-100 million digit work types.
I cannot replicate. If I set NumCPUs=2 and CoresPerTest=1 the WorkerWindows dialog comes up properly greyed out. If I set NumCPUs=2 and CoresPerTest=2 the WorkerWindows dialog comes up and lets me edit the CPU counts.

What do I need to do differently?
Prime95 is offline   Reply With Quote
Old 2019-08-20, 21:26   #351
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

2×5×239 Posts
Default

Quote:
Originally Posted by Prime95 View Post
I cannot replicate. If I set NumCPUs=2 and CoresPerTest=1 the WorkerWindows dialog comes up properly greyed out. If I set NumCPUs=2 and CoresPerTest=2 the WorkerWindows dialog comes up and lets me edit the CPU counts.

What do I need to do differently?
I don't have NumCPUs set on this computer. I'm also not able to reproduce this on a Mac Pro. It seems this issue only affects certain computers.
ixfd64 is offline   Reply With Quote
Old 2019-08-20, 21:44   #352
Evil Genius
 
Evil Genius's Avatar
 
Jul 2019
the Netherlands

2210 Posts
Default

Quote:
Originally Posted by AG5BPilot View Post
The difference, yes, but not your point. AVX added the 256-bit ymm# registers.

The 128-bit xmm# registers are SSE registers (also usable by AVX instructions).

Are you trying to say that Piledriver lacked the 16 256 bit ymm# registers?

No, I'm saying that although there's a 256-bit implementation on the outside, on the inside many AVX compatible processors implemented 256-bit operations as 2 consecutive 128-bit operations.
Zen 1(+) was no exception, but this changed with Zen 2.


Also of note is that if there's no native 256-bit implementation, the 128-bit implementation is faster.

Last fiddled with by Evil Genius on 2019-08-20 at 21:49
Evil Genius is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 version 29.2 Prime95 Software 71 2017-09-16 16:55
Prime95 version 29.1 Prime95 Software 95 2017-08-22 22:46
Prime95 version 26.5 Prime95 Software 175 2011-04-04 22:35
Prime95 version 25.9 Prime95 Software 143 2010-01-05 22:53
Prime95 version 25.8 Prime95 Software 159 2009-09-21 16:30

All times are UTC. The time now is 16:32.


Fri Jul 16 16:32:56 UTC 2021 up 49 days, 14:20, 1 user, load averages: 1.34, 1.47, 1.56

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.