mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2016-03-30, 23:43   #12
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

162678 Posts
Default

Prime95 uses SSE2 code for AMD chips. I don't believe you can force prime95 to use FMA FFTs because I'm not sure that AMD supports the Intel FMA3 syntax. I'd have to build a special executable with the AMD FFT4 syntax.

Last fiddled with by Prime95 on 2016-03-30 at 23:44
Prime95 is online now   Reply With Quote
Old 2016-03-31, 00:02   #13
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2×7×829 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Prime95 uses SSE2 code for AMD chips. I don't believe you can force prime95 to use FMA FFTs because I'm not sure that AMD supports the Intel FMA3 syntax. I'd have to build a special executable with the AMD FFT4 syntax.
According to Wikipedia:
Quote:
AMD introduced FMA3 support in processors starting with Piledriver architecture for compatibility reasons.[2][3] The 2nd generation APU processors based on "Trinity" (32nm) supporting FMA3 instructions were launched May 15, 2012. The 2nd generation Bulldozer processors with Piledriver cores supporting FMA3 instructions were launched October 23, 2012.
So to flip my SSE2-mode query around, is there a way to force Prime95 to run in AVX (both sans and with FMA3) mode on AMD CPUs? It seems to me it is high time to revisit the SSE2-versus-AVX performance issue on their more recent CPUs ... what is the last AMD CPU on which you did comparative timings?
ewmayer is offline   Reply With Quote
Old 2016-03-31, 01:46   #14
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1CB716 Posts
Default

Try putting "CpuArchitecture=5" in local.txt file.

The last time I investigated AMD performance was Bulldozer.
Prime95 is online now   Reply With Quote
Old 2016-03-31, 03:14   #15
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

22×2,539 Posts
Default

Quote:
Originally Posted by ewmayer View Post
A note and a couple of questions re. your results summary:

o AMD's 8-int-core/4-fpu hybrid arch will likely act like an 8-core for integer-dominated work which is not too memory-intensive, for FPU-and-memory-heavy stuff like Prime95 it's effectively a 4-core.

o Does "FMA" here mean Intel-AVX2-style FMA3, or the AMD-only FMA4 which AMD introduced in late 2011 with their Bulldozer core?

o Has AMD fixed the issue noted by George in which their early AVX offerings performed worse running in AVX mode than they did in SSE2 mode? George, is there any way to force the program to run in SSE2 mode on a platform supporting both?
Indeed, it is quick enough (if P95 and 2 x mfaktc are not running.) However, compared to contemporaneous Intel chips, it is a power-hungry dog. (No offense intended to real, honest dogs.)
I can't answer in detail, but this chip is a Piledriver, the successor to Bulldozer. (Argh! These names!)
https://en.wikipedia.org/wiki/Piledr...rchitecture%29
kladner is offline   Reply With Quote
Old 2016-03-31, 13:18   #16
Fred
 
Fred's Avatar
 
"Ron"
Jan 2016
Fitchburg, MA

11000012 Posts
Default

So far, testing v28.9 in comparison to v28.7, using 4096 benchmarks I'm noticing the following:

Crappy old i3 laptop - ~10% drop in performance when I go to 28.9
i5-4670K, DDR3-1600 : Just about dead even, no change in performance
i5-6500, Non-Z OC DDR4-2133 : ~4% increase in performance when I go to 28.9
(interesting sidenote, with 28.7 on my newer systems the benchmarks indicated I should run 4 workers for maximum throughput. with 28.9, I seem to get the best throughput with 4 cores on 1 worker)

So it seems the faster processors or faster memory benefits from the changes, while (at least in my testing) older systems take a hit.
Fred is offline   Reply With Quote
Old 2016-03-31, 15:16   #17
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

29×101 Posts
Default

The 10% drop in performance is interesting to me. I wonder what caused that.
Mark Rose is offline   Reply With Quote
Old 2016-03-31, 15:39   #18
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11100101101112 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
The 10% drop in performance is interesting to me. I wonder what caused that.
There are some strange cache effects going on that I don't understand. The difference between 28.7 and 28.9 is I chose different sizes for pass 1 and pass 2 of the FFT. This changes the working set size for each pass.

I presume the crappy i3 was pre-FMA3. I optimized based on my 4-core Sandy Bridge, the crappy i3 must have much different cache and memory characteristics. At least Fred knows about the issue and can revert to 28.7 -- other users are not as fortunate.
Prime95 is online now   Reply With Quote
Old 2016-03-31, 15:47   #19
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

55618 Posts
Default

Quote:
Originally Posted by Prime95 View Post
There are some strange cache effects going on that I don't understand. The difference between 28.7 and 28.9 is I chose different sizes for pass 1 and pass 2 of the FFT. This changes the working set size for each pass.

I presume the crappy i3 was pre-FMA3. I optimized based on my 4-core Sandy Bridge, the crappy i3 must have much different cache and memory characteristics. At least Fred knows about the issue and can revert to 28.7 -- other users are not as fortunate.
The chip he mentioned was an i3-3120M. It has 2 cores, 1 MB of L2, and 3 MB of L3.
Mark Rose is offline   Reply With Quote
Old 2016-03-31, 18:01   #20
tului
 
Jan 2013

22×17 Posts
Default

Pentium G4400, two real cores, no HT W10-64. It tells me that I've assigned more workers than cores despite that not being the case. I ignored it and will check performance.
tului is offline   Reply With Quote
Old 2016-03-31, 19:03   #21
science_man_88
 
science_man_88's Avatar
 
"Forget I exist"
Jul 2009
Dumbassville

100000110000002 Posts
Default

my processor is even older I just don't have old results to compare it to http://ark.intel.com/products/53428/...Cache-3_40-GHz based on things I did on spu-Z at one point it's on a foxconn board with 8 gb single channel memory.
science_man_88 is offline   Reply With Quote
Old 2016-03-31, 22:22   #22
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

101101010101102 Posts
Default

Quote:
Originally Posted by kladner View Post
Indeed, it is quick enough (if P95 and 2 x mfaktc are not running.) However, compared to contemporaneous Intel chips, it is a power-hungry dog. (No offense intended to real, honest dogs.)
I can't answer in detail, but this chip is a Piledriver, the successor to Bulldozer. (Argh! These names!)
https://en.wikipedia.org/wiki/Piledr...rchitecture%29
So did you try the timings with AVX-usage enabled as George noted? I was asking on your behalf ... no AMD in my CPU collection.
ewmayer is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 version 27.3 Prime95 Software 148 2012-03-18 19:24
Prime95 version 26.3 Prime95 Software 76 2010-12-11 00:11
Prime95 version 25.5 Prime95 PrimeNet 369 2008-02-26 05:21
Prime95 version 25.4 Prime95 PrimeNet 143 2007-09-24 21:01
When the next prime95 version ? pacionet Software 74 2006-12-07 20:30

All times are UTC. The time now is 07:17.

Sat Feb 27 07:17:25 UTC 2021 up 86 days, 3:28, 0 users, load averages: 1.99, 1.85, 1.72

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.