mersenneforum.org  

Go Back   mersenneforum.org > New To GIMPS? Start Here! > Information & Answers

Reply
 
Thread Tools
Old 2019-09-20, 12:45   #12
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

5·7·11 Posts
Default

Quote:
Originally Posted by M344587487 View Post
My hypothesis is that decoupling will greatly help 3200 DDR4 but be a wash for 3600 DDR4. Increasing Fclk may also disproportionately help tests that stay mainly in cache so testing at least three FFTs per setup would be nice for comparison (fully in cache, wavefront which straddles cache and RAM, 100M which is mainly in RAM).
I have a feeling I might already have attempted this, but regardless I'll rerun with current versions. The scenario painted was for (relatively) slow ram with faster IF. I do know from previous testing, one of my Zen 2 CPUs (I don't recall which I did it on) didn't get IF much beyond 1800, so I'll only test the first 3 scenarios when I get around to it. This will likely be a weekend task.

If you have suggestions on FFT sizes to use that would help. Previously I've used 4096k and 5120k but I have no idea how those relate to GIMPS tests.The actual stuff I run is usually <1024k so this isn't a major interest area for me.
mackerel is offline   Reply With Quote
Old 2019-09-20, 16:06   #13
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2·313 Posts
Default

Based on nomead's data ( https://www.mersenneforum.org/showpo...&postcount=110 ), 2560K 5120K and 7680K look like reasonable sample points. I don't know exactly but 2560K is/was somewhere around DC range and 5120K somewhere around wavefront range so that works too. The start of 100M is at 18432K I believe if you want to test that.
M344587487 is offline   Reply With Quote
Old 2019-09-22, 07:13   #14
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2×313 Posts
Default

I know it's cheeky to keep asking for more data but if you have a watt meter power from the wall would be another thing covered.
M344587487 is offline   Reply With Quote
Old 2019-09-22, 12:27   #15
Mysticial
 
Mysticial's Avatar
 
Sep 2016

7×47 Posts
Default

Quote:
Originally Posted by mackerel View Post
Only just saw this thread.

Ram needed is FFT_size * 8 + "a bit" for other lookup data. I've found in practice just considering the FFT*8 sufficient as guide to performance.
I have no idea what prime95 does, but for an FFT that has N bytes of data, there will be N/4 bytes* of unique twiddle factor data. So an FFT implementation that caches all twiddle factors with no redundancy will have 25% overhead.

Space-time trade-offs are available. So it's possible to go below 25% at the cost of additional computation, uglier memory access patterns, and/or loss of precision. But I have no idea what (if any) of these that prime95 does.

*It would be much worse than N/4 if it weren't for two different symmetries on the complex plane that each save a factor of 2. Namely: reflection across x access (complex conjugates), and reflection across 45 degree line.


-----


In addition to those, you also have the IBDWT weights. At first glance it seems that they are all unique - thus N bytes (100%) overhead. But I have no experience with those so I have no idea how well they compress and what space-time trade-offs exist.

Last fiddled with by Mysticial on 2019-09-22 at 12:28
Mysticial is offline   Reply With Quote
Old 2019-09-22, 12:32   #16
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

5×7×11 Posts
Default

https://docs.google.com/spreadsheets...it?usp=sharing

Results at link above. My methodology is to run each test at least 3 times, and record the highest result.

I ended up testing 3 kits on the 3600 only. 2666 single rank, 3200 dual rank, 4000 (+3600) single rank.

In short, faster IF could give a small increase in some situations, but not all.

I had forgotten that dual rank helps, and the 3200 dual rank kit was by far the fastest tested. If you don't need ram quantity, running 4x4GB might be the most practical way to get that. 8GB dual rank modules don't seem common any more, and 16GB modules are excessive unless the system is used for other things.

The 3600 vs 4000 ram difference might not be entirely attributable to decoupled clock domains on Zen 2, as I had seen similar drops in performance on Intel also. My guess is that the 4000 subtimings are slacker and offsets the potential gain from bandwidth.
mackerel is offline   Reply With Quote
Old 2019-09-23, 13:33   #17
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

5×7×11 Posts
Default

Quote:
Originally Posted by Mysticial View Post
I have no idea what prime95 does, but for an FFT that has N bytes of data, there will be N/4 bytes* of unique twiddle factor data. So an FFT implementation that caches all twiddle factors with no redundancy will have 25% overhead.
Thinking more about that, I have to wonder if the "FFT_size*8" observation happens to work, but not necessarily for the reasons I'm thinking. For practical purposes my knowledge on how FFTs work is next to nothing. I'm approaching this from an outside observer perspective. The FFT is a black box. I can choose the FFT size, and set outside variables, and see what happens. The rough rule of thumb is if FFT*8 fits in LLC, performance = excellent (what I call practically ram unlimited), and if it is bigger, you start to see how good your ram is. It might not be because the data fits, but a more complex combination of data moving in and out that happens to work out about the same.

I had seen with higher core count systems that my predictions (based on lower core count testing) seemed a little pessimistic on how fast performance falls off, but I never had access to get sufficient data to try and figure it out. I suspect large (but still not big enough) caches help there.

Another incorrect assumption I had in the past was that ram bandwidth was king, latency didn't really matter. Well, now the question is what is the balance between them. My 4000 rated kit is generally faster running in its 3600 profile, presumably due to better timings. And also the recent testing reminded me how much having dual rank modules helps too (or similarly 2DPC).
mackerel is offline   Reply With Quote
Old 2019-10-04, 17:36   #18
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2×313 Posts
Default

Thanks for the benchmarks. It's good to know that running stock using sensible hardware isn't far from the ideal. I am surprised that the power figures don't compare favourably to these systems ( https://www.mersenneforum.org/showpo...9&postcount=19 ), but it is apples and oranges in that they are mobile parts. Underclocking if done the right way may even things up and the server parts should naturally have better perf/watt. The only hope for stock Ryzen desktop competing in perf/watt is the upcoming 3900 (non-X) part which has a 65W TDP but twice the chiplets of the 3600 (that part is particularly interesting of the Ryzen line as it may yield the best perf/watt and perf/$).
M344587487 is offline   Reply With Quote
Old 2019-10-04, 21:26   #19
lavalamp
 
lavalamp's Avatar
 
Oct 2007
London, UK

2·653 Posts
Default

The upcoming 3950X may have better binned parts due to being on the high end of the desktop line up. For most that would mean a better overclock, but if performance per watt is your goal, undervolting it would be a viable strategy.

Gigabyte just showed of a nice overclock with it running Prime95 stable at 4.3 GHz, and running Cinebench at 4.4 GHz. Of course they likely chose the best of the best chips they had available, so independant benchmarks upon release will have to tell the full story.
lavalamp is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Is there any sensible auxiliary task for HT logical cores when physical cores already used for PRP? hansl Information & Answers 5 2019-06-17 14:07
Can't seem to make Prime95 use fewer cores Octopuss Software 6 2018-01-28 13:05
Prime95 fails to recognize more than 2 cores? MrLittleTexas Software 5 2016-12-14 03:30
6 CPU cores not recognized by Prime95 v25.11.8 Christenson Information & Answers 4 2011-02-06 01:03
Intel e6600 Dual Core Problem - How to use both cores with Prime95? Shoallakeboy Hardware 2 2006-11-06 17:55

All times are UTC. The time now is 20:00.

Sat Sep 19 20:00:45 UTC 2020 up 9 days, 17:11, 1 user, load averages: 1.40, 1.46, 1.56

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.