mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Threa Dripper RAM sanity check (https://www.mersenneforum.org/showthread.php?t=22614)

M344587487 2017-10-01 13:40

Threa Dripper RAM sanity check
 
I'm looking to build a 1950X system in a few months, and need advice on the type of RAM to get. The system will be used for more than just Prime95/Mlucas, but these will be the only programs that might benefit from ECC. Here are some options if I were to build it today, bold is the one I think is best:

[b]4x4GB 3200 CL16 non-ECC ~£170[/b]
4x4GB 3733 CL17 non-ECC ~£190
4x4GB 2133 CL15 ECC ~£200

I've seen conflicting things about the sensitivity of Threadripper to RAM speed, and can't find any reliable benchmarks that obviously apply to the use case. I'm pretty sure the 3733 CL17 should be ruled out (because of the CL17 timings, and it may be a pain to get stable), but it's there in case I'm wrong. 3200 CL16 seems to be the standard, and 2133 seems very low even with better timings, so I'm leaning towards the 3200.

a) Is the difference between the 2133 and 3200 as massive as it looks for Prime95/Mlucas?
b) Does ECC even matter that much for the current wavefront? Assuming 4 workers, 1 per CCX. I don't plan on doing 100M work.
c) Am I missing something? It's been years since my last build.

Thanks for any advice.

mackerel 2017-10-01 21:57

I'm only assuming it is not much different than desktop Ryzen, in which case 3200 is about the sweet spot. Look up exactly what results others have had with Ryzen and whatever kit you're looking at, as it can be a bit fussy.

On Prime95 performance, ram or not, Ryzen in general seems underwhelming and even when you expect it to ram limit like Intel, it still performs worse.

Mark Rose 2017-10-01 22:38

Zen loves fast memory. Going with 3200 over 2133 will make a massive performance difference.

I'd get the 3733 memory. It's not much more. You can probably run it at 3200 CL15 if it won't run at 3733.

3733/3200 = 1.17
17/1.17 = 14.5

But I'd still clock it as high as stable, since that's what makes the difference for Zen.

M344587487 2017-10-03 10:29

[QUOTE=mackerel;468978]I'm only assuming it is not much different than desktop Ryzen, in which case 3200 is about the sweet spot. Look up exactly what results others have had with Ryzen and whatever kit you're looking at, as it can be a bit fussy.

On Prime95 performance, ram or not, Ryzen in general seems underwhelming and even when you expect it to ram limit like Intel, it still performs worse.[/QUOTE]

3200 does look standard for Zen, it's definitely the baseline to expect to work. I will do proper compatibility checks when it comes time to buy, the specific parts I'm looking at now are probably not going to be the best then. Hopefully the mobo lineup matures a bit (and RAM doesn't go up too much).

Not to sound like an AMD fanboy, but I can't buy intel if AMD has a competitive product. I understand intel has the IPC, and wins in a few other ways, but AMD is competitive.

[QUOTE=Mark Rose;468979]Zen loves fast memory. Going with 3200 over 2133 will make a massive performance difference.

I'd get the 3733 memory. It's not much more. You can probably run it at 3200 CL15 if it won't run at 3733.

3733/3200 = 1.17
17/1.17 = 14.5

But I'd still clock it as high as stable, since that's what makes the difference for Zen.[/QUOTE]

I'm sold. I'll most likely do that, buy high speed (probably) lower timings and try tighter timings at 3200, see which one performs better if both work. The general recommendation for really high clocks seems to largely be because the infinity fabric is tied to RAM speed. That shouldn't matter for Prime95 with properly set affinities, it'll be interesting to find out.

Can we really do (17/1.17=14.5) < 15, even as an estimate? I thought the timings were more complicated than that.

Thank you both.

VictordeHolland 2017-10-03 13:09

[QUOTE=M344587487;469118]
Not to sound like an AMD fanboy, but I can't buy intel if AMD has a competitive product. I understand intel has the IPC, and wins in a few other ways, but AMD is competitive.
[/QUOTE]
AMD has a very competitive product lineup with Ryzen/Threadripper/EPYC for most workloads. And thank god, it has taken them 5+ years to become competitive again (Bulldozer was a bust).

There are workloads in which Intel has an advantage, the ones that use AVX instructions (Prime95 and Mlucas both use those). Intel processors can do more AVX FMA instructions per clock than AMD. Intel also has AVX-512 capability in their server lineup. At some point you also hit memory bandwidth limitations and adding cores doesn't improve performance by much.

M344587487 2017-10-03 15:48

Yeah I get that, AVX-512 is a big deal for Prime95/Mlucas specifically, I meant that I can't go intel when AMD is competitive in general. Bulldozer was a bust, I went with a 2500K.

PS I don't know who edited the title, but I like it :P

edit: You've forced my hand <--- ;)

Mark Rose 2017-10-03 15:58

[QUOTE=M344587487;469118]I'm sold. I'll most likely do that, buy high speed (probably) lower timings and try tighter timings at 3200, see which one performs better if both work. The general recommendation for really high clocks seems to largely be because the infinity fabric is tied to RAM speed. That shouldn't matter for Prime95 with properly set affinities, it'll be interesting to find out.[/quote]

Prime95 isn't NUMA aware, so the memory allocation between dies will be suboptimal, and the infinity fabric bandwidth and latency will matter.

[quote]Can we really do (17/1.17=14.5) < 15, even as an estimate? I thought the timings were more complicated than that.[/QUOTE]

The CAS latency is set in clocks, but the physics of the amplification circuitry to get the data line ready are in nanoseconds.

So 17 clocks of 1866 MHz (DDR4 3733) is 1/1866000000 * 17 = 9.11 ns.

So how many clocks of 1600 MHz is 9.11 ns? 14.6.

Now there may be other latencies involved, so it's no guarantee.

[url]https://en.wikipedia.org/wiki/CAS_latency[/url]

M344587487 2017-10-03 16:38

[QUOTE=Mark Rose;469135]Prime95 isn't NUMA aware, so the memory allocation between dies will be suboptimal, and the infinity fabric bandwidth and latency will matter.[/QUOTE]

Prime95 isn't NUMA-aware, but does it need to be? When I say I want to set a worker per CCX, I mean run 4 instances of Prime95/Mlucas, with their affinities set to each CCX as required. It should then be down to the OS to allocate memory smartly. That's how I understood it to work, please let me know if this is wrong.

edit: I've found this and am researching elsewhere: [url]http://www.mersenneforum.org/showthread.php?t=22571[/url]

[QUOTE=Mark Rose;469135]The CAS latency is set in clocks, but the physics of the amplification circuitry to get the data line ready are in nanoseconds.

So 17 clocks of 1866 MHz (DDR4 3733) is 1/1866000000 * 17 = 9.11 ns.

So how many clocks of 1600 MHz is 9.11 ns? 14.6.

Now there may be other latencies involved, so it's no guarantee.

[url]https://en.wikipedia.org/wiki/CAS_latency[/url][/QUOTE]
It's the other memory timings that I thought may make calculations a pain, but I've never looked at them in detail as they seemed a little too "inside baseball" to worry about: [url]https://en.wikipedia.org/wiki/Memory_timings[/url]

ATH 2017-10-03 17:45

[QUOTE=M344587487;468956][b]4x4GB 3200 CL16 non-ECC ~£170[/b]
4x4GB 3733 CL17 non-ECC ~£190
4x4GB 2133 CL15 ECC ~£200
[/QUOTE]

Can you really use ECC RAM with these AMD processors? With Intel processors you need Xeon models to use ECC.

Mark Rose 2017-10-03 18:00

[QUOTE=ATH;469143]Can you really use ECC RAM with these AMD processors? With Intel processors you need Xeon models to use ECC.[/QUOTE]

Yep! ECC across the full Threadripper lineup.

For Ryzen, it depends on motherboard support.

All Zen CPUs support ECC.

M344587487 2017-10-03 18:18

[QUOTE=ATH;469143]Can you really use ECC RAM with these AMD processors? With Intel processors you need Xeon models to use ECC.[/QUOTE]

Yes, definitely with Threadripper processors on AMDs HEDT platform (TR4 socket), and EPYC (SP3 socket). You can also supposedly use ECC RAM with Ryzen processors (AM4 socket), but that's more hit and miss as it's up to the mobo to support it, many don't as AMD haven't certified Ryzen as supporting it as Ryzen is their desktop platform.

[url]http://www.hardwarecanucks.com/forum/hardware-canucks-reviews/75030-ecc-memory-amds-ryzen-deep-dive.html[/url]

Honestly not a fanboy, just a fan :P

edit: I need to work on my response speeds ;)

M344587487 2017-10-04 10:56

[QUOTE]Modern operating systems such as Microsoft® Windows® and Linux® use a “first touch”
policy. What this means is that when an application requests memory, the virtual address is
initially not mapped to any physical memory. When the application first accesses the memory
(read or write), the OS allocates a physical memory region and maps the virtual address to the
physical range. The OS typically allocates physical memory from the same NUMA node as the
CPU that executed the thread which first accessed the virtual memory block.[/QUOTE]
I found this here: [url]https://developer.amd.com/wordpress/media/2012/10/NUMA_aware_heap_memory_manager_article_final.pdf[/url]

My understanding is that it's possible to softball NUMA-awareness by having the workers manage their own memory, [i]maybe[/i] they don't need to malloc but they do need to be the first ones to memset. By softball I mean it should work as long as memory is not constrained, ultimately you're relying on the underlying OS but it's probably good enough.

Assuming you're running a program that doesn't do this, ie the "first touch" is by the main thread that may not be where the worker is, it looks like you can bodge it by using taskset in linux: [url]http://xmodulo.com/run-program-process-specific-cpu-cores-linux.html[/url]

You'd do this by running an instance of the program per worker, making sure that the program and worker is set to within the same node. A ballache to figure out but doable. I have taskset on my WSL bash in windows 10, so it may be a "cross platform" solution for commandline programs. I don't know if mprime or Mlucas need this extra taskset step if you're already doing 1 worker per instance, it's something I'll test in a few months although I'm sure someone else already has somewhere.

mackerel 2017-10-04 12:40

[QUOTE=M344587487;469134]Yeah I get that, AVX-512 is a big deal for Prime95/Mlucas specifically, I meant that I can't go intel when AMD is competitive in general. Bulldozer was a bust, I went with a 2500K.[/QUOTE]

You will of course decide on your own use cases, but I will say, my 1700 even overclocked falls behind my old 6700k at non-turbo stock for both large and small FFT tasks.

Mark Rose 2017-10-04 17:31

I should add one more thing about memory: dual rank memory works better for Prime95. The trade-off is that it usually doesn't clock as high. If you can get dual rank memory at 3200, go for it. Usually dual-rank will be higher capacity sticks, so it may not be worth spending the money depending on your priorities.

kladner 2017-10-04 18:16

[QUOTE=Mark Rose;469212]I should add one more thing about memory: dual rank memory works better for Prime95. The trade-off is that it usually doesn't clock as high. If you can get dual rank memory at 3200, go for it. Usually dual-rank will be higher capacity sticks, so it may not be worth spending the money depending on your priorities.[/QUOTE]
The last RAM I bought is Kingston HyperX, 2666MHz, Dual rank. I am running it at 3200MHz, with slightly higher voltage (1.22 vs 1.2) with timings set 1 clock slower. I got 8 MB parts. The HyperX page at Kingston lists the rank on many of DIMMs.

Mark Rose 2017-10-04 18:55

[QUOTE=kladner;469217]The last RAM I bought is Kingston HyperX, 2666MHz, Dual rank. I am running it at 3200MHz, with slightly higher voltage (1.22 vs 1.2) with timings set 1 clock slower. I got 8 MB parts. The HyperX page at Kingston lists the rank on many of DIMMs.[/QUOTE]

8 MB? Are you running Windows 95? ;)

M344587487 2017-10-04 21:18

[QUOTE=Mark Rose;469212]I should add one more thing about memory: dual rank memory works better for Prime95. The trade-off is that it usually doesn't clock as high. If you can get dual rank memory at 3200, go for it. Usually dual-rank will be higher capacity sticks, so it may not be worth spending the money depending on your priorities.[/QUOTE]

Ok thanks, I'll consider it. When looking at normal recommendations people rarely talk about rank, I think the only thing I've seen is to avoid it for some games because reasons. You're right that it's not available for 4GB sticks AFAIK. 32GB would be nice but man, RAM prices have pretty much doubled over the past few years.

Mark Rose 2017-10-05 00:39

[QUOTE=M344587487;469231]Ok thanks, I'll consider it. When looking at normal recommendations people rarely talk about rank, I think the only thing I've seen is to avoid it for some games because reasons. You're right that it's not available for 4GB sticks AFAIK. 32GB would be nice but man, RAM prices have pretty much doubled over the past few years.[/QUOTE]

Dual rank can have higher latency in some cases and it usually doesn't clock as high, so yeah, not for pure gamers.

I know what you mean about RAM prices. When I built my cluster the other year, I bought five i5-6600 systems with 32 GB of DDR4-2133 each. The cost for 32GB? CA$120. The price to buy now? $330.

So I'm sitting on a $1050 RAM appreciation. It's nuts.

kladner 2017-10-05 03:52

[QUOTE=Mark Rose;469227]8 MB? Are you running Windows 95? ;)[/QUOTE]
Well, there are two of them. :smile: If I had known where prices were going I would have gotten the other two up front. :sad:

ATH 2017-10-05 03:56

[QUOTE=kladner;469253]Well, there are two of them. :smile: If I had known where prices were going I would have gotten the other two up front. :sad:[/QUOTE]

He was referring to your typo of 8 MB instead of 8 GB.

kladner 2017-10-05 04:03

[QUOTE=ATH;469254]He was referring to your typo of 8 MB instead of 8 GB.[/QUOTE]
Oblivion rules at this hour (2300 CDT), fresh home from work. Actually, we ran 8MB total with Win 3.1. By Win95, or maybe 98 it was 256MB, I think.
Drives weren't measured in GB then. Our first machine, 486DX-33 had a 120 MB HDD. The second had more than twice that in RAM.

ATH 2017-10-05 04:19

Actually Windows 95 could be installed with 4MB RAM but 8 MB recommended apparently:

[url]http://discussions.virtualdr.com/showthread.php?9796-Windows-95-System-Requirements[/url]


(It is 6:20 am here and I have been at work since 10:50 pm)

Mark Rose 2017-10-05 04:48

[QUOTE=ATH;469260]Actually Windows 95 could be installed with 4MB RAM but 8 MB recommended apparently:

[url]http://discussions.virtualdr.com/showthread.php?9796-Windows-95-System-Requirements[/url]


(It is 6:20 am here and I have been at work since 10:50 pm)[/QUOTE]

I once used Windows 95 with 4 MB of RAM. It was nearly unusable.

kladner 2017-10-05 04:50

[QUOTE=ATH;469260]Actually Windows 95 could be installed with 4MB RAM but 8 MB recommended apparently:

[URL]http://discussions.virtualdr.com/showthread.php?9796-Windows-95-System-Requirements[/URL]


(It is 6:20 am here and I have been at work since 10:50 pm)[/QUOTE]
The Win3.1 machine listed with 4 MB, but I had done enough reading to insist on getting 8 MB.

moebius 2017-10-08 05:36

You can find a list of ram-kits and what type of chips they use at reddit.

Because Samsung B-die is the preferred memory chip type for overclocking and there are
single rank ram-kits with better timings (for example (8GB 3600 MHz CL15 ) on the market, maybe this is of interest.


[URL]https://www.reddit.com/r/Amd/comments/62vp2g/clearing_up_any_samsung_bdie_confusion_eg_on/[/URL]

M344587487 2017-10-10 08:43

Thanks, that's a handy table :)

kruoli 2017-10-16 09:22

When I built my system, I did some tinkering with RAM. I have a 1950X on an ASRock X399 Taichi.

4 DIMMs were not able to reach 3200 MHz. I then set them to 3066 MHz and it ran fine.
Now, I'm using full blown 8 DIMMs, they are not stable at 2933 MHz (I tried hard!), but was able to set them to 2800 MHz CL12-12-12-30CR1, which runs really nice. My DIMMs are GSkill 3200 MHz CL14, 16 GB per DIMM.


All times are UTC. The time now is 06:59.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.