mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2019-06-30, 19:32   #45
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3×53×31 Posts
Default

Quote:
Originally Posted by mackerel View Post
What FFT size is leading edge work at? Could the 3950X with 64MB of cache hold that on socket? My concern is the inter-chiplet bandwidth should cores need cache data on other chiplet. This would have to be tested once available.
5120K, which needs a bit over 40MB just for the LL/PRP residue array, and a few MB more for auxiliary data tables in a compact-memory implementation of same.
ewmayer is offline   Reply With Quote
Old 2019-06-30, 20:53   #46
maxzor
 
Apr 2017

2010 Posts
Default

Only a week to go, but AFAIU this thread, Rob Hallock at AMD confirmed you can set IF to 3000MHz while keeping a 1:1 DRAM:UMC ratio.
maxzor is offline   Reply With Quote
Old 2019-07-06, 16:12   #47
maxzor
 
Apr 2017

101002 Posts
Default

https://fuse.wikichip.org/news/2458/...amd-zen-2-core
maxzor is offline   Reply With Quote
Old 2019-07-07, 14:22   #48
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

2×3×5×11 Posts
Default

https://www.techspot.com/review/1869...x-ryzen-3700x/
One thing sticks out like a sore thumb. OUCH!

Quote:
AMD’s made a compromise as client workloads do very little writing, so rather than use this space to improve something that isn’t required, they’ve invested the silicon real estate in more beneficial ways to achieve tangible performance gains. Whereas the Core Complex Die to IO Die link for reading memory is 32 bytes wide, it’s only 16 bytes wide for writing, and this significantly reduces write performance which impacts the SiSoftware copy test.
nomead is offline   Reply With Quote
Old 2019-07-07, 17:06   #49
maxzor
 
Apr 2017

22·5 Posts
Default

Quote:
Originally Posted by nomead View Post
https://www.techspot.com/review/1869...x-ryzen-3700x/
One thing sticks out like a sore thumb. OUCH!
Is it mentioned anywhere else?
How bad is it doctor?
maxzor is offline   Reply With Quote
Old 2019-07-07, 17:54   #50
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

3×263 Posts
Default

Quote:
Originally Posted by maxzor View Post
Is it mentioned anywhere else?
How bad is it doctor?
https://youtu.be/oDVUdpcKZMA?t=1360


Sounds bad but I don't know if it's a hard bandwidth bottleneck, intuitively data needs to be read more than written to account for a dataset larger than a CCX's 16MiB of L3 cache but I'm no doctor. It could be a soft bandwidth bottleneck if writes are particularly bursty or if asynchronous read/write speeds are unusual and haven't been properly accounted for in software. You'd think that as long as write bandwidth is not completely saturated that it shouldn't matter much if at all as we shouldn't need to read any data that we're in the process of writing (if we needed it why are we dumping it to memory?). Time will tell.
M344587487 is offline   Reply With Quote
Old 2019-07-07, 18:11   #51
maxzor
 
Apr 2017

248 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Also the bottleneck looks to be per chiplet.
maxzor is offline   Reply With Quote
Old 2019-07-07, 19:09   #52
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

25×13 Posts
Default

On the half bandwidth writes I saw the same stated on ocerclockers.com review:
https://www.overclockers.com/amd-ryz...0x-cpu-review/

I hope I'm wrong on this, but this is sounding even more like anything touching ram is going to be limited. This would also apply if you got a two chiplet model and data has to pass between them.

For my personal prime number use cases, I think everything can fit inside the juicy L3 cache so I hope that ram becomes irrelevant. I'm still seeking to clarify just how fast IF is, and also if the 2:1 ratio mode would cripple ram performance even more because that data simply can't go anywhere fast enough.

I have a 3600 on order as the only one available today, expected Tuesday.
mackerel is offline   Reply With Quote
Old 2019-07-07, 22:27   #53
hansl
 
hansl's Avatar
 
Apr 2019

5·41 Posts
Default

Looks like the half write speed happens on 3700X, but does *not* apply to the 3900X.

https://www.overclock3d.net/reviews/..._x570_review/9

https://www.youtube.com/watch?v=45fQaCl9WlA#t=14m30s

Also surprisingly these guys clocked way higher memory than "ideal" 3800 MHz and I don't see any evidence of a switch to 2:1 ratio.
hansl is offline   Reply With Quote
Old 2019-07-07, 22:52   #54
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

25×13 Posts
Default

To my understanding the half write speed is per chiplet, so on the two chiplet CPUs you can still fill up the ram write bandwidth when both are in use. It doesn't help each chiplet individually. If they want to talk to ram or each other, I suspect it will choke. If anyone gets one, I'd be interested to see what really happens.

As for the ratio thing, I think I get it now. It was not explained well in pre-launch materials. The 2:1 ratio applies between the ram and memory controller. The infinity fabric link goes asynchronous if ram is above 3600, and IF locks at 1800 but can be user adjusted further. What this implies is that ram bandwidth will stop scaling at some point when the practical bandwidth converges with the maximum IF bandwidth. Writes will be limited already, so there's only potential benefits in reads. It will be interesting to see what a practical maximum IF clock is on these.

I'd love to be proved wrong, but at the moment I don't think I want to put these to any tasks that stresses ram. The LLR tasks I do will fit in the 32MB L3 so not concerned with ram speed. This reminds me of the Broadwell desktop CPUs, with their 128MB L4 cache. It meant that basically it didn't matter what ram you combined it with, the CPU could run practically unlimited. When I was short I ran 1x4GB module per system!
mackerel is offline   Reply With Quote
Old 2019-07-08, 12:31   #55
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

1A016 Posts
Default

In this thread was written:
Quote:
A 4M FFT at 8-bytes per FFT element is 32MB. As Ernst said, we must read and write that data twice. Thus 128MB of bandwidth is needed for each iteration. Prime95 uses about 5MB of read-only data each iteration. Grand total is 133MB bandwidth per iteration.
Would I be right in thinking that, if we simplify to reads = writes, then half write BW would increase total r+w transfer time by 50%?

On that 5MB of read only data, is that a fixed amount regardless of FFT size? Is this pre-computed look up data? What's the ram space occupied if so?

I was playing about with this info and throughput benchmark results last night in preparation of getting a 3600 to bench, so I have something to compare it with. The bandwidth used would include talking to caches I assume, so numbers I were getting were quite a bit higher than ram bandwidth. Only when plotted on log-log chart did I see something possibly there, a drop after the L3 cache size on Intel CPU, but this would take more looking at to understand, also to repeat on a 2600.
mackerel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
RX470 and RX460 announced VictordeHolland GPU Computing 0 2016-07-30 13:05
Intel Xeon D announced VictordeHolland Hardware 7 2015-03-11 23:26
Factoring details mturpin Information & Answers 4 2013-02-08 02:43
Euler (6,2,5) details. Death Math 10 2011-08-03 13:49
Larrabee instruction set announced fivemack Hardware 0 2009-03-25 12:09

All times are UTC. The time now is 14:54.

Sun Apr 18 14:54:28 UTC 2021 up 10 days, 9:35, 0 users, load averages: 1.85, 1.32, 1.36

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.