![]() |
|
|
#100 | |
|
Feb 2016
UK
1110000002 Posts |
Quote:
I think we can safely say they are binning. Recent info from Silicon Lottery suggests that 3800X on average gets you about 100 MHz more than a 3700X in similar conditions. On two CCD models, the highest boosts are only needed at lower active core counts so they can stretch the best CCDs a bit further. I guess the only significant possible counter-argument against different CCD binning would be that they may still need "better" CCDs running all core to help keep within power/current budget. I haven't looked at differing performance on CCXs yet. I do note using Ryzen Master software it indicates for a CCD, which is the "fastest" core, the 2nd fastest core on same CCX, and also 1st and 2nd fastest cores on the other CCX. Yesterday I started doing some experiments on a 3700X, running at 4+0 (one CCX, half effective L3 cache, better core to core latency), and running 2+2 (all L3 available, higher latency crossing CCX). Not done many workloads yet, no significant difference (<1%) in Cinebench R15, R20, Geekbench 4 multicore. 3DMark11 Physics showed around 8% advantage for 4+0. If this sounds like a random bunch of benchmarks, it's for a bit of fun elsewhere. |
|
|
|
|
|
|
#101 |
|
"Composite as Heck"
Oct 2017
2·52·19 Posts |
In addition to the current known variants it's likely that 3500, 3700 and 3900 are coming: https://www.tomshardware.com/news/ry...ies,40040.html
The 3500 is likely OEM-only. The 3700 is an interesting alternative to the 3600 at the right price (unlikely to arrive at that price unless it comes much later). The 3900 likely uses lower binned chiplets like the 3600 which means it should be easier to secure enough supply for. By the time the 3900 comes to market the 3900X supply may have been solved but who knows. |
|
|
|
|
|
#102 |
|
Feb 2016
UK
26×7 Posts |
I'm doing the challenge at PrimeGrid at the moment with the two Zen 2 systems... or would be if one hadn't produced some bad results. It's the 3600 running stock, with 2666 ram. Running two tasks of 3 cores each as I found that optimal for productivity.
I have a suspicion that the CPU might not be stable at higher temps, even before it throttles. Currently I have a Wraith Prism on it, which is the cooler bundled with the 3700X. Peak temps hit 90C... when I had a Noctua D9L on it before, it was only hitting 80C. So, bad trade off maybe. RGB+noise vs cooler temps. Guess I have to put the Noctua back on it some time. |
|
|
|
|
|
#103 |
|
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts |
Okay, I just had to do some quick benchmarks of my own on the Ryzen 5 3600. No overclocking options used in the BIOS so I assume it's running at base clock (3.6 GHz) while doing these tests. It's a bit hard to tell because the clock values I get from Linux are all over the place. One second it's 2.1 GHz, the next it's 4.1 and so on, while running benchmarks in mprime. But anyway it was a quick qualitative test to see how the L3 cache and memory bandwidth copes with different situations, so maybe that doesn't matter so much.
So I ran all mprime FFT sizes from 2048K to 8192K, varied the number of cores used from 2 to 6, and always kept the number of workers at 1. As a baseline comparison, the lowest graph curve is the now retired Ryzen 3 2200G, four cores, one worker. Speeds normalized by multiplying FFT length (in K) with throughput (iters/sec), then dividing that value by the slowest such result, which happened to be 8064K FFT on the 2200G processor. It seems that four cores is enough to saturate the RAM bandwidth, once the FFT size gets big enough (around 6M). Around the current first test wavefront (5120M fft) five cores seems to be enough, there's not much improvement from having six cores running. But in my opinion, it makes a lot more sense to run tests at the double checking wavefront. The DC exponents I'm getting use 2688K FFT and fit well inside the L3 cache. It would be really interesting to see how fast the eight-core 3700X/3800X runs there And, of course, it would be even more interesting to see the 3900X performance, if someone ever manages to get one... Double the L3 cache should mean that even first test FFTs fit in the cache and the speedup should be very noticeable. Note that even on the 3600, the L3 cache is divided in two, but this doesn't seem to matter that much, only the total amount. Bigger is better, folks.
|
|
|
|
|
|
#104 | |
|
Jan 2015
2·127 Posts |
Quote:
Ryzen is still dual channel RAM right? Just guessing, maybe we'll see linear acceleration up to 16 workers at least with Rome's 8 channels since it sort of has intrasocket UMA, depending on how long it takes to saturate those PCIe4.0 IF links. |
|
|
|
|
|
|
#105 |
|
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts |
|
|
|
|
|
|
#106 |
|
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/
61608 Posts |
What speed are you running the memory at?
|
|
|
|
|
|
#107 |
|
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts |
2x 8GB, 3600 MHz CL18.
|
|
|
|
|
|
#108 | |
|
Feb 2016
UK
26×7 Posts |
Quote:
As it stands, for smaller tasks than around here, I found one worker per CCX to give optimal throughput, but those do fit in their split of the cache. For bigger FFTs exceeding a CCX, the performance isn't bad using the whole CCD, but I think would have been even higher had the L3 cache been unified. While running 3600 ram is probably optimal, for those with slower ram I wonder if there may be some benefit to running a higher IF clock than ram clock. The tradeoff is increased ram latency from breaking the synchronous nature, but you recover some of that write bandwidth.. |
|
|
|
|
|
|
#109 | |
|
"Composite as Heck"
Oct 2017
2×52×19 Posts |
Quote:
An interesting thing to note is that IF speed (FCLK) can be decoupled from RAM speed, it's not just a case of being tied to a multiple of RAM speed. There is a latency penalty (at least) to doing so, but if it means IF can be set to 1900 (instead of the typical 1600 to 1800 range) it may be a worthwhile speedup when not bound by RAM. (This video shows some tuning/metrics that may not be directly relevant but show the general concept: https://youtu.be/10pYf9wqFFY?t=535 ). |
|
|
|
|
|
|
#110 |
|
"Sam Laur"
Dec 2018
Turku, Finland
4758 Posts |
All right, one 3900X owner ran the mprime benchmarks for me, same methodology in plotting as before. The 3900X is thus over two times faster than the 3600 within a certain range of FFT sizes, from 5120K to 7680K. 2.45 ms/iter at the wavefront, 5120K... Certainly the effect of the larger L3 cache can be seen. He has 3000 MHz memory in the system, no idea about latency. No system tuning done, so I assume that fclk is just 1500 MHz.
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| RX470 and RX460 announced | VictordeHolland | GPU Computing | 0 | 2016-07-30 13:05 |
| Intel Xeon D announced | VictordeHolland | Hardware | 7 | 2015-03-11 23:26 |
| Factoring details | mturpin | Information & Answers | 4 | 2013-02-08 02:43 |
| Euler (6,2,5) details. | Death | Math | 10 | 2011-08-03 13:49 |
| Larrabee instruction set announced | fivemack | Hardware | 0 | 2009-03-25 12:09 |