mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2020-02-15, 19:25   #12
AbsolutXTR
 
Feb 2020

3 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
The L3 cache cache on the 3900X is split in four 16 MB chunks, one per CCX, and it's a victim cache, and not unified, so it's unlikely the FFT is fitting. The 3900X is basically two 3600 put together, but they'll share memory bandwidth so I don't expect the 3900X would be much faster than a 3600 for that matter. The 3600 with 3600 MHz memory is probably a sweet combination.
Just went from a 3600 to a 3900x and seeing barely any improvement (2 workers I have only marginally sped up... and adding more workers kills performance of the first 2). Sample size of 1 set of workloads, but so far you are 100% correct. 3600 seems like the sweet spot.

Once I have my 3600 in my other system I'll try some more side by sides with the 3900x. I should have both 3200Mhz and 3600Mhz ram too so I might even play around to see the speed difference there.
AbsolutXTR is offline   Reply With Quote
Old 2020-02-15, 21:07   #13
AbsolutXTR
 
Feb 2020

310 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
The L3 cache cache on the 3900X is split in four 16 MB chunks, one per CCX, and it's a victim cache, and not unified, so it's unlikely the FFT is fitting. The 3900X is basically two 3600 put together, but they'll share memory bandwidth so I don't expect the 3900X would be much faster than a 3600 for that matter. The 3600 with 3600 MHz memory is probably a sweet combination.
@Mark Rose - I just upgraded to a 3900x from a 3600 and from what I can see, you're right. The performance gain is marginal at best, for the workload I'm running.
(I'm doing two LL 100MM+ digit number tests... for fun)

I've got some time left on those to complete but I can run some more relevant benchmarks in a little while. I'm going to put the 3600 in a different system.
AbsolutXTR is offline   Reply With Quote
Old 2020-02-15, 23:11   #14
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

2·52·19 Posts
Default

To take full advantage of the L3 cache I think you need to do smaller FFT's on the 3900X to free up memory bandwidth to avoid stalls and do the 100M tests on the 3600. There should be an FFT size where the 3900X can have nearly twice the throughput of a 3600 but it may be much smaller than the current wavefront, maybe in DC territory.


If you're inclined it might be fun to mix tests. Maybe only doing wavefront tests hammers the memory bandwidth too much and only DC's too little, the sweet spot could be a bit of both. Maybe 4 tests are best (one per CCX) with n being wavefront tests and 4-n being DC tests.
M344587487 is offline   Reply With Quote
Old 2020-02-18, 06:35   #15
AbsolutXTR
 
Feb 2020

316 Posts
Default

Quote:
Originally Posted by M344587487 View Post
To take full advantage of the L3 cache I think you need to do smaller FFT's on the 3900X to free up memory bandwidth to avoid stalls and do the 100M tests on the 3600. There should be an FFT size where the 3900X can have nearly twice the throughput of a 3600 but it may be much smaller than the current wavefront, maybe in DC territory.


If you're inclined it might be fun to mix tests. Maybe only doing wavefront tests hammers the memory bandwidth too much and only DC's too little, the sweet spot could be a bit of both. Maybe 4 tests are best (one per CCX) with n being wavefront tests and 4-n being DC tests.
I've still yet to run my 3600 side by side (tbh I didn't write down its iter/s on the 100mil tests but it seems very comparable, what the 3900x is doing vs 3600).

I'm sure you're right about small FFTs though. Running my 2 100mil workers with anything additional that had a large FFT really kills performance on 3900x. However, I can run some PRP tests alongside just fine (FFT = 560K).

It seems to work best to make sure that the cores being shared by a worker are in the same CCX (max 3 cores per worker for me)... at least in my very unscientific testing. e.g. 2, 6-core 100mil workers is basically the same performance as running 2, 3-core 100mil workers, while being able to do other small FFT stuff alongside..... Running 2, 4-core 100mil workers with 1 other 4-core alongside destroyed 100mil perf, though (perhaps I need to verify I ran PRP on the 3rd.)
AbsolutXTR is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Ryzen help Prime95 Hardware 9 2018-05-14 04:06
Ryzen 2 efficiency improvements M344587487 Hardware 3 2018-04-25 15:23
Help to choose components for a Ryzen rig robert44444uk Hardware 50 2018-04-07 20:41
29.2 benchmark help #2 (Ryzen only) Prime95 Software 10 2017-05-08 13:24
AMD Ryzen is risin' up. jasong Hardware 11 2017-03-02 19:56

All times are UTC. The time now is 16:33.


Fri Jul 7 16:33:56 UTC 2023 up 323 days, 14:02, 1 user, load averages: 2.03, 2.25, 1.97

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔