![]() |
Ryzen 7950X + Windows 11 + P-1 stage2 = lag
I'm very much enjoying my new Ryzen 7950X system, however I'm disappointed by the lag I get when running Prime95 P-1 stage-2. Everything runs just nicely when Prime95 is doing stage-1 or PRP or something not RAM-intensive, but when it switches to stage-2 the lag becomes really annoying. Seen especially when doing things like double-click-open-file, or Win+E to open File Explorer or such things that should be near-instant (and are in stage-1) take anywhere from 1-5 seconds when Prime95 is using RAM. And not all my RAM either, I allocate it 40 of 64GB during the daytime (50 at night).
Relevant hardware/software: CPU: [URL="https://www.amd.com/en/products/cpu/amd-ryzen-9-7950x"]AMD Ryzen 9 7950X[/URL] GPU: [URL="https://www.msi.com/Graphics-Card/GeForce-RTX-4090-GAMING-X-TRIO-24G/Specification"]GeForce RTX 4090[/URL] Mobo: [URL="https://rog.asus.com/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/"]Asus X670E-E Strix[/URL] RAM: [URL="https://www.gskill.com/product/165/390/1665020865/F5-6000J3040G32GX2-TZ5NR"]G.Skill Trident Z5 Neo RGB DDR5-6000 64GB[/URL] (2x32GB) AMD EXPO SSD: [URL="https://www.westerndigital.com/en-ca/products/internal-drives/wd-black-sn850x-nvme-ssd#WDS400T2X0E"]WD Black SN850X 4TB[/URL] Windows 11 Pro Prime95 v30.8b17 RAM usage is generally around 50-55GB (40 for Prime95 + 10-15 for whatever else I'm doing). 16GB swap file configured on the SSD. I've played with Core Isolation Memory Integrity with no noticeable difference. Has anyone else noticed something like this? Any suggestions? |
If you disable swap, does the lag persist?
|
[QUOTE=Mark Rose;627148]If you disable swap, does the lag persist?[/QUOTE]It persists.
|
Can try to test with lower RAM Frequency.
6000 MT usually is hard already with 32 GB of Ram for Zen 4 memory controller. Can try 5600 MT, same timings, see if that helps to reduce the lag. If it does, then at least you know where to look into and find a better RAM tune. If it does not help, then you may want to look into how to reduce the used RAM bandwidth used by p95, not just size. 16 Core Zen 4 can push so many instructions that RAM bandwidth gets fully saturated before capacity is. |
What happens if you use fewer cores?
We use only two cores on our 5800X3D. (We are also running P-1 with version 30.8 build 17, on W11 Home.) :mike: |
1 Attachment(s)
[QUOTE=Xyzzy;627166]What happens if you use fewer cores?
We use only two cores on our 5800X3D.[/QUOTE]Since 7950X is two chiplets I run one 8-thread worker on threads 0..15, second on 16..31 I have experimented with running Prime95 on 7 threads/worker, leaving the first core of each chiplet free. It may have made some small difference, I'm not sure, nothing significant though. I do notice looking at the Task Manager graph that something weird happens to the worker in stage 2. Just by looking at the graph I can tell which one is in stage 2 -- it has these weird dropouts in the kernel times portion, regularly for about 3 seconds every 15 seconds. The worker in stage1 has a solid usage graph. |
[QUOTE=James Heinrich;627172]Since 7950X is two chiplets I run one 8-thread worker on threads 0..15, second on 16..31
I have experimented with running Prime95 on 7 threads/worker, leaving the first core of each chiplet free. It may have made some small difference, I'm not sure, nothing significant though. I do notice looking at the Task Manager graph that something weird happens to the worker in stage 2. Just by looking at the graph I can tell which one is in stage 2 -- it has these weird dropouts in the kernel times portion, regularly for about 3 seconds every 15 seconds. The worker in stage1 has a solid usage graph.[/QUOTE] For telemetry and sensor reads I suggest using hwinfo64, it is the gold standard software for it. Can see real time telemetry or make logs. Behavior indeed seems weird, are you running latest BIOS, Chipset Driver from AMD and Windows Update? What are the temperatures? |
1 Attachment(s)
Not sure what you wanted to see from HWiNFO, I took a general screenshot.
I have the CPU targeting 70°C in the BIOS (rather than the default 90°C) for noise reasons, and it sticks to that target pretty well (can still draw more than its nominal 170W TDP even with only 14/16 cores running). Windows Update is up-to-date. There is a newer AMD Chipset Driver v5.01.03.005 that I'll try out. edit: No change. Also unplugged an unused ancient 6TB HDD I saw was giving SMART warnings, also no change. BIOS is v0805 from a few months ago when I got the system, there are a couple [URL="https://rog.asus.com/ca-en/motherboards/rog-strix/rog-strix-x670e-e-gaming-wifi-model/helpdesk_bios/"]newer versions[/URL] I could try (not sure if I want to try the beta one):[quote]Version 0922 - 2023/02/24 1. Update AGESA version to ComboAM5PI 1.0.0.5 patch C 2. Improve better performance for AMD new CPUs Version 0925 - 2023/03/15 1.Improve system stability Version 1003 Beta Version - 2023/03/20 1. Update AGESA version to ComboAM5PI 1.0.0.6 2. Supports high density DDR5 module 3. TPM 2.0 security update[/quote] |
So what happens if you run just 1 core per chiplet?
:mike: Also, we run without HT enabled. What happens if you try that? |
1 Attachment(s)
[QUOTE=Xyzzy;627182]So what happens if you run just 1 core per chiplet?[/QUOTE]P-1 runs 8x slower than it could?
I updated BIOS to v1003-beta but no change in lagginess (and same weird dips on the CPU graph). [strike]But I did lose my 70°C thermal limit BIOS option so my fans are screaming at me. :down: I'll have to check again for that option, maybe they renamed/moved it.[/strike] edit: found it -- you need to set Precision Boost Overdrive mode to "Enhancement" (not Auto or Manual or any of the other half-dozen options) to be able to see it). |
[QUOTE=James Heinrich;627186]P-1 runs 8x slower than it could?[/QUOTE]
It probably won't run exactly 8× slower. [CODE]CORESPERCHIPLET = 1 TEST P-1 IF LAG == TRUE CORESPERCHIPLET-- ELSE CORESPERCHIPLET++ [/CODE] |
On a whim I just experimented with a few more things:
1) Disabling AVX512F ([c]CpuSupportsAVX512F=0[/c]) didn't make any noticeable difference 2) Running 4x4 workers instead of 2x8 made a significant difference... the wrong way: nearly locked up the machine, took me 30 seconds to be able to exit Prime95. |
1 Attachment(s)
Running 2 workers on AMD 2 chiplet CPU's is more preferable. Yes.
From hwinfo64, I was curious about the RAM bandwidth that is used. System can be laggy and unresponsible if RAM memory controller is overwhelmed and keeps getting bottlenecked. When I was running 2 PRP at the same time, my PC was veryyy slugish, so now I work only 1 PRP + 1 DC for my 5900X. Unless you can set limitation for max used RAM bandwidth, leaving some of it for normal usage, then large P-1 will consume all the resources. I am talking about bandwidth, not the size. I attached my screenshot with DDR4 3800 running PRP. When it write/reads at the same time, its a heavy task for RAM and Memory controller. Edit: 7950X is very fast CPU, with a lot of cores. It can easily overwhelm memory controller, especially with 64 GB of ram. |
1 Attachment(s)
I added additional screenshot.
Now I have 2x6 setup, I have 12 cores for 5900X. 1st worker does DC at FFT 3360 2nd worker does PRP at FFT 6M In this setup, it is close to running max bandwidth for RAM, but I have enough responsiveness to not be bothered while doing any other task - working on documents, browsing, discord, movies etc. If I want to do something heavier or play some games, have to turn p95 off completely thou. My suggestion - try to run only 1 P-1 worker, other worker on something lighter. |
I've given up running P-1 for the last several months, I run 2x PRP/DC workers. It still lags. It's more usable running only 6 (instead of 8) threads per worker, but it's still noticeable.
Firing up HWiNFO, I see running 2x6-core LL I have RAM read/write of 27/18Gbps. With 2x8-LL it's 30/19. 2x8 P-1 stage 2 it's 38/27. |
I see, not sure what else I could suggest.
Running 2x6 is not the worst option to be honest. Especially if it is LL + PRP or P-1. I assume 38/27 RAM bandwidth is the hard cap already for 2x8 P-1, so keeping at ~30/20 may leave enough leftover bandwidth to keep your system responsive. Can go deep dive in RAM overclocking and tuning, to gain out more performance out of it, up to 5-10%. But it requires time and dedication to learn the nuances of RAM overclocking. Here is an youtube video of recognized overclocker introducing primary overclocking of ram for AMD 7000 - [url]https://youtu.be/dlYxmRcdLVw[/url] |
3 Attachment(s)
I don't really want to push my RAM much further, it's already "factory overclocked" with the EXPO profile and runs warm (66C) and even if I could get another 10% bandwidth I don't think it would really fix the underlying issue. I'll just make do with fewer workers during the daytime.
|
1 Attachment(s)
Gotcha, 66c is quite warm, I would look for a way to improve case airflow to bring it a bit down, either with fan on top of them (picture added) or modifying the heat spreaders. High temps can cause some instability, but its not a big deal if error checking does not get triggered and probably would not help with current performance.
I am out of ideas what to suggest further to run it at full speed with good responsiveness. If you find out anything, would be good to know. Good luck! |
| All times are UTC. The time now is 16:20. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.