![]() |
Unstable Ryzen 7950X
Looks like Prime95 30.8.17 doesn't support AMD Ryzen 7950X on Windows 10 either. Is there a correct version of Prime95 I can download?
|
[QUOTE=tuckerkao;619857]Looks like Prime95 30.8.17 doesn't support AMD Ryzen 7950X on Windows 10 either. Is there a correct version of Prime95 I can download?[/QUOTE]
If you stated the errors you get with all parameters and symptoms you might expect an answer. You mentioned several problems : [QUOTE=tuckerkao;619807]I'm wondering when Prime95 30.9 will start to run smoothly with Ryzen i9 7950X on Windows 11. The 30.8.17 version kept getting the FFT errors during the PRP testing or even roundoffs during P-1. System virtual memory and page size already increased to 17 GBs ~ 25 GBs.[/QUOTE] This would point to an unstable configuration. Did you try to run Torture Tests ? With what results for which settings ? |
[QUOTE=tuckerkao;619807]Ryzen i9 7950X on Windows 11. The 30.8.17 version kept getting the FFT errors during the PRP testing or even roundoffs during P-1.[/QUOTE][QUOTE=tuckerkao;619857]Looks like Prime95 30.8.17 doesn't support AMD Ryzen 7950X on Windows 10 either.[/QUOTE]Prime95 v30.8b17 is running just fine on my 7950X on Windows 11...
|
2 Attachment(s)
[QUOTE=James Heinrich;619875]Prime95 v30.8b17 is running just fine on my 7950X on Windows 11...[/QUOTE]That said, I notice sometimes the threads are all nicely spaced onto presumably real cores, other times it all seems out of whack. Should I be concerned?
|
Do you see a performance difference in the two cases you have presented?
|
2 Attachment(s)
[QUOTE=S485122;619861]If you stated the errors you get with all parameters and symptoms you might expect an answer.[/QUOTE]
The exponent size may be the problem that cause the error. I'll just post a screenshot(#1), see if George Woltman has the solution (CPU: AMD Ryzen 7950X, Windows 10). This error doesn't occur every time, but most likely will occur after half an hour of continuous test. The same problem also arises during the PRP test with the FFP, but that can cause bluescreen death for the Windows system at times. When I bring the P-1 file to the other computer that works good(Intel CPU), the auto FMA3 FFT selection seems to be much larger, but I don't know how to adjust that on individual machines(Screenshot #2). |
1 Attachment(s)
[QUOTE=tuckerkao;619899]The exponent size may be the problem that cause the error.[/QUOTE]Works fine on mine. Same FFT size is selected. My selected bounds are higher, but it looks like you hardcoded bounds rather than let Prime95 select them?
Also, you seem to be running only 1 worker with 8 threads so only using half the CPU? And finally, again, you're working on unreserved exponents. Please reserve the exponents you work on. |
[QUOTE=kruoli;619896]Do you see a performance difference in the two cases you have presented?[/QUOTE]Of course after posting that it's been showing up very consistently in the "sane" configuration, I don't know I got it scrambled like that before. :brian-e:
|
[QUOTE=James Heinrich;619909]Works fine on mine. Same FFT size is selected. My selected bounds are higher, but it looks like you hardcoded bounds rather than let Prime95 select them?
Also, you seem to be running only 1 worker with 8 threads so only using half the CPU? And finally, again, you're working on unreserved exponents. Please reserve the exponents you work on.[/QUOTE] I couldn't reserve the Pminus1= code assignments for some reason, the worktodo.txt always change to N/A afterwards. Invalid assignment type error message. I've noticed that the Gerbicz errors occur even Prime95 checks them every 10,000 iterations on Ryzen 7950X, perhaps some adjustments needed for the BIOS or power or memory settings? |
[QUOTE=tuckerkao;619914]I couldn't reserve the Pminus1= code assignments for some reason, the worktodo.txt always change to N/A afterwards.[/QUOTE]If you look at the top Communications window in Prime95 it should give you a reason for the failure-to-register.
Any particular reason why you're hardcoding bounds with [c]Pminus1=[/c] rather than letting Prime95 decide with [c]Pfactor=1,2,168455141,-1,77,1.3[/c]? |
I checked the memory allocation graph from Task Manager.
When Prime95 is running the stage 1 of P-1 or PRP test on my Ryzen 7950X machine, it only used 0.3 Gb of RAM for some reason, that was why the errors keep occur, how can I change this? Prime95 displayed [B]Out of Memory![/B] and stopped P-1 after stage 1. I noticed that stage 2 of P-1 doesn't have this problem if stage 1 finished on another machine because there's a menu in Prime95 that I can adjust the RAM usage. [QUOTE=James Heinrich;619909]Also, you seem to be running only 1 worker with 8 threads so only using half the CPU?[/QUOTE] I tried to find why Gerbicz errors keep occurring, it turned out lack of memory was the main problem. I'd like someone to tell me how to customize the memory allocations for PRP and stage 1 of P-1. Using all 16 cores caused my Windows operating system to crash into the bluescreen. |
[QUOTE=tuckerkao;619931]When Prime95 is running the stage 1 of P-1 or PRP test on my Ryzen 7950X machine, it only used 0.3 Gb of RAM for some reason.
Prime95 displayed [B]Out of Memory![/B] and stopped P-1 after stage 1[/quote]Prime95 only uses lots of RAM during stage-2 of P-1 (and ECM). Right now running a PRP for example I see 184 MB used. This is normal. But your 0.3GB statement is suspicious as that's a default configuration. Under [c]Options... | Resource Limits...[/c] check that your Daytime and Nighttime RAM allocation is appropriate (somewhere around 75% of physical installed RAM). [QUOTE=tuckerkao;619931]Using all 16 cores caused my Windows operating system to crash into the bluescreen. I tried to find why Gerbicz errors keep occurring[/QUOTE]You have something very wrong with your hardware. How hot is the CPU running? [QUOTE=tuckerkao;619925]Pminus1 with the recommended bounds from GPU72 typically take less time to complete.[/QUOTE]Yes, doing things worse is usually faster. You can lower the bounds to near-zero and be done almost instantly, but that doesn't really benefit anyone. Please let Prime95 decide what bounds should be used based on the exponent status and your available RAM. The values shown on mersenne.ca are just a very rough estimate of typical values and should [b]never[/b] override what Prime95 selects by itself. |
[QUOTE=James Heinrich;619935]You have something very wrong with your hardware. How hot is the CPU running?[/QUOTE]
I'm not that good checking the BIOS data, maybe you can tell me what to look up. I have ASUS Prime X670-P Motherboard. I've heard that Ryzen 7950X can go as high as 95°C which was what the AMD company advertised for. The bluescreen only showed up on Windows 11, I never had this problem again after downgrading the operating system back to Windows 10. After increased the emergency memory GB/Worker to 7 GB, it seems to help some. |
[QUOTE=tuckerkao;619937]I'm not that good checking the BIOS data, maybe you can tell me what to look up.[/QUOTE]You may already have Asus' [c]Armory Crate[/c] installed which includes monitoring stuff. Something like [url=https://www.alcpu.com/CoreTemp/]CoreTemp[/url] is much lighter to just show temperature and core usage.
[QUOTE=tuckerkao;619937]After increase the emergency memory GB/Worker to 7 GB, it seems to help some.[/QUOTE]What is your normal (not emergency) memory configuration set to? Perhaps show a screenshot of your Resource Limits dialog. |
[QUOTE=James Heinrich;619938]What is your normal (not emergency) memory configuration set to? Perhaps show a screenshot of your Resource Limits dialog.[/QUOTE]
Temp Disk Space Limit (GB / Worker): 48 ECM Stage 2 Memory (GB) for both daytime and nighttime: 24 After installed Core Temp 1.18, it showed 86~87°C for CPU #0 and Power Usage of around 153 W. |
[QUOTE=tuckerkao;619939]86~87°C for CPU #0 and Power Usage of around 153 W.[/QUOTE]Your CPU would seem to be throttling itself to fit within the thermal limits. For comparison mine is running PRP on 16 cores at 5425MHz at 68°C @ 176W.
|
[QUOTE=James Heinrich;619935]Yes, doing things worse is usually faster. [/QUOTE]
+1 :goodposting: I may steal that and use it like a motto for a while... |
[QUOTE=James Heinrich;619940]Your CPU would seem to be throttling itself to fit within the thermal limits. For comparison mine is running PRP on 16 cores at 5425MHz at 68°C @ 176W.[/QUOTE]
I believe I have found the guideline online of how to adjust the thermal limits from the BIOS, but will let you check it first - [URL="https://edgeup.asus.com/2022/control-the-temps-of-your-amd-ryzen-7000-series-cpu-with-asus-exclusive-pbo-enhancement/"]https://edgeup.asus.com/2022/control-the-temps-of-your-amd-ryzen-7000-series-cpu-with-asus-exclusive-pbo-enhancement/[/URL] My motherboard is: Asus Prime X670-P Wifi AM5 ATX w/ Wi-Fi 6, 2.5GbT Lan, (3)PCIe x16, (1)PCIe x1, (3)M.2, (6)Sata for the Ryzen 7950X CPU After raising the emergency memory to 7.8 GBs / Worker, the P-1 cycle could go for at least 3 hours, then the memory ran out again. Thus I had to restart Windows 10 after every P-1 completed. I'm wondering why Prime95 wouldn't release the used memory space like my the other computer. I have 2 T-Delta 6000 MHz DDR5 RAM, 16 GBs each and 32 GB total. |
[QUOTE=tuckerkao;620012]After raising the emergency memory to 7.8 GB / Worker, the P-1 cycle could go for at least 3 hours, then the memory ran out again. Thus I have to restart Windows 10 after every P-1 completed. I'm wondering why Prime95 won't release the used memory space like my the other computer.[/QUOTE]You're doing something wrong. Prime95 should never need "emergency" memory to run things. That's why it's called "emergency". Set it back to 1GB.
If you want help, show screenshots of your Resource Limits dialog, and the worker window as it starts up P-1 and gets to however far it gets, and what the crash looks like when it "runs out of memory". |
2 Attachment(s)
[QUOTE=tuckerkao;619899]This error doesn't occur every time, but most likely will occur after half an hour of continuous test.[/QUOTE]
The crash screen screenshot was already posted on post #6 of this thread which was on page 1 and quoted above. Once "in write_gwnum, unexpected gwtogiant failure retcode -1" error message shows up, the P-1 progress can no longer be saved and will likely hit "Out of Memory" within several minutes. The 2 new screenshots show the resource limit windows. I'm trying to see whether use hyperthreads for P-1 will help. |
[QUOTE=tuckerkao;620017]The crash screen screenshot was already posted on post #6 of this thread[/QUOTE]The screenshots on post #6 don't say anything about being out of memory.
|
1 Attachment(s)
[QUOTE=James Heinrich;620018]The screenshots on post #6 don't say anything about being out of memory.[/QUOTE]
The "Out of Memory" message was shown after Prime95 finished the stage 1 of P-1, then it showed "Out of Memory!" on the bottom window, then the worker stopped with no stage 2, but it could take me another several hours to try to regenerate that very specific screen. It had to be after "in write_gwnum, unexpected gwtogiant failure retcode -1" showed on the top. [QUOTE=James Heinrich;619909]Also, you seem to be running only 1 worker with 8 threads so only using half the CPU?[/QUOTE] When using all 16 Cores at the same, the rounding errors show up very often. |
Stop all P-1 and PRP testing.
This system has hardware problems. Concentrate on passing a torture test. My first thoughts whenever hardware problems arise are to change your memory settings. You'll need to get comfortable with the BIOS. Reduce memory speed, then try a torture test. |
I reduced the frequency of the RAMs from 6000 to 4800 inside the BIOS which seemed to be the factory overclocking.
Prime95 passed the torture test with around 16 small tests. I'll run the P-1 with only 1 GB/worker of emergency memory, see if the same problem arises or not. Progress: It seemed like the P-1 survived at least 20 minutes with all 16 Cores used while without any errors reporting. I still cannot find the Extreme Tweaker inside my Asus Prime 670-p Bios to adjust the thermal limit of my CPU. |
The small FFT torture test does not test RAM as much as a blend torture test
|
[QUOTE=tuckerkao;620029]I reduced the frequency of the RAMs from 6000 to 4800[/QUOTE]If you simply set your RAM to 6000 without enabling the AMD EXPO profile (which also adjusts all sorts of other things like voltages) then it's unlikely to be stable at 6000. If your RAM does not have AMD EXPO profile, it may be tricky to get it working at that speed.
|
[QUOTE=James Heinrich;620046]The TF bitlevel is probably pretty close to what GPU72 recommends.[/QUOTE]
After the Nvidia Lovelace GPUs released, most exponents from M108M to M120M have been trial factored up to at least 2[SUP]77[/SUP] by either Chalsall's GPU72 crews or TheJudger's group. If using that ratio, all M168M exponents should be brought up to at least 2[SUP]79[/SUP] which will take less than 2 hours total from 2[SUP]74[/SUP] to 2[SUP]79[/SUP] per exponent on my fastest GPU before running the PRP tests. [QUOTE=James Heinrich;620047]If you simply set your RAM to 6000 without enabling the AMD EXPO profile (which also adjusts all sorts of other things like voltages) then it's unlikely to be stable at 6000. If your RAM does not have AMD EXPO profile, it may be tricky to get it working at that speed.[/QUOTE] I'm not sure whether there's a necessity to re-run the P-1 tests during the unstable time frame or not. The impacted exponents were M[M]168916721[/M], M[M]168455141[/M], M[M]168465413[/M], M[M]168465421[/M], M[M]168465601[/M], M[M]168173101[/M]. I typed the list down, so I don't forget them later. |
[QUOTE=James Heinrich;620047]If you simply set your RAM to 6000 without enabling the AMD EXPO profile (which also adjusts all sorts of other things like voltages) then it's unlikely to be stable at 6000. If your RAM does not have AMD EXPO profile, it may be tricky to get it working at that speed.[/QUOTE]
EXPO is one-click RAM OC for GigaByte boards. DCOP is ASUS's. I recommend setting BIOS defaults and just turning on DCOP or EXPO depending on one's board's manufacturer. |
[QUOTE=paulunderwood;620049]EXPO is one-click RAM OC for GigaByte boards. DCOP is ASUS's[/QUOTE]EXPO (in this context) is AMD's version of Intel's XMP, brand new for AM5:
[url]https://www.amd.com/en/technologies/expo[/url] It's not specific to any motherboard manufacturer or RAM manufacturer, but both need to support it to work as intended. |
[QUOTE=James Heinrich;620050]EXPO (in this context) is AMD's version of Intel's XMP, brand new for AM5:
[url]https://www.amd.com/en/technologies/expo[/url] It's not specific to any motherboard manufacturer or RAM manufacturer, but both need to support it to work as intended.[/QUOTE] I stand corrected. I hit a page from 2016 with EOPC :redface: |
James posted some data in the [url=https://www.mersenneforum.org/showthread.php?t=28107]Zen4 7950X Benchmarks[/url] thread :
[QUOTE=James Heinrich;620024]After playing with my 7950X for a bit I came to two conclusions: 1) Getting workers aligned to chiplets is [b]vital[/b]. Running 16-thread PRP across 2 chiplets is actually 40% [b]slower[/b] than just running 8 threads on one chiplet. 2) You can save a fair amount of power and heat and not lose much performance. I actually dropped the thermal limit from 90°C to 70°C in the BIOS. My PRP iteration times are perhaps 1% slower, but CPU power consumption is down from 235W to 195W, temperature down from 92C to 72C, and (important to me) the fan noise is down from very-noticeable to barely-there. For me at least it's a hugely worthwhile tradeoff.[/QUOTE] His 7950X used 235 W without going over 90 °C. Tuckertao's 7950X reaching more than 86 °C at 153 W might indicate a cooling problem. He blamed the errors showing up in Prime95 on Windows 11, some updates, then Windows 10, then Prime95, ... He changed all kind of settings without fully understanding their meaning. He should start by reverting his motherboard BIOS settings to the factory defaults without overclocking, perhaps just configuring the memory settings according to CPU, memory and motherboard specifications (checking the respective manufacturers documentation.) Then he should check his hardware setup : CPU cooler, ventilation of the case, ... Then revert his Prime95 settings to the default, then input his user and computer names, set the memory to use for P+-1 and ECM, optionally set the work type. After that start torture testing "Small FFT's" to check the cooling of his hardware, then "Large FFT's to check the memory or just "Blend". A torture test should be run for quite a time (hours), especially if there are suspicions of hardware problems. |
Based on symptoms it seems not unlikely that overheating is contributing to his problems, and he knows he's running too hot (and should know how to fix it, because it's surely his settings that caused it to run hot in the first place - other than possibly laptops, CPUs don't normally reach that temperature with default settings). Most people including me have a max CPU temp around 70 C.
Re-doing the suspect P-1 is surely a good idea. As for the P-1 bounds, perhaps he'd understand better if he knew of the 30.8 changes that cause the greatest amount of the disparity - he's running 30.8, and presumably has enough memory to run the faster stage 2 on these exponents, and so really should be taking advantage of the higher B2 it allows. If stage 2 takes less than half as much time as stage 1 (ignoring GCD time), it's definitely suboptimal. But that is less important than reliability - good P-1 at acceptable bounds beats bad P-1 at good bounds. And to repeat what others have said, the torture test is there for a reason! Every time you make a significant hardware change or encounter apparent hardware problems it should be run again. |
I concur with others in that memory is the most likely point of trouble for the OP.
My 7950X is paired with an ASUS ProArt X670E-Creator WiFi motherboard. This board supports DDR5 memory with AMD EXPO profiles but not DDR5 memory with Intel XMP profiles. I test installed a pair of Trident Z5 RGB XMP DDR5 6000 memory modules with this motherboard and the XMP profile did not even appear within BIOS setup. This is when I realized that X670/E motherboards really need to be given DDR5 memory modules with AMD EXPO profile, if one wants memory speed & timings faster than the DDR5 4800 JEDEC standard. I use a pair of G.Skill Trident Z5 Neo AMD EXPO 16GB DDR5 6000 modules with my 7950X and have not encountered any stability issues. While ASUS motherboards typically allow memory timings to be manually configured, this is always a risky approach with questionable odds of success. Another thing to consider is how Microsoft Windows 11 OS can mess up support for CPUs with non-hybrid/legacy core architecture such as Zen 3 & Zen 4 Ryzen. The most recent OS scheduler efforts by Microsoft have been focused in optimally supporting hybrid core CPUs such as Intel Alder Lake & Raptor Lake. It appears that Microsoft has struggled to keep up with the latest gen Intel & AMD CPUs and the scheduler cannot support hybrid core CPUs without compromising non-hybrid core CPU performance and perhaps robustness also. The latest incident of Windows 11 22H2 causing performance issues with Zen 4 Ryzen CPUs, reported back in October, was by no means the first. In addition to running the Prime95 torture test with large FFTs (stresses memory controller and RAM), I recommend running Memtest86+ to assess the integrity of the memory subsystem. One last recommendation: update the motherboard to the latest released BIOS from the manufacturer. AMD AGESA code support for Zen 4 is apparently still evolving to improve system performance and DRAM compatibility. |
[QUOTE=Andrew Usher;620095]And to repeat what others have said, the torture test is there for a reason! Every time you make a significant hardware change or encounter apparent hardware problems it should be run again.[/QUOTE]
Amen!!! Most know this, but I got involved with GIMPS way back when when I was actually deploying kit in the field. And some server rooms. George's amazing code has saved me ***sooo*** much time and grief (and serious coin!!!) over the years. Thanks again, George. Truly useful work. To many. :tu: |
[QUOTE=scan80269;620143]X670/E motherboards really need to be given DDR5 memory modules with AMD EXPO profile, if one wants memory speed & timings faster than the DDR5 4800 JEDEC standard.
One last recommendation: update the motherboard to the latest released BIOS from the manufacturer. AMD AGESA code support for Zen 4 is apparently still evolving to improve system performance and DRAM compatibility.[/QUOTE]I'm using 2x32GB [URL="https://www.gskill.com/product/165/390/1665020865/F5-6000J3040G32GX2-TZ5NR"]G.SKILL Trident Z5 Neo RGB[/URL] which supports EXPO. With the Asus X670E-E shipping BIOS version it booted fine at 4800 but as soon as I set the EXPO profile it refused to POST. Updating the BIOS to the latest and now it's running at 6000 perfectly happy. |
| All times are UTC. The time now is 16:06. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.