mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2022-12-17, 00:29   #23
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

17·487 Posts
Default

Stop all P-1 and PRP testing.

This system has hardware problems. Concentrate on passing a torture test.

My first thoughts whenever hardware problems arise are to change your memory settings. You'll need to get comfortable with the BIOS. Reduce memory speed, then try a torture test.
Prime95 is offline   Reply With Quote
Old 2022-12-17, 01:08   #24
tuckerkao
 
"Tucker Kao"
Jan 2020
Head Base M168202123

24·5·11 Posts
Default

I reduced the frequency of the RAMs from 6000 to 4800 inside the BIOS which seemed to be the factory overclocking.

Prime95 passed the torture test with around 16 small tests. I'll run the P-1 with only 1 GB/worker of emergency memory, see if the same problem arises or not.


Progress: It seemed like the P-1 survived at least 20 minutes with all 16 Cores used while without any errors reporting.


I still cannot find the Extreme Tweaker inside my Asus Prime 670-p Bios to adjust the thermal limit of my CPU.

Last fiddled with by tuckerkao on 2022-12-17 at 01:54
tuckerkao is offline   Reply With Quote
Old 2022-12-17, 01:43   #25
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

17·487 Posts
Default

The small FFT torture test does not test RAM as much as a blend torture test
Prime95 is offline   Reply With Quote
Old 2022-12-17, 04:53   #26
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

7×13×47 Posts
Default

Quote:
Originally Posted by tuckerkao View Post
I reduced the frequency of the RAMs from 6000 to 4800
If you simply set your RAM to 6000 without enabling the AMD EXPO profile (which also adjusts all sorts of other things like voltages) then it's unlikely to be stable at 6000. If your RAM does not have AMD EXPO profile, it may be tricky to get it working at that speed.
James Heinrich is online now   Reply With Quote
Old 2022-12-17, 05:03   #27
tuckerkao
 
"Tucker Kao"
Jan 2020
Head Base M168202123

24×5×11 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
The TF bitlevel is probably pretty close to what GPU72 recommends.
After the Nvidia Lovelace GPUs released, most exponents from M108M to M120M have been trial factored up to at least 277 by either Chalsall's GPU72 crews or TheJudger's group.

If using that ratio, all M168M exponents should be brought up to at least 279 which will take less than 2 hours total from 274 to 279 per exponent on my fastest GPU before running the PRP tests.


Quote:
Originally Posted by James Heinrich View Post
If you simply set your RAM to 6000 without enabling the AMD EXPO profile (which also adjusts all sorts of other things like voltages) then it's unlikely to be stable at 6000. If your RAM does not have AMD EXPO profile, it may be tricky to get it working at that speed.
I'm not sure whether there's a necessity to re-run the P-1 tests during the unstable time frame or not. The impacted exponents were M168916721, M168455141, M168465413, M168465421, M168465601, M168173101. I typed the list down, so I don't forget them later.

Last fiddled with by tuckerkao on 2022-12-17 at 05:14
tuckerkao is offline   Reply With Quote
Old 2022-12-17, 05:07   #28
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

468510 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
If you simply set your RAM to 6000 without enabling the AMD EXPO profile (which also adjusts all sorts of other things like voltages) then it's unlikely to be stable at 6000. If your RAM does not have AMD EXPO profile, it may be tricky to get it working at that speed.
EXPO is one-click RAM OC for GigaByte boards. DCOP is ASUS's.

I recommend setting BIOS defaults and just turning on DCOP or EXPO depending on one's board's manufacturer.

Last fiddled with by paulunderwood on 2022-12-17 at 05:09
paulunderwood is offline   Reply With Quote
Old 2022-12-17, 05:18   #29
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

7×13×47 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
EXPO is one-click RAM OC for GigaByte boards. DCOP is ASUS's
EXPO (in this context) is AMD's version of Intel's XMP, brand new for AM5:
https://www.amd.com/en/technologies/expo
It's not specific to any motherboard manufacturer or RAM manufacturer, but both need to support it to work as intended.
James Heinrich is online now   Reply With Quote
Old 2022-12-17, 05:24   #30
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

124D16 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
EXPO (in this context) is AMD's version of Intel's XMP, brand new for AM5:
https://www.amd.com/en/technologies/expo
It's not specific to any motherboard manufacturer or RAM manufacturer, but both need to support it to work as intended.
I stand corrected. I hit a page from 2016 with EOPC
paulunderwood is offline   Reply With Quote
Old 2022-12-17, 11:54   #31
S485122
 
S485122's Avatar
 
"Jacob"
Sep 2006
Brussels, Belgium

2·977 Posts
Default

James posted some data in the Zen4 7950X Benchmarks thread :
Quote:
Originally Posted by James Heinrich View Post
After playing with my 7950X for a bit I came to two conclusions:

1) Getting workers aligned to chiplets is vital. Running 16-thread PRP across 2 chiplets is actually 40% slower than just running 8 threads on one chiplet.

2) You can save a fair amount of power and heat and not lose much performance. I actually dropped the thermal limit from 90°C to 70°C in the BIOS. My PRP iteration times are perhaps 1% slower, but CPU power consumption is down from 235W to 195W, temperature down from 92C to 72C, and (important to me) the fan noise is down from very-noticeable to barely-there. For me at least it's a hugely worthwhile tradeoff.
His 7950X used 235 W without going over 90 °C.

Tuckertao's 7950X reaching more than 86 °C at 153 W might indicate a cooling problem. He blamed the errors showing up in Prime95 on Windows 11, some updates, then Windows 10, then Prime95, ... He changed all kind of settings without fully understanding their meaning.

He should start by reverting his motherboard BIOS settings to the factory defaults without overclocking, perhaps just configuring the memory settings according to CPU, memory and motherboard specifications (checking the respective manufacturers documentation.)

Then he should check his hardware setup : CPU cooler, ventilation of the case, ...

Then revert his Prime95 settings to the default, then input his user and computer names, set the memory to use for P+-1 and ECM, optionally set the work type.

After that start torture testing "Small FFT's" to check the cooling of his hardware, then "Large FFT's to check the memory or just "Blend". A torture test should be run for quite a time (hours), especially if there are suspicions of hardware problems.
S485122 is offline   Reply With Quote
Old 2022-12-17, 15:52   #32
Andrew Usher
 
Dec 2022

1FB16 Posts
Default

Based on symptoms it seems not unlikely that overheating is contributing to his problems, and he knows he's running too hot (and should know how to fix it, because it's surely his settings that caused it to run hot in the first place - other than possibly laptops, CPUs don't normally reach that temperature with default settings). Most people including me have a max CPU temp around 70 C.

Re-doing the suspect P-1 is surely a good idea. As for the P-1 bounds, perhaps he'd understand better if he knew of the 30.8 changes that cause the greatest amount of the disparity - he's running 30.8, and presumably has enough memory to run the faster stage 2 on these exponents, and so really should be taking advantage of the higher B2 it allows. If stage 2 takes less than half as much time as stage 1 (ignoring GCD time), it's definitely suboptimal. But that is less important than reliability - good P-1 at acceptable bounds beats bad P-1 at good bounds.

And to repeat what others have said, the torture test is there for a reason! Every time you make a significant hardware change or encounter apparent hardware problems it should be run again.
Andrew Usher is offline   Reply With Quote
Old 2022-12-18, 01:02   #33
scan80269
 
"Sam"
Jun 2019
California, USA

478 Posts
Default

I concur with others in that memory is the most likely point of trouble for the OP.

My 7950X is paired with an ASUS ProArt X670E-Creator WiFi motherboard. This board supports DDR5 memory with AMD EXPO profiles but not DDR5 memory with Intel XMP profiles. I test installed a pair of Trident Z5 RGB XMP DDR5 6000 memory modules with this motherboard and the XMP profile did not even appear within BIOS setup. This is when I realized that X670/E motherboards really need to be given DDR5 memory modules with AMD EXPO profile, if one wants memory speed & timings faster than the DDR5 4800 JEDEC standard. I use a pair of G.Skill Trident Z5 Neo AMD EXPO 16GB DDR5 6000 modules with my 7950X and have not encountered any stability issues.

While ASUS motherboards typically allow memory timings to be manually configured, this is always a risky approach with questionable odds of success.

Another thing to consider is how Microsoft Windows 11 OS can mess up support for CPUs with non-hybrid/legacy core architecture such as Zen 3 & Zen 4 Ryzen. The most recent OS scheduler efforts by Microsoft have been focused in optimally supporting hybrid core CPUs such as Intel Alder Lake & Raptor Lake. It appears that Microsoft has struggled to keep up with the latest gen Intel & AMD CPUs and the scheduler cannot support hybrid core CPUs without compromising non-hybrid core CPU performance and perhaps robustness also. The latest incident of Windows 11 22H2 causing performance issues with Zen 4 Ryzen CPUs, reported back in October, was by no means the first.

In addition to running the Prime95 torture test with large FFTs (stresses memory controller and RAM), I recommend running Memtest86+ to assess the integrity of the memory subsystem.

One last recommendation: update the motherboard to the latest released BIOS from the manufacturer. AMD AGESA code support for Zen 4 is apparently still evolving to improve system performance and DRAM compatibility.
scan80269 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Zen4 7950X Benchmarks Mysticial Hardware 23 2022-12-17 03:02
Ryzen help Prime95 Hardware 9 2018-05-14 04:06
AMD Ryzen is risin' up. jasong Hardware 11 2017-03-02 19:56
Prime-Unstable... beorntheold Hardware 8 2005-01-23 19:04
Unstable? Unregistered Hardware 10 2005-01-15 18:17

All times are UTC. The time now is 16:26.


Fri Jul 7 16:26:44 UTC 2023 up 323 days, 13:55, 0 users, load averages: 2.61, 2.20, 1.77

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔