mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2020-06-22, 19:47   #1
joehesse
 
Jun 2020

2 Posts
Default Need help with failed prime95 test

Hello,

I ran ./prime with N to the first prompt and used the defaults for the other prompts.
I have done this a number of times and I always get the same error. Since you have the
following comment in stress.txt, I want to see if the error is due to a software bug.

"HOWEVER, if you are failing the torture test in the SAME SPOT with the SAME ERROR MESSAGE
every time, then ask for help at http://mersenneforum.org - it is possible that a recent
change to the torture test code may have introduced a software bug."

My computer:
Fedora 32 Linux
No overclocking
Gigabyte X570 Aorus Elite Wifi motherboard
AMD Ryzen 9, 3900X, EVGA liquid cooling with 2 fans
32G memory - passed MemTest 86
1TB nvme ssd for OS Fedora Linux 32
1TB mechanical drive for /home and swap
Corsair Gold 650 watt power supply
Bare minimum video Card - MSI GT710 1GB D3 PCIE LP
Computer powered by a heavy duty UPS

I used lm_sensors to monitor the CPU temperature.
$ sensors | grep --after-context=10 k10temp
The output of the above looks like

k10temp-pci-00c3
Adapter: PCI adapter
Vcore: 1.31 V
Vsoc: 1.07 V
Tdie: +69.9°C
Tctl: +69.9°C
Tccd1: +70.2°C
Tccd2: +70.0°C
Icore: 89.00 A
Isoc: 11.50 A

I'm not sure which of the temperatures is the correct one to monitor but all 4
were close to but under 70.0C.

Here is the file results.txt where lines of the form "Self-test xxK passed!" have been omitted.
In this test and previous ones, the error was always a "FATAL ERROR: Rounding was ..."

I would appreciate any help you can give me.
Thank you,
Joe

=================================================================================
[Mon Jun 22 13:25:22 2020]
[Mon Jun 22 13:31:06 2020]
[Mon Jun 22 13:37:14 2020]
[Mon Jun 22 13:43:41 2020]
[Mon Jun 22 13:50:17 2020]
[Mon Jun 22 13:57:03 2020]
[Mon Jun 22 14:03:38 2020]
[Mon Jun 22 14:09:16 2020]
[Mon Jun 22 14:14:53 2020]
FATAL ERROR: Rounding was 0.4998519667, expected less than 0.4
Hardware failure detected, consult stress.txt file.
[Mon Jun 22 14:22:02 2020]
FATAL ERROR: Rounding was 0.5, expected less than 0.4
Hardware failure detected, consult stress.txt file.
[Mon Jun 22 14:27:58 2020]
joehesse is offline   Reply With Quote
Old 2020-06-23, 10:09   #2
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

111158 Posts
Default

This is tough problem. What version of mprime are you using?
paulunderwood is offline   Reply With Quote
Old 2020-06-23, 11:36   #3
axn
 
axn's Avatar
 
Jun 2003

23·683 Posts
Default

Quote:
Originally Posted by joehesse View Post
Here is the file results.txt where lines of the form "Self-test xxK passed!" have been omitted.
In this test and previous ones, the error was always a "FATAL ERROR: Rounding was ..."
You'll need to include those "passed" lines so that we can understand if the error is always happening at the same point in the test. Add it as an attachment to the post, rather than as inline text.

My first impression is that you have a flaky hardware. People have run mprime on Ryzen 3000 series without any software bugs being uncovered; not impossible but unlikely.
axn is offline   Reply With Quote
Old 2020-06-23, 11:44   #4
moebius
 
moebius's Avatar
 
Jul 2009
Germany

10110000102 Posts
Default

Maybe you could try a different XMP profile with less timing and latency for your RAM?
moebius is offline   Reply With Quote
Old 2020-06-23, 12:51   #5
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

31418 Posts
Default

Have you tried the stress test feature of y-cruncher, too?
kruoli is online now   Reply With Quote
Old 2020-06-23, 15:36   #6
joehesse
 
Jun 2020

216 Posts
Default XMP was the problem.

My bios has only 1 xmp profile and it is either enabled or disabled. When I did the original prime95 test it was enabled. I disabled it and prime95 went for 2.5 hours without an error. I will try again and if prime95 runs for 10 hours or so without error I will assume I solved it.


Thank you all for your responses.
joehesse is offline   Reply With Quote
Old 2020-06-24, 02:27   #7
moebius
 
moebius's Avatar
 
Jul 2009
Germany

2C216 Posts
Default

Quote:
Originally Posted by joehesse View Post
My bios has only 1 xmp profile...
I have an MSI X470 mainboard with 3700X.
You can also set in the BIOS that the memory runs with an specific XMP profile and e.g. 3200 instead of 3000 MHz, which is prime95 stable for me.
Sometimes round off errors first occur at e.g. 50% of the LL test, but then quite a few times in succession.

Last fiddled with by moebius on 2020-06-24 at 02:41
moebius is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
overclocking and prime95 test nik PrimeNet 19 2014-07-25 08:29
Prime95 - Torture Test doffdoff Software 12 2012-05-04 19:53
Prime95 resetting LL test jonthomson Software 17 2007-05-12 13:21
Test 1: failed? No0dles Hardware 12 2005-11-05 09:23
Help! Prime95 service failed! bluea3 Software 5 2005-01-14 14:26

All times are UTC. The time now is 16:29.


Fri Jul 7 16:29:24 UTC 2023 up 323 days, 13:57, 0 users, load averages: 2.68, 2.24, 1.84

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔