mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2022-09-23, 18:12   #2784
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·2,287 Posts
Default GPU -> Host read errors

Other configurations that have also had repeated GPU -> Host read errors:
roa; Lenovo 0B98401PRO motherboard, dual Xeon e5-2697v2, Radeon VII directly in PCIe slot
asrock; i7-4790 in Asrock H81 motherboard, Radeon VII on extender
test; i3-4170 in Asrock H81 motherboard, RX 5700XT had frequent errors
asr3; i7-4770 in Asrock H81 motherboard, same RX 5700XT GPU had frequent errors, Radeon VII none, GTX1080 had none; RTX 2080 had none in brief testing; RX550 had recoverable single GPU -> Host read errors & rare double but no run-terminating triple.

These errors seem more likely in the DC to first test wavefront exponent range than at larger exponent, but have also occurred at 355M and 384M.

On the Radeon VIIs, I haven't ruled out that the GPU -> Host errors correlate with the GPUs containing Samsung HBM2 ram. That would not explain the RX 5700XT, which has GDDR5 ram IIRC.

Conversely, I have an RX550 (2GiB) on the asrock system above, that hangs on resume of 384M PRP, but runs 64M PRP as DC normally. A Rdeon VII on the same system is able to resume the 384M PRP ok.

The H81 is a 6 PCIe layout, of which 5 are usable if the motherboard VGA output is used. System asrock was originally set up as an mfaktc engine with NVIDIA GPUs, so system ram was more than adequate.
With multiple GPUs running P-1 on Gpuowl, as the system in the previous post was run for a while, 16GiB system ram was limiting.

Last fiddled with by kriesel on 2022-09-23 at 18:13
kriesel is online now   Reply With Quote
Old 2022-09-24, 14:38   #2785
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·2,287 Posts
Default

Quote:
Originally Posted by kriesel View Post
On the Radeon VIIs, I haven't ruled out that the GPU -> Host errors correlate with the GPUs containing Samsung HBM2 ram. That would not explain the RX 5700XT, which has GDDR5 ram IIRC.
I think it's uncorrelated with Samsung HBM2; neither positively nor negatively correlated.
Because the roa system GPU has Samsung ram, but on the 5-GPU system it's device 1 that has Samsung ram, but the error occurs most often on devices 2 and 3. Asrock has errors in its Radeon VII folder logs from two months ago, and Radeon VII Hynix ram now, but GPUs were changed at some point, so is inconclusive.

Last fiddled with by kriesel on 2022-09-24 at 14:43
kriesel is online now   Reply With Quote
Old 2022-09-28, 16:08   #2786
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

2×32×11×43 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
Okay, let's try this a second time.



We think (?) we have done the following calculations properly.

The attached table shows the energy efficiency for this card. We are using the physical "quiet" BIOS switch on the card. We have modified the fan curve to run at 1% fan speed per °C. (So if the card is 46 °C the fans run at 46%.) The PRP work unit we tested is in the 114.6M range. "HS" = GPU hot spot. Our "$/kWh" is roughly 10¢.

Attached Thumbnails
Click image for larger version

Name:	6800XT.png
Views:	44
Size:	32.6 KB
ID:	27389  
Xyzzy is offline   Reply With Quote
Old 2022-09-29, 06:43   #2787
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

2·3·7·241 Posts
Default

What's the second column?
LaurV is online now   Reply With Quote
Old 2022-09-29, 07:20   #2788
S485122
 
S485122's Avatar
 
"Jacob"
Sep 2006
Brussels, Belgium

72C16 Posts
Default

Quote:
Originally Posted by LaurV View Post
What's the second column?
J/s Joules per second, Watts in other words.
S485122 is offline   Reply With Quote
Old 2022-09-29, 07:36   #2789
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

236128 Posts
Default

Quote:
Originally Posted by S485122 View Post
J/s Joules per second, Watts in other words.
That's a "J", haha, thanks! Silly me... I don't want to tell what I was thinking about...
LaurV is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1684 2022-04-19 20:25
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 17:00.


Mon Oct 3 17:00:31 UTC 2022 up 46 days, 14:29, 1 user, load averages: 1.20, 1.26, 1.21

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔