mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2023-03-28, 16:14   #1
cardogab7341
 
Mar 2013
Dallas, TX

37 Posts
Default CPU load affecting GPU throughput

Running my new system, a Ryzen 7950X with RTX 3070 TI GPU, I noticed that when trial factoring on the GPU, the throughput drops from about 3500 GHz-D/D to 2900 or so GHz-D/D when I start Prime95 on the CPU and it is fully loaded.

Is this normal? Is there a way to mitigate it? What causes this?

Thanks
cardogab7341 is offline   Reply With Quote
Old 2023-03-28, 16:43   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

7,823 Posts
Default

What does task manager in Windows or top in Linux say about the TF process's CPU usage?
Hopefully you are using GPU sieving not CPU sieving. In that case CPU usage of mfaktc should be quite low.
I see ~0% CPU use on mfaktc on an RTX2080 Super; less than 12 logical-core-minutes accumulated in 4 days+ on an old dual-4-core & x2HT Xeon E5520 system; <0.21% of one logical core of the 16. I'm guessing it would be even lower with a new Ryzen 7950x. That's with 1/20 the default checkpoint saving frequency in mfaktc.ini:
Code:
# CheckpointDelay is the time in seconds between two checkpoint writes.
# Allowed values are 0 <= CheckpointDelay <= 900.
#
# Minimum: CheckpointDelay=0
# Maximum: CheckpointDelay=900
#
# Default: CheckpointDelay=30

CheckpointDelay=600
Frequency of logging class completions and saving checkpoints would affect CPU usage observed.
You could try running less-classes, or higher bit levels, or both, which take longer per class.

Tuning the TF app if you haven't yet could help. Using a version allowing GPUSieveSize up to 2047M helps performance. See also the mfaktc reference thread, especially posts on tuning. https://www.mersenneforum.org/showthread.php?t=23386

Another possibility is it's indirect thermal coupling. Is the case well ventilated, all fans functioning well?

Last fiddled with by kriesel on 2023-03-28 at 16:50
kriesel is online now   Reply With Quote
Old 2023-03-28, 18:04   #3
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/

24×199 Posts
Default

Quote:
Originally Posted by cardogab7341 View Post
Running my new system, a Ryzen 7950X with RTX 3070 TI GPU, I noticed that when trial factoring on the GPU, the throughput drops from about 3500 GHz-D/D to 2900 or so GHz-D/D when I start Prime95 on the CPU and it is fully loaded.

Is this normal? Is there a way to mitigate it? What causes this?

Thanks
The way mfaktc is written, there are parts that wait on the CPU while the GPU is idle. I'd leave at least a core free. You won't have enough memory bandwidth to fully saturate the CPU cores either (assuming DDR5 6000).
Mark Rose is offline   Reply With Quote
Old 2023-03-28, 18:31   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

7,823 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
The way mfaktc is written, there are parts that wait on the CPU while the GPU is idle. I'd leave at least a core free.
Hyperthreading is typically sufficient to handle that reasonably well.
Another way to go is to run two mfaktc instances on the same GPU, from different folders. While one waits for the CPU the other can keep the GPU busy.
A brief test on a dual e5-2670 system running prime95 on all cores but without using hyperthreading in prime95 P-1 shows only a very slight impact on mmff 0.28, which is similar to mfaktc and derived from it, on a GTX1650. At MM127 TF, stopping all prime95 workers provided only a ~0.23% increase in GPU throughput, not far above measurement noise & normal fluctuation. Each output line represents a completion of one factor class.
Code:
[Mar 28 10:44] M127        [186-187]:  27.9% 1288/4620,268/960 |   n.a.  | 3107.8s | 24d21h | 1082.25G | 348.24M/s | 1050165 |   n.a.%  | kriesel@emu-gtx1650
[Mar 28 11:36] M127        [186-187]:  28.0% 1292/4620,269/960 |   n.a.  | 3108.6s | 24d20h | 1082.25G | 348.14M/s | 1050165 |   n.a.%  | kriesel@emu-gtx1650
(stop prime95 workers at 11:29)
[Mar 28 12:27] M127        [186-187]:  28.1% 1297/4620,270/960 |   n.a.  | 3101.3s | 24d18h | 1082.25G | 348.97M/s | 1050165 |   n.a.%  | kriesel@emu-gtx1650
[Mar 28 13:19] M127        [186-187]:  28.2% 1304/4620,271/960 |   n.a.  | 3100.6s | 24d17h | 1082.25G | 349.05M/s | 1050165 |   n.a.%  | kriesel@emu-gtx1650

Last fiddled with by kriesel on 2023-03-28 at 19:05
kriesel is online now   Reply With Quote
Old 2023-03-28, 19:57   #5
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/

24×199 Posts
Default

Quote:
Originally Posted by kriesel View Post
Hyperthreading is typically sufficient to handle that reasonably well.
Another way to go is to run two mfaktc instances on the same GPU, from different folders. While one waits for the CPU the other can keep the GPU busy.
A brief test on a dual e5-2670 system running prime95 on all cores but without using hyperthreading in prime95 P-1 shows only a very slight impact on mmff 0.28, which is similar to mfaktc and derived from it, on a GTX1650. At MM127 TF, stopping all prime95 workers provided only a ~0.23% increase in GPU throughput, not far above measurement noise & normal fluctuation. Each output line represents a completion of one factor class.
Code:
[Mar 28 10:44] M127        [186-187]:  27.9% 1288/4620,268/960 |   n.a.  | 3107.8s | 24d21h | 1082.25G | 348.24M/s | 1050165 |   n.a.%  | kriesel@emu-gtx1650
[Mar 28 11:36] M127        [186-187]:  28.0% 1292/4620,269/960 |   n.a.  | 3108.6s | 24d20h | 1082.25G | 348.14M/s | 1050165 |   n.a.%  | kriesel@emu-gtx1650
(stop prime95 workers at 11:29)
[Mar 28 12:27] M127        [186-187]:  28.1% 1297/4620,270/960 |   n.a.  | 3101.3s | 24d18h | 1082.25G | 348.97M/s | 1050165 |   n.a.%  | kriesel@emu-gtx1650
[Mar 28 13:19] M127        [186-187]:  28.2% 1304/4620,271/960 |   n.a.  | 3100.6s | 24d17h | 1082.25G | 349.05M/s | 1050165 |   n.a.%  | kriesel@emu-gtx1650
For what it's worth, a 3070 Ti is far more demanding than a 1650.

With my 1070s, the impact of running mprime with no hyperthreading is minimal. With my 3070s, GPUSieveSize=128, each was taking 10% of a core (including system usage in the nvidia driver). With GPUSieveSize=2047, they still use about 1% each. On my system with the 3070s I have mprime running on a single core, leaving the other free.
Mark Rose is offline   Reply With Quote
Old 2023-03-28, 20:48   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

7,823 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
For what it's worth, a 3070 Ti is far more demanding than a 1650.
Sure, and let's look at a few numbers;
3076 GHD/d mfaktc RTX 3070Ti per https://www.mersenne.ca/mfaktc.php; 806 GDH/d mfaktc GTX 1650.
Further, allow 2:1 mfaktc/mmff, so
3076 * 2 / 806 = 7.63:1.
So from 0.21% for the GTX1650 in mmff to 1.6% of a cpu core for an RTX 3070 Ti in mfaktc, still quite low CPU core utilization.
And that's pessimistically assuming a Ryzen 7950X core is only equal to an old xeon e5-2670 core, which seems unlikely.
I don't know what your CPU model is, but the ~1% is comparable to the 1.6% estimated above.
The order of magnitude CPU overhead reduction with proper GPU app tuning is useful data. Thanks for that.

Last fiddled with by kriesel on 2023-03-28 at 21:00
kriesel is online now   Reply With Quote
Old 2023-03-29, 14:44   #7
cardogab7341
 
Mar 2013
Dallas, TX

37 Posts
Default

Quote:
Originally Posted by kriesel View Post
Tuning the TF app if you haven't yet could help. Using a version allowing GPUSieveSize up to 2047M helps performance. See also the mfaktc reference thread, especially posts on tuning. https://www.mersenneforum.org/showthread.php?t=23386

Another possibility is it's indirect thermal coupling. Is the case well ventilated, all fans functioning well?
Thanks for the tips. Once I downloaded and installed the version allowing GPUSieveSize up to 2047M, everything is working fine. TF on the 3070TI is running at 3700-3900 GHz-d/d, depending on exponent, and the CPU load does not have a noticeable effect on these numbers.
cardogab7341 is offline   Reply With Quote
Old 2023-06-14, 20:41   #8
moebius
 
moebius's Avatar
 
Jul 2009
Germany

10110000102 Posts
Default

Quote:
Originally Posted by cardogab7341 View Post
Running my new system, a Ryzen 7950X with RTX 3070 TI GPU, I noticed that when trial factoring on the GPU, the throughput drops from about 3500 GHz-D/D to 2900 or so GHz-D/D when I start Prime95 on the CPU and it is fully loaded.

Is this normal? Is there a way to mitigate it? What causes this?

Thanks
Mfactc runs on a higher priority than other tasks and use a few % of the cpu capacity.
moebius is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Budget PC Throughput Rodrigo Hardware 14 2011-09-26 10:16
possibly serious bug affecting msieve 1.48 & 1.49 jrk Msieve 0 2011-09-03 17:53
how is the throughput calculated? ixfd64 PrimeNet 5 2008-05-21 13:39
My throughput does not compute... petrw1 Hardware 9 2007-08-13 14:38
Fake throughput drop Lumly Lounge 12 2002-09-05 20:00

All times are UTC. The time now is 14:21.


Fri Jul 7 14:21:13 UTC 2023 up 323 days, 11:49, 0 users, load averages: 0.79, 1.12, 1.20

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔