mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2022-12-30, 05:25   #3576
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/

24×199 Posts
Default

I've been playing with my 8 x 3070 machine tonight. It's got a lowly 2 core/2 thread G4400 CPU and single channel DDR4-2133 memory.

Compared to my older 1070s, the 3070s in this system benefit greatly from changing GPUSieveSize=128 (from 64), especially when working on 332M exponents. I saw GHz-d/d go from ~2600 to ~3050, and volatile usage in nvidia-smi go from ~75% to ~85%. CPU usage also dropped from ~60% to ~40% (system usage 40%->25%). Once running a GPUSieveSize of 128, running two instances per card resulted in equal or less overall throughput depending on CPU saturation.

Recompiling and setting GPUSieveSize=2047 gives around 3500 GHz-d/d on a 332M number (80->81 is faster than 79->80, both using barrett87_mul32_gs) and 3700 GHz-d/d for a 128M assignment (76->77). Volatile GPU usage rose to 99% and CPU usage dropped to under 10%. Guess I didn't need to buy that quad-core i5-7400 for $40.

So it turns out low GPUSieveSize is a major bottleneck on newer cards. It didn't impact my 1070s at all.

So a 40% increase in performance for a couple hours of fiddling. Not bad.
Mark Rose is offline   Reply With Quote
Old 2022-12-30, 07:44   #3577
rebirther
 
rebirther's Avatar
 
Sep 2011
Germany

1110000101012 Posts
Default

Quote:
Originally Posted by lalera View Post
hi,
here is mfaktc v0.21 linux cuda12
a version for windows i can not compile
i can not test it with a rtx4090 because i have only a gtx1050ti
The app doesnt work:

Code:
./mfaktc.exe: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory
The file was smaller than the others, you need to recompile it where the lib is also included in the local folder.
rebirther is offline   Reply With Quote
Old 2022-12-30, 07:58   #3578
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24·3·163 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
Volatile GPU usage rose to 99% and CPU usage dropped to under 10%. Guess I didn't need to buy that quad-core i5-7400 for $40.

So it turns out low GPUSieveSize is a major bottleneck on newer cards. It didn't impact my 1070s at all.
GPU-Z can show 100% usage. Try running multiple instances on a 3070. On my 2080 Super, dual instances increased throughput by 1.5%, after all conventional tuning and maxing gpusievesize to 2047 had already been done. It also looked like there would be more to be gained on a single instance if part of mfaktc was rewritten further, to unsigned int32 for gpusievesize to allow up to 4095M. Large gpusievesize also helped a 1080 and 1080Ti. https://www.mersenneforum.org/showpo...99&postcount=8
It would be interesting to see how much high gpusievesize matters on 40xx. (Anyone?)

Last fiddled with by kriesel on 2022-12-30 at 08:00
kriesel is online now   Reply With Quote
Old 2022-12-30, 10:02   #3579
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/

24·199 Posts
Default

Quote:
Originally Posted by kriesel View Post
GPU-Z can show 100% usage. Try running multiple instances on a 3070. On my 2080 Super, dual instances increased throughput by 1.5%, after all conventional tuning and maxing gpusievesize to 2047 had already been done. It also looked like there would be more to be gained on a single instance if part of mfaktc was rewritten further, to unsigned int32 for gpusievesize to allow up to 4095M. Large gpusievesize also helped a 1080 and 1080Ti. https://www.mersenneforum.org/showpo...99&postcount=8
GPU-Z seems to be Windows only.

I did try running two instances on a single 3070 with 2047 GPUSieveSize on a 332M 80->81 assignment: throughput dropped 13% from ~3500 to ~3050.

I took a look at increasing gpu_sieve_size beyond 2047. I'm not seeing much reason why it couldn't be increased beyond that, though it probably require changing some types passed to the CUDA kernels. I haven't peeked at how it will affect CPU sieving.
Mark Rose is offline   Reply With Quote
Old 2022-12-30, 10:08   #3580
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/

C7016 Posts
Default

Quote:
Originally Posted by kriesel View Post
GPU-Z can show 100% usage.
And I just caught a 100%. I'm happy with 99% using a single instance.

Code:
$ nvidia-smi
Fri Dec 30 10:05:47 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03   Driver Version: 510.108.03   CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 86%   75C    P2   235W / 270W |    424MiB /  8192MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:02:00.0 Off |                  N/A |
| 76%   67C    P2   234W / 270W |    424MiB /  8192MiB |     99%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:03:00.0 Off |                  N/A |
| 83%   72C    P2   241W / 270W |    424MiB /  8192MiB |     99%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:04:00.0 Off |                  N/A |
| 72%   63C    P2   240W / 270W |    424MiB /  8192MiB |     99%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  NVIDIA GeForce ...  Off  | 00000000:08:00.0 Off |                  N/A |
| 81%   71C    P2   236W / 270W |    424MiB /  8192MiB |     99%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  NVIDIA GeForce ...  Off  | 00000000:0B:00.0 Off |                  N/A |
| 84%   73C    P2   243W / 270W |    424MiB /  8192MiB |     99%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  NVIDIA GeForce ...  Off  | 00000000:0C:00.0 Off |                  N/A |
| 86%   74C    P2   239W / 270W |    424MiB /  8192MiB |     99%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  NVIDIA GeForce ...  Off  | 00000000:0D:00.0 Off |                  N/A |
|  0%   49C    P8    26W / 270W |      3MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      8757      C   ./mfaktc.exe                      421MiB |
|    1   N/A  N/A      8244      C   ./mfaktc.exe                      421MiB |
|    2   N/A  N/A      9980      C   ./mfaktc.exe                      421MiB |
|    3   N/A  N/A      8753      C   ./mfaktc.exe                      421MiB |
|    4   N/A  N/A      8801      C   ./mfaktc.exe                      421MiB |
|    5   N/A  N/A      8808      C   ./mfaktc.exe                      421MiB |
|    6   N/A  N/A      8816      C   ./mfaktc.exe                      421MiB |
+-----------------------------------------------------------------------------+
Mark Rose is offline   Reply With Quote
Old 2022-12-30, 11:08   #3581
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/

24·199 Posts
Default

Playing around with power limits, I set them to 200 watts (down from 270, though they were using ~245) and lost only ~5% throughout. Each card behaves slightly differently.

I'll gladly save the 400 watts.

It's now 4 am. Time to sleep!

Last fiddled with by Mark Rose on 2022-12-30 at 11:08
Mark Rose is offline   Reply With Quote
Old 2022-12-30, 11:27   #3582
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24×3×163 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
I did try running two instances on a single 3070 with 2047 GPUSieveSize on a 332M 80->81 assignment: throughput dropped 13% from ~3500 to ~3050.

I took a look at increasing gpu_sieve_size beyond 2047. I'm not seeing much reason why it couldn't be increased beyond that, though it probably require changing some types passed to the CUDA kernels. I haven't peeked at how it will affect CPU sieving.
Performance drop is surprising.
Gpusievesize impacts an int32 variable. There is some usage in mfaktx of a negative value. So that would need to be split off to a different variable.
Or one could partially 64-bit-ize mfaktc.
If one wanted to play above exponent 2^32, a more complete 64-bit conversion would be needed, including rewriting the kernels.
kriesel is online now   Reply With Quote
Old 2022-12-30, 17:07   #3583
lalera
 
lalera's Avatar
 
Jul 2003

28016 Posts
Default

Quote:
Originally Posted by rebirther View Post
The app doesnt work:

Code:
./mfaktc.exe: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory
The file was smaller than the others, you need to recompile it where the lib is also included in the local folder.
hi,
i have ubuntu v22.04 and cuda toolkit 12 installed
i compiled mfaktc v 0.21 and let the selftests -st and -st2 run - works
to compile it where the libs are does not work (changes nothing - i tried out with a live-cd)
i am sorry but i know not how to do this (i am not a programmer)
lalera is offline   Reply With Quote
Old 2022-12-30, 17:16   #3584
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

427710 Posts
Default

Quote:
Originally Posted by kriesel View Post
It would be interesting to see how much high gpusievesize matters on 40xx. (Anyone?)
I haven't used mfaktc for years until this morning. I now have a 4090 to test on.
What would you like me to test?
Attached Thumbnails
Click image for larger version

Name:	4090-2047.png
Views:	59
Size:	321.6 KB
ID:	27858  
James Heinrich is offline   Reply With Quote
Old 2022-12-30, 19:46   #3585
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/

24·199 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
I haven't used mfaktc for years until this morning. I now have a 4090 to test on.
What would you like me to test?
That thing is bananas!
Mark Rose is offline   Reply With Quote
Old 2022-12-30, 20:09   #3586
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

172208 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
I haven't used mfaktc for years until this morning. I now have a 4090 to test on.
What would you like me to test?
If you cut GpuSieveSize to 1024, how much performance do you lose?
If at GpuSieveSize=2047, you run two instances (in separate folders, very similar work; similar exponent, same bit level), do you gain combined throughput, or lose?
kriesel is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1724 2023-06-04 23:31
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 42 2022-12-18 05:59
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 14:43.


Fri Jul 7 14:43:26 UTC 2023 up 323 days, 12:11, 0 users, load averages: 1.44, 1.33, 1.13

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔