mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2020-09-28, 11:46   #3334
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

23·3·331 Posts
Default

Quote:
Originally Posted by kriesel View Post
Maybe post 3189's attachment will help (also linked to at 3208).
Thanks!

Xyzzy is offline   Reply With Quote
Old 2020-09-28, 14:03   #3335
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

2×3×13×23 Posts
Default

Quote:
Originally Posted by DrobinsonPE View Post
The version of mfaktc I am using is the linux compiled version I found at this post https://mersenneforum.org/showpost.p...postcount=3208

The mfaktc.ini file was already configured with the settings GPUSieveProcessSize=32, GPUSieveSize=2047, GPUSievePrimes=82486.
I checked the archives I have downloaded to date. None have mfaktc.ini. Only the binaries. I will do some more digging.

Edit: I found it in the Linux archive in the link above. The settings were mostly like the quoted above. The best 0.1% time was 1.876 seconds. 3349 GHz-d/day.

Last fiddled with by storm5510 on 2020-09-28 at 14:20
storm5510 is offline   Reply With Quote
Old 2020-10-01, 05:59   #3336
Neutron3529
 
Neutron3529's Avatar
 
Dec 2018
China

23·5 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
If you have a completed TF run could you please submit the data so I can calculate the values for RTX 3080?
https://www.mersenne.ca/mfaktc.php#benchmark

borrowed a machine with 4 RTX 3090 and found a quite strange thing:
the GPU util cannot reach 100% even I am running a single mfaktc in a single GPU


uploaded 2 result, and may try gpuowl later
Neutron3529 is offline   Reply With Quote
Old 2020-10-01, 07:16   #3337
Neutron3529
 
Neutron3529's Avatar
 
Dec 2018
China

23·5 Posts
Default

Quote:
Originally Posted by Neutron3529 View Post
borrowed a machine with 4 RTX 3090 and found a quite strange thing:
the GPU util cannot reach 100% even I am running a single mfaktc in a single GPU

uploaded 2 result, and may try gpuowl later
results are uploaded, but do not tried gpuowl since `gmpxx.h` not found.


all the results generate by mfaktc is here:
results.7z
Neutron3529 is offline   Reply With Quote
Old 2020-10-01, 07:52   #3338
moebius
 
moebius's Avatar
 
Jul 2009
Germany

54710 Posts
Default

Quote:
Originally Posted by Neutron3529 View Post
uploaded 2 result, and may try gpuowl later
Please make a short gpuowl benchmark with the exponent 77936867, so that we can directly compare the values ​​of the graphics cards, thank you.
https://mersenneforum.org/showthread.php?p=558317#post558317

Last fiddled with by moebius on 2020-10-01 at 07:52
moebius is offline   Reply With Quote
Old 2020-10-01, 13:28   #3339
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

484510 Posts
Default

Quote:
Originally Posted by Neutron3529 View Post
borrowed a machine with 4 RTX 3090 and found a quite strange thing:
the GPU util cannot reach 100% even I am running a single mfaktc in a single GPU
I would expect that. Fast gpus need multiple instances as well as large gpusieveprimes and other tuning typically. Tune with a single instance first, then test performance versus number of tuned instances is the approach I use. The effect seems to be stronger, the faster the gpu is. Solid state disk or ramdisk might help also.
kriesel is offline   Reply With Quote
Old 2020-10-01, 13:51   #3340
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

3×23×47 Posts
Default

Quote:
Originally Posted by Neutron3529 View Post
borrowed a machine with 4 RTX 3090 and found a quite strange thing: the GPU util cannot reach 100% even I am running a single mfaktc in a single GPU
This is normal on high-performance GPUs. 1080 will get to about 95%, 2080 will get to about 80% (apparently 30x0 same). The GPU is just too fast, the little bit that the CPU does can't keep up. In production use running two instances of mfaktc should allow optimal throughput (splitting the CPU load across two cores).
James Heinrich is offline   Reply With Quote
Old 2020-10-01, 14:12   #3341
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

2×3×13×23 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
This is normal on high-performance GPUs. 1080 will get to about 95%, 2080 will get to about 80% (apparently 30x0 same). The GPU is just too fast, the little bit that the CPU does can't keep up. In production use running two instances of mfaktc should allow optimal throughput (splitting the CPU load across two cores).

There is a solution for this. I do not know about Linux, but with Windows it is possible to set the CPU speed based on a percentage of its capability. Default minimum is something like 5%. It will not respond to a quick pulse very much. Set it to 85%, for example, with no load and it will respond much faster. I noticed that when I have Prime95 running, my GPU performance, with mfaktc, increased considerably.
Attached Thumbnails
Click image for larger version

Name:	cpu.JPG
Views:	56
Size:	54.7 KB
ID:	23451  
storm5510 is offline   Reply With Quote
Old 2020-10-01, 17:02   #3342
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×5×17×19 Posts
Default

Quote:
Originally Posted by kriesel View Post
Fast gpus need multiple instances as well as large gpusieveprimes and other tuning typically. Tune with a single instance first, then test performance versus number of tuned instances is the approach I use. The effect seems to be stronger, the faster the gpu is. Solid state disk or ramdisk might help also.
Oops, meant large GpuSieveSize there.
Needing multiple instances for full performance is typical for gpus faster than ~GTX1050Ti, even with prime95 fairly fully utilizing the cpu, keeping cpu clock rates at highest sustainable levels, and on -gpu sieving enabled in mfaktc. The faster the gpu the more it matters. Two instances does a pretty good job on some gpu models; I use 3 instances to get the most throughput from GTX2080x. So it's no surprise the GTX3080 is underutilized with a single instance. Also, was the GTX3080 mfaktc test thoroughly tuned?

I think the lower than 100% utilization in mfaktc has to do with time for saving checkpoint files and generating console output, and activities that may be limited by pcie bandwidth. Running multiple instances lets gpu resources work on something in one instance while another instance is waiting for the cpu side of mfaktc and the OS to get things done occasionally and communication across pcie to occur.
For comparison, GTX1080Ti shows 98% utilization in gpuowl with one instance.
Mfakto shows much less effect of tuning than mfaktc for equivalent gpu speed. So maybe it has to do with CUDA call overhead.

For more, see detailed mfaktc tune analyses on GTX1080Ti and RTX2080 Super here. I saw 90% utilization with 256 gpusievesize on RTX2080Super, but 2047 gpusievesize and a good tune otherwise boosted it a lot.

Last fiddled with by kriesel on 2020-10-01 at 17:29
kriesel is offline   Reply With Quote
Old 2020-10-04, 14:47   #3343
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

179410 Posts
Default

I just began using Ubuntu 20.04 LTS. The archive mfaktc-0.21-linux64.cuda10.1-gpusievesize2047.tar.gz does not contain the libraries needed to run. Where can I find them?
storm5510 is offline   Reply With Quote
Old 2020-10-04, 15:28   #3344
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

7·47 Posts
Default

Quote:
Originally Posted by storm5510 View Post
I just began using Ubuntu 20.04 LTS. The archive mfaktc-0.21-linux64.cuda10.1-gpusievesize2047.tar.gz does not contain the libraries needed to run. Where can I find them?
You can often find CUDA DLLs by googling them. This should work, and shouldn't have viruses (99% sure it doesn't, but do as you wish).
Viliam Furik is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1668 2020-12-22 15:38
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 15:25.

Thu Jan 21 15:25:39 UTC 2021 up 49 days, 11:36, 0 users, load averages: 2.19, 2.21, 2.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.