mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2016-03-30, 12:59   #1
bgbeuning
 
Dec 2014

22×32×7 Posts
Default mfaktc GPU usage

When I run mfaktc on Linux, the nvidia-smi command says the GPU usage is between 9% and 16%.
Is that normal or do I have something configured wrong?

GPU is Tesla M2070 with CC 2.0 .
bgbeuning is offline   Reply With Quote
Old 2016-03-30, 14:47   #2
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

100111000011012 Posts
Default

I am on Windows, but FWIW, mfaktc runs at a maximum of 98% if Prime95 is running full on. With P95 stopped, or only running one worker, mfaktc can make it to 99%. Additional CPU usage by non-P95 items, like System, will cause a greater drop. Granted, this is with an FX-8350 CPU. Intel may perform differently in this regard.

Sorry if this turns out to be irrelevant under Linux.

EDIT: This is with two GTX 580s, also CC 2.0.

Last fiddled with by kladner on 2016-03-30 at 14:49
kladner is offline   Reply With Quote
Old 2016-03-30, 17:10   #3
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

110710 Posts
Default

Hi,

Quote:
Originally Posted by bgbeuning View Post
Is that normal or do I have something configured wrong?
I don't know because you didn't provide any further details (mfaktc version, exponent and bitlevels, CPU or GPU sieving).

Oliver
TheJudger is offline   Reply With Quote
Old 2016-03-31, 00:19   #4
bgbeuning
 
Dec 2014

22×32×7 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Hi,

I don't know because you didn't provide any further details (mfaktc version, exponent and bitlevels, CPU or GPU sieving).

Oliver
Sorry about short details.

version 0.20
exponents around 75e6 (recently assignment by gpu72)
bit levels 73, 74, 75
I think this version does GPU sieving.


Here is the startup output from mfaktc

Quote:
mfaktc v0.20 (64bit built)

Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled

Runtime options
SievePrimes 25000
SievePrimesAdjust 1
SievePrimesMin 5000
SievePrimesMax 100000
NumStreams 3
CPUStreams 3
GridSize 3
GPU Sieving enabled
GPUSievePrimes 82486
GPUSieveSize 64Mi bits
GPUSieveProcessSize 16Ki bits
WorkFile worktodo.txt
Checkpoints enabled
CheckpointDelay 30s
Stages enabled
StopAfterFactor bitlevel
PrintMode full
V5UserID (none)
ComputerID (none)
AllowSleep no
TimeStampInResults no

CUDA version info
binary compiled for CUDA 4.20
CUDA runtime version 4.20
CUDA driver version 7.50

CUDA device info
name Tesla M2070
compute capability 2.0
maximum threads per block 1024
number of multiprocessors 14 (448 shader cores)
clock rate 1147MHz

Automatic parameters
threads per grid 917504
bgbeuning is offline   Reply With Quote
Old 2016-03-31, 20:10   #5
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

33·41 Posts
Default

Hi,

looks good to me. Is it really using a barrettXX_gs (GPU sieve) kernel?
Any chance to upgrade to mfaktc 0.21 (compiled with a more recent CUDA toolkit?)

If this doesn't help: Please provide the output of 'nvidia-smi' and the full output of mfaktc.

Oliver
TheJudger is offline   Reply With Quote
Old 2016-04-01, 01:07   #6
bgbeuning
 
Dec 2014

3748 Posts
Default

I compiled mfaktc-0.21 with cuda 7.5 .
(Had to remove cc_11 support from Makefile.)
Here is mfaktc output with one process running and below the nvidia-smi output.
One more post after this...

Quote:
mfaktc v0.21 (64bit built)

Compiletime options
THREADS_PER_BLOCK 256
SIEVE_SIZE_LIMIT 32kiB
SIEVE_SIZE 193154bits
SIEVE_SPLIT 250
MORE_CLASSES enabled

Runtime options
SievePrimes 25000
SievePrimesAdjust 1
SievePrimesMin 5000
SievePrimesMax 100000
NumStreams 3
CPUStreams 3
GridSize 3
GPU Sieving enabled
GPUSievePrimes 82486
GPUSieveSize 64Mi bits
GPUSieveProcessSize 16Ki bits
Checkpoints enabled
CheckpointDelay 30s
WARNING: Cannot read WorkFileAddDelay from mfaktc.ini, set to 600s by default
WorkFileAddDelay 600s
Stages enabled
StopAfterFactor bitlevel
PrintMode full
V5UserID (none)
ComputerID (none)
AllowSleep no
TimeStampInResults no

CUDA version info
binary compiled for CUDA 7.50
CUDA runtime version 7.50
CUDA driver version 7.50

CUDA device info
name Tesla M2070
compute capability 2.0
max threads per block 1024
max shared memory per MP 49152 byte
number of multiprocessors 14
CUDA cores per MP 32
CUDA cores - total 448
clock rate (CUDA cores) 1147MHz
memory clock rate: 1566MHz
memory bus width: 384 bit

Automatic parameters
threads per grid 917504
GPUSievePrimes (adjusted) 82486
GPUsieve minimum exponent 1055144

running a simple selftest...
Selftest statistics
number of tests 107
successfull tests 107

selftest PASSED!

got assignment: exp=74772779 bit_min=74 bit_max=75 (51.17 GHz-days)
Starting trial factoring M74772779 from 2^74 to 2^75 (51.17 GHz-days)
k_min = 126312450758340
k_max = 252624901523033
Using GPU kernel "barrett76_mul32_gs"

found a valid checkpoint file!
last finished class was: 0
found 0 factor(s) already

Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Mar 31 21:02 | 1 0.2% | 16.616 4h25m | 277.15 82485 n.a.%
Mar 31 21:03 | 9 0.3% | 16.666 4h25m | 276.32 82485 n.a.%
Quote:
Thu Mar 31 21:03:25 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.79 Driver Version: 352.79 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M2070 Off | 0000:0F:00.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 80MiB / 5375MiB | 13% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M2070 Off | 0000:11:00.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 10MiB / 5375MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M2070 Off | 0000:12:00.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 10MiB / 5375MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M2070 Off | 0000:15:00.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 10MiB / 5375MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla M2070 Off | 0000:16:00.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 10MiB / 5375MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla M2070 Off | 0000:17:00.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 10MiB / 5375MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla M2070 Off | 0000:18:00.0 Off | 0 |
| N/A N/A P8 N/A / N/A | 10MiB / 5375MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 12285 C /home/bgb/bin64/mfaktc.exe 68MiB |
+-----------------------------------------------------------------------------+
bgbeuning is offline   Reply With Quote
Old 2016-04-01, 01:11   #7
bgbeuning
 
Dec 2014

22·32·7 Posts
Default

Here is nvidia-smi output with all 7 running.
By running just one I wanted to show the PCI link is not the bottleneck.

Quote:
Thu Mar 31 21:09:32 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.79 Driver Version: 352.79 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M2070 Off | 0000:0F:00.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 80MiB / 5375MiB | 14% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M2070 Off | 0000:11:00.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 80MiB / 5375MiB | 16% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M2070 Off | 0000:12:00.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 80MiB / 5375MiB | 16% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M2070 Off | 0000:15:00.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 80MiB / 5375MiB | 22% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla M2070 Off | 0000:16:00.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 80MiB / 5375MiB | 16% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla M2070 Off | 0000:17:00.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 80MiB / 5375MiB | 23% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla M2070 Off | 0000:18:00.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 80MiB / 5375MiB | 14% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 12285 C /home/bgb/bin64/mfaktc.exe 68MiB |
| 1 12310 C /home/bgb/bin64/mfaktc.exe 68MiB |
| 2 12311 C /home/bgb/bin64/mfaktc.exe 68MiB |
| 3 12312 C /home/bgb/bin64/mfaktc.exe 68MiB |
| 4 12313 C /home/bgb/bin64/mfaktc.exe 68MiB |
| 5 12314 C /home/bgb/bin64/mfaktc.exe 68MiB |
| 6 12315 C /home/bgb/bin64/mfaktc.exe 68MiB |
+-----------------------------------------------------------------------------+
bgbeuning is offline   Reply With Quote
Old 2016-04-01, 07:56   #8
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

33×41 Posts
Default

Hi,

In theory the PCIe bandwitdh shouldn't be an issue when running GPU sieve.
The performance numbers (~277GHz equivalent) look as expected for this GPU type. So I guess the GPU/mfaktc is running at full speed and it is an issue with nvidia-smi.

Removing cc 1.1 from the makefile is perfectly fine for CUDA 7.5. But keep in mind that CUDA 7.0 and 7.5 are broken for maxwell/mfaktc...

Oliver
TheJudger is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Low CPU/GPU usage? GeoffreyY Msieve 23 2017-02-17 18:01
GPU Usage Brain GPU Computing 9 2011-04-12 22:25
mfaktc tichy GPU Computing 4 2010-12-03 21:51
Usage of GMP-ECM ECMFreak Factoring 13 2007-07-20 17:34
CPU usage Unregistered Software 6 2003-11-19 07:05

All times are UTC. The time now is 23:43.

Fri Sep 25 23:43:38 UTC 2020 up 15 days, 20:54, 1 user, load averages: 1.57, 1.43, 1.39

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.