![]() |
|
|
#67 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2×112×47 Posts |
Quote:
With regards to the GPU sieving, yes, I believe "on" has been the default since George and Oliver implemented it, since it is SO much faster! |
|
|
|
|
|
|
#68 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
|
|
|
|
|
|
#69 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2×112×47 Posts |
The 560 doesn't give any details except for fan speed, temp and memory usage (74%, 84C and 64MiB / 1985MiB). nVidia seem to have intentionally broken nvidia-smi for older cards...
For the two 1050s (both in the same machine): Code:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.59 Driver Version: 390.59 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 Off | 00000000:01:00.0 Off | N/A |
| 48% 83C P0 N/A / 65W | 67MiB / 2000MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1050 Off | 00000000:03:00.0 Off | N/A |
| 63% 70C P0 N/A / 75W | 67MiB / 2000MiB | 99% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 16053 C ./mfaktc.exe 57MiB |
| 1 16095 C ./mfaktc.exe 57MiB |
+-----------------------------------------------------------------------------+
Last fiddled with by chalsall on 2019-01-07 at 16:50 Reason: s/The 580/The 560/; |
|
|
|
|
|
#70 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Note in chalsall's nvidia-smi output, gpu load is 99% not 100%, which nvidia-smi is capable of displaying.
On a 3-disparate-gpu system, Win7 x64: Code:
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 378.66 Driver Version: 378.66 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1070 WDDM | 0000:03:00.0 Off | N/A | | 89% 85C P2 119W / 158W | 345MiB / 8192MiB | 98% Default | +-------------------------------+----------------------+----------------------+ | 1 Quadro 2000 WDDM | 0000:1C:00.0 Off | N/A | |100% 91C P0 N/A / N/A | 87MiB / 1024MiB | 99% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 105... WDDM | 0000:28:00.0 Off | N/A | | 40% 67C P0 65W / 75W | 304MiB / 4096MiB | 100% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 5888 C ...uments\gtx1070-mfaktc\2\mfaktc-win-64.exe N/A | | 0 11748 C ...ocuments\gtx1070-mfaktc\mfaktc-win-64.exe N/A | | 1 9908 C ...ments\mfaktc-quadro2000\mfaktc-win-64.exe N/A | | 2 9884 C ...DALucas2.06beta-CUDA6.5-Windows-WIN32.exe N/A | +-----------------------------------------------------------------------------+ Code:
Mon Jan 07 10:57:38 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 378.66 Driver Version: 378.66 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1070 WDDM | 0000:03:00.0 Off | N/A | | 89% 85C P2 113W / 158W | 226MiB / 8192MiB | 95% Default | +-------------------------------+----------------------+----------------------+ | 1 Quadro 2000 WDDM | 0000:1C:00.0 Off | N/A | |100% 92C P0 N/A / N/A | 87MiB / 1024MiB | 99% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 105... WDDM | 0000:28:00.0 Off | N/A | | 41% 68C P0 65W / 75W | 304MiB / 4096MiB | 100% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 11748 C ...ocuments\gtx1070-mfaktc\mfaktc-win-64.exe N/A | | 1 9908 C ...ments\mfaktc-quadro2000\mfaktc-win-64.exe N/A | | 2 9884 C ...DALucas2.06beta-CUDA6.5-Windows-WIN32.exe N/A | +-----------------------------------------------------------------------------+ Another hypothesis about the performance is in doing a class, the last batch of thread blocks may not fully occupy the gpu, and so temporarily underutilize the gpu, for the duration of their run. I remember reading a post somewhere about that. Running multiple instances may reduce the extent and impact of that brief underutilization. More classes would make that occurrence more frequent, less classes less frequent. Last fiddled with by kriesel on 2019-01-07 at 17:18 |
|
|
|
|
|
#71 | |
|
"/X\(‘-‘)/X\"
Jan 2013
https://pedan.tech/
24·199 Posts |
Quote:
1. The cost of electricity over two or three years? 2. The cost of providing a PCIe slot and power? The 2080 Ti is probably the best value once those are taken into account. |
|
|
|
|
|
|
#72 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Less classes, one instance, 92M, peregrine laptop, Win10 x64
Code:
Mon Jan 07 11:20:20 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 398.36 Driver Version: 398.36 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 105... WDDM | 00000000:01:00.0 Off | N/A | | N/A 84C P0 N/A / N/A | 137MiB / 4096MiB | 94% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 49772 C ...0ti\mfaktc-win-64.LessClasses-CUDA8.exe N/A | +-----------------------------------------------------------------------------+ Code:
Mon Jan 07 11:46:48 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 398.36 Driver Version: 398.36 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 105... WDDM | 00000000:01:00.0 Off | N/A | | N/A 86C P0 N/A / N/A | 197MiB / 4096MiB | 98% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 49772 C ...0ti\mfaktc-win-64.LessClasses-CUDA8.exe N/A | | 0 54744 C ...i\2\mfaktc-win-64.LessClasses-CUDA8.exe N/A | +-----------------------------------------------------------------------------+ More classes one instance 304 GhzD/day, 98% load. One more-classes 162.2, one less classes147.6, 309.8 combined, 98% load. Last fiddled with by kriesel on 2019-01-07 at 18:53 |
|
|
|
|
|
#73 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2×112×47 Posts |
Quote:
Could it be an OS issue? You're running WinBlows (sorry, couldn't resist... ) and I'm running a "headless" server-class Linux.Frankly, it is not worth my time to squeeze ~1% more out of my kit if it takes ongoing human cycles.... |
|
|
|
|
|
|
#74 | ||
|
Sep 2003
2·5·7·37 Posts |
Quote:
So, from the programmer's guide: Quote:
|
||
|
|
|
|
|
#75 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Quote:
Context switch is one type of overhead. Apparently not the whole picture. Consider one task running. It will spend time in non-compute phases, loading data and CUDA code and transferring results. Seems like something else could be using the compute cores then. Consider TF with a number of thread blocks that is more than but not an exact multiple of the particular gpu's core count. How fully utilized is the gpu hardware during the last, "runt" subset for a TF class? Simple example, 10 parallel tasks, 8 processors, equal length tasks, 8 run, then 2; average utilization 10/16 (less if allowing for setup time in series). 100 tasks, 12 sets of 8, then 4; utilization <100/(13*8). And I think there's no guarantee the tasks take the same time. Some may wait for the slowest one to finish. There was an online conversation related to this by R Gerbicz and The Judger I think recently. I may be misremembermangling some of the vaguely recalled details. Edit: some good background starts around post 2995 and goes to 3020 in the mfaktc thread. https://www.mersenneforum.org/showth...12827&page=273 And I see you were an active participant in that, contributing content that was useful to me, thanks! Last fiddled with by kriesel on 2019-01-07 at 19:59 |
|
|
|
|
|
|
#76 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Quote:
Display duties of the gpus here are minimal since nearly all systems are being run by remote desktop access, which puts a small load on the cpu not the gpu. The peregrine laptop is further configured to do normal local display by the UHD630 igp not the GTX 1050 Ti it contains. So, some bases covered. It's very understandable to not sweat 1% of a 1050 (~ -2.1 GhzD/day TF difference), definitely a case of small potential gains, but less so when the underutilization is 5 to 10% of a GTX 1070, 1080Ti; or 5% of a RTX2080 as it was for another poster, more than half of the throughput of a GTX1050. I assume you're running the more-classes executable. That vs. Less-classes (which I had been running for reduced console output volume) could account for a lot of difference. Lots of points of difference: OS, gpu model, driver level, possibly app classes count, degree of "headlessness", exponent, TF level, ?... Last fiddled with by kriesel on 2019-01-07 at 19:45 |
|
|
|
|
|
|
#77 | ||
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2C6E16 Posts |
Quote:
If you're not able / willing to "dual boot" between WinDoze ( ) and Linux, perhaps others are, in order to get some heuristics from a "bare metal" perspective.All of my GPUs (admittedly, a small sample set of slower GPUs; even those I sometimes rent from Amazon, Google and M$) don't even have a display connected; they're just for "compute". And they always run Linux. And they always report ~99% utilization by nvidia-smi. Quote:
Last fiddled with by chalsall on 2019-01-07 at 20:14 Reason: s/by nividia-smi/by nvidia-smi/; |
||
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Nvidia GTX 745 4GB ??? | petrw1 | GPU Computing | 3 | 2016-08-02 15:23 |
| Nvidia Pascal, a third of DP | firejuggler | GPU Computing | 12 | 2016-02-23 06:55 |
| AMD + Nvidia | TheMawn | GPU Computing | 7 | 2013-07-01 14:08 |
| Nvidia Kepler | Brain | GPU Computing | 149 | 2013-02-17 08:05 |
| What can I do with my nvidia GPU? | Surge | Software | 4 | 2010-09-29 11:36 |