mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2019-01-07, 20:58   #78
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24×3×163 Posts
Default

GTX 1050 Ti
Quote:
Originally Posted by kriesel View Post
Less classes, one instance, 92M, peregrine laptop, Win10 x64

Code:
Mon Jan 07 11:20:20 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 398.36                 Driver Version: 398.36                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105... WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   84C    P0    N/A /  N/A |    137MiB /  4096MiB |     94%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     49772      C   ...0ti\mfaktc-win-64.LessClasses-CUDA8.exe N/A      |
+-----------------------------------------------------------------------------+
vs.
Code:
Mon Jan 07 11:46:48 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 398.36                 Driver Version: 398.36                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105... WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   86C    P0    N/A /  N/A |    197MiB /  4096MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     49772      C   ...0ti\mfaktc-win-64.LessClasses-CUDA8.exe N/A      |
|    0     54744      C   ...i\2\mfaktc-win-64.LessClasses-CUDA8.exe N/A      |
+-----------------------------------------------------------------------------+
300 GhzD/day vs 154*2 = 308.

More classes one instance 304 GhzD/day, 98% load.

One more-classes 162.2, one less classes147.6, 309.8 combined, 98% load.
More classes two instances 98% load, combined throughput 311 GhzD/day.
Testing done a while ago on a GTX480 gave % idle declining ~ proportional to 1/(instance count)
kriesel is online now   Reply With Quote
Old 2019-01-07, 21:37   #79
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

317 Posts
Default

Quote:
Originally Posted by kriesel View Post
Compare Ghzd/day figure for one instance, vs sum of two-instances values. I assume there would come a point of diminishing or negative returns, akin to the dropoff of throughput on a timeshare cpu called thrashing. Well before that, the personal overhead of setting up and managing an additional instance becomes not worth it to the operator.
Fine. Let's do that again, now with more (and differently measured) data perhaps. And a lot of handwaving on top.

RTX2080 pinned at 1725 MHz (nvidia-smi -lgc 1725,1725), quite far from thermal throttling (steady state around 67 C) and also a safe distance from the set power limit so that the clock stays constant throughout the tests. If you let the GPU decide the clock rate, by just setting a power target, it will apparently vary quite a bit, possibly making any benchmarks invalid. Running "nvidia-smi dmon -s pucvmt" all the time to monitor the status and see if any thermal or power violations occur while running (and none did). Also since it matters for the short exponent runs, running off NVMe SSD, not a hard disk.

Two simultaneous instances of mfaktc, running 100 short run exponents, with "less classes" mfaktc. Yes, utilization hits 100%. Total real wallclock time consumed on one instance 3:36.53, the other 3:36.86. Then after that, one instance again running 100 short run exponents. Utilization is down to 90% since there's more time wasted between short classes, even with the "less classes" version. It took 1:48.32. Which is pretty much spot on half of the time spent when running two instances. (Doubled, it is 3:36.64) If we had gained anything, the single instance time should be slightly longer.

Then another test with a longer run, single exponent per instance from the same range (92255861 and 92255893, 72 to 73 bits) and "more classes" mfaktc. Both instances show about 1348 GHz-d/day in mfaktc with very little variance in run times from class to class. Utilization is again 100%. Timers say 11:07.36 and 11:07.39. Then the single instance (92256029, 72 to 73 bits). Now mfaktc says 2819 GHz-d/day most of the time, with some classes taking 5, even 10 ms more to run. So for whatever reason, it is not a steady sustained performance. Utilization 94%. And the timer says 5:21.51, which is now quite a bit less than half of the two-instance time (doubled 10:43.02). Primenet gives 10.3680 GHz days of credit for those exponents. Now, if I calculate the actual GHz-d/day rate from those three times, I get 1342, 1342, and 2786.

So, in these test cases at least, for short runs it doesn't matter, and for longer ones, one instance is definitely faster than two.
nomead is offline   Reply With Quote
Old 2019-01-07, 22:19   #80
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

31710 Posts
Default

Another quick test. All other things are the same as before, but this time I timed the longer runs with "less classes" mfaktc to see if it made a difference either way. And yes, sort of.

Two instances: 92256053 and 92256067, 72 to 73 bits, timed at 11:09.04 and 11:09.05. Both thus give 1339 GHz-d/day, calculated from the real time spent. For this case, there's not much difference vs. "more classes" mfaktc.

And one instance: 92256187, 72 to 73 bits, timed at 5:27.04 (doubled 10:54.08). This gives only 2739 GHz-d/day, so it's slower than with more classes, but still the throughput is faster than two instances.
nomead is offline   Reply With Quote
Old 2019-01-09, 04:05   #81
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

41×251 Posts
Default

Quote:
Originally Posted by nomead View Post
RTX2080 pinned at 1725 MHz ...<snip>... Now mfaktc says 2819 GHz-d/day most of the time, with some classes taking 5, even 10 ms more to run. So for whatever reason, it is not a steady sustained performance. Utilization 94%. ...<snip>... So, in these test cases at least, ...<snip>... one instance is definitely faster than two.
You may be able to squeeze some more juice from that lemon, at that clock, playing with the settings in the ini file. Somewhere at 98% and ~2920 geez (and a lot more heat).
LaurV is offline   Reply With Quote
Old 2019-01-09, 06:15   #82
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

317 Posts
Default

Quote:
Originally Posted by LaurV View Post
You may be able to squeeze some more juice from that lemon, at that clock, playing with the settings in the ini file. Somewhere at 98% and ~2920 geez (and a lot more heat).
Yes, of course. I quickly fiddled with the settings right at the beginning but that was just a quick poke to see what worked and what didn't, not an exhaustive search of the best combination of options. At that time, the only thing that seemed to have a big effect was GPUSieveSize=128 (from the default 64), but it would be better to test more exhaustively the combinations of various parameters, and look at actual total run times, not just at what mfaktc says while running, for better timing accuracy. And of course follow the power draw and GPU utilization figures while doing it. This will take a while, but I'll be back.
nomead is offline   Reply With Quote
Old 2019-01-09, 12:15   #83
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

317 Posts
Default

Phew. That really took some time. Anyway, I tested everything on the same exponent, and the same bit depth to keep things constant in that regard. First I started filling a table with values at GPUSieveProcessSize=8 and increased the sieve size. 128 was the best there. Then I increased the process size step by step, and to be sure that I don't miss something unexpected, at every step I also tested at the top three sieve sizes. No change there, every time 128 was the best size. Process size 16 ended up being slightly better than 24 but the difference is really marginal. (I had been running on 32 for some reason thus far) 2884 GHz-d/day at 176 watts power. An increase in clock speed (or running it against the default power limit of 215W) would of course produce even more throughput. But also more heat and somewhat less performance per watt.

What bothered me though, was that no combination of those settings could produce any higher GPU utilization than 94%. So I had to try out some potentially risky things, but in the end, everything went well. I edited GPU_SIEVE_SIZE_MAX in params.h to 256, then 512, then 1024 and recompiled the program. Yes it's below the large warning "DO NOT EDIT DEFINES BELOW THIS LINE UNLESS YOU REALLY KNOW WHAT YOU DO!" but since the comment on that parameter says "We've only tested up to 128M bits. The GPU sieve code may be able to go higher." I thought, well, let's give it a try. After each recompilation I ran the long self test and everything worked fine. Of course it uses more GPU memory now, but even at the largest size, there's still plenty left and memory bandwidth usage stays at 1%. Every increase in sieve size brought a corresponding increase in performance, but of course, the further I got, the smaller the difference between steps. Diminishing returns. Finally at 1024, the GPU utilization was at 99% and per-class timings as reported by mfaktc itself stayed stable (at 128 they vary a bit from row to row, for some reason). And 3085 GHz-d/day with just a few more watts consumed than at sieve size 128.

Is there some risk of missing factors or something else, if the sieve size is increased like that? I mean, the difference is 200 GHz-d/day just from that one setting. Or is it just a matter of further tests needed, but nobody has done it? (Could I do it?)

I also tested if NumStreams had any effect on performance, but no, not really. Any difference is practically indistinguishable from noise and measurement uncertainty. The final check was to see if changing GPUSievePrimes had any effect. Well, yes, mostly negative ones. Going higher increased power consumption and noticeably decreased performance. Going lower perhaps decreased power consumption a bit, but also the performance went down a bit. So the default value of 82486 is spot on.

I've attached a printout of the timings I gathered.
Attached Files
File Type: pdf mfaktc-rtx2080.pdf (4.8 KB, 157 views)
nomead is offline   Reply With Quote
Old 2019-01-09, 13:57   #84
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

2×32×13×37 Posts
Default

Which 2xxx card is equivalent to the old 1080 Ti?

Xyzzy is offline   Reply With Quote
Old 2019-01-09, 14:05   #85
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

317 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
Which 2xxx card is equivalent to the old 1080 Ti?

Depends on what you're doing. In gaming, the RTX 2080 is supposed to be about the same as the GTX 1080 Ti (But costs more). In CUDALucas benchmarks, however, even the RTX 2080 Ti is only slightly faster than the 1080Ti. But in factoring, the GTX 1080 Ti lags behind even the 2070...
nomead is offline   Reply With Quote
Old 2019-01-09, 14:09   #86
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

5·7·139 Posts
Default

Quote:
Originally Posted by nomead View Post
Depends on what you're doing. In gaming, the RTX 2080 is supposed to be about the same as the GTX 1080 Ti (But costs more). In CUDALucas benchmarks, however, even the RTX 2080 Ti is only slightly faster than the 1080Ti. But in factoring, the GTX 1080 Ti lags behind even the 2070...
Thr RTX 2060 will be my next present . It should offer GTX 1070 [ti] speed for trial factoring for 350 dollars.
ET_ is offline   Reply With Quote
Old 2019-01-09, 16:31   #87
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

5·7·139 Posts
Default

Quote:
Originally Posted by nomead View Post
RTX 2060 announced at "$349". Based on the released specs it could have a better "bang for the buck" than either the 2070 or 2080. It has 1920 CUDA cores, so that's 17% less than RTX2070, and 35% less than RTX2080. Clock speeds are almost the same as on the 2070 - 1365 MHz base and 1680 MHz boost. The RTX2080 is clocked higher, though, so there the performance differential will likely be more than 35%. TDP 160 watts. 6 GB of GDDR6 on a 192-bit bus, for 336 GB/s of bandwidth.

30% less price than the RTX2070 for let's say 20% less performance?

And 50% less price than the RTX2080 for maybe 40-45% less performance.

Again speculation based purely on published specifications, not running any actual LL or TF benchmarks.
Did anybody test the new RTX 2060 on a Ubuntu 18.04 LTS PC?
ET_ is offline   Reply With Quote
Old 2019-01-09, 16:52   #88
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24·3·163 Posts
Default

Have RTX20xx become reliable yet? (Reduced, low probability of return for repair within months, weeks, or days) The initial going seemed to be sort of dismal, with some users indicating 2 of 3 or 2 of 2 early failures, including replacements failing.

Last fiddled with by kriesel on 2019-01-09 at 16:52
kriesel is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Nvidia GTX 745 4GB ??? petrw1 GPU Computing 3 2016-08-02 15:23
Nvidia Pascal, a third of DP firejuggler GPU Computing 12 2016-02-23 06:55
AMD + Nvidia TheMawn GPU Computing 7 2013-07-01 14:08
Nvidia Kepler Brain GPU Computing 149 2013-02-17 08:05
What can I do with my nvidia GPU? Surge Software 4 2010-09-29 11:36

All times are UTC. The time now is 15:22.


Fri Jul 7 15:22:28 UTC 2023 up 323 days, 12:51, 0 users, load averages: 1.21, 1.12, 1.10

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔