![]() |
|
|
#78 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24×3×163 Posts |
GTX 1050 Ti
Quote:
Testing done a while ago on a GTX480 gave % idle declining ~ proportional to 1/(instance count) |
|
|
|
|
|
|
#79 | |
|
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts |
Quote:
![]() RTX2080 pinned at 1725 MHz (nvidia-smi -lgc 1725,1725), quite far from thermal throttling (steady state around 67 C) and also a safe distance from the set power limit so that the clock stays constant throughout the tests. If you let the GPU decide the clock rate, by just setting a power target, it will apparently vary quite a bit, possibly making any benchmarks invalid. Running "nvidia-smi dmon -s pucvmt" all the time to monitor the status and see if any thermal or power violations occur while running (and none did). Also since it matters for the short exponent runs, running off NVMe SSD, not a hard disk. Two simultaneous instances of mfaktc, running 100 short run exponents, with "less classes" mfaktc. Yes, utilization hits 100%. Total real wallclock time consumed on one instance 3:36.53, the other 3:36.86. Then after that, one instance again running 100 short run exponents. Utilization is down to 90% since there's more time wasted between short classes, even with the "less classes" version. It took 1:48.32. Which is pretty much spot on half of the time spent when running two instances. (Doubled, it is 3:36.64) If we had gained anything, the single instance time should be slightly longer. Then another test with a longer run, single exponent per instance from the same range (92255861 and 92255893, 72 to 73 bits) and "more classes" mfaktc. Both instances show about 1348 GHz-d/day in mfaktc with very little variance in run times from class to class. Utilization is again 100%. Timers say 11:07.36 and 11:07.39. Then the single instance (92256029, 72 to 73 bits). Now mfaktc says 2819 GHz-d/day most of the time, with some classes taking 5, even 10 ms more to run. So for whatever reason, it is not a steady sustained performance. Utilization 94%. And the timer says 5:21.51, which is now quite a bit less than half of the two-instance time (doubled 10:43.02). Primenet gives 10.3680 GHz days of credit for those exponents. Now, if I calculate the actual GHz-d/day rate from those three times, I get 1342, 1342, and 2786. So, in these test cases at least, for short runs it doesn't matter, and for longer ones, one instance is definitely faster than two. |
|
|
|
|
|
|
#80 |
|
"Sam Laur"
Dec 2018
Turku, Finland
31710 Posts |
Another quick test. All other things are the same as before, but this time I timed the longer runs with "less classes" mfaktc to see if it made a difference either way. And yes, sort of.
Two instances: 92256053 and 92256067, 72 to 73 bits, timed at 11:09.04 and 11:09.05. Both thus give 1339 GHz-d/day, calculated from the real time spent. For this case, there's not much difference vs. "more classes" mfaktc. And one instance: 92256187, 72 to 73 bits, timed at 5:27.04 (doubled 10:54.08). This gives only 2739 GHz-d/day, so it's slower than with more classes, but still the throughput is faster than two instances. |
|
|
|
|
|
#81 | |
|
Romulan Interpreter
"name field"
Jun 2011
Thailand
41×251 Posts |
Quote:
|
|
|
|
|
|
|
#82 |
|
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts |
Yes, of course. I quickly fiddled with the settings right at the beginning but that was just a quick poke to see what worked and what didn't, not an exhaustive search of the best combination of options. At that time, the only thing that seemed to have a big effect was GPUSieveSize=128 (from the default 64), but it would be better to test more exhaustively the combinations of various parameters, and look at actual total run times, not just at what mfaktc says while running, for better timing accuracy. And of course follow the power draw and GPU utilization figures while doing it. This will take a while, but I'll be back.
|
|
|
|
|
|
#83 |
|
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts |
Phew. That really took some time. Anyway, I tested everything on the same exponent, and the same bit depth to keep things constant in that regard. First I started filling a table with values at GPUSieveProcessSize=8 and increased the sieve size. 128 was the best there. Then I increased the process size step by step, and to be sure that I don't miss something unexpected, at every step I also tested at the top three sieve sizes. No change there, every time 128 was the best size. Process size 16 ended up being slightly better than 24 but the difference is really marginal. (I had been running on 32 for some reason thus far) 2884 GHz-d/day at 176 watts power. An increase in clock speed (or running it against the default power limit of 215W) would of course produce even more throughput. But also more heat and somewhat less performance per watt.
What bothered me though, was that no combination of those settings could produce any higher GPU utilization than 94%. So I had to try out some potentially risky things, but in the end, everything went well. I edited GPU_SIEVE_SIZE_MAX in params.h to 256, then 512, then 1024 and recompiled the program. Yes it's below the large warning "DO NOT EDIT DEFINES BELOW THIS LINE UNLESS YOU REALLY KNOW WHAT YOU DO!" but since the comment on that parameter says "We've only tested up to 128M bits. The GPU sieve code may be able to go higher." I thought, well, let's give it a try. After each recompilation I ran the long self test and everything worked fine. Of course it uses more GPU memory now, but even at the largest size, there's still plenty left and memory bandwidth usage stays at 1%. Every increase in sieve size brought a corresponding increase in performance, but of course, the further I got, the smaller the difference between steps. Diminishing returns. Finally at 1024, the GPU utilization was at 99% and per-class timings as reported by mfaktc itself stayed stable (at 128 they vary a bit from row to row, for some reason). And 3085 GHz-d/day with just a few more watts consumed than at sieve size 128. Is there some risk of missing factors or something else, if the sieve size is increased like that? I mean, the difference is 200 GHz-d/day just from that one setting. Or is it just a matter of further tests needed, but nobody has done it? (Could I do it?) I also tested if NumStreams had any effect on performance, but no, not really. Any difference is practically indistinguishable from noise and measurement uncertainty. The final check was to see if changing GPUSievePrimes had any effect. Well, yes, mostly negative ones. Going higher increased power consumption and noticeably decreased performance. Going lower perhaps decreased power consumption a bit, but also the performance went down a bit. So the default value of 82486 is spot on. I've attached a printout of the timings I gathered. |
|
|
|
|
|
#84 |
|
Aug 2002
2×32×13×37 Posts |
Which 2xxx card is equivalent to the old 1080 Ti?
|
|
|
|
|
|
#85 |
|
"Sam Laur"
Dec 2018
Turku, Finland
317 Posts |
Depends on what you're doing. In gaming, the RTX 2080 is supposed to be about the same as the GTX 1080 Ti (But costs more). In CUDALucas benchmarks, however, even the RTX 2080 Ti is only slightly faster than the 1080Ti. But in factoring, the GTX 1080 Ti lags behind even the 2070...
|
|
|
|
|
|
#86 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
5·7·139 Posts |
Quote:
. It should offer GTX 1070 [ti] speed for trial factoring for 350 dollars.
|
|
|
|
|
|
|
#87 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
5·7·139 Posts |
Quote:
|
|
|
|
|
|
|
#88 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Have RTX20xx become reliable yet? (Reduced, low probability of return for repair within months, weeks, or days) The initial going seemed to be sort of dismal, with some users indicating 2 of 3 or 2 of 2 early failures, including replacements failing.
Last fiddled with by kriesel on 2019-01-09 at 16:52 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Nvidia GTX 745 4GB ??? | petrw1 | GPU Computing | 3 | 2016-08-02 15:23 |
| Nvidia Pascal, a third of DP | firejuggler | GPU Computing | 12 | 2016-02-23 06:55 |
| AMD + Nvidia | TheMawn | GPU Computing | 7 | 2013-07-01 14:08 |
| Nvidia Kepler | Brain | GPU Computing | 149 | 2013-02-17 08:05 |
| What can I do with my nvidia GPU? | Surge | Software | 4 | 2010-09-29 11:36 |