mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2019-01-07, 08:53   #56
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

1001111012 Posts
Default RTX 2060 is out

RTX 2060 announced at "$349". Based on the released specs it could have a better "bang for the buck" than either the 2070 or 2080. It has 1920 CUDA cores, so that's 17% less than RTX2070, and 35% less than RTX2080. Clock speeds are almost the same as on the 2070 - 1365 MHz base and 1680 MHz boost. The RTX2080 is clocked higher, though, so there the performance differential will likely be more than 35%. TDP 160 watts. 6 GB of GDDR6 on a 192-bit bus, for 336 GB/s of bandwidth.

30% less price than the RTX2070 for let's say 20% less performance?

And 50% less price than the RTX2080 for maybe 40-45% less performance.

Again speculation based purely on published specifications, not running any actual LL or TF benchmarks.
nomead is offline   Reply With Quote
Old 2019-01-07, 10:06   #57
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

7008 Posts
Default

Compared to 2070, 25% less ram bandwidth and 13 to 18% lower rated FLOPS depending on FE card or not. Also the announced 2060 model is FE, so there will eventually be cheaper models later. They may also have lower rated FLOPS from lower clocks.

How ram bandwidth sensitive are we on GPUs? Not an area I've looked at in detail.
mackerel is offline   Reply With Quote
Old 2019-01-07, 11:56   #58
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

4758 Posts
Default

On RTX cards, when doing anything FFT intensive, the DP performance is the limiting factor. For example on my RTX 2080, in CUDALucas and currently relevant FFT sizes, memory bandwidth usage seems to be around 45-50% depending on the actual FFT, while GPU (SM) usage shows 100%. Volta (as in Titan V) is the exception since it has much more dual precision units per streaming multiprocessor, and it is limited by memory bandwidth.

Factoring, on the other hand, doesn't seem to use much memory bandwidth at all. Running mfaktc shows about 95% GPU usage and 1% memory bandwidth usage.
nomead is offline   Reply With Quote
Old 2019-01-07, 11:59   #59
SELROC
 

260416 Posts
Default

Quote:
Originally Posted by nomead View Post
On RTX cards, when doing anything FFT intensive, the DP performance is the limiting factor. For example on my RTX 2080, in CUDALucas and currently relevant FFT sizes, memory bandwidth usage seems to be around 45-50% depending on the actual FFT, while GPU (SM) usage shows 100%. Volta (as in Titan V) is the exception since it has much more dual precision units per streaming multiprocessor, and it is limited by memory bandwidth.

Factoring, on the other hand, doesn't seem to use much memory bandwidth at all. Running mfaktc shows about 95% GPU usage and 1% memory bandwidth usage.

Trial factoring with mfakto does not show any difference by varying gpu memory clock....
  Reply With Quote
Old 2019-01-07, 14:00   #60
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24×3×163 Posts
Default

Quote:
Originally Posted by nomead View Post
Factoring, on the other hand, doesn't seem to use much memory bandwidth at all. Running mfaktc shows about 95% GPU usage and 1% memory bandwidth usage.
Is that with a single instance of mfaktc? It's long been known that better utilization can be reached with more than one instance running on the same gpu at the same time.
kriesel is online now   Reply With Quote
Old 2019-01-07, 14:23   #61
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

4758 Posts
Default

Quote:
Originally Posted by kriesel View Post
Is that with a single instance of mfaktc? It's long been known that better utilization can be reached with more than one instance running on the same gpu at the same time.
Thanks for the info! Yes it's a single instance. I'll have to try it right away
nomead is offline   Reply With Quote
Old 2019-01-07, 14:41   #62
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

13D16 Posts
Default

Hmmm. With two instances of mfaktc, SM utilization indeed goes to 100%. Memory bandwidth usage is about 3-4%. I also see a whole lot more PCIe traffic (from about 1 MB/s with one instance, to about 15-20 MB/s, varying all the time, with two instances). But when I add together the two figures of GHz-d/day as measured by mfaktc, the sum of the two instances is less than either of the instances running alone. So while utilization goes up, does it (or should it) actually help throughput?
nomead is offline   Reply With Quote
Old 2019-01-07, 15:04   #63
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2C6E16 Posts
Default

Quote:
Originally Posted by nomead View Post
So while utilization goes up, does it (or should it) actually help throughput?
My understanding is running two mfakc instances /used/ to increase net throughput when the seiving was still done on the CPU. This is no longer true (at least on the three (low-end) GPUs I run).
chalsall is offline   Reply With Quote
Old 2019-01-07, 15:53   #64
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

1E9016 Posts
Default

Quote:
Originally Posted by chalsall View Post
My understanding is running two mfakc instances /used/ to increase net throughput when the seiving was still done on the CPU. This is no longer true (at least on the three (low-end) GPUs I run).
Running at various times, mfaktc on a deliberately purchased variety of gpus, from the beastly slow and low memory but seemingly immortal Quadro 2000, up to GTX1080 Ti, I find the faster the gpu, the more that one instance fails to saturate the device. GTX 1080 Ti runs around 90% gpu load with one instance. That's more than one Quadro 2000 equivalent sitting idle. All of these are doing gpu sieving, as far as I know. The only appreciable cpu usage indication I see in mfaktx is mfakto on an IGP. (UHD630 mfakto one instance on i7-8750H around 3% cpu, with gpu sieving on.) I speculate that for the discrete gpus, there's a brief wait while each class's completion is being written to screen or log file or the checkpoint file is being written. Note, all these instances are on rotating HD folders, not SSD or ramdisk, except for the UHD630 on SSD, and a GTX 1050Ti on the same SSD. Both nvidia-smi and GPU-Z show that GTX 1050 Ti at gpu load 95% with one instance, cpu usage 0.1% (on Win ten; wouldn't show up in Win7 task manager with 1.0% resolution). I speculate it's waiting briefly on cpu activity in the application. This may be testable by running a Less-Classes version instead. Its sieve on gpu is on:
Code:
# Default: SieveOnGPU=1

SieveOnGPU=1


# GPUSievePrimes defines how far we sieve the factor candidates on the GPU.
# The first <GPUSievePrimes> primes are sieved.
#
# Minimum: GPUSievePrimes=0
# Maximum: GPUSievePrimes=1075000
#
# Default: GPUSievePrimes=82486

GPUSievePrimes=82486


# GPUSieveSize defines how big of a GPU sieve we use (in M bits).
#
# Minimum: GPUSieveSize=4
# Maximum: GPUSieveSize=128
#
# Default: GPUSieveSize=64

GPUSieveSize=64


# GPUSieveProcessSize defines how far many bits of the sieve each TF block
# processes (in K bits). Larger values may lead to less wasted cycles by
# reducing the number of times all threads in a warp are not TFing a
# candidate.  However, more shared memory is used which may reduce occupancy.
# Smaller values should lead to a more responsive system (each kernel takes
# less time to execute). GPUSieveProcessSize must be a multiple of 8.
#
# Minimum: GPUSieveProcessSize=8
# Maximum: GPUSieveProcessSize=32
#
# Default: GPUSieveProcessSize=16

   GPUSieveProcessSize=16
The effect on total throughput as measured by sum of ghzdays/day over the same sample time window is diminished if the gpu is thermally throttling.
The effect is progressively lessened with more instances, per old testing on a GTX480. See the attachment at

https://www.mersenneforum.org/showpo...76&postcount=4
nvidia-smi has resolution 1.0%.

Windows Task Manager shows the igp load from mfakto fluctuating, with maximum 100.0% but periodically a variety of lower figures.

(Anticipating the possibility of linux-partisans' gratuitous Win-slam in 5, 4, 3, ... ;)

Last fiddled with by kriesel on 2019-01-07 at 16:26
kriesel is online now   Reply With Quote
Old 2019-01-07, 15:54   #65
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24×3×163 Posts
Default

Quote:
Originally Posted by nomead View Post
So while utilization goes up, does it (or should it) actually help throughput?
Compare Ghzd/day figure for one instance, vs sum of two-instances values. I assume there would come a point of diminishing or negative returns, akin to the dropoff of throughput on a timeshare cpu called thrashing. Well before that, the personal overhead of setting up and managing an additional instance becomes not worth it to the operator.

Last fiddled with by kriesel on 2019-01-07 at 15:58
kriesel is online now   Reply With Quote
Old 2019-01-07, 16:07   #66
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

24·3·163 Posts
Default

Quote:
Originally Posted by chalsall View Post
My understanding is running two mfaktc instances /used/ to increase net throughput when the sieving was still done on the CPU. This is no longer true (at least on the three (low-end) GPUs I run).
Which gpu model #s? (and that's on linux I expect) And hasn't gpu sieving on/enabled/1 been the default in the distributed ini files for quite a while now?

Last fiddled with by kriesel on 2019-01-07 at 16:22
kriesel is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Nvidia GTX 745 4GB ??? petrw1 GPU Computing 3 2016-08-02 15:23
Nvidia Pascal, a third of DP firejuggler GPU Computing 12 2016-02-23 06:55
AMD + Nvidia TheMawn GPU Computing 7 2013-07-01 14:08
Nvidia Kepler Brain GPU Computing 149 2013-02-17 08:05
What can I do with my nvidia GPU? Surge Software 4 2010-09-29 11:36

All times are UTC. The time now is 15:19.


Fri Jul 7 15:19:53 UTC 2023 up 323 days, 12:48, 0 users, load averages: 0.81, 1.04, 1.08

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔