![]() |
|
|
#56 |
|
"Sam Laur"
Dec 2018
Turku, Finland
1001111012 Posts |
RTX 2060 announced at "$349". Based on the released specs it could have a better "bang for the buck" than either the 2070 or 2080. It has 1920 CUDA cores, so that's 17% less than RTX2070, and 35% less than RTX2080. Clock speeds are almost the same as on the 2070 - 1365 MHz base and 1680 MHz boost. The RTX2080 is clocked higher, though, so there the performance differential will likely be more than 35%. TDP 160 watts. 6 GB of GDDR6 on a 192-bit bus, for 336 GB/s of bandwidth.
30% less price than the RTX2070 for let's say 20% less performance? And 50% less price than the RTX2080 for maybe 40-45% less performance. Again speculation based purely on published specifications, not running any actual LL or TF benchmarks. |
|
|
|
|
|
#57 |
|
Feb 2016
UK
7008 Posts |
Compared to 2070, 25% less ram bandwidth and 13 to 18% lower rated FLOPS depending on FE card or not. Also the announced 2060 model is FE, so there will eventually be cheaper models later. They may also have lower rated FLOPS from lower clocks.
How ram bandwidth sensitive are we on GPUs? Not an area I've looked at in detail. |
|
|
|
|
|
#58 |
|
"Sam Laur"
Dec 2018
Turku, Finland
4758 Posts |
On RTX cards, when doing anything FFT intensive, the DP performance is the limiting factor. For example on my RTX 2080, in CUDALucas and currently relevant FFT sizes, memory bandwidth usage seems to be around 45-50% depending on the actual FFT, while GPU (SM) usage shows 100%. Volta (as in Titan V) is the exception since it has much more dual precision units per streaming multiprocessor, and it is limited by memory bandwidth.
Factoring, on the other hand, doesn't seem to use much memory bandwidth at all. Running mfaktc shows about 95% GPU usage and 1% memory bandwidth usage. |
|
|
|
|
|
#59 | |
|
260416 Posts |
Quote:
Trial factoring with mfakto does not show any difference by varying gpu memory clock.... |
|
|
|
|
#60 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24×3×163 Posts |
Is that with a single instance of mfaktc? It's long been known that better utilization can be reached with more than one instance running on the same gpu at the same time.
|
|
|
|
|
|
#61 |
|
"Sam Laur"
Dec 2018
Turku, Finland
4758 Posts |
|
|
|
|
|
|
#62 |
|
"Sam Laur"
Dec 2018
Turku, Finland
13D16 Posts |
Hmmm. With two instances of mfaktc, SM utilization indeed goes to 100%. Memory bandwidth usage is about 3-4%. I also see a whole lot more PCIe traffic (from about 1 MB/s with one instance, to about 15-20 MB/s, varying all the time, with two instances). But when I add together the two figures of GHz-d/day as measured by mfaktc, the sum of the two instances is less than either of the instances running alone. So while utilization goes up, does it (or should it) actually help throughput?
|
|
|
|
|
|
#63 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2C6E16 Posts |
My understanding is running two mfakc instances /used/ to increase net throughput when the seiving was still done on the CPU. This is no longer true (at least on the three (low-end) GPUs I run).
|
|
|
|
|
|
#64 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
1E9016 Posts |
Quote:
Code:
# Default: SieveOnGPU=1 SieveOnGPU=1 # GPUSievePrimes defines how far we sieve the factor candidates on the GPU. # The first <GPUSievePrimes> primes are sieved. # # Minimum: GPUSievePrimes=0 # Maximum: GPUSievePrimes=1075000 # # Default: GPUSievePrimes=82486 GPUSievePrimes=82486 # GPUSieveSize defines how big of a GPU sieve we use (in M bits). # # Minimum: GPUSieveSize=4 # Maximum: GPUSieveSize=128 # # Default: GPUSieveSize=64 GPUSieveSize=64 # GPUSieveProcessSize defines how far many bits of the sieve each TF block # processes (in K bits). Larger values may lead to less wasted cycles by # reducing the number of times all threads in a warp are not TFing a # candidate. However, more shared memory is used which may reduce occupancy. # Smaller values should lead to a more responsive system (each kernel takes # less time to execute). GPUSieveProcessSize must be a multiple of 8. # # Minimum: GPUSieveProcessSize=8 # Maximum: GPUSieveProcessSize=32 # # Default: GPUSieveProcessSize=16 GPUSieveProcessSize=16 The effect is progressively lessened with more instances, per old testing on a GTX480. See the attachment at https://www.mersenneforum.org/showpo...76&postcount=4 nvidia-smi has resolution 1.0%. Windows Task Manager shows the igp load from mfakto fluctuating, with maximum 100.0% but periodically a variety of lower figures. (Anticipating the possibility of linux-partisans' gratuitous Win-slam in 5, 4, 3, ... ;) Last fiddled with by kriesel on 2019-01-07 at 16:26 |
|
|
|
|
|
|
#65 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24×3×163 Posts |
Compare Ghzd/day figure for one instance, vs sum of two-instances values. I assume there would come a point of diminishing or negative returns, akin to the dropoff of throughput on a timeshare cpu called thrashing. Well before that, the personal overhead of setting up and managing an additional instance becomes not worth it to the operator.
Last fiddled with by kriesel on 2019-01-07 at 15:58 |
|
|
|
|
|
#66 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
24·3·163 Posts |
Which gpu model #s? (and that's on linux I expect) And hasn't gpu sieving on/enabled/1 been the default in the distributed ini files for quite a while now?
Last fiddled with by kriesel on 2019-01-07 at 16:22 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Nvidia GTX 745 4GB ??? | petrw1 | GPU Computing | 3 | 2016-08-02 15:23 |
| Nvidia Pascal, a third of DP | firejuggler | GPU Computing | 12 | 2016-02-23 06:55 |
| AMD + Nvidia | TheMawn | GPU Computing | 7 | 2013-07-01 14:08 |
| Nvidia Kepler | Brain | GPU Computing | 149 | 2013-02-17 08:05 |
| What can I do with my nvidia GPU? | Surge | Software | 4 | 2010-09-29 11:36 |