![]() |
|
|
#188 |
|
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
32·5·7 Posts |
One titan can do an LL iteration with 4M ffts in about 2.75ms. 250Gb/s communication between the devices would be just enough for two titans to do iterations with 4M ffts in 2ms. With more devices the situation gets worse, approaching 500Gb/s for an infinite number of devices.
Stage 2 of p-1 on the other hand would benifit very nicely. Last fiddled with by owftheevil on 2013-09-26 at 10:30 |
|
|
|
|
|
#189 | |
|
Mar 2010
3·137 Posts |
Quote:
Say there's a hypothetical GTX Titan X2, which has 2 GK110 GPUs at lower clocks, but with the same 2688 shaders per GPU. Would it perform better than two GTX Titans, from theoretical throughput point of view? Last fiddled with by Karl M Johnson on 2013-09-26 at 18:34 Reason: yes |
|
|
|
|
|
|
#190 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Probably not. As always, it depends mostly on the latency and speed of the "bridge" and I'm not sure if internal SLI is any different (?)
|
|
|
|
|
|
#191 |
|
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
32·5·7 Posts |
Memory bandwidth would still be the limiting factor. We are almost up to that limit now with a single processor. The normalization and pointwise multiplication kernels could be split without increasing memory transfers, but they are only about 15% of the iteration time.
Is the memory on those cards shared or partitioned between the two processors? Last fiddled with by owftheevil on 2013-09-26 at 22:19 |
|
|
|
|
|
#192 | |
|
"Rob Gahan"
Aug 2013
Ireland
22·32 Posts |
Quote:
|
|
|
|
|
|
|
#193 |
|
"Rob Gahan"
Aug 2013
Ireland
22·32 Posts |
I think I saw a performance review on videocardz that a dual gpu never out performs two singles ie 7990 is roughly 15% less than 2 x 7970s but sli and crossfire are to be avoided for gpu computation. each gpu should only be addressed from the pcie slot as a separate entity.
|
|
|
|
|
|
#194 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
|
|
|
|
|
|
#195 |
|
"Carl Darby"
Oct 2012
Spring Mountains, Nevada
32·5·7 Posts |
So its looking like distributed LLs in any sense in not feasible at this time.
Sorry kracker and msft. Here's your thread back. Any new developments with cllucas? |
|
|
|
|
|
#196 |
|
Jul 2009
Tokyo
2·5·61 Posts |
|
|
|
|
|
|
#197 | |
|
Romulan Interpreter
Jun 2011
Thailand
965310 Posts |
Quote:
Anyhow, to come ontopic, there will be no advantage spreading LL tests over multiple cards. The external communication is always slower than the internal computing, and the LL test are freaky to parallelize, except the FFT used to do each iteration, but for that, the data are already available internally (you need it all, for error correction, etc), it would make no sense to move it around too much, wasting precious time. It will always take shorter time to make the calculus, than to move the data, make the calculus, bring back the results. If you have two GPU's, then you will do much better doing two LL tests, one exponent in each GPU, with SLI or without SLI. Always. |
|
|
|
|
|
|
#198 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Yes, one test on one GPU will always be best I think.
EDIT: On another note in 4h my 4th DC will finish with clLucas. Last fiddled with by kracker on 2013-09-27 at 13:12 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| Can't get OpenCL to work on HD7950 Ubuntu 14.04.5 LTS | VictordeHolland | Linux | 4 | 2018-04-11 13:44 |
| OpenCL accellerated lattice siever | pstach | Factoring | 1 | 2014-05-23 01:03 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| AMD's Graphics Core Next- a reason to accelerate towards OpenCL? | Belteshazzar | GPU Computing | 19 | 2012-03-07 18:58 |