![]() |
|
|
#320 | |
|
Serpentine Vermin Jar
Jul 2014
7×11×43 Posts |
Quote:
That might put it's DP performance somewhere in the 2-3 TFLOPs range although I saw some reports it was rated over 4. The memory bandwidth, some reports are saying as much as 1 TB/s... I take all of that with a heavy dose of salt since the rumor mill is based on who knows what, but whatever the case, it does sound pretty intriguing. I know it's a sucky thing to look at a performance problem, shrug your shoulders, and throw more hardware at it, especially when the software side has apparent room for improvement. |
|
|
|
|
|
|
#321 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
2×53×71 Posts |
Quote:
Prime95 does some of the forward FFT, point-wise squaring, and inverse FFT while data is in memory, clLucas does not. At best, you are going to need 2 r/w for the forward FFT, 1 r/w for the squaring, 2 r/w for the inverse FFT, 1 r/w for the rounding and carry propagation. Prime95 uses a 256KB L2 cache which CUDA cards don't have, I assume AMD doesn't either. Consequently, I expect the 2 r/w in the forward and inverse FFT is optimistic -- probably 4 r/w is more realistic. You'd need to look at the clFFT code to know for sure. |
|
|
|
|
|
|
#322 |
|
Jul 2003
So Cal
83A16 Posts |
I'm not sure about the consumer cards, but the Tesla K20 has 1.25 MB L2 cache and the K20x and K40 has 1.5 MB L2.
|
|
|
|
|
|
#323 | |
|
"David"
Jul 2015
Ohio
11·47 Posts |
Quote:
The AMD cards are a bit better equipped than we have been discussing in terms of cache, of course we throw almost all of this out between each kernel call currently. There is a 2MB L2 cache shared between all compute units. There is also a small GDS shared between all compute units of 32KB pages, a L1 cache per CU, and 64KB LDS for each 64 ALUs, and most importantly each of the 64 CUs has a relatively huge number of vector registers, 256KiB worth per CU with 2KB of Scalar registers. As one more bonus for a carefully tuned kernel, their are basic integer ops (add, compare/swap) and reordering ops built into the LDS which can run fully independent of the VALUs. Last fiddled with by airsquirrels on 2016-01-30 at 14:01 |
|
|
|
|
|
|
#324 | |
|
Einyen
Dec 2003
Denmark
1100010101112 Posts |
Quote:
|
|
|
|
|
|
|
#325 |
|
"David"
Jul 2015
Ohio
11×47 Posts |
Correct, realized performance is closer to 200 GFLOP due to FFT implementations inefficiencies. My statements were to point out that clFFT is not achieving the same level of performance as cuFFT despite having similar HW. clFFT is much younger and has more room for optimizations.
|
|
|
|
|
|
#326 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
100110000000102 Posts |
That probably wouldn't (optimally) Make Sense (TM).
There will always be a need for additional GPU TF'ing before the LL'ing wavefronts. But at some point (read: probably in about eight months or so) we'll be far enough ahead in the TF'ing that most GPU efforts should go to DC'ing and/or LL'ing. Of course, this all depends on the optimal economic cross-over points for LL'ing vs. TF'ing. These points have already changed several times as the GPU codes where optimized for different GPUs and different worktypes. |
|
|
|
|
|
#327 |
|
Mar 2014
Germany
23·3·5 Posts |
3735 Assignments or about 65 THzd left for completion which all have been assigned approx. 20 days ago to ANONYMOUS - this will be finished on the next drop off of his completed assignments next friday or maybe the week after - then all the DCTF is done!
What do you think will he do after he has finished? Help in the 100M digit range or on the LLTF front? Or continue on what he did before? I also have a theory on what that guy was doing before he started DCTF: Before that time there was an Anonymous user, who submitted a very large amount of TF work in the 875M range (here) and stopped doing so at about the time this anonymous user started DCTF - that's why I think he was there a long time before doing very high range work, that nobody else cares about right now. |
|
|
|
|
|
#328 |
|
"David"
Jul 2015
Ohio
11·47 Posts |
77 candidates left!
Looks like tomorrow, if Anonymous posts results like he/she has on the weekends in the past. We many finally get the answer to whether Anonymous will help with LLTF or move on to some other work. |
|
|
|
|
|
#329 |
|
Romulan Interpreter
Jun 2011
Thailand
2×5×312 Posts |
Or Chris can bring in the 62M to 74, they are just 2000 of them, then this DCTF is RIP forever... (well, till next better hardware will allow us few bits more...
)
|
|
|
|
|
|
#330 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
100110000000102 Posts |
Quote:
But even now, for LL Cat 4 there are approximately 240 assigned a day, with only 75 being completed a day. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Getting unneed DCTF work | Mark Rose | GPU to 72 | 4 | 2018-01-01 06:14 |