![]() |
|
|
#529 | |
|
Sep 2006
The Netherlands
36 Posts |
Quote:
In gpgpu you can lineair scale. Most games do not lineair scale - a high clock frequency in most games still is bigtime kick butt with the bandwidth to the RAM as the overwhelming sweet spot. That's why they increased frequency of the latest gpu line and increased bandwidth matters that much. If you look careful you'll see that for trial factoring you won't need bandwidth to the RAM much and that a higher clock is not interesting either. Just clock multiplied by number of cores matters. |
|
|
|
|
|
|
#530 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
216810 Posts |
Quote:
|
|
|
|
|
|
|
#531 | |
|
Sep 2006
The Netherlands
36 Posts |
Quote:
The 5000 series on other hand are total different. To start with they cannot prefetch the RAM in the 5000 series. In the 5000 series there is 5 PE's that form 1 compute core and in the 6000 and 7000 series it's 4 PE's that form 1 compute core. So that's completely the same. So it does scale lineair there from 6000 to 7000 series. Last fiddled with by diep on 2012-11-26 at 18:53 |
|
|
|
|
|
|
#532 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
Quote:
|
|
|
|
|
|
|
#533 | |
|
Sep 2006
The Netherlands
72910 Posts |
Quote:
the 6970 was 40 nm. Note that some 28nm factories are in fact 32 nm, just they call them 28 nm sometimes. Hope you'll excuse me swapping out the technical reason how they manage to do that :) So the AMD processors produced at 32 nm nowadays, that's pretty much the same factory in some cases like the 28 nm factory producing the gpu's. Nvidia also is TSMC 28/32 nm factories. Intel produces nowadays at 22 nm. So realize well that intel is a proces generation ahead and their Xeon Phi ain't faster than an older proces technology at which nvidia and AMD produce... If you wondered how strong Xeon phi is objectively.... |
|
|
|
|
|
|
#534 |
|
Sep 2006
The Netherlands
36 Posts |
Note that i saw someone claim there is a trick to do things faster at the AMD gpu's for TF.
I'll test it out soon. Didn't work at CPU's i tried that trick, but those do not have a FMA (fused multiply-add), so i'll have a shot at the 6970 here. I bought that card for a lot of cash that 6970 at the time when it just released. When the driver finally worked, the card was already 50 euro cheaper in the shops here some months later. If that trick doesn't work somehow i'll get a big hammer and... |
|
|
|
|
|
#535 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Quote:
|
|
|
|
|
|
|
#536 | |
|
"Jerry"
Nov 2011
Vancouver, WA
1,123 Posts |
Quote:
The reason I asked the question in the first place is this: Is anyone running an AMD card listed above the 580 and what is your actual GHz-days/day? The chart is nice, but is it reality? I owned a 590 for a while and there was no way to actually get 452 GHz-days/day out of that thing with any CPU I own (I still don't know if it was the PCI-e 2.0 bus or the CPU that was limiting overall throughput, but I suspect it was the PCI-e 2.0 bus on the 590). Either way, the 7990 (or anything listed above the 580) looks nice, but unless someone is getting real-world results that justify replacing 580s, it doesn't make sense to change. Last fiddled with by flashjh on 2012-11-26 at 19:21 Reason: Can't spell |
|
|
|
|
|
|
#537 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
I believe dbaugh owns a 7970.
|
|
|
|
|
|
#538 |
|
Sep 2006
The Netherlands
2D916 Posts |
No worries.
Do some math with me. 40 ^ 2 = 1600 mm^2 28 ^ 2 = 784 so the improvement potential from 6970 to 7970 was: factor 2.04 The card i got is 1536 cores * 0.88Ghz = 1351.68 Ghzcore 7970 is 2048 * 925Mhz default factory clock. 2048 * 0.925 = 1894.4 Ghzcore An increase of factor 1.4 If we look to the RAM bandwidth. They claim 264GB/s for the 7970 versus 176GB/s for the 6970. Note that in my own tests i never managed to get more than 140GB/s out of the 6970, yet realize those gpu's from AMD cannot do gpgpu without serving as a videocard to the screen as well. That bandwidth increase is 264 / 176 = 1.5 |
|
|
|
|
|
#539 | |
|
Sep 2006
The Netherlands
36 Posts |
Quote:
Note there is a trick posted on a website, which if it works would speedup TF for the smaller kernels quite some on AMD videocards. As it would make it 3 cycles to multiply 23x23 == 46 bits using a FMA trick. This whereas it's currently 8 cycles to do 32x32 == 64 bits at the AMD's, versus the same thing at Nvidia it's 2 cycles. So that's why Nvidia is owning AMD (add to this some 20% for nvidia having carry add stuff which OpenCL lacks as well). Claim is that it works - yet i need to see proof of that first. Gonna do that later this week as it's so cold here in this office now that winter sets in so i gotta run some GPU's to keep me warm :) If it works of course the OpenCL kernels will look even more like hacked chaos, as working with 23 bits is very big fun. p.s. when i say cycles i mean the aggregated number of cycles so total cycles = number of involved PE's times number of cycles. Last fiddled with by diep on 2012-11-26 at 19:38 |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2718 | 2021-07-06 18:30 |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |