mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-11-26, 18:45   #529
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

36 Posts
Default

Quote:
Originally Posted by kracker View Post
I didn't understand that.

But a 6970 and a 7970 is *not* almost same if you look at those benchmarks.
You mix up games where they design the gpu's for with gpgpu.

In gpgpu you can lineair scale. Most games do not lineair scale - a high clock frequency in most games still is bigtime kick butt with the bandwidth to the RAM as the overwhelming sweet spot.

That's why they increased frequency of the latest gpu line and increased bandwidth matters that much.

If you look careful you'll see that for trial factoring you won't need bandwidth to the RAM much and that a higher clock is not interesting either. Just clock multiplied by number of cores matters.
diep is offline   Reply With Quote
Old 2012-11-26, 18:46   #530
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

Quote:
Originally Posted by diep View Post
You mix up games where they design the gpu's for with gpgpu.

In gpgpu you can lineair scale. Most games do not lineair scale - a high clock frequency in most games still is bigtime kick butt with the bandwidth to the RAM as the overwhelming sweet spot.

That's why they increased frequency of the latest gpu line and increased bandwidth matters that much.

If you look careful you'll see that for trial factoring you won't need bandwidth to the RAM much and that a higher clock is not interesting either. Just clock multiplied by number of cores matters.
Did you even know what benchmarks I am talking about? look above, heh
kracker is offline   Reply With Quote
Old 2012-11-26, 18:51   #531
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

36 Posts
Default

Quote:
Originally Posted by kracker View Post
Did you even know what benchmarks I am talking about? look above, heh
FYI, the 7970 is a shrink of the 6790, if you know what that means.

The 5000 series on other hand are total different. To start with they cannot prefetch the RAM in the 5000 series.
In the 5000 series there is 5 PE's that form 1 compute core and in the 6000 and 7000 series it's 4 PE's that form 1 compute core.

So that's completely the same. So it does scale lineair there from 6000 to 7000 series.

Last fiddled with by diep on 2012-11-26 at 18:53
diep is offline   Reply With Quote
Old 2012-11-26, 18:56   #532
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by diep View Post
FYI, the 7970 is a shrink of the 6790, if you know what that means.

The 5000 series on other hand are total different. To start with they cannot prefetch the RAM in the 5000 series.
In the 5000 series there is 5 PE's that form 1 compute core and in the 6000 and 7000 series it's 4 PE's that form 1 compute core.

So that's completely the same. So it does scale lineair there from 6000 to 7000 series.
Yeah, from 32 nm to 22 nm right? But anyways, I've never had a major problem with both Nvidia and AMD, the only time I got a bsod was when I OC'ed it too far...
kracker is offline   Reply With Quote
Old 2012-11-26, 19:02   #533
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

10110110012 Posts
Default

Quote:
Originally Posted by kracker View Post
Yeah, from 32 nm to 22 nm right? But anyways, I've never had a major problem with both Nvidia and AMD, the only time I got a bsod was when I OC'ed it too far...
7970 is 28 nm.

the 6970 was 40 nm.

Note that some 28nm factories are in fact 32 nm, just they call them 28 nm sometimes. Hope you'll excuse me swapping out the technical reason how they manage to do that :)

So the AMD processors produced at 32 nm nowadays, that's pretty much the same factory in some cases like the 28 nm factory producing the gpu's.

Nvidia also is TSMC 28/32 nm factories.

Intel produces nowadays at 22 nm.

So realize well that intel is a proces generation ahead and their Xeon Phi ain't faster than an older proces technology at which nvidia and AMD produce...

If you wondered how strong Xeon phi is objectively....
diep is offline   Reply With Quote
Old 2012-11-26, 19:07   #534
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

36 Posts
Default

Note that i saw someone claim there is a trick to do things faster at the AMD gpu's for TF.

I'll test it out soon.

Didn't work at CPU's i tried that trick, but those do not have a FMA (fused multiply-add), so i'll have a shot at the 6970 here.

I bought that card for a lot of cash that 6970 at the time when it just released. When the driver finally worked, the card was already 50 euro cheaper in the shops here some months later.

If that trick doesn't work somehow i'll get a big hammer and...
diep is offline   Reply With Quote
Old 2012-11-26, 19:08   #535
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

Quote:
Originally Posted by diep View Post
7970 is 28 nm.

the 6970 was 40 nm.

Note that some 28nm factories are in fact 32 nm, just they call them 28 nm sometimes. Hope you'll excuse me swapping out the technical reason how they manage to do that :)

So the AMD processors produced at 32 nm nowadays, that's pretty much the same factory in some cases like the 28 nm factory producing the gpu's.

Nvidia also is TSMC 28/32 nm factories.

Intel produces nowadays at 22 nm.

So realize well that intel is a proces generation ahead and their Xeon Phi ain't faster than an older proces technology at which nvidia and AMD produce...

If you wondered how strong Xeon phi is objectively....
Ah yes, sorry confused Sandy Bridge to IB, yeah it was 40 nm.
kracker is offline   Reply With Quote
Old 2012-11-26, 19:15   #536
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

Quote:
Originally Posted by kracker View Post
Really? have you even SEEN benchmarks or know what they are?

@flashjh This will help, although most of the nvidia cards are not mfaktc 0.19 I think, but just to see how many GHZ/days you can get from them.
The information from James' site is exactly why I'm asking the question. I run 580s, and though I need to update my benchmark information, each 580 actually produces 400+ GHz-days/day when pared with an i7 2700 or 3770 (they would do more if I turned off P-1 on those machines too). The chart leads me to believe that a 7990 can do 605 GHz-days/day. If that information is based on one instance, then a 7990 should be able to do more.

The reason I asked the question in the first place is this: Is anyone running an AMD card listed above the 580 and what is your actual GHz-days/day? The chart is nice, but is it reality?

I owned a 590 for a while and there was no way to actually get 452 GHz-days/day out of that thing with any CPU I own (I still don't know if it was the PCI-e 2.0 bus or the CPU that was limiting overall throughput, but I suspect it was the PCI-e 2.0 bus on the 590). Either way, the 7990 (or anything listed above the 580) looks nice, but unless someone is getting real-world results that justify replacing 580s, it doesn't make sense to change.
Attached Thumbnails
Click image for larger version

Name:	Capture.JPG
Views:	86
Size:	75.2 KB
ID:	8907  

Last fiddled with by flashjh on 2012-11-26 at 19:21 Reason: Can't spell
flashjh is offline   Reply With Quote
Old 2012-11-26, 19:19   #537
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

I believe dbaugh owns a 7970.
kracker is offline   Reply With Quote
Old 2012-11-26, 19:29   #538
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

36 Posts
Default

Quote:
Originally Posted by kracker View Post
Ah yes, sorry confused Sandy Bridge to IB, yeah it was 40 nm.
No worries.

Do some math with me.

40 ^ 2 = 1600 mm^2
28 ^ 2 = 784

so the improvement potential from 6970 to 7970 was: factor 2.04

The card i got is 1536 cores * 0.88Ghz = 1351.68 Ghzcore
7970 is 2048 * 925Mhz default factory clock.

2048 * 0.925 = 1894.4 Ghzcore

An increase of factor 1.4

If we look to the RAM bandwidth.

They claim 264GB/s for the 7970 versus 176GB/s for the 6970.

Note that in my own tests i never managed to get more than 140GB/s out of the 6970, yet realize those gpu's from AMD cannot do gpgpu without serving as a videocard to the screen as well.

That bandwidth increase is 264 / 176 = 1.5
diep is offline   Reply With Quote
Old 2012-11-26, 19:32   #539
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

36 Posts
Default

Quote:
Originally Posted by flashjh View Post
The information from James' site is exactly why I'm asking the question. I run 580s, and though I need to update my benchmark information, each 580 actually produces 400+ GHz-days/day when pared with an i7 2700 or 3770 (they would do more if I turned off P-1 on those machines too). The chart leads me to believe that a 7990 can do 605 GHz-days/day. If that information is based on one instance, then a 7990 should be able to do more.

The reason I asked the question in the first place is this: Is anyone running an AMD card listed above the 580 and what is your actual GHz-days/day? The chart is nice, but is it reality?

I owned a 590 for a while and there was no way to actually get 452 GHz-days/day out of that thing with any CPU I own (I still don't know if it was the PCI-e 2.0 bus or the CPU that was limiting overall throughput, but I suspect it was the PCI-e 2.0 bus on the 590). Either way, the 7990 (or anything listed above the 580) looks nice, but unless someone is getting real-world results that justify replacing 580s, it doesn't make sense to change.
Your 580 will totally destroy any AMD videocard with 1 gpu inside.
Note there is a trick posted on a website, which if it works would speedup TF for the smaller kernels quite some on AMD videocards.

As it would make it 3 cycles to multiply 23x23 == 46 bits
using a FMA trick.

This whereas it's currently 8 cycles to do 32x32 == 64 bits at the AMD's, versus the same thing at Nvidia it's 2 cycles.
So that's why Nvidia is owning AMD (add to this some 20% for nvidia having carry add stuff which OpenCL lacks as well).

Claim is that it works - yet i need to see proof of that first.

Gonna do that later this week as it's so cold here in this office now that winter sets in so i gotta run some GPU's to keep me warm :)

If it works of course the OpenCL kernels will look even more like hacked chaos, as working with 23 bits is very big fun.

p.s. when i say cycles i mean the aggregated number of cycles so total cycles = number of involved PE's times number of cycles.

Last fiddled with by diep on 2012-11-26 at 19:38
diep is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2718 2021-07-06 18:30
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 07:40.


Mon Aug 2 07:40:32 UTC 2021 up 10 days, 2:09, 0 users, load averages: 1.14, 1.26, 1.33

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.