mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-06-14, 15:43   #485
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by kracker View Post
Yeah, I believe mine has 600 stream processors on it
It's 400 (@600 MHz) in the HD 6550D / A8 3850, so mfakto guessed it right

Last fiddled with by Bdot on 2012-06-14 at 15:45
Bdot is offline   Reply With Quote
Old 2012-06-19, 19:40   #486
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Okay, I just got my 7770 today, for some reason (maybe I'm doing it wrong) performance is only a little better compared to my integrated 6550D... (and yes gpu usage was 85%~)

Code:
mfakto 0.11-Win (64bit build)


Runtime options
  Inifile                   mfakto.ini
  SievePrimesMin            10000
  SievePrimesMax            200000
  SievePrimes               25000
  SievePrimesAdjust         1
  NumStreams                5
  GridSize                  4
  WorkFile                  worktodo.txt
  ResultsFile               results.txt
  Checkpoints               enabled
  CheckpointDelay           300s
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 full
  V5UserID                  none
  ComputerID                none
  AllowSleep                yes
  TimeStampInResults        no
  VectorSize                4
  PreferKernel              mfakto_cl_barrett79
  SieveOnGPU                no
  SmallExp                  no
Compiletime options
  SIEVE_SIZE_LIMIT          64kiB
  SIEVE_SIZE                482885bits
  SIEVE_SPLIT               250
  MORE_CLASSES              enabled
Select device - Get device info - Compiling kernels ..........

OpenCL device info
  name                      Capeverde (Advanced Micro Devices, Inc.)
  device (driver) version   OpenCL 1.2 AMD-APP (937.2) (CAL 1.4.1734 (VM))
  maximum threads per block 256
  maximum threads per grid  16777216
  number of multiprocessors 10 (800 compute elements (estimate for ATI GPUs))
  clock rate                1000MHz

Automatic parameters
  threads per grid          2097152

running a simple selftest ...
########## testcase 1/17 (#2) ##########
########## testcase 2/17 (#25) ##########
########## testcase 3/17 (#39) ##########
########## testcase 4/17 (#57) ##########
########## testcase 5/17 (#70) ##########
########## testcase 6/17 (#72) ##########
########## testcase 7/17 (#73) ##########
########## testcase 8/17 (#82) ##########
########## testcase 9/17 (#88) ##########
########## testcase 10/17 (#106) ##########
########## testcase 11/17 (#355) ##########
########## testcase 12/17 (#358) ##########
########## testcase 13/17 (#666) ##########
########## testcase 14/17 (#1547) ##########
########## testcase 15/17 (#1552) ##########
########## testcase 16/17 (#1556) ##########
########## testcase 17/17 (#1557) ##########
Selftest statistics                          
  number of tests           44
  successful tests          44

selftest PASSED!

got assignment: exp=61275631 bit_min=70 bit_max=71
Starting trial factoring M61275631 from 2^70 to 2^71 (3.90GHz-days)
  k_min = 9633451350600 - k_max = 19266902705863
Using GPU kernel "barrett15_75"

found a valid checkpoint file!
  last finished class was: 989
  found 0 factors already

   done |    ETA |     GHz |time/class|    #FCs | avg. rate | SieveP. |CPU idle
 21.67% |  1h54m |   38.43 |   9.139s | 448.79M |  49.11M/s |   25000 |  53.57%
 21.77% |  1h52m |   39.14 |   8.973s | 444.60M |  49.55M/s |   28125 |  51.99%

mfakto will exit once the current class is finished.
press ^C again to exit immediately
 21.88% |  1h53m |   38.52 |   9.118s | 440.40M |  48.30M/s |   31640 |  51.31%
Attached Thumbnails
Click image for larger version

Name:	screen.gif
Views:	120
Size:	14.3 KB
ID:	8151  

Last fiddled with by kracker on 2012-06-19 at 19:40
kracker is offline   Reply With Quote
Old 2012-06-19, 21:34   #487
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

25516 Posts
Default

Quote:
Originally Posted by kracker View Post
Okay, I just got my 7770 today, for some reason (maybe I'm doing it wrong) performance is only a little better compared to my integrated 6550D... (and yes gpu usage was 85%~)
It's probably not you doing anything wrong, it's mfakto. Could you please send me ~ one minute of 'mfakto-pi -st' output?

I still have to teach mfakto what is good on GCN and what is bad. The kernel it chose is extremely bad. I can probably send you a GCN-optimized version tomorrow, which should have doubled these performance figures.
Bdot is offline   Reply With Quote
Old 2012-06-19, 22:49   #488
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

Quote:
Originally Posted by Bdot View Post
It's probably not you doing anything wrong, it's mfakto. Could you please send me ~ one minute of 'mfakto-pi -st' output?

I still have to teach mfakto what is good on GCN and what is bad. The kernel it chose is extremely bad. I can probably send you a GCN-optimized version tomorrow, which should have doubled these performance figures.
Will do.

P.S.: Oh Crap. forgot to self-test
kracker is offline   Reply With Quote
Old 2012-06-20, 16:29   #489
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default GCN aka HD7xxx performance

Thanks, kracker for your tests!

It turns out that the GCN-based cards have less registers at their disposal. Already at VectorSize=4, the very register-intensive 15-bit kernel has bad register-spilling into slow scratchpad memory, crippling performance.

On the other hand, GCN can schedule threads more flexibly, so that big vector sizes are no longer needed for high performance - even the more register-efficient kernels run fastest with VectorSize=2.

Therefore, to maximize performance with mfakto 0.11 on HD77xx-79xx, set the following in mfakto.ini:

Stages=1
PreferKernel=mfakto_cl_71
VectorSize=2

The next mfakto version will do this automatically ...

Projecting the test results of the 7770 to 7970, 400M/s should easily be surpassed - dbaugh, do you still/again have an operational 7970?

7770 is now even 7% faster than 5770, even though it has less GFLOPS (1280 vs. 1360) - which indicates improved hardware efficiency.

Last fiddled with by Bdot on 2012-06-20 at 16:39 Reason: comparison
Bdot is offline   Reply With Quote
Old 2012-06-20, 19:00   #490
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default Thanks

Thanks Bdot, for everything

Performance is much better, from ~50M to ~135M for my Radeon HD 7770
I'll send some benchmarks (the right way) over to you-know-who
kracker is offline   Reply With Quote
Old 2012-06-21, 03:53   #491
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default Thanks from me too

I'm about to steal the file locking code
I must say I never would have done such an elaborate system, I can see that I still have much to learn

(PS, compare lines 21 and 25 )

Last fiddled with by Dubslow on 2012-06-21 at 03:56 Reason: ytop
Dubslow is offline   Reply With Quote
Old 2012-06-21, 07:50   #492
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Talking

Quote:
Originally Posted by kracker View Post
Thanks Bdot, for everything

Performance is much better, from ~50M to ~135M for my Radeon HD 7770
I'll send some benchmarks (the right way) over to you-know-who
exactly the right thing to do

Quote:
Originally Posted by Dubslow View Post
I'm about to steal the file locking code
I must say I never would have done such an elaborate system, I can see that I still have much to learn
go ahead (steal and learn )

Quote:
Originally Posted by Dubslow View Post
(PS, compare lines 21 and 25 )
That's just to be sure it really is included

Publishing the source code starts to show added value ... (fixed).

Last fiddled with by Bdot on 2012-06-21 at 07:52 Reason: missed smiley in the last line!
Bdot is offline   Reply With Quote
Old 2012-06-22, 07:57   #493
dbaugh
 
dbaugh's Avatar
 
Aug 2005

11101102 Posts
Default 7970 update

I replaced the card with an identical card with the exception that this one works great! I have been running it 24/7 with three instances to get the GPU% to 99 and with the clocks maxed out at 1125/1575. Except for the system pulling a steady 500 watts, all is good. I'll run the benchmark soon. Is there optimized code for the 7970?
dbaugh is offline   Reply With Quote
Old 2012-06-22, 13:52   #494
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23·271 Posts
Default

Quote:
Originally Posted by Bdot View Post
Therefore, to maximize performance with mfakto 0.11 on HD77xx-79xx, set the following in mfakto.ini:

Stages=1
PreferKernel=mfakto_cl_71
VectorSize=2

The next mfakto version will do this automatically ...
There, if you haven't already
kracker is offline   Reply With Quote
Old 2012-06-22, 16:33   #495
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by dbaugh View Post
I replaced the card with an identical card with the exception that this one works great! I have been running it 24/7 with three instances to get the GPU% to 99 and with the clocks maxed out at 1125/1575. Except for the system pulling a steady 500 watts, all is good. I'll run the benchmark soon. Is there optimized code for the 7970?
Quote:
Originally Posted by kracker View Post
There, if you haven't already
Yes, these settings should do. If possible, I'd like to see a mfakto-pi run of your card with VectorSize=2 - this should show the highest mfakto rate I've seen so far.

Regarding the (over)clocks you mention: I've noticed on my slower cards, that apart from more heat there is no measurable difference when increasing the memory clock. Core clock has a direct and linear impact. I could set the core clock to a higher value when lowering the memory clock to the lowest possible setting.
Bdot is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2719 2021-08-05 22:43
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 01:05.


Fri Aug 6 01:05:27 UTC 2021 up 13 days, 19:34, 1 user, load averages: 2.44, 2.42, 2.34

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.