![]() |
|
|
#485 |
|
Nov 2010
Germany
11258 Posts |
It's 400 (@600 MHz) in the HD 6550D / A8 3850, so mfakto guessed it right
Last fiddled with by Bdot on 2012-06-14 at 15:45 |
|
|
|
|
|
#486 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
216810 Posts |
Okay, I just got my 7770 today, for some reason (maybe I'm doing it wrong) performance is only a little better compared to my integrated 6550D... (and yes gpu usage was 85%~)
Code:
mfakto 0.11-Win (64bit build) Runtime options Inifile mfakto.ini SievePrimesMin 10000 SievePrimesMax 200000 SievePrimes 25000 SievePrimesAdjust 1 NumStreams 5 GridSize 4 WorkFile worktodo.txt ResultsFile results.txt Checkpoints enabled CheckpointDelay 300s Stages enabled StopAfterFactor class PrintMode full V5UserID none ComputerID none AllowSleep yes TimeStampInResults no VectorSize 4 PreferKernel mfakto_cl_barrett79 SieveOnGPU no SmallExp no Compiletime options SIEVE_SIZE_LIMIT 64kiB SIEVE_SIZE 482885bits SIEVE_SPLIT 250 MORE_CLASSES enabled Select device - Get device info - Compiling kernels .......... OpenCL device info name Capeverde (Advanced Micro Devices, Inc.) device (driver) version OpenCL 1.2 AMD-APP (937.2) (CAL 1.4.1734 (VM)) maximum threads per block 256 maximum threads per grid 16777216 number of multiprocessors 10 (800 compute elements (estimate for ATI GPUs)) clock rate 1000MHz Automatic parameters threads per grid 2097152 running a simple selftest ... ########## testcase 1/17 (#2) ########## ########## testcase 2/17 (#25) ########## ########## testcase 3/17 (#39) ########## ########## testcase 4/17 (#57) ########## ########## testcase 5/17 (#70) ########## ########## testcase 6/17 (#72) ########## ########## testcase 7/17 (#73) ########## ########## testcase 8/17 (#82) ########## ########## testcase 9/17 (#88) ########## ########## testcase 10/17 (#106) ########## ########## testcase 11/17 (#355) ########## ########## testcase 12/17 (#358) ########## ########## testcase 13/17 (#666) ########## ########## testcase 14/17 (#1547) ########## ########## testcase 15/17 (#1552) ########## ########## testcase 16/17 (#1556) ########## ########## testcase 17/17 (#1557) ########## Selftest statistics number of tests 44 successful tests 44 selftest PASSED! got assignment: exp=61275631 bit_min=70 bit_max=71 Starting trial factoring M61275631 from 2^70 to 2^71 (3.90GHz-days) k_min = 9633451350600 - k_max = 19266902705863 Using GPU kernel "barrett15_75" found a valid checkpoint file! last finished class was: 989 found 0 factors already done | ETA | GHz |time/class| #FCs | avg. rate | SieveP. |CPU idle 21.67% | 1h54m | 38.43 | 9.139s | 448.79M | 49.11M/s | 25000 | 53.57% 21.77% | 1h52m | 39.14 | 8.973s | 444.60M | 49.55M/s | 28125 | 51.99% mfakto will exit once the current class is finished. press ^C again to exit immediately 21.88% | 1h53m | 38.52 | 9.118s | 440.40M | 48.30M/s | 31640 | 51.31% Last fiddled with by kracker on 2012-06-19 at 19:40 |
|
|
|
|
|
#487 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
I still have to teach mfakto what is good on GCN and what is bad. The kernel it chose is extremely bad. I can probably send you a GCN-optimized version tomorrow, which should have doubled these performance figures. |
|
|
|
|
|
|
#488 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Quote:
P.S.: Oh Crap. forgot to self-test
|
|
|
|
|
|
|
#489 |
|
Nov 2010
Germany
3·199 Posts |
Thanks, kracker for your tests!
It turns out that the GCN-based cards have less registers at their disposal. Already at VectorSize=4, the very register-intensive 15-bit kernel has bad register-spilling into slow scratchpad memory, crippling performance. On the other hand, GCN can schedule threads more flexibly, so that big vector sizes are no longer needed for high performance - even the more register-efficient kernels run fastest with VectorSize=2. Therefore, to maximize performance with mfakto 0.11 on HD77xx-79xx, set the following in mfakto.ini: Stages=1 PreferKernel=mfakto_cl_71 VectorSize=2 The next mfakto version will do this automatically ... Projecting the test results of the 7770 to 7970, 400M/s should easily be surpassed - dbaugh, do you still/again have an operational 7970? 7770 is now even 7% faster than 5770, even though it has less GFLOPS (1280 vs. 1360) - which indicates improved hardware efficiency. Last fiddled with by Bdot on 2012-06-20 at 16:39 Reason: comparison |
|
|
|
|
|
#490 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Thanks Bdot, for everything
![]() Performance is much better, from ~50M to ~135M for my Radeon HD 7770 I'll send some benchmarks (the right way) over to you-know-who
|
|
|
|
|
|
#491 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3×29×83 Posts |
I'm about to steal the file locking code
![]() I must say I never would have done such an elaborate system, I can see that I still have much to learn ![]() (PS, compare lines 21 and 25 )
Last fiddled with by Dubslow on 2012-06-21 at 03:56 Reason: ytop |
|
|
|
|
|
#492 | ||
|
Nov 2010
Germany
10010101012 Posts |
Quote:
![]() Quote:
)That's just to be sure it really is included ![]() Publishing the source code starts to show added value ... (fixed).
Last fiddled with by Bdot on 2012-06-21 at 07:52 Reason: missed smiley in the last line! |
||
|
|
|
|
|
#493 |
|
Aug 2005
2×59 Posts |
I replaced the card with an identical card with the exception that this one works great! I have been running it 24/7 with three instances to get the GPU% to 99 and with the clocks maxed out at 1125/1575. Except for the system pulling a steady 500 watts, all is good. I'll run the benchmark soon. Is there optimized code for the 7970?
|
|
|
|
|
|
#494 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
|
|
|
|
|
|
#495 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
Regarding the (over)clocks you mention: I've noticed on my slower cards, that apart from more heat there is no measurable difference when increasing the memory clock. Core clock has a direct and linear impact. I could set the core clock to a higher value when lowering the memory clock to the lowest possible setting. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2718 | 2021-07-06 18:30 |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |