![]() |
[QUOTE=kracker;301838]Yeah, I believe mine has 600 stream processors on it
[/QUOTE] It's 400 (@600 MHz) in the [FONT=Verdana][SIZE=2][COLOR=#000000][FONT=verdana,geneva][SIZE=2]HD 6550D / A8 3850[/SIZE][/FONT][/COLOR][/SIZE][/FONT], so mfakto guessed it right :smile: |
1 Attachment(s)
Okay, I just got my 7770 today, for some reason (maybe I'm doing it wrong) performance is only a little better compared to my integrated 6550D... (and yes gpu usage was 85%~)
[code] mfakto 0.11-Win (64bit build) Runtime options Inifile mfakto.ini SievePrimesMin 10000 SievePrimesMax 200000 SievePrimes 25000 SievePrimesAdjust 1 NumStreams 5 GridSize 4 WorkFile worktodo.txt ResultsFile results.txt Checkpoints enabled CheckpointDelay 300s Stages enabled StopAfterFactor class PrintMode full V5UserID none ComputerID none AllowSleep yes TimeStampInResults no VectorSize 4 PreferKernel mfakto_cl_barrett79 SieveOnGPU no SmallExp no Compiletime options SIEVE_SIZE_LIMIT 64kiB SIEVE_SIZE 482885bits SIEVE_SPLIT 250 MORE_CLASSES enabled Select device - Get device info - Compiling kernels .......... OpenCL device info name Capeverde (Advanced Micro Devices, Inc.) device (driver) version OpenCL 1.2 AMD-APP (937.2) (CAL 1.4.1734 (VM)) maximum threads per block 256 maximum threads per grid 16777216 number of multiprocessors 10 (800 compute elements (estimate for ATI GPUs)) clock rate 1000MHz Automatic parameters threads per grid 2097152 running a simple selftest ... ########## testcase 1/17 (#2) ########## ########## testcase 2/17 (#25) ########## ########## testcase 3/17 (#39) ########## ########## testcase 4/17 (#57) ########## ########## testcase 5/17 (#70) ########## ########## testcase 6/17 (#72) ########## ########## testcase 7/17 (#73) ########## ########## testcase 8/17 (#82) ########## ########## testcase 9/17 (#88) ########## ########## testcase 10/17 (#106) ########## ########## testcase 11/17 (#355) ########## ########## testcase 12/17 (#358) ########## ########## testcase 13/17 (#666) ########## ########## testcase 14/17 (#1547) ########## ########## testcase 15/17 (#1552) ########## ########## testcase 16/17 (#1556) ########## ########## testcase 17/17 (#1557) ########## Selftest statistics number of tests 44 successful tests 44 selftest PASSED! got assignment: exp=61275631 bit_min=70 bit_max=71 Starting trial factoring M61275631 from 2^70 to 2^71 (3.90GHz-days) k_min = 9633451350600 - k_max = 19266902705863 Using GPU kernel "barrett15_75" found a valid checkpoint file! last finished class was: 989 found 0 factors already done | ETA | GHz |time/class| #FCs | avg. rate | SieveP. |CPU idle 21.67% | 1h54m | 38.43 | 9.139s | 448.79M | 49.11M/s | 25000 | 53.57% 21.77% | 1h52m | 39.14 | 8.973s | 444.60M | 49.55M/s | 28125 | 51.99% mfakto will exit once the current class is finished. press ^C again to exit immediately 21.88% | 1h53m | 38.52 | 9.118s | 440.40M | 48.30M/s | 31640 | 51.31% [/code] |
[QUOTE=kracker;302701]Okay, I just got my 7770 today, for some reason (maybe I'm doing it wrong) performance is only a little better compared to my integrated 6550D... (and yes gpu usage was 85%~)
[/QUOTE] It's probably not you doing anything wrong, it's mfakto. Could you please send me ~ one minute of 'mfakto-pi -st' output? I still have to teach mfakto what is good on GCN and what is bad. The kernel it chose is extremely bad. I can probably send you a GCN-optimized version tomorrow, which should have doubled these performance figures. |
[QUOTE=Bdot;302705]It's probably not you doing anything wrong, it's mfakto. Could you please send me ~ one minute of 'mfakto-pi -st' output?
I still have to teach mfakto what is good on GCN and what is bad. The kernel it chose is extremely bad. I can probably send you a GCN-optimized version tomorrow, which should have doubled these performance figures.[/QUOTE] Will do. P.S.: Oh Crap. forgot to self-test :ouch2: |
GCN aka HD7xxx performance
Thanks, kracker for your tests!
It turns out that the GCN-based cards have less registers at their disposal. Already at VectorSize=4, the very register-intensive 15-bit kernel has bad register-spilling into slow scratchpad memory, crippling performance. On the other hand, GCN can schedule threads more flexibly, so that big vector sizes are no longer needed for high performance - even the more register-efficient kernels run fastest with VectorSize=2. Therefore, to maximize performance with mfakto 0.11 on HD77xx-79xx, set the following in mfakto.ini: Stages=1 PreferKernel=mfakto_cl_71 VectorSize=2 The next mfakto version will do this automatically ... Projecting the test results of the 7770 to 7970, 400M/s should easily be surpassed - dbaugh, do you still/again have an operational 7970? 7770 is now even 7% faster than 5770, even though it has less GFLOPS (1280 vs. 1360) - which indicates improved hardware efficiency. |
Thanks
Thanks Bdot, for everything :smile:
Performance is much better, from ~50M to ~135M for my Radeon HD 7770 I'll send some benchmarks (the right way) over to you-know-who :lol: |
Thanks from me too
I'm about to steal the file locking code :smile:
I must say I never would have done such an elaborate system, I can see that I still have much to learn :smile: (PS, compare lines 21 and 25 :razz:) |
[QUOTE=kracker;302804]Thanks Bdot, for everything :smile:
Performance is much better, from ~50M to ~135M for my Radeon HD 7770 I'll send some benchmarks (the right way) over to you-know-who :lol:[/QUOTE] exactly the right thing to do :cool: [QUOTE=Dubslow;302835]I'm about to steal the file locking code :smile: I must say I never would have done such an elaborate system, I can see that I still have much to learn :smile: [/QUOTE] go ahead (steal and learn :grin:) [QUOTE=Dubslow;302835](PS, compare lines 21 and 25 :razz:)[/QUOTE] That's just to be sure it really is included :wink: Publishing the source code starts to show added value ... (fixed). :big grin: |
7970 update
I replaced the card with an identical card with the exception that this one works great! I have been running it 24/7 with three instances to get the GPU% to 99 and with the clocks maxed out at 1125/1575. Except for the system pulling a steady 500 watts, all is good. I'll run the benchmark soon. Is there optimized code for the 7970?
|
[QUOTE=Bdot;302775]
Therefore, to maximize performance with mfakto 0.11 on HD77xx-79xx, set the following in mfakto.ini: Stages=1 PreferKernel=mfakto_cl_71 VectorSize=2 The next mfakto version will do this automatically ... [/QUOTE] There, if you haven't already :smile: |
[QUOTE=dbaugh;302969]I replaced the card with an identical card with the exception that this one works great! I have been running it 24/7 with three instances to get the GPU% to 99 and with the clocks maxed out at 1125/1575. Except for the system pulling a steady 500 watts, all is good. I'll run the benchmark soon. Is there optimized code for the 7970?[/QUOTE]
[QUOTE=kracker;302999]There, if you haven't already :smile:[/QUOTE] Yes, these settings should do. If possible, I'd like to see a mfakto-pi run of your card with VectorSize=2 - this should show the highest mfakto rate I've seen so far. Regarding the (over)clocks you mention: I've noticed on my slower cards, that apart from more heat there is no measurable difference when increasing the memory clock. Core clock has a direct and linear impact. I could set the core clock to a higher value when lowering the memory clock to the lowest possible setting. |
| All times are UTC. The time now is 23:01. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.