mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

Bdot 2012-06-14 15:43

[QUOTE=kracker;301838]Yeah, I believe mine has 600 stream processors on it
[/QUOTE]

It's 400 (@600 MHz) in the [FONT=Verdana][SIZE=2][COLOR=#000000][FONT=verdana,geneva][SIZE=2]HD 6550D / A8 3850[/SIZE][/FONT][/COLOR][/SIZE][/FONT], so mfakto guessed it right :smile:

kracker 2012-06-19 19:40

1 Attachment(s)
Okay, I just got my 7770 today, for some reason (maybe I'm doing it wrong) performance is only a little better compared to my integrated 6550D... (and yes gpu usage was 85%~)

[code]
mfakto 0.11-Win (64bit build)


Runtime options
Inifile mfakto.ini
SievePrimesMin 10000
SievePrimesMax 200000
SievePrimes 25000
SievePrimesAdjust 1
NumStreams 5
GridSize 4
WorkFile worktodo.txt
ResultsFile results.txt
Checkpoints enabled
CheckpointDelay 300s
Stages enabled
StopAfterFactor class
PrintMode full
V5UserID none
ComputerID none
AllowSleep yes
TimeStampInResults no
VectorSize 4
PreferKernel mfakto_cl_barrett79
SieveOnGPU no
SmallExp no
Compiletime options
SIEVE_SIZE_LIMIT 64kiB
SIEVE_SIZE 482885bits
SIEVE_SPLIT 250
MORE_CLASSES enabled
Select device - Get device info - Compiling kernels ..........

OpenCL device info
name Capeverde (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 1.2 AMD-APP (937.2) (CAL 1.4.1734 (VM))
maximum threads per block 256
maximum threads per grid 16777216
number of multiprocessors 10 (800 compute elements (estimate for ATI GPUs))
clock rate 1000MHz

Automatic parameters
threads per grid 2097152

running a simple selftest ...
########## testcase 1/17 (#2) ##########
########## testcase 2/17 (#25) ##########
########## testcase 3/17 (#39) ##########
########## testcase 4/17 (#57) ##########
########## testcase 5/17 (#70) ##########
########## testcase 6/17 (#72) ##########
########## testcase 7/17 (#73) ##########
########## testcase 8/17 (#82) ##########
########## testcase 9/17 (#88) ##########
########## testcase 10/17 (#106) ##########
########## testcase 11/17 (#355) ##########
########## testcase 12/17 (#358) ##########
########## testcase 13/17 (#666) ##########
########## testcase 14/17 (#1547) ##########
########## testcase 15/17 (#1552) ##########
########## testcase 16/17 (#1556) ##########
########## testcase 17/17 (#1557) ##########
Selftest statistics
number of tests 44
successful tests 44

selftest PASSED!

got assignment: exp=61275631 bit_min=70 bit_max=71
Starting trial factoring M61275631 from 2^70 to 2^71 (3.90GHz-days)
k_min = 9633451350600 - k_max = 19266902705863
Using GPU kernel "barrett15_75"

found a valid checkpoint file!
last finished class was: 989
found 0 factors already

done | ETA | GHz |time/class| #FCs | avg. rate | SieveP. |CPU idle
21.67% | 1h54m | 38.43 | 9.139s | 448.79M | 49.11M/s | 25000 | 53.57%
21.77% | 1h52m | 39.14 | 8.973s | 444.60M | 49.55M/s | 28125 | 51.99%

mfakto will exit once the current class is finished.
press ^C again to exit immediately
21.88% | 1h53m | 38.52 | 9.118s | 440.40M | 48.30M/s | 31640 | 51.31%
[/code]

Bdot 2012-06-19 21:34

[QUOTE=kracker;302701]Okay, I just got my 7770 today, for some reason (maybe I'm doing it wrong) performance is only a little better compared to my integrated 6550D... (and yes gpu usage was 85%~)
[/QUOTE]

It's probably not you doing anything wrong, it's mfakto. Could you please send me ~ one minute of 'mfakto-pi -st' output?

I still have to teach mfakto what is good on GCN and what is bad. The kernel it chose is extremely bad. I can probably send you a GCN-optimized version tomorrow, which should have doubled these performance figures.

kracker 2012-06-19 22:49

[QUOTE=Bdot;302705]It's probably not you doing anything wrong, it's mfakto. Could you please send me ~ one minute of 'mfakto-pi -st' output?

I still have to teach mfakto what is good on GCN and what is bad. The kernel it chose is extremely bad. I can probably send you a GCN-optimized version tomorrow, which should have doubled these performance figures.[/QUOTE]

Will do.

P.S.: Oh Crap. forgot to self-test :ouch2:

Bdot 2012-06-20 16:29

GCN aka HD7xxx performance
 
Thanks, kracker for your tests!

It turns out that the GCN-based cards have less registers at their disposal. Already at VectorSize=4, the very register-intensive 15-bit kernel has bad register-spilling into slow scratchpad memory, crippling performance.

On the other hand, GCN can schedule threads more flexibly, so that big vector sizes are no longer needed for high performance - even the more register-efficient kernels run fastest with VectorSize=2.

Therefore, to maximize performance with mfakto 0.11 on HD77xx-79xx, set the following in mfakto.ini:

Stages=1
PreferKernel=mfakto_cl_71
VectorSize=2

The next mfakto version will do this automatically ...

Projecting the test results of the 7770 to 7970, 400M/s should easily be surpassed - dbaugh, do you still/again have an operational 7970?

7770 is now even 7% faster than 5770, even though it has less GFLOPS (1280 vs. 1360) - which indicates improved hardware efficiency.

kracker 2012-06-20 19:00

Thanks
 
Thanks Bdot, for everything :smile:

Performance is much better, from ~50M to ~135M for my Radeon HD 7770
I'll send some benchmarks (the right way) over to you-know-who :lol:

Dubslow 2012-06-21 03:53

Thanks from me too
 
I'm about to steal the file locking code :smile:
I must say I never would have done such an elaborate system, I can see that I still have much to learn :smile:

(PS, compare lines 21 and 25 :razz:)

Bdot 2012-06-21 07:50

[QUOTE=kracker;302804]Thanks Bdot, for everything :smile:

Performance is much better, from ~50M to ~135M for my Radeon HD 7770
I'll send some benchmarks (the right way) over to you-know-who :lol:[/QUOTE]
exactly the right thing to do :cool:

[QUOTE=Dubslow;302835]I'm about to steal the file locking code :smile:
I must say I never would have done such an elaborate system, I can see that I still have much to learn :smile:
[/QUOTE]
go ahead (steal and learn :grin:)

[QUOTE=Dubslow;302835](PS, compare lines 21 and 25 :razz:)[/QUOTE]
That's just to be sure it really is included :wink:

Publishing the source code starts to show added value ... (fixed). :big grin:

dbaugh 2012-06-22 07:57

7970 update
 
I replaced the card with an identical card with the exception that this one works great! I have been running it 24/7 with three instances to get the GPU% to 99 and with the clocks maxed out at 1125/1575. Except for the system pulling a steady 500 watts, all is good. I'll run the benchmark soon. Is there optimized code for the 7970?

kracker 2012-06-22 13:52

[QUOTE=Bdot;302775]
Therefore, to maximize performance with mfakto 0.11 on HD77xx-79xx, set the following in mfakto.ini:

Stages=1
PreferKernel=mfakto_cl_71
VectorSize=2

The next mfakto version will do this automatically ...
[/QUOTE]

There, if you haven't already :smile:

Bdot 2012-06-22 16:33

[QUOTE=dbaugh;302969]I replaced the card with an identical card with the exception that this one works great! I have been running it 24/7 with three instances to get the GPU% to 99 and with the clocks maxed out at 1125/1575. Except for the system pulling a steady 500 watts, all is good. I'll run the benchmark soon. Is there optimized code for the 7970?[/QUOTE]

[QUOTE=kracker;302999]There, if you haven't already :smile:[/QUOTE]

Yes, these settings should do. If possible, I'd like to see a mfakto-pi run of your card with VectorSize=2 - this should show the highest mfakto rate I've seen so far.

Regarding the (over)clocks you mention: I've noticed on my slower cards, that apart from more heat there is no measurable difference when increasing the memory clock. Core clock has a direct and linear impact. I could set the core clock to a higher value when lowering the memory clock to the lowest possible setting.


All times are UTC. The time now is 23:01.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.