mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

KyleAskine 2013-05-16 00:30

I am going to start it tonight, and will let everyone know at around 5am tomorrow morning (EDT).

KyleAskine 2013-05-16 09:18

Cayman (6950 with Bios changed to 6970):

Selftest statistics
number of tests 318416
successful tests 318416

selftest PASSED!

Bdot 2013-05-16 09:31

[QUOTE=KyleAskine;340651]Cayman (6950 with Bios changed to 6970):

Selftest statistics
number of tests 318416
successful tests 318416

selftest PASSED![/QUOTE]

Very nice :cool:, thanks!

I will do some "paperwork" and packaging for releasing v0.13. Until then, 0.13pre5 should be stable enough for "productive work".

KyleAskine 2013-05-16 10:23

Also, just so you know, it passed on my 6570 as well (but it took 15 hours(!!))!


Selftest statistics
number of tests 318416
successful tests 318416

selftest PASSED!

kracker 2013-05-16 15:06

Passed on my 7770.

Selftest statistics
number of tests 287358
successful tests 287358

selftest PASSED!

EDIT: Strange, did I miss some tests?

Bdot 2013-05-16 18:41

[QUOTE=kracker;340685]Passed on my 7770.

Selftest statistics
number of tests 287358
successful tests 287358

selftest PASSED!

EDIT: Strange, did I miss some tests?[/QUOTE]
Thanks also for your results. No, you did not miss anything. You tested with GPU sieving, Kyle tested with CPU sieving. There are 3 less relevant kernels that I did not enable for GPU sieving. One is used for TF from 2^58 to 2^60, which is the only range where my mongomery implementation shines. The others are older kernels that are too slow to be selected in any real TF. I should probably remove them.

BTW, I successfully tested both GPU and CPU sieving for both HD5770 and HD7850.

On the 5770, I found GPUSievePrimes of 60k-70k to be the optimum. The 7850 seems to peak at ~110k. This also shows that the GPU sieve on VLIW5 is not very efficient yet (only 40% occupation on average). But this is something for the next mfakto version(s).

kracker 2013-05-16 18:47

[QUOTE=Bdot;340703]Thanks also for your results. No, you did not miss anything. You tested with GPU sieving, Kyle tested with CPU sieving. There are 3 less relevant kernels that I did not enable for GPU sieving. One is used for TF from 2^58 to 2^60, which is the only range where my mongomery implementation shines. The others are older kernels that are too slow to be selected in any real TF. I should probably remove them.

BTW, I successfully tested both GPU and CPU sieving for both HD5770 and HD7850.

On the 5770, I found GPUSievePrimes of 60k-70k to be the optimum. The 7850 seems to peak at ~110k. This also shows that the GPU sieve on VLIW5 is not very efficient yet (only 40% occupation on average). But this is something for the next mfakto version(s).[/QUOTE]

On my APU, ~60k seems to be best, I still yet have to tinker on m 7770... Are you planning more improvements or should we do benchmark time? :smile::razz:

Bdot 2013-05-16 19:07

[QUOTE=kracker;340704]On my APU, ~60k seems to be best, I still yet have to tinker on m 7770... Are you planning more improvements or should we do benchmark time? :smile::razz:[/QUOTE]
I found a cheap small thing for VLIW5 (e.g. your APU), but this is the last change I will do to the TF part. I'm now checking a few of the other features and already fixed the wait time display when CPU sieving.

You can start tweaking your 7770 and send the benchmark to James :grin:, the last build is just around the corner and will look the same for GCN.

kracker 2013-05-16 20:15

[QUOTE=Bdot;340706]I found a cheap small thing for VLIW5 (e.g. your APU), but this is the last change I will do to the TF part. I'm now checking a few of the other features and already fixed the wait time display when CPU sieving.

You can start tweaking your 7770 and send the benchmark to James :grin:, the last build is just around the corner and will look the same for GCN.[/QUOTE]

Looks like 10k is best for me for my 7770. Although, do I need to set it to 82485 for the benchmark? on James's benchmark form it says: "(set GPUSievePrimes=82485)"

Bdot 2013-05-16 21:04

[QUOTE=kracker;340714]Looks like 10k is best for me for my 7770. Although, do I need to set it to 82485 for the benchmark? on James's benchmark form it says: "(set GPUSievePrimes=82485)"[/QUOTE]

10k or 110k? If it's really 10k, then something is wrong ... which TF test are you doing, what are GPUSieveSize and GPUSieveProcessSize and what's the reported GHz? I think you should be getting around 140-145 GHzdays/day for M62... 2[SUP]70[/SUP] to 2[SUP]73[/SUP], or 125-130 for M62... 2[SUP]73[/SUP] to 2[SUP]82[/SUP]. Yes, GCN has a 10% performance step right there.

And for the reporting to James ... I don't see why you should not use the fastest possible value. Would be crippling devices that have their optimal point somewhere else ... Maybe James can comment on that.

kracker 2013-05-16 21:11

[QUOTE=Bdot;340721]10k or 110k? If it's really 10k, then something is wrong ... which TF test are you doing, what are GPUSieveSize and GPUSieveProcessSize and what's the reported GHz? I think you should be getting around 140-145 GHzdays/day for M62... 2[SUP]70[/SUP] to 2[SUP]73[/SUP], or 125-130 for M62... 2[SUP]73[/SUP] to 2[SUP]82[/SUP]. Yes, GCN has a 10% performance step right there.[/QUOTE]

100k. Sorry.:cry:

And yes, at ~M62, I'm getting 148 GHz days.

EDIT: When I go to the 332M range 73 to 74 bits, it drops to ~120 GHz...


All times are UTC. The time now is 23:09.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.