mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

Bdot 2014-09-09 08:42

[QUOTE=Bdot;382095]
I need to see if the same kernel is selected as before ... Maybe I did something wrong with the kernel precedence for APUs ... I'll check your logs and come back to that separately.
[/QUOTE]
The correct kernel was selected, and everything else also seems to be OK. More detailed testing has shown that a performance improvement for GCN has adverse effects on VLIW5 :rant:

Thanks again, Jayder, for pointing out this issue to me. I'm not yet sure how to address this, though ...

It goes in line with another observation regarding the use of double precision: On my HD7950, it improves performance by 5%. On HD7850, performance drops by 7%. It looks like a lot of device-dependent #ifdefs need to go into the kernel files, which I tried to avoid so far (the IntelHD bugs were the first to require that). I may also need to create a separate device class for the high-end GCN's because of their faster DP performance.

Thank you all for your offers to test ... with these additional changes coming, I think it makes no sense to send out a test version right now.

I'll come back to you ...

VictordeHolland 2014-09-09 13:23

[QUOTE=Bdot;382330]Finally I managed to create a 74-bit kernel that helps straightening out the performance of mfakto when the factor sizes increase (it moves out the big drop one more bit). My HD7950@1100MHz now runs 100M candidates:

bits : GHz-days/day
67-68: 448
68-69: 476
69-70: 459
70-71: 416
71-72: 417
72-73: 418
[COLOR=DarkGreen]73-74: 408[/COLOR] <== the new one, was 361 before
74-82: 361

Attempts of achieving this using a new 5x16-bit kernel or an improved montgomery kernel yielded slow results. The solution is a "4x15-bit + 1x16-bit" kernel ...[/QUOTE]
Great news, well done!

AK76 2014-10-07 13:52

[QUOTE=Bdot;382330]My HD7950@1100MHz now runs 100M candidate.[/QUOTE]

100M mean 2^100M or 100M-digits ?

snme2pm1 2014-10-08 06:14

[QUOTE=AK76;384579]100M mean 2^100M or 100M-digits ?[/QUOTE]

The production rate figures quoted for that HD7950 1100MHz are not too far removed from rates for my HD7950 1000MHz working on exponents in the 118 million space, (i.e. 2 power p minus 1).
I've been watching this space anticipating a new release for 74 bit exploration, but can probably only use that out of hours, since I've never been able to adequately duck lack of responsiveness issues.

Bdot 2014-10-08 07:42

[QUOTE=AK76;384579]100M mean 2^100M or 100M-digits ?[/QUOTE]
I used a 100M exponent for that test.
[QUOTE=snme2pm1;384659]The production rate figures quoted for that HD7950 1100MHz are not too far removed from rates for my HD7950 1000MHz working on exponents in the 118 million space, (i.e. 2 power p minus 1).
I've been watching this space anticipating a new release for 74 bit exploration, but can probably only use that out of hours, since I've never been able to adequately duck lack of responsiveness issues.[/QUOTE]
There are a few parameters you can try to tweak for better responsiveness: low but non-zero FlushInterval (3, 2, 1), lower GPUSieveSize and lower GPUSieveProcessSize should each help. I'd try tweaking them in this order for best responsiveness-gain per performance-loss ratio.

The new release will have to wait a bit longer as I currently have very little time to work on it (and there are still a few things to do).

AK76 2014-10-09 17:42

On my ATI R9 290 i use FlushInterval=0. 3,2,1 works much worse than "0".

GPUSieveSize=5 or 6

GPUSieveProcessSize=16

GPUSievePrimes= between 30000 and 80000

For example: today i run exp 70M bit 71-72 Rate 2500M/s, 430GHz-d/day.

Soon i will test 100M candidates on different bits, to comapre my GPU performance with Bdot's 7950.

VictordeHolland 2014-10-09 19:01

[QUOTE=AK76;384792]On my ATI R9 290 i use FlushInterval=0. 3,2,1 works much worse than "0".
[/QUOTE]
Less responsive or less throughput (or both)???

AK76 2014-10-09 20:27

Less throughput.

Bdot 2014-10-10 22:27

Ah, thanks for that clarification!

Yes, of course, the best throughput is achieved when the GPU is not shared with anything, especially not 3D-Games or screen-updates in general :smile:.

My suggestion was meant towards a more responsive system at the cost of as little as possible throughput.

Regarding your performance measurements: throughput should scale linearly with the GPU clock speed (or shader clock). Memory clock has very little influence.

snme2pm1 2014-10-11 08:10

[QUOTE=Bdot;384939]My suggestion was meant towards a more responsive system at the cost of as little as possible throughput.[/QUOTE]

I've read that over several times, but can't convince myself that it can not be entirely misunderstood.
I recognise that the writer is not necessarily english first.
I suspect that various minor inflections might have conveyed a more intended meaning.
One such moderation would be to suggest an intention of a more responsive system at the cost of as little as possible throughput [B]reduction[/B].

Bdot 2014-10-11 12:31

Oh .. I see :blush:. Thank you for trying to extract what I really meant (I wish all forum members did that consistently). I probably wanted to say something like "the responsiveness improvement should cost you as little performance as possible" ...

And you're totally right about English not being my first language. It was actually the third I started to learn.

Thinking about this all again, it would probably be much easier for me to aim for as little throughput as possible than what I was trying over the past few years :gah:


All times are UTC. The time now is 23:03.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.