![]() |
[QUOTE=Bdot;382095]
I need to see if the same kernel is selected as before ... Maybe I did something wrong with the kernel precedence for APUs ... I'll check your logs and come back to that separately. [/QUOTE] The correct kernel was selected, and everything else also seems to be OK. More detailed testing has shown that a performance improvement for GCN has adverse effects on VLIW5 :rant: Thanks again, Jayder, for pointing out this issue to me. I'm not yet sure how to address this, though ... It goes in line with another observation regarding the use of double precision: On my HD7950, it improves performance by 5%. On HD7850, performance drops by 7%. It looks like a lot of device-dependent #ifdefs need to go into the kernel files, which I tried to avoid so far (the IntelHD bugs were the first to require that). I may also need to create a separate device class for the high-end GCN's because of their faster DP performance. Thank you all for your offers to test ... with these additional changes coming, I think it makes no sense to send out a test version right now. I'll come back to you ... |
[QUOTE=Bdot;382330]Finally I managed to create a 74-bit kernel that helps straightening out the performance of mfakto when the factor sizes increase (it moves out the big drop one more bit). My HD7950@1100MHz now runs 100M candidates:
bits : GHz-days/day 67-68: 448 68-69: 476 69-70: 459 70-71: 416 71-72: 417 72-73: 418 [COLOR=DarkGreen]73-74: 408[/COLOR] <== the new one, was 361 before 74-82: 361 Attempts of achieving this using a new 5x16-bit kernel or an improved montgomery kernel yielded slow results. The solution is a "4x15-bit + 1x16-bit" kernel ...[/QUOTE] Great news, well done! |
[QUOTE=Bdot;382330]My HD7950@1100MHz now runs 100M candidate.[/QUOTE]
100M mean 2^100M or 100M-digits ? |
[QUOTE=AK76;384579]100M mean 2^100M or 100M-digits ?[/QUOTE]
The production rate figures quoted for that HD7950 1100MHz are not too far removed from rates for my HD7950 1000MHz working on exponents in the 118 million space, (i.e. 2 power p minus 1). I've been watching this space anticipating a new release for 74 bit exploration, but can probably only use that out of hours, since I've never been able to adequately duck lack of responsiveness issues. |
[QUOTE=AK76;384579]100M mean 2^100M or 100M-digits ?[/QUOTE]
I used a 100M exponent for that test. [QUOTE=snme2pm1;384659]The production rate figures quoted for that HD7950 1100MHz are not too far removed from rates for my HD7950 1000MHz working on exponents in the 118 million space, (i.e. 2 power p minus 1). I've been watching this space anticipating a new release for 74 bit exploration, but can probably only use that out of hours, since I've never been able to adequately duck lack of responsiveness issues.[/QUOTE] There are a few parameters you can try to tweak for better responsiveness: low but non-zero FlushInterval (3, 2, 1), lower GPUSieveSize and lower GPUSieveProcessSize should each help. I'd try tweaking them in this order for best responsiveness-gain per performance-loss ratio. The new release will have to wait a bit longer as I currently have very little time to work on it (and there are still a few things to do). |
On my ATI R9 290 i use FlushInterval=0. 3,2,1 works much worse than "0".
GPUSieveSize=5 or 6 GPUSieveProcessSize=16 GPUSievePrimes= between 30000 and 80000 For example: today i run exp 70M bit 71-72 Rate 2500M/s, 430GHz-d/day. Soon i will test 100M candidates on different bits, to comapre my GPU performance with Bdot's 7950. |
[QUOTE=AK76;384792]On my ATI R9 290 i use FlushInterval=0. 3,2,1 works much worse than "0".
[/QUOTE] Less responsive or less throughput (or both)??? |
Less throughput.
|
Ah, thanks for that clarification!
Yes, of course, the best throughput is achieved when the GPU is not shared with anything, especially not 3D-Games or screen-updates in general :smile:. My suggestion was meant towards a more responsive system at the cost of as little as possible throughput. Regarding your performance measurements: throughput should scale linearly with the GPU clock speed (or shader clock). Memory clock has very little influence. |
[QUOTE=Bdot;384939]My suggestion was meant towards a more responsive system at the cost of as little as possible throughput.[/QUOTE]
I've read that over several times, but can't convince myself that it can not be entirely misunderstood. I recognise that the writer is not necessarily english first. I suspect that various minor inflections might have conveyed a more intended meaning. One such moderation would be to suggest an intention of a more responsive system at the cost of as little as possible throughput [B]reduction[/B]. |
Oh .. I see :blush:. Thank you for trying to extract what I really meant (I wish all forum members did that consistently). I probably wanted to say something like "the responsiveness improvement should cost you as little performance as possible" ...
And you're totally right about English not being my first language. It was actually the third I started to learn. Thinking about this all again, it would probably be much easier for me to aim for as little throughput as possible than what I was trying over the past few years :gah: |
| All times are UTC. The time now is 23:03. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.