![]() |
[QUOTE=axn;408581]I guess it doesn't hurt to ask. Are you using GPU Sieve or CPU Sieve?[/QUOTE]
I am using the GPU sieve. It is possible this is an issue with AMD Catalyst 15.7 beta driver, when I get a moment I will be testing using a Windows image for comparison. |
The only thing that I could imagine here is that the delay between sending results for a block from the GPU to the CPU and receiving the order for the next block is increasing a lot when the PCIe speed drops. As you are running very short assignments, this may have a big impact.
If that is the case, the GPU load should be rather low - scaled down with the GHzDays/day. Could you please run two separate tests: [LIST=1][*]use longer tests, like 72 to 73 bits. This should reduce the effect (note that a different (slower) kernel will be used - you may not get 1000 GHz even on the fast PCIe).[*] run multiple instances of mfakto for the same device (e.g. from different directories using different worktodo.txt files). Maybe in total you can achieve 1000 GHz this way even on lower PCIe speeds?[/LIST]BTW, the number of streams only plays a role when using the CPU sieve. When mfakto starts with GPU sieving, then it will display the settings that it regards in this mode. I would also suggest to play around with FlushInterval: If the card is so fast, then the queue may run empty. Setting higher values (or zero to disable chunking) should help. |
Is Fury X supported by the latest mfakto (0.15pre5)? Or in linux with (0.14) ?
|
[QUOTE=UBR47K;409720]Is Fury X supported by the latest mfakto (0.15pre5)? Or in linux with (0.14) ?[/QUOTE]
0.15 does not support anything yet - it's not ready to do real tasks. 0.14 can run on Fury X, though it will say that it does not know the chip. It will select GCN optimization which fits well (as far as I can tell - I have not yet had a chance to play around with that card). |
[QUOTE=Bdot;409734]0.15 does not support anything yet - it's not ready to do real tasks.
0.14 can run on Fury X, though it will say that it does not know the chip. It will select GCN optimization which fits well (as far as I can tell - I have not yet had a chance to play around with that card).[/QUOTE] I can offer to run tests on it if needed. |
After more than 1.5 years on AMD Catalyst 13.12 I finally updated my drivers to 15.7.1
Win7 64bit HD7950 -st: [code] Selftest statistics number of tests 3092 successful tests 3092 selftest PASSED![/code]Always a relief to see it working after an update. |
[QUOTE=UBR47K;409735]I can offer to run tests on it if needed.[/QUOTE]
That would indeed be very helpful. If you have windows, please run the perftestmfakto.cmd from [url]http://mersenneforum.org/mfakto/mfakto-0.15pre5/[/url] If you have Linux, I'd need to prepare the binary for it first ... |
[QUOTE=VictordeHolland;409825]Always a relief to see it working after an update.[/QUOTE]
:smile: yes, we've seen surprises. I run a windows box on 12.8 (the last version where I can use assembly language for the GPU), another on on some 13.x and a Linux box currently on latest and greatest ... |
[QUOTE=Bdot;410047]That would indeed be very helpful. If you have windows, please run the perftestmfakto.cmd from [url]http://mersenneforum.org/mfakto/mfakto-0.15pre5/[/url]
If you have Linux, I'd need to prepare the binary for it first ...[/QUOTE] I am using Linux here |
I've been using a specific stable commit build of 0.15 on Fury X cards on Linux to great success. I ran side by side with 0.14 and the 0.15pre5 build and both pass both normal and extended self tests and hit pretty close to the expected found factor percentages. 0.15 is definitely a bit faster though :)
I'm still having trouble on systems with less PCIe lanes but I have been too busy with work to investigate further yet. |
My adventure into GPU assembly programming is over before it really started. My 7950 died (the VRMs did, to be specific). As a replacement I ordered an R9 380, also as an incentive to do some model-refresh in mfakto. The new card, however, is not recognized by the ancient driver, so I moved to 15.7.1 as well - no bad surprises so far.
However, I cannot see the int32 improvements that some owners of an R9 285 (which should be the same "Tonga"-chip) have reported. For me, the usual GCN selection works well. To be sure about that I created a version that will performance-test each kernel for each TF job to find the fastest one for the exponent and bitlevel. I think I will keep this test as an option, to persist and re-use the results. That should allow to adapt to any upcoming development of APUs, GPUs, and whatever, across vendors. BTW, the selftest failure of the latest code is caused by an incomplete merge from mfaktc. I the test case of very small exponents, but the code to handle them correctly is not yet in. |
| All times are UTC. The time now is 22:55. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.