![]() |
|
|
#1321 |
|
"David"
Jul 2015
Ohio
11·47 Posts |
|
|
|
|
|
|
#1322 |
|
Nov 2010
Germany
3×199 Posts |
The only thing that I could imagine here is that the delay between sending results for a block from the GPU to the CPU and receiving the order for the next block is increasing a lot when the PCIe speed drops. As you are running very short assignments, this may have a big impact.
If that is the case, the GPU load should be rather low - scaled down with the GHzDays/day. Could you please run two separate tests:
I would also suggest to play around with FlushInterval: If the card is so fast, then the queue may run empty. Setting higher values (or zero to disable chunking) should help. |
|
|
|
|
|
#1323 |
|
Aug 2015
22×17 Posts |
Is Fury X supported by the latest mfakto (0.15pre5)? Or in linux with (0.14) ?
|
|
|
|
|
|
#1324 | |
|
Nov 2010
Germany
3×199 Posts |
Quote:
0.14 can run on Fury X, though it will say that it does not know the chip. It will select GCN optimization which fits well (as far as I can tell - I have not yet had a chance to play around with that card). |
|
|
|
|
|
|
#1325 | |
|
Aug 2015
22·17 Posts |
Quote:
|
|
|
|
|
|
|
#1326 |
|
"Victor de Hollander"
Aug 2011
the Netherlands
49816 Posts |
After more than 1.5 years on AMD Catalyst 13.12 I finally updated my drivers to 15.7.1
Win7 64bit HD7950 -st: Code:
Selftest statistics number of tests 3092 successful tests 3092 selftest PASSED! |
|
|
|
|
|
#1327 |
|
Nov 2010
Germany
11258 Posts |
That would indeed be very helpful. If you have windows, please run the perftestmfakto.cmd from http://mersenneforum.org/mfakto/mfakto-0.15pre5/
If you have Linux, I'd need to prepare the binary for it first ... |
|
|
|
|
|
#1328 |
|
Nov 2010
Germany
10010101012 Posts |
|
|
|
|
|
|
#1329 | |
|
Aug 2015
22×17 Posts |
Quote:
|
|
|
|
|
|
|
#1330 |
|
"David"
Jul 2015
Ohio
20516 Posts |
I've been using a specific stable commit build of 0.15 on Fury X cards on Linux to great success. I ran side by side with 0.14 and the 0.15pre5 build and both pass both normal and extended self tests and hit pretty close to the expected found factor percentages. 0.15 is definitely a bit faster though :)
I'm still having trouble on systems with less PCIe lanes but I have been too busy with work to investigate further yet. |
|
|
|
|
|
#1331 |
|
Nov 2010
Germany
3·199 Posts |
My adventure into GPU assembly programming is over before it really started. My 7950 died (the VRMs did, to be specific). As a replacement I ordered an R9 380, also as an incentive to do some model-refresh in mfakto. The new card, however, is not recognized by the ancient driver, so I moved to 15.7.1 as well - no bad surprises so far.
However, I cannot see the int32 improvements that some owners of an R9 285 (which should be the same "Tonga"-chip) have reported. For me, the usual GCN selection works well. To be sure about that I created a version that will performance-test each kernel for each TF job to find the fastest one for the exponent and bitlevel. I think I will keep this test as an option, to persist and re-use the results. That should allow to adapt to any upcoming development of APUs, GPUs, and whatever, across vendors. BTW, the selftest failure of the latest code is caused by an incomplete merge from mfaktc. I the test case of very small exponents, but the code to handle them correctly is not yet in. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2718 | 2021-07-06 18:30 |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |