![]() |
|
|
#1310 |
|
"Gleb Mazovetskiy"
Jul 2015
London, UK
22 Posts |
Fixing int sign change warnings had no effect. Since GPU sieve works, I set GWDEBUG in gpusieve.cl and got lots of output like this:
Code:
sloppy_mod_p: p doesn't match pinv!! p = 713509, pinv = 6018 (should be 6019) Code:
ERROR: selftest failed for M1031831 (cl_barrett15_69_gs) no factor found Selftest statistics number of tests 30 successful tests 29 no factor found 1 selftest FAILED! |
|
|
|
|
|
#1311 |
|
Jul 2015
116 Posts |
When I try to run mfakto on a R9 Fury X it tells me:
Code:
Select device - Get device info - Loading binary kernel file mfakto_Kernels.elf Compiling kernels. WARNING: Unknown GPU name, assuming GCN. Please post the device name "Fiji (Advanced Micro Devices, Inc.)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto1.ini to select a GPU type yourself to avoid this warning. OpenCL device info name Fiji (Advanced Micro Devices, Inc.) device (driver) version OpenCL 2.0 AMD-APP (1800.8) (1800.8 (VM)) maximum threads per block 256 maximum threads per grid 16777216 number of multiprocessors 64 (4096 compute elements) clock rate 1050MHz Automatic parameters threads per grid 256 optimizing kernels for GCN Started a simple selftest ... ######### testcase 4/17 (M50863909[69-70]) ######### mfakto will exit once the current test is finished. Is there any setting I can change to make it work or does it require an updated version of mfakto? |
|
|
|
|
|
#1312 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
I need to find some time to fix the new features ... |
|
|
|
|
|
|
#1313 | |
|
Nov 2010
Germany
11258 Posts |
Quote:
Could you please run Code:
mfakto -i mfakto1.ini -st2 And maybe give the perftestmfakto.cmd from http://mersenneforum.org/mfakto/mfak...o-0.15pre5.zip a chance - if it does not stop right away, the output might be helpful. |
|
|
|
|
|
|
#1314 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
Code:
Resulting speed for M66362159:
bit_min - bit_max GHz-days/day kernelname
60 - 69 209.504 cl_barrett15_69_gs
69 - 70 196.994 cl_barrett15_71_gs
70 - 74 185.303 cl_barrett15_74_gs
74 - 77 162.719 cl_barrett32_77_gs
77 - 88 147.887 cl_barrett32_88_gs
88 - 92 117.277 cl_barrett32_92_gs
|
|
|
|
|
|
|
#1315 |
|
"David"
Jul 2015
Ohio
11·47 Posts |
I am seeing something interested and unexpected with mfakto and multiple GPUs. On one much older host system, the PCIe links are reducing to 5GT/s x16 for the primary card and 5GT/s x4 for the secondary. What is surprising is this has a pretty massive effect on mfakto's performance. This is strange because mfakto should not need much bandwidth to the cards at all, and to my understanding of the code should only be making a (trivial) call for each class.
These are Fury X cards which normally get around 1000GhzDay/Day at 8GT/s x16 PCIe speeds. I will see a few different tiers of speeds that directly correlate to negotiated PCIe speeds, all factoring using cl_barret15_73_gs_2. PCIe Speed - GhzDay/day 8GT/s x16 - 1000 5GT/s x16 - 890 5GT/s x8 - 690 5GT/s x4 - 480 2.5GT/s x4 - 190 Increasing the number of streams did not seem to have any appreciable effect. Last fiddled with by airsquirrels on 2015-08-22 at 14:03 |
|
|
|
|
|
#1316 |
|
Romulan Interpreter
Jun 2011
Thailand
32·29·37 Posts |
That is strange! I always have a mixture of x8 and x16 cards and don't see any difference in output for my gtx580s...
Are you sure you call mfaktX with appropriate "-d gpu_number"? Last fiddled with by LaurV on 2015-08-22 at 17:36 |
|
|
|
|
|
#1317 |
|
"David"
Jul 2015
Ohio
10000001012 Posts |
Yes, I can verify the correct card is initialized and the performance matches the PCIe lane speed even if the other card is idle. With my slower cards this is not noticeable until 5GT/4x or even 2.5GT/4x but the Fury X sees the penalty much earlier. I might try Windows instead of Debian to see about the Catalyst driver version.
I have switched the two cards and the performance definitely follows the slot, this is true in two different systems as well so that rules out the motherboard/chipset. I will say both of these systems are AMD CPUs, but my Intel systems are all much faster systems overall with plenty of PCIe lanes. I did notice that we do actually do multiple kernel schedulings per class as it works through the K range, but adjusting the sieve size parameters changed performance an order of magnitude less than the PCIe slot. |
|
|
|
|
|
#1318 |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
9,767 Posts |
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...'" -- Isaac Asimov
|
|
|
|
|
|
#1319 | |
|
Jun 2003
117328 Posts |
Quote:
|
|
|
|
|
|
|
#1320 |
|
Romulan Interpreter
Jun 2011
Thailand
32×29×37 Posts |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2718 | 2021-07-06 18:30 |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |