![]() |
Fixing int sign change warnings had no effect. Since GPU sieve works, I set GWDEBUG in gpusieve.cl and got lots of output like this:
[CODE]sloppy_mod_p: p doesn't match pinv!! p = 713509, pinv = 6018 (should be 6019)[/CODE] When running without any arguments (a simple selftest) I get: [CODE] ERROR: selftest failed for M1031831 (cl_barrett15_69_gs) no factor found Selftest statistics number of tests 30 successful tests 29 no factor found 1 selftest FAILED![/CODE] |
Fiji GPU
When I try to run mfakto on a R9 Fury X it tells me:
[CODE] Select device - Get device info - Loading binary kernel file mfakto_Kernels.elf Compiling kernels. WARNING: Unknown GPU name, assuming GCN. Please post the device name "Fiji (Advanced Micro Devices, Inc.)" to [URL]http://www.mersenneforum.org/showthread.php?t=15646[/URL] to have it added to mfakto. Set GPUType in mfakto1.ini to select a GPU type yourself to avoid this warning. OpenCL device info name Fiji (Advanced Micro Devices, Inc.) device (driver) version OpenCL 2.0 AMD-APP (1800.8) (1800.8 (VM)) maximum threads per block 256 maximum threads per grid 16777216 number of multiprocessors 64 (4096 compute elements) clock rate 1050MHz Automatic parameters threads per grid 256 optimizing kernels for GCN Started a simple selftest ... ######### testcase 4/17 (M50863909[69-70]) ######### mfakto will exit once the current test is finished. [/CODE]and then hangs in testcase 4/17. Is there any setting I can change to make it work or does it require an updated version of mfakto? |
[QUOTE=glebm;406481]The latest source from GitHub, compiled Visual Studio 2013 and App SDK 3.0 Beta running on Tonga (R9 380) fails both selftests (0.14) works.
[/QUOTE] Version 0.14 is currently the latest stable one. Many of the features for 0.15 are unfinished yet; the version from github cannot be used right now (and it's not because of the unsigned warnings). I need to find some time to fix the new features ... |
[QUOTE=maxa;407377]When I try to run mfakto on a R9 Fury X it tells me:
[CODE] Select device - Get device info - Loading binary kernel file mfakto_Kernels.elf Compiling kernels. WARNING: Unknown GPU name, assuming GCN. Please post the device name "Fiji (Advanced Micro Devices, Inc.)" to [URL]http://www.mersenneforum.org/showthread.php?t=15646[/URL] to have it added to mfakto. Set GPUType in mfakto1.ini to select a GPU type yourself to avoid this warning. OpenCL device info name Fiji (Advanced Micro Devices, Inc.) device (driver) version OpenCL 2.0 AMD-APP (1800.8) (1800.8 (VM)) maximum threads per block 256 maximum threads per grid 16777216 number of multiprocessors 64 (4096 compute elements) clock rate 1050MHz Automatic parameters threads per grid 256 optimizing kernels for GCN Started a simple selftest ... ######### testcase 4/17 (M50863909[69-70]) ######### mfakto will exit once the current test is finished. [/CODE]and then hangs in testcase 4/17. Is there any setting I can change to make it work or does it require an updated version of mfakto?[/QUOTE] I'll add Fiji to the list of known GCN chips. It would be good to have some performance test so I know how to best use the chip, but I guess for that the other issue needs to be fixed first: Could you please run [code] mfakto -i mfakto1.ini -st2[/code]this should show exactly at which kernel it hangs, And maybe give the perftestmfakto.cmd from [URL]http://mersenneforum.org/mfakto/mfakto-0.15pre5/mfakto-0.15pre5.zip[/URL] a chance - if it does not stop right away, the output might be helpful. |
[QUOTE=Ethan (EO);403703]I've pulled the card now, but did manage to run the 0.15pre5 benchmarking script first; results attached.[/QUOTE]
Thank you for the HD6950 Cayman results. [code]Resulting speed for M66362159: bit_min - bit_max GHz-days/day kernelname 60 - 69 209.504 cl_barrett15_69_gs 69 - 70 196.994 cl_barrett15_71_gs 70 - 74 185.303 cl_barrett15_74_gs 74 - 77 162.719 cl_barrett32_77_gs 77 - 88 147.887 cl_barrett32_88_gs 88 - 92 117.277 cl_barrett32_92_gs [/code]They confirm that the burnt electricity is probably not worth it - there are way faster and more efficient cards these days. The 205 GHz-days/day that are listed on [url]http://www.mersenne.ca/mfaktc.php[/url] for the 6950 are probably a bit optimistic and can only be achieved when factoring up to 69 bits. |
I am seeing something interested and unexpected with mfakto and multiple GPUs. On one much older host system, the PCIe links are reducing to 5GT/s x16 for the primary card and 5GT/s x4 for the secondary. What is surprising is this has a pretty massive effect on mfakto's performance. This is strange because mfakto should not need much bandwidth to the cards at all, and to my understanding of the code should only be making a (trivial) call for each class.
These are Fury X cards which normally get around 1000GhzDay/Day at 8GT/s x16 PCIe speeds. I will see a few different tiers of speeds that directly correlate to negotiated PCIe speeds, all factoring using cl_barret15_73_gs_2. PCIe Speed - GhzDay/day 8GT/s x16 - 1000 5GT/s x16 - 890 5GT/s x8 - 690 5GT/s x4 - 480 2.5GT/s x4 - 190 Increasing the number of streams did not seem to have any appreciable effect. |
That is strange! I always have a mixture of x8 and x16 cards and don't see any difference in output for my gtx580s...
Are you sure you call mfaktX with appropriate "-d gpu_number"? |
Yes, I can verify the correct card is initialized and the performance matches the PCIe lane speed even if the other card is idle. With my slower cards this is not noticeable until 5GT/4x or even 2.5GT/4x but the Fury X sees the penalty much earlier. I might try Windows instead of Debian to see about the Catalyst driver version.
I have switched the two cards and the performance definitely follows the slot, this is true in two different systems as well so that rules out the motherboard/chipset. I will say both of these systems are AMD CPUs, but my Intel systems are all much faster systems overall with plenty of PCIe lanes. I did notice that we do actually do multiple kernel schedulings per class as it works through the K range, but adjusting the sieve size parameters changed performance an order of magnitude less than the PCIe slot. |
[QUOTE=airsquirrels;408552]I have switched the two cards and the performance definitely follows the slot, this is true in two different systems as well so that rules out the motherboard/chipset.[/QUOTE]
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...'" -- Isaac Asimov |
[QUOTE=airsquirrels;408552]Yes, I can verify the correct card is initialized and the performance matches the PCIe lane speed even if the other card is idle. With my slower cards this is not noticeable until 5GT/4x or even 2.5GT/4x but the Fury X sees the penalty much earlier. I might try Windows instead of Debian to see about the Catalyst driver version.
I have switched the two cards and the performance definitely follows the slot, this is true in two different systems as well so that rules out the motherboard/chipset. I will say both of these systems are AMD CPUs, but my Intel systems are all much faster systems overall with plenty of PCIe lanes. I did notice that we do actually do multiple kernel schedulings per class as it works through the K range, but adjusting the sieve size parameters changed performance an order of magnitude less than the PCIe slot.[/QUOTE] I guess it doesn't hurt to ask. Are you using GPU Sieve or CPU Sieve? |
[QUOTE=axn;408581]I guess it doesn't hurt to ask. Are you using GPU Sieve or CPU Sieve?[/QUOTE]
Bingo, good question! (as usual axn to the point!). |
| All times are UTC. The time now is 22:59. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.