mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

legendarymudkip 2014-07-05 09:40

I have an i5-4670k and seeing that it worked on the HD4600, I decided to give it a go. However, whenever I open the executable from [url]http://mersenneforum.org/mfakto/mfakto-0.14/[/url] I always get:
ERROR: init_CL<3, 0> failed
and I am unsure as to what the problem is. Do I need to build it on my PC? If so, how do I do so?

Prime95 2014-07-05 19:09

[QUOTE=legendarymudkip;377439]I have an i5-4670k and seeing that it worked on the HD4600, I decided to give it a go. [/QUOTE]

Did you download the latest Intel driver? I got mine starting here: [url]https://software.intel.com/en-us/vcsource/tools/opencl-sdk[/url]

You should not need the SDK unless you want to compile and link an executable. The mfakto you downloaded won't work (it doesn't have the fixes I've made), but it should recognize your HD 4600 engine.

legendarymudkip 2014-07-05 19:49

I installed the latest non-beta driver for it today, but every time I try running either executable (32 or 64 bit) it always just quickly flashes up with
ERROR: init_CL<3, 0> failed
and then closes immediately. I don't know what the problem is, but I hope the error message is helpful :/

Bdot 2014-07-06 10:49

[QUOTE=kracker;377392]Well, should I make a new Makefile then? kbhit.cpp is set to build there, and sadly I think there is no easy way to differentiate between platforms in Makefile, a reason for ./configure :smile:[/QUOTE]
I will add #ifdefs to kbhit.cpp/.h so that we can build it on win as well.
[QUOTE=Prime95;377395]
In barrett24.cl replace the 4620 in mul_24_48 with (4620 % (exp72.d1 + 1000000))

in common.cl calc_FC32, replace the 4620u in mul_hi with (4620 % (exponent + 1000000))


I sure hope you're not including these workarounds into the AMD code and they can be turned off when Intel fixes their compiler/drivers. BTW, here is the link to my bug report: [URL]https://software.intel.com/en-us/forums/topic/517787[/URL][/QUOTE]
In common.c I used (4620 % (exponent + 1)) so that we don't have an overflow if exponent gets close to 2[SUP]32[/SUP]. As mfakto has a minimum of 1000000 for exponent, I hope this is OK too.
[QUOTE=Prime95;377398]4 files coming -- for vector_sizes 1,2,4,8[/QUOTE]
Very interesting. It seems HD4600 has plenty of registers, or is using them very efficiently. All vector sizes have almost the same speed, VectorSize=4 being fastest. Regarding the kernels, there is no big surprise. 32-bit kernels are way more efficient than 15-bit, 32_76 being best. It's raw speed of ~26M FC's per second can yield ~24 GHz-days/day with very high CPU sieving (to 200,000). GPU-sieving should be able to achieve ~20GHz-days/day.

[QUOTE=Prime95;377404]--perftest crashes, output attached. MSVC is useless in finding the cause (call_stack is of no value).[/QUOTE]
Looks like the driver does not like re-initializing the application. I'll try to build separate perftest modes for CPU and GPU sieving.

[QUOTE=Prime95;377416]Bdot, have you looked at the Intel OpenCL optimization guide? I'm not enough of an OpenCL / mfakto expert to make much use of it -- maybe you can help. I ran three TF assignments in the 450M area and was getting a paltry 16GHz-days/day. I don't have a feel for what should be theoretically possible.

Guide is here: [URL]https://software.intel.com/en-us/iocl_2014.b1_opg?language=it[/URL]

I'm especially interested in optimizations that minimize / eliminates impact on memory bandwidth. The current mfakto does slow down running LL tests.[/QUOTE]

I browsed through that online guide, but it is all very high-level. More details are available only for OpenCL for AVX, i.e. when you target the CPU instead of the built-in GPU.

The TF kernels require almost no bandwidth, it comes almost all from the sieving, where the global memory sieve is stressed. Local memory, that is used for counting and extracting the sieve bits, is probably implemented in the L2 cache - using a lot of it will cause more cache misses in LL tests.

My guess is, that using smaller SieveSize will reduce the amount of global memory, lower GPUSieveProcessSize will reduce the amount of L2 cache being accessed, and using lower SievePrimes will cause fewer accesses to global memory. The downside is that the TF kernels will have more work, but they use only registers ...

Eliminating memory bandwidth can be achieved only, if we changed the sieving to primarily use registers. On most platforms register space is pretty scarce, though. As you probably also want to avoid L2 cache accesses, local memory would need to be avoided too ...

[QUOTE=legendarymudkip;377465]I installed the latest non-beta driver for it today, but every time I try running either executable (32 or 64 bit) it always just quickly flashes up with
ERROR: init_CL<3, 0> failed
and then closes immediately. I don't know what the problem is, but I hope the error message is helpful :/[/QUOTE]

Please open a command prompt, cd to the mfakto directory and run mfakto there - then you will see the complete output. Interesting would be to see which error message is reported before the 'init_CL failed'.

Another test would be to run 'clinfo' which should come with the driver. It should report two devices (CPU and GPU) available for use with OpenCL.

legendarymudkip 2014-07-06 12:47

Select device - Error: No platform found
ERROR: init_CL<3, 0> failed

Ran from command prompt and got this.

potonono 2014-07-06 16:21

1 Attachment(s)
I was getting a similar error. On one of the earlier posts, it was mentioned that changing the GPU type specifically to NVIDIA in the INI file would skip one of the checks that intel's GPU didn't like. Attached is my output for clinfo and mfakto-x64.

kracker 2014-07-06 19:37

[QUOTE=potonono;377507]I was getting a similar error. On one of the earlier posts, it was mentioned that changing the GPU type specifically to NVIDIA in the INI file would skip one of the checks that intel's GPU didn't like. Attached is my output for clinfo and mfakto-x64.[/QUOTE]

You'll need to either get or compile the latest git, btw.

Bdot 2014-07-06 22:29

[QUOTE=legendarymudkip;377500]Select device - Error: No platform found
ERROR: init_CL<3, 0> failed

Ran from command prompt and got this.[/QUOTE]

"No platform found" means that no usable OpenCL driver is available.

[QUOTE=potonono;377507]I was getting a similar error. On one of the earlier posts, it was mentioned that changing the GPU type specifically to NVIDIA in the INI file would skip one of the checks that intel's GPU didn't like. Attached is my output for clinfo and mfakto-x64.[/QUOTE]

You seem to have solved the OpenCL driver issue. But mfakto 0.14 is not ready for IntelHD graphics.

[QUOTE=kracker;377518]You'll need to either get or compile the latest git, btw.[/QUOTE]
I will most likely provide a test-build in the next few days ...

kracker 2014-07-06 22:32

[QUOTE=Bdot;377543]
I will most likely provide a test-build in the next few days ...[/QUOTE]

:smile:
Good!
Now, if I can figure out why the heck my 4600 doesn't appear in clinfo while my other one does... Maybe because of my two AMD GPU's installed? Dunno..

tului 2014-07-07 01:03

I just finished 77 bit numbers on 2 260X cards. Is there any benefit to doing that? It pulls around 200GHz/day. If I need to stay on the "front" line please advise me where to pull numbers from.

I run these machines 24/7. I'd use the CPU too but it's crunching a 100M digit number due in February. I really want to help as much as possible(while pumping my numbers) any idea what to do. I can buld mfakto from source so if you have any tests you want me to run, feel free to shoot them my way.

Mark Rose 2014-07-07 09:29

[QUOTE=tului;377547]I just finished 77 bit numbers on 2 260X cards. Is there any benefit to doing that? It pulls around 200GHz/day. If I need to stay on the "front" line please advise me where to pull numbers from.

I run these machines 24/7. I'd use the CPU too but it's crunching a 100M digit number due in February. I really want to help as much as possible(while pumping my numbers) any idea what to do. I can buld mfakto from source so if you have any tests you want me to run, feel free to shoot them my way.[/QUOTE]

To stay on the "front line" the easiest way is to use GPU72 and pick "let GPU72 decide" when requesting assignments. It's currently more helpful to factor only up to 74 bits and do more assignments.


All times are UTC. The time now is 23:05.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.