mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

Bdot 2012-12-09 12:47

[QUOTE=Rodrigo;321016]
OK, here's the attached clinfo file.
[/QUOTE]
Hmm, the clinfo shows the AMD-OpenCL runtime supporting the 7770 and the CPU, and the Intel-OpenCL runtime supporting only the CPU.

I had hoped that Intel's runtime would register both the CPU and the embedded GPU. I guess, a little more reading on Intel's OpenCL sites is required ... or you really need to install Intel's DevKit?

Rodrigo 2012-12-09 15:51

That's what I thought, too.

I did some looking around the Intel site and didn't find anything definite on whether you need to install the SDK in order simply to [B]use[/B] OpenCL (as opposed to developing software on it).

One thing I found that was possibly disturbing was in the [URL="http://software.intel.com/en-us/articles/opencl-release-notes/"]release notes[/URL]:

[QUOTE]



To overcome shared context (the OpenCL* context which includes both CPU and GPU devices) limitations: [LIST][*]Do not trigger for both devices (specify NULL as device_list parameter)[*]Avoid using images with CL_MEM_USE_HOST_PTR flag.[*][B]Avoid using on systems with discrete graphics.[/B][/LIST][/QUOTE][emphasis added]

Rodrigo

kjaget 2012-12-10 16:48

[QUOTE=aketilander;320969]Well I think it depends on what kind of work you are doing.

According to [URL]http://www.mersenne.ca[/URL]

[URL="http://www.mersenne.ca/mfaktc.php"]TF: 452.4 GHz-days/day[/URL]
[URL="http://www.mersenne.ca/cudalucas.php"]LL: 43.2-55.8 GHz-days/day[/URL]

If you use other programs like mmff, which is doing the sieving on the GPU. I have no idea what you would expect.

To me it seems that you have OCed your GPU very much so its a good idea to take notice of the temperature.[/QUOTE]

I believe the GPU numbers on the web site are from running a single instance and extrapolating to 100% GPU utilization. That ignores the benefit you gain from running multiple instances of mfakt[co] and allowing them to do more than the bare minimum amount of sieving. Having the CPUs offload some of the work this way will speed up how quickly the GPU can run through factors.

This may or may not explain the performance difference, or at least part of it.

Bdot 2012-12-10 20:20

[QUOTE=Rodrigo;321092]That's what I thought, too.

I did some looking around the Intel site and didn't find anything definite on whether you need to install the SDK in order simply to [B]use[/B] OpenCL (as opposed to developing software on it).

One thing I found that was possibly disturbing was in the [URL="http://software.intel.com/en-us/articles/opencl-release-notes/"]release notes[/URL]:

[emphasis added]

Rodrigo[/QUOTE]

Have a look [URL="http://stackoverflow.com/questions/11999889/cpu-as-host-and-intel-hd-4000-as-device-1-and-discrete-gpu-as-device-2-in-opencl"]here[/URL].
It seems, once you installed the new Intel SDK [U]and[/U] the new Intel drivers, the IntelHD4000 should be registered with OpenCL (that is what the ICD stuff means).

The Intel embedded GPU is still OpenCL 1.1 and does not allow double precision, but that is OK with mfakto. I's HD4000 brings 16 compute cores @ up to 1.15GHz - I'm really curious if this can add a noticable contribution to your primenet success ...

Bdot

Rodrigo 2012-12-10 23:17

Very good, thanks!

Reading the post from Stackoverflow, I see this:

[QUOTE]The Intel ICD will enumerate both the host CPU and integrated GPU as OpenCL capable devices. You will then need to use the discrete GPU vendor's SDK and ICD to identify and enumerate that as an OpenCL device. [/QUOTE]

Can I assume that I don't need to do the second part (with respect to an SDK and ICD for the HD 7770) because tjhe discrete GPU is already recognized on my computer?

Rodrigo

kracker 2012-12-10 23:44

[QUOTE=Bdot;321213]The Intel embedded GPU is still OpenCL 1.1 and does not allow double precision, but that is OK with mfakto. I's HD4000 brings 16 compute cores @ up to 1.15GHz - I'm really curious if this can add a noticable contribution to your primenet success ...[/QUOTE]

I'm curious too, how much difference from AMD's integrated (APU) iGPU.

Bdot 2012-12-11 08:43

[QUOTE=Rodrigo;321222]
Can I assume that I don't need to do the second part (with respect to an SDK and ICD for the HD 7770) because tjhe discrete GPU is already recognized on my computer?

Rodrigo[/QUOTE]
mfakto will do the second part for you :smile:
mfakto is built with the "discrete GPU vendor's SDK". The only missing part is the registration (aka ICD) for the embedded GPU.

I assume, the Intel SDK will add the eGPU as another device to the Intel platform. Most likely it will takes the CPU's -d21 and shift the CPU to -d22. If it is added as a separate platform, you'd need -d31 for it. Again, if it is not working well, a clinfo output will clarify.

And reading a little more, it appears the CPU alone can provide enough heat to reach the whole chip's thermal limits. This means, that any additional heat from the eGPU may reduce the CPU's clock in order to stay within the specification. So when trying it out, better monitor all clocks ...

There's another thing that I'd be interested in: You now have the choice of two different implementations for running mfakto on the CPU. (-d 12 and -d21) One will use AMD's compiler to build the kernels, the other will use Intel's. Which one is faster? (Not that it really matters - you don't usually let mfakto run on the CPU - but it's interesting anyway.)

Rodrigo 2012-12-24 06:33

1 Attachment(s)
OK, I finally got the time to research and install the Intel SDK for OpenCL Applications, to see if I can run mfakto on both GPUs.

No dice. mfakto is still giving me the same error messages as reported upthread, and GPU-Z is still not putting check marks in the OpenCL or DirectCompute boxes for the HD 4000. I'm not sure what else needs to be done, beyond "installing" the SDK.

FWIW, during installation of the Intel SDK there was a warning that it would not be integrated with Visual Studio (since I don't have that). Or was it Visual C++, I can't remember and it all sounds alike to me (sorry!)...

If I have to start hunting for those sorts of things to get this done, it may simply not be worth the effort. I don't have the time or, more importantly, the expertise to range that far and wide!

Rodrigo

P.S. Also FWIW, I'm attaching the new clinfo (renamed).

Ralf Recker 2012-12-24 09:03

[QUOTE=Rodrigo;322484]I'm not sure what else needs to be done, beyond "installing" the SDK.[/QUOTE]

[url]http://software.intel.com/en-us/forums/topic/277886[/url]

[QUOTE]Currently Processor Graphics OCL device in unavailable in the "headless" configuration (without a monitor plugged in).[/QUOTE]

Rodrigo 2012-12-26 05:15

[QUOTE=Ralf Recker;322492][URL]http://software.intel.com/en-us/forums/topic/277886[/URL][/QUOTE]
Hmm, that might be a problem. This is a brand-new HP system and the iGPU video-out ports are covered with a bracket that reads, "Do Not Remove." In spite of the oddball screws, I wouldn't have an objection to removing the brackets, except for the possibility that doing that might void the warranty. :rolleyes: I will have to look into that before proceeding.

Thanks for the warning.

Rodrigo

Rodrigo 2012-12-26 05:34

Another question, about using mfakto on the GPU along with Prime95 on the CPU:

Is there any way to tell these two programs to use specific cores of the i7 3770? The reason is that yesterday I was using two CPU cores to do LLs while a third core was supporting mfakto, and everything was running smoothly. The time/class for mfakto was at 2.xxx seconds and one exponent was taking about 45 minutes to finish.

Then, I discovered a manually reserved LL that I had forgotten about for 174 days, so I decided to add it to Prime95 in a third worker window. But now, the time/class for mfakto is over [B]4.xxx[/B] seconds. (And yet, according to GPU-Z, the GPU load is at 1%. :unsure:) The per-iteration times for the original two Prime95 worker windows have gone up from 0.020 and 0.019 to 0.025 and 0.026, respectively. Evidently, mfakto and Prime95 are stepping on each other.

A further complication is that when I selected CPUs 2 and 4 for Prime95, according to Task Manager (Windows 7) there are 8 available threads (or whatever the right designation is for that) in the quad-core system, and it was the second and the fourth of these eight that were busy, so I am not sure if Prime95 was actually using the second and fourth [B]cores,[/B] or merely the second halves of the first two cores. (Have I put my question clearly enough?) Now with the third worker operating and mfakto doing it thing, I've ended up with [B]five[/B] of the eight threads running at or near 100%. I would have guessed three for each of the Prime95 workers, and one for mfakto (3 + 1 = 4, not 5).

FWIW, that third "emergency" LL is set to "Smart Assignment" CPU selection. When I had it selected to CPU 3, the per-iteration times on the other two shot up to 0.035 and 0.040. I ended up doing Smart Assignment, with ThreadsPerTest set at 2.

Bottom line: I would like to learn how to tell Prime95 to use (say) the first three physical cores (only), and mfakto to use the last core (only). And this while using just one thread, not two, per Prime95 worker.

Suggestions are very welcome...

Rodrigo


All times are UTC. The time now is 23:05.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.