mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-12-09, 12:47   #595
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by Rodrigo View Post
OK, here's the attached clinfo file.
Hmm, the clinfo shows the AMD-OpenCL runtime supporting the 7770 and the CPU, and the Intel-OpenCL runtime supporting only the CPU.

I had hoped that Intel's runtime would register both the CPU and the embedded GPU. I guess, a little more reading on Intel's OpenCL sites is required ... or you really need to install Intel's DevKit?
Bdot is offline   Reply With Quote
Old 2012-12-09, 15:51   #596
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

2·467 Posts
Default

That's what I thought, too.

I did some looking around the Intel site and didn't find anything definite on whether you need to install the SDK in order simply to use OpenCL (as opposed to developing software on it).

One thing I found that was possibly disturbing was in the release notes:

Quote:



To overcome shared context (the OpenCL* context which includes both CPU and GPU devices) limitations:
  • Do not trigger for both devices (specify NULL as device_list parameter)
  • Avoid using images with CL_MEM_USE_HOST_PTR flag.
  • Avoid using on systems with discrete graphics.
[emphasis added]

Rodrigo
Rodrigo is offline   Reply With Quote
Old 2012-12-10, 16:48   #597
kjaget
 
kjaget's Avatar
 
Jun 2005

3×43 Posts
Default

Quote:
Originally Posted by aketilander View Post
Well I think it depends on what kind of work you are doing.

According to http://www.mersenne.ca

TF: 452.4 GHz-days/day
LL: 43.2-55.8 GHz-days/day

If you use other programs like mmff, which is doing the sieving on the GPU. I have no idea what you would expect.

To me it seems that you have OCed your GPU very much so its a good idea to take notice of the temperature.
I believe the GPU numbers on the web site are from running a single instance and extrapolating to 100% GPU utilization. That ignores the benefit you gain from running multiple instances of mfakt[co] and allowing them to do more than the bare minimum amount of sieving. Having the CPUs offload some of the work this way will speed up how quickly the GPU can run through factors.

This may or may not explain the performance difference, or at least part of it.
kjaget is offline   Reply With Quote
Old 2012-12-10, 20:20   #598
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

25516 Posts
Default

Quote:
Originally Posted by Rodrigo View Post
That's what I thought, too.

I did some looking around the Intel site and didn't find anything definite on whether you need to install the SDK in order simply to use OpenCL (as opposed to developing software on it).

One thing I found that was possibly disturbing was in the release notes:

[emphasis added]

Rodrigo
Have a look here.
It seems, once you installed the new Intel SDK and the new Intel drivers, the IntelHD4000 should be registered with OpenCL (that is what the ICD stuff means).

The Intel embedded GPU is still OpenCL 1.1 and does not allow double precision, but that is OK with mfakto. I's HD4000 brings 16 compute cores @ up to 1.15GHz - I'm really curious if this can add a noticable contribution to your primenet success ...

Bdot
Bdot is offline   Reply With Quote
Old 2012-12-10, 23:17   #599
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

93410 Posts
Default

Very good, thanks!

Reading the post from Stackoverflow, I see this:

Quote:
The Intel ICD will enumerate both the host CPU and integrated GPU as OpenCL capable devices. You will then need to use the discrete GPU vendor's SDK and ICD to identify and enumerate that as an OpenCL device.
Can I assume that I don't need to do the second part (with respect to an SDK and ICD for the HD 7770) because tjhe discrete GPU is already recognized on my computer?

Rodrigo

Last fiddled with by Rodrigo on 2012-12-10 at 23:17
Rodrigo is offline   Reply With Quote
Old 2012-12-10, 23:44   #600
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by Bdot View Post
The Intel embedded GPU is still OpenCL 1.1 and does not allow double precision, but that is OK with mfakto. I's HD4000 brings 16 compute cores @ up to 1.15GHz - I'm really curious if this can add a noticable contribution to your primenet success ...
I'm curious too, how much difference from AMD's integrated (APU) iGPU.
kracker is offline   Reply With Quote
Old 2012-12-11, 08:43   #601
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by Rodrigo View Post
Can I assume that I don't need to do the second part (with respect to an SDK and ICD for the HD 7770) because tjhe discrete GPU is already recognized on my computer?

Rodrigo
mfakto will do the second part for you
mfakto is built with the "discrete GPU vendor's SDK". The only missing part is the registration (aka ICD) for the embedded GPU.

I assume, the Intel SDK will add the eGPU as another device to the Intel platform. Most likely it will takes the CPU's -d21 and shift the CPU to -d22. If it is added as a separate platform, you'd need -d31 for it. Again, if it is not working well, a clinfo output will clarify.

And reading a little more, it appears the CPU alone can provide enough heat to reach the whole chip's thermal limits. This means, that any additional heat from the eGPU may reduce the CPU's clock in order to stay within the specification. So when trying it out, better monitor all clocks ...

There's another thing that I'd be interested in: You now have the choice of two different implementations for running mfakto on the CPU. (-d 12 and -d21) One will use AMD's compiler to build the kernels, the other will use Intel's. Which one is faster? (Not that it really matters - you don't usually let mfakto run on the CPU - but it's interesting anyway.)

Last fiddled with by Bdot on 2012-12-11 at 09:08
Bdot is offline   Reply With Quote
Old 2012-12-24, 06:33   #602
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

11101001102 Posts
Default

OK, I finally got the time to research and install the Intel SDK for OpenCL Applications, to see if I can run mfakto on both GPUs.

No dice. mfakto is still giving me the same error messages as reported upthread, and GPU-Z is still not putting check marks in the OpenCL or DirectCompute boxes for the HD 4000. I'm not sure what else needs to be done, beyond "installing" the SDK.

FWIW, during installation of the Intel SDK there was a warning that it would not be integrated with Visual Studio (since I don't have that). Or was it Visual C++, I can't remember and it all sounds alike to me (sorry!)...

If I have to start hunting for those sorts of things to get this done, it may simply not be worth the effort. I don't have the time or, more importantly, the expertise to range that far and wide!

Rodrigo

P.S. Also FWIW, I'm attaching the new clinfo (renamed).
Attached Files
File Type: txt clinfonew.txt (9.5 KB, 179 views)

Last fiddled with by Rodrigo on 2012-12-24 at 06:44 Reason: added attachment
Rodrigo is offline   Reply With Quote
Old 2012-12-24, 09:03   #603
Ralf Recker
 
Ralf Recker's Avatar
 
Oct 2010

19110 Posts
Default

Quote:
Originally Posted by Rodrigo View Post
I'm not sure what else needs to be done, beyond "installing" the SDK.
http://software.intel.com/en-us/forums/topic/277886

Quote:
Currently Processor Graphics OCL device in unavailable in the "headless" configuration (without a monitor plugged in).
Ralf Recker is offline   Reply With Quote
Old 2012-12-26, 05:15   #604
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

2×467 Posts
Default

Quote:
Originally Posted by Ralf Recker View Post
Hmm, that might be a problem. This is a brand-new HP system and the iGPU video-out ports are covered with a bracket that reads, "Do Not Remove." In spite of the oddball screws, I wouldn't have an objection to removing the brackets, except for the possibility that doing that might void the warranty. I will have to look into that before proceeding.

Thanks for the warning.

Rodrigo

Last fiddled with by Rodrigo on 2012-12-26 at 05:34 Reason: typo
Rodrigo is offline   Reply With Quote
Old 2012-12-26, 05:34   #605
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

16468 Posts
Default

Another question, about using mfakto on the GPU along with Prime95 on the CPU:

Is there any way to tell these two programs to use specific cores of the i7 3770? The reason is that yesterday I was using two CPU cores to do LLs while a third core was supporting mfakto, and everything was running smoothly. The time/class for mfakto was at 2.xxx seconds and one exponent was taking about 45 minutes to finish.

Then, I discovered a manually reserved LL that I had forgotten about for 174 days, so I decided to add it to Prime95 in a third worker window. But now, the time/class for mfakto is over 4.xxx seconds. (And yet, according to GPU-Z, the GPU load is at 1%. ) The per-iteration times for the original two Prime95 worker windows have gone up from 0.020 and 0.019 to 0.025 and 0.026, respectively. Evidently, mfakto and Prime95 are stepping on each other.

A further complication is that when I selected CPUs 2 and 4 for Prime95, according to Task Manager (Windows 7) there are 8 available threads (or whatever the right designation is for that) in the quad-core system, and it was the second and the fourth of these eight that were busy, so I am not sure if Prime95 was actually using the second and fourth cores, or merely the second halves of the first two cores. (Have I put my question clearly enough?) Now with the third worker operating and mfakto doing it thing, I've ended up with five of the eight threads running at or near 100%. I would have guessed three for each of the Prime95 workers, and one for mfakto (3 + 1 = 4, not 5).

FWIW, that third "emergency" LL is set to "Smart Assignment" CPU selection. When I had it selected to CPU 3, the per-iteration times on the other two shot up to 0.035 and 0.040. I ended up doing Smart Assignment, with ThreadsPerTest set at 2.

Bottom line: I would like to learn how to tell Prime95 to use (say) the first three physical cores (only), and mfakto to use the last core (only). And this while using just one thread, not two, per Prime95 worker.

Suggestions are very welcome...

Rodrigo

Last fiddled with by Rodrigo on 2012-12-26 at 05:42 Reason: additional info
Rodrigo is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2718 2021-07-06 18:30
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 07:37.


Mon Aug 2 07:37:56 UTC 2021 up 10 days, 2:06, 0 users, load averages: 1.08, 1.38, 1.39

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.