![]() |
Used this mfakto source: [url]https://github.com/Bdot42/mfakto[/url].
Is it possible to run mfakto on Intel HD 530 gpu ? The most important issue. So far I had no success, even for cpu version it was a little challenging to compile (used Visual Studio 2017 [community version] with win10 home edition for this, but I have also Ubuntu; and only this slow GPU on this desktop pc). The GPU is really working with OpenCL, I have successful run on this non-trivial code: [url]https://gist.github.com/ddemidov/2925717[/url] . At compiling there were 2 errors, one of them at mfakto.cpp line 855 (even you could simply delete the whole line, that was only a cerr), the other was similar to this. And replaced mfakto.cpp 738th line with: strcat(program_options, " -I src ");// -O3 so we haven't used the optimization flag, and included also a folder. Noticed also that the taskbar has shown the build as a 32 bits build (why??), and as you can see it has 6.5 GHzdays speed. (the usage was high, close to 100%). Do you see anything suspicious? [CODE] mfakto 0.15pre6-Win (64bit build) Runtime options Inifile mfakto.ini Verbosity 3 SieveOnGPU yes MoreClasses yes GPUSievePrimes 81157 GPUSieveProcessSize 24Ki bits GPUSieveSize 96Mi bits FlushInterval 0 WorkFile worktodo.txt ResultsFile results.txt Checkpoints enabled CheckpointDelay 300s Stages enabled StopAfterFactor class PrintMode compact V5UserID none ComputerID none ProgressHeader "Date Time | class Pct | time ETA | GHz-d/day Sieve Wait" ProgressFormat "%d %T | %C %p%% | %t %e | %g %s %W%%" TimeStampInResults yes VectorSize 2 GPUType AUTO SmallExp no UseBinfile mfakto_Kernels.elf Compiletime options Select device - GPU not found, fallback to CPU. Get device info: Device 1/1: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (Intel(R) Corporation), device version: OpenCL 2.1 (Build 10), driver version: 7.0.0.2567 Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_dx9_media_sharing cl_intel_dx9_media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing cl_khr_fp64 cl_khr_image2d_from_buffer Global memory:17048297472, Global memory cache: 262144, local memory: 32768, workgroup size: 8192, Work dimensions: 3[8192, 8192, 8192, 0, 0] , Max clock speed:3400, compute units:8 OpenCL device info name Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (Intel(R) Corporation) device (driver) version OpenCL 2.1 (Build 10) (7.0.0.2567) maximum threads per block 8192 maximum threads per grid 0 number of multiprocessors 8 (8 compute elements) clock rate 3400MHz Automatic parameters threads per grid 0 optimizing kernels for CPU Loading binary kernel file mfakto_Kernels.elf Compiling kernels (build options: "-I. -DVECTOR_SIZE=2 -DCPU -I src -DMORE_CLASSES -DCL_GPU_SIEVE"). BUILD OUTPUT Device build started Device build done Reload Program Binary Object. END OF BUILD OUTPUT Error 0 (Success): clBuildProgram GPUSievePrimes (adjusted) 81206 GPUsieve minimum exponent 1037054 Started a simple selftest ... Selftest statistics number of tests 30 successful tests 30 selftest PASSED! got assignment: exp=3321932839 bit_min=75 bit_max=76 (2.30 GHz-days) Starting trial factoring M3321932839 from 2^75 to 2^76 (2.30GHz-days) k_min = 5686287725580 - k_max = 11372575453491 Using GPU kernel "cl_barrett32_77_gs_2" Found a valid checkpoint file. last finished class was: 92 found 0 factors already RES (32): <32 x 0> <30 x 0 at the end> Date Time | class Pct | time ETA | GHz-d/day Sieve Wait RES (32): <32 x 0> 2.1% | 31.804 8h18m | 6.52 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 2.2% | 32.015 8h21m | 6.48 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 2.3% | 31.526 8h12m | 6.58 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 2.4% | 32.096 8h21m | 6.46 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 2.5% | 31.868 8h17m | 6.51 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 2.6% | 31.999 8h18m | 6.48 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 2.7% | 31.979 8h17m | 6.48 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 2.8% | 32.132 8h19m | 6.45 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 2.9% | 31.598 8h10m | 6.56 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 3.0% | 31.559 8h09m | 6.57 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 3.1% | 32.066 8h17m | 6.47 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 3.2% | 31.709 8h10m | 6.54 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 3.3% | 31.563 8h08m | 6.57 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 3.4% | 31.966 8h13m | 6.49 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 3.5% | 31.824 8h11m | 6.51 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 3.6% | 31.746 8h09m | 6.53 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 3.8% | 31.826 8h10m | 6.51 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 3.9% | 31.730 8h08m | 6.53 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 4.0% | 31.648 8h06m | 6.55 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 4.1% | 31.709 8h06m | 6.54 81206 0.00% <30 x 0 at the end> RES (32): <32 x 0> 4.2% | 31.641 8h05m | 6.55 81206 0.00% <30 x 0 at the end> [/CODE] |
[QUOTE=R. Gerbicz;504628]Used this mfakto source: [URL]https://github.com/Bdot42/mfakto[/URL].
Is it possible to run mfakto on Intel HD 530 gpu ? The most important issue. So far I had no success, even for cpu version it was a little challenging to compile (used Visual Studio 2017 [community version] with win10 home edition for this, but I have also Ubuntu; and only this slow GPU on this desktop pc). The GPU is really working with OpenCL, I have successful run on this non-trivial code: [URL]https://gist.github.com/ddemidov/2925717[/URL] . [/QUOTE] I don't have access to an HD530. The precompiled executables available at James Heinrich's mirror site have worked for me, on Windows, on RX480, RX550, HD620, and UHD630. [URL]https://download.mersenne.ca/[/URL] (get mfakto 0.15pre6) The IGP and cpu are different devices as far as OpenCl is concerned and should be selectable by different device numbers on the command line, eg mfakto -d 11 >>mfakto.txt . You might want to run opencl-z.exe on Windows to check it lists the IGP, otherwise the required Intel OpenCl driver may be missing or something. [URL]https://sourceforge.net/projects/opencl-z/[/URL] When I see "gpu not found, falling back to cpu" it's generally an issue with device identification or the opencl driver for the gpu. |
[QUOTE=kriesel;504640]
When I see "gpu not found, falling back to cpu" it's generally an issue with device identification or the opencl driver for the gpu.[/QUOTE] Thanks, that was the problem (old driver), I can get roughly 19GHzdays on GPU. |
@kriesel and @SELROC, I checked the system up and down, inside and out. There was nothing obvious going on that would account for the stuttering and choppiness: CPU and RAM usage were normal (0-2% CPU when Prime95 not running), disk (SSD) is healthy, all hardware tests came back fine.
So I reverted to an earlier GPU driver for the 7770 (12.104.0.0). With MFAKTO running, now I can move the mouse around and open program windows with less delay or hesitation. The hesitation is not completely gone, but at least I can track the mouse pointer more accurately. I think there's something about the newest (or all newer?) AMD drivers for this card that led to the problem. However, the situation is not back to where it had been previously, where there was no perceptible delay even with MFAKTO running. With the older driver, TF throughput is somewhat higher (134-135 GHz-days/day, vs. 130-132 with the newer driver). |
[QUOTE=Rodrigo;504710]@kriesel and @SELROC, I checked the system up and down, inside and out. There was nothing obvious going on that would account for the stuttering and choppiness: CPU and RAM usage were normal (0-2% CPU when Prime95 not running), disk (SSD) is healthy, all hardware tests came back fine.
So I reverted to an earlier GPU driver for the 7770 (12.104.0.0). With MFAKTO running, now I can move the mouse around and open program windows with less delay or hesitation. The hesitation is not completely gone, but at least I can track the mouse pointer more accurately. I think there's something about the newest (or all newer?) AMD drivers for this card that led to the problem. However, the situation is not back to where it had been previously, where there was no perceptible delay even with MFAKTO running. With the older driver, TF throughput is somewhat higher (134-135 GHz-days/day, vs. 130-132 with the newer driver).[/QUOTE] Glad to read you got it sorted out. In gpuowl, I've seen later releases require a driver upgrade, and diminished performance with the new driver on a previous version (by up to 5% reduction) |
[QUOTE=kriesel;504726]Glad to read you got it sorted out.
In gpuowl, I've seen later releases require a driver upgrade, and diminished performance with the new driver on a previous version (by up to 5% reduction)[/QUOTE] You use the same GPU for both computations and rendering...you could do another test...move the mouse quickly open programs, and read out the ms/it or ms/sq from gpuowl or mfakto, if they interfere with each other you should see it the ms/it changing a few. |
[QUOTE=SELROC;504744]You use the same GPU for both computations and rendering...you could do another test...move the mouse quickly open programs, and read out the ms/it or ms/sq from gpuowl or mfakto, if they interfere with each other you should see it the ms/it changing a few.[/QUOTE]
The 5% reduction was seen in V1.9 gpuowl, after upgrading the driver to allow running V2.0. The measurements were made with as little user activity as practical. It was also likely done via a remote desktop session. [URL]https://www.mersenneforum.org/showpost.php?p=484694&postcount=370[/URL] A later driver update was also slowing, but by a fraction of a percent. Two experiments were made today to check to what extent user activity via remote desktop affects gpu computing throughput, with some fairly maximal user activity. a) click and continuously drag about the screen, a gpuowl console box, for a few minutes, while two gpus run gpuowl and mfakto. Note start time of dragging, by gpuowl timestamp of screen output. Check ms/iter in gpuowl, ghzd/day numbers in mfakto for that time period. Check also prime95 iteration times in its 4 workers. Result: gpu throughput change if any is within the normal fluctuation (7.65-7.69 ms/it in gpuowl 3.8; 91.49 to 91.63 ghzD/d in mfakto). Prime95 worker one iteration time ~37.2 ms/iter increased to 38.5 (~3.5% higher) during 8:06-8:10 local time high user remote console activity period; other prime95 workers timings remain within the 1% typical fluctuation. b) halt gpuowl and mfakto, verify gpu utilization has gone to zero in gpu-z sessions for both gpus. Resume gpuowl box drag, while watching gpu-z sessions for both gpus for 30 seconds (several gpu-z update periods). Result: gpu-z indicated utilization remains zero on both gpus. |
[QUOTE=kriesel;504827]The 5% reduction was seen in V1.9 gpuowl, after upgrading the driver to allow running V2.0. The measurements were made with as little user activity as practical. It was also likely done via a remote desktop session. [URL]https://www.mersenneforum.org/showpost.php?p=484694&postcount=370[/URL]
A later driver update was also slowing, but by a fraction of a percent. Two experiments were made today to check to what extent user activity via remote desktop affects gpu computing throughput, with some fairly maximal user activity. a) click and continuously drag about the screen, a gpuowl console box, for a few minutes, while two gpus run gpuowl and mfakto. Note start time of dragging, by gpuowl timestamp of screen output. Check ms/iter in gpuowl, ghzd/day numbers in mfakto for that time period. Check also prime95 iteration times in its 4 workers. Result: gpu throughput change if any is within the normal fluctuation (7.65-7.69 ms/it in gpuowl 3.8; 91.49 to 91.63 ghzD/d in mfakto). Prime95 worker one iteration time ~37.2 ms/iter increased to 38.5 (~3.5% higher) during 8:06-8:10 local time high user remote console activity period; other prime95 workers timings remain within the 1% typical fluctuation. b) halt gpuowl and mfakto, verify gpu utilization has gone to zero in gpu-z sessions for both gpus. Resume gpuowl box drag, while watching gpu-z sessions for both gpus for 30 seconds (several gpu-z update periods). Result: gpu-z indicated utilization remains zero on both gpus.[/QUOTE] May be moving the mouse and clicking is not enough, try to run Quake 3 while you run mfakto or gpuowl :-) |
Hm, reading the code and still don't know where do you sieve by p>11; confirmed that it is in SegSieve in gpusieve.cl, but as I can see we don't even call this in the runs:
say replacing line 1296 by big_bit_array32[j * threadsPerBlock + get_local_id(0)]=0; (this would eliminate all k values) or placing Visual Studio's breakpoints in the first line in SegSieve. Or even using [CODE] #define TRACE_SIEVE_KERNEL 5 // If above tracing is on, only the thread with the ID below will trace #define TRACE_SIEVE_TID 2 [/CODE] results we pass the selftest and we're seeing no break, no additonal debug info. Furthermore modifying the default GPUSievePrimes=81157 we get different times, suggesting that we're really sieving, but where? One more thing that I've also seen on this forum, that at run it is displaying that the automatic parameter for threads per grid is 0: [CODE] OpenCL device info name Intel(R) HD Graphics 530 (Intel(R) Corporation) device (driver) version OpenCL 2.1 NEO (25.20.100.6471) maximum threads per block 256 maximum threads per grid 16777216 number of multiprocessors 24 (24 compute elements) clock rate 1150MHz Automatic parameters threads per grid 0 optimizing kernels for INTEL [/CODE] Still I've some ideas (not that many) to improve the current code but without understanding the basics of the code it is somewhat hard. |
[QUOTE=SELROC;504830]May be moving the mouse and clicking is not enough, try to run Quake 3 while you run mfakto or gpuowl :-)[/QUOTE]
"Moving the mouse and click" is not what I did. Click on the gpuowl console window and drag the whole thing around the screen in an elliptical pattern, with full display (not just outline box) while dragging; hundreds of thousands of pixels being updated at frame rate would have been the impact if doing it on the local display & gpu. I regularly get 5-15% gpu utilization on one gpu when doing that very same thing on the local display. It's one of the more drastic demands an interactive user might put on a Mersenne compute system's display as a display activity. It's the demanding opposite of how to treat a system running a benchmark after changing drivers to compare their performance. The experiment showed that the display workload shifts from the gpu on the host system, to its cpu running the rdp server service, and to the remote desktop client's gpu, when using remote desktop. Hence, normal low impact user activity to account for the difference in driver performance as you suggested some posts back seems to be ruled out. |
[QUOTE=kriesel;504843]"Moving the mouse and click" is not what I did. Click on the gpuowl console window and drag the whole thing around the screen in an elliptical pattern, with full display (not just outline box) while dragging; hundreds of thousands of pixels being updated at frame rate would have been the impact if doing it on the local display & gpu. I regularly get 5-15% gpu utilization on one gpu when doing that very same thing on the local display. It's one of the more drastic demands an interactive user might put on a Mersenne compute system's display as a display activity. It's the demanding opposite of how to treat a system running a benchmark after changing drivers to compare their performance.
The experiment showed that the display workload shifts from the gpu on the host system, to its cpu running the rdp server service, and to the remote desktop client's gpu, when using remote desktop. Hence, normal low impact user activity to account for the difference in driver performance as you suggested some posts back seems to be ruled out.[/QUOTE] You are still reading old messages, but you didn't run Quake 3 :-) For what is worth I don't trust windows as a secure and stable operating system. Sorry, it is just personal experience. |
| All times are UTC. The time now is 22:30. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.