![]() |
[QUOTE=tServo;499015]kriesel,
I'm going to install Msys2 on an old machine that has win10 in the next day or 2 ( or 3 ) as time permits to see if I can duplicate your problem. I have already installed AMD SDK 3.0 so next is actual Msys2 itself. Maybe between the 2 of us we can either figure this out or decide that the latest Msys2 is hopelessy messed up. If I get the same results, as you I may try an older Msys2 and/or older GCC.[/QUOTE] Wow, thanks for jumping in. Meanwhile I can still run existing executables including gpuOwL V3.8 and have many other projects queued. Second attachment of [URL]https://www.mersenneforum.org/showpost.php?p=488535&postcount=2[/URL] has been updated with more gpuOwL V3.9 timings, showing a varying but consistent speed advantage for V3.5-3.8 over V3.9. I saw a post earlier by SELROC about having difficulty compiling clLucas. (Presumably that's on native linux, based on his prior posts.) I think it's a generic problem, that static code and the evolving compilation tools drift into incompatibility over time. It could also be operator error here. Please keep good track of how you build the build environment. If/when you get a successful gpuowl build, please also document the compile/link steps thoroughly. |
[QUOTE=kriesel;499017]Wow, thanks for jumping in.
Meanwhile I can still run existing executables including gpuOwL V3.8 and have many other projects queued. Second attachment of [URL]https://www.mersenneforum.org/showpost.php?p=488535&postcount=2[/URL] has been updated with more gpuOwL V3.9 timings, showing a varying but consistent speed advantage for V3.5-3.8 over V3.9. I saw a post earlier by SELROC about having difficulty compiling clLucas. (Presumably that's on native linux, based on his prior posts.) I think it's a generic problem, that static code and the evolving compilation tools drift into incompatibility over time. It could also be operator error here. Please keep good track of how you build the build environment. If/when you get a successful build, please also document the compile/link steps thoroughly.[/QUOTE] Sorry about the lack of detail, I was trying on Debian 9 with the latest GCC v.8 |
kriesel,
i shall definitely keep track of the steps I used to install. I just compiled hello.c on msys2/mingw64 to hello.exe, moved it to a windows dir, and it executed so tomorrow I will try some more significant stuff. It's late now and I'll just make mistakes. However two small things I noticed: (1) In some of the install documentation for Msys2, particularly on Github, they screw up a directory. Quite often they will specify c::\dev\msys2 when they should say c:\msys2.\dev. This error is so blatant that it's easy to catch, tho. (2) In kracker's instructions in post #356, step #3, he says to copy directory: \Program Files (x86)\AMD APP SDK\3.0\include to an msys2 directory. This AMDSDK directory only has 2 directories so no files will be copied unless you use xcopy, which will preserve the directory structure if you use /S ---or--- you cd into each directory and copy its contents individually without worrying about directory structure into the named target. I did both, just in case. I just looked at the opencl.h file and it looks like the proper directory structure is required, so use xcopy /S You've probably already noticed this stuff, tho. I'll see what tomorrow brings. |
[QUOTE=tServo;499074]kriesel,
i shall definitely keep track of the steps I used to install. I just compiled hello.c on msys2/mingw64 to hello.exe, moved it to a windows dir, and it executed so tomorrow I will try some more significant stuff. It's late now and I'll just make mistakes. However two small things I noticed: (1) In some of the install documentation for Msys2, particularly on Github, they screw up a directory. Quite often they will specify c::\dev\msys2 when they should say c:\msys2.\dev. This error is so blatant that it's easy to catch, tho. (2) In kracker's instructions in post #356, step #3, he says to copy directory: \Program Files (x86)\AMD APP SDK\3.0\include to an msys2 directory. This AMDSDK directory only has 2 directories so no files will be copied unless you use xcopy, which will preserve the directory structure if you use /S ---or--- you cd into each directory and copy its contents individually without worrying about directory structure into the named target. I did both, just in case. I just looked at the opencl.h file and it looks like the proper directory structure is required, so use xcopy /S You've probably already noticed this stuff, tho. I'll see what tomorrow brings.[/QUOTE] ====== correction ============= in #1 above, the msys directory is msys64, NOT mysy2. sorry |
kriesel,
Bad news. Mine acts the same way. I've only tested 1 src version. I will test earlier ones next. I will keep working on it. |
[QUOTE=preda;498831]Ok, I was just trying to establish if there is anything in 4.x code (new relative to 3.8), that is causing the problems that you see. If there is, it's my burden to fix that.
If OTOH the problems are caused by something related e.g. to msys2, then I don't need to fix that. Checking out 3.8 to attempt build in your current environment should be relatively easy. If that doesn't work, than it's on msys2 or environment.[/QUOTE] [url]https://github.com/msoos/amdmiscompile[/url] |
[QUOTE=tServo;499144]kriesel,
Bad news. Mine acts the same way. I've only tested 1 src version. I will test earlier ones next. I will keep working on it.[/QUOTE] That's sort of a good news / bad news outcome. Good news: I'm not proven crazy or proven incompetent in this way. Bad news: There's apparently a real reproducible problem. Going in the forward update direction: [CODE]Msys2 updates via pacman Oct 30: first round, bash and tty second round, Packages (28) brotli-1.0.7-1 curl-7.61.1-1 file-5.35-1 gdbm-1.18.1-1 git-2.19.1-1 gnupg-2.2.10-1 libcurl-7.61.1-1 libgcrypt-1.8.4-1 libgdbm-1.18.1-1 libgpgme-1.12.0-1 liblz4-1.8.3-1 libnghttp2-1.34.0-1 libp11-kit-0.23.14-1 libpcre2_8-10.32-1 libunrar-5.6.8-1 libunrar-devel-5.6.8-1 mingw-w64-x86_64-crt-git-7.0.0.5255.83bdce54-1 mingw-w64-x86_64-headers-git-7.0.0.5255.83bdce54-1 mingw-w64-x86_64-libwinpthread-git-7.0.0.5253.37101e0b-1 mingw-w64-x86_64-winpthreads-git-7.0.0.5253.37101e0b-1 openssh-7.9p1-1 p11-kit-0.23.14-1 perl-Error-0.17027-1 perl-IO-Socket-SSL-2.060-1 perl-YAML-Syck-1.31-1 perl-libwww-6.36-1 unrar-5.6.8-1 vim-8.1.0500-1 (up to date after two rounds) gpuOwl V3.8 build still had issues after this Oct 31: round one bison-3.2-1 (up to date after one round)[/CODE] |
I've got it FIXED !!!
One only needs to add the -static keyword to the end of the makefile to correct the problems.
I first got a zip snapshot I saved from September to work. I googled that __imp_ keyword in the error message and found 2 references that it mentioned static libraries. I then remember kracker's instructions in post #356 ( thanks, kracker ! ) that used a static link. I did the link manually as per those instructions and POOF! success. However, another problem has arisen. THREADS ! This is one thing that simply does not translate from Linux to Windoze. The most recent versions ( October ? ) use them. I have found a wrapper in a msys library and will try that. ---Marv |
BTW,
I didn't get any errors during compile or link, but got an assertion when it started running. I am now investigating library winpthreads to see if that fixes the thread problem. |
gpuowl 4.7-5b01b65
[QUOTE=tServo;499176]One only needs to add the -static keyword to the end of the makefile to correct the problems.
I first got a zip snapshot I saved from September to work. I googled that __imp_ keyword in the error message and found 2 references that it mentioned static libraries. I then remember kracker's instructions in post #356 ( thanks, kracker ! ) that used a static link. I did the link manually as per those instructions and POOF! success. However, another problem has arisen. THREADS ! This is one thing that simply does not translate from Linux to Windoze. The most recent versions ( October ? ) use them. I have found a wrapper in a msys library and will try that. ---Marv[/QUOTE] Thanks! I had tried -static recently, but maybe something else was broken at the time in my compile environment. [CODE]ken@condorella MINGW64 ~/gpuowl-compile/v3.8 $ make openowl-notf g++ -DREV=\"91c52fa\" -std=c++14 OpenGpu.cpp NoTF.cpp clwrap.cpp common.cpp gpuowl.cpp -o openowl -lOpenCL -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -static [/CODE]Modified make of gpuowl v3.8-91c52fa succeeded, with pacman updated to current today. (Exe untested.) Continuing on: [CODE] ken@condorella MINGW64 ~/gpuowl-compile/v3.8 $ cd ../v4.7 ken@condorella MINGW64 ~/gpuowl-compile/v4.7 $ make openowl g++ -std=c++17 -O2 -DREV=\"5b01b65\" -Wall Worktodo.cpp Result.cpp common.cpp gpuowl.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp GCD.cpp Primes.cpp Stats.cpp state.cpp Signal.cpp -o openowl -lOpenCL -lgmp -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L. -static Gpu.cpp: In function 'PRPState loadPRP(Gpu*, u32, u32, u32)': Gpu.cpp:564:7: warning: unknown conversion type character 'l' in format [-Wformat=] log("%s loaded: %d/%d, B1 %u, blockSize %d, %016llx (expected %016llx)\n", ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Gpu.cpp:564:7: warning: unknown conversion type character 'l' in format [-Wformat=] Gpu.cpp:564:7: warning: too many arguments for format [-Wformat-extra-args] Gpu.cpp: In member function 'PRPResult Gpu::isPrimePRP(u32, const Args&, u32, u32)': Gpu.cpp:710:11: warning: unknown conversion type character 'l' in format [-Wformat=] log("%s %8d / %d, %016llx (base %016llx)\n", isPrime ? "PP" : "CC", kEnd, E, finalRes64, baseRes64); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Gpu.cpp:710:11: warning: unknown conversion type character 'l' in format [-Wformat=] Gpu.cpp:710:11: warning: too many arguments for format [-Wformat-extra-args] In file included from Task.cpp:4: OpenTF.h: In member function 'virtual std::__cxx11::string OpenTF::findFactor(u32, u32, u32, u32, u32, u64*, u64*, bool)': OpenTF.h:209:9: warning: unknown conversion type character 'l' in format [-Wformat=] log("TF %u %u-%u, K %llu - %llu, %dx%d + 1x%d groups, start from class #%u\n", ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ OpenTF.h:209:9: warning: unknown conversion type character 'l' in format [-Wformat=] OpenTF.h:209:9: warning: format '%d' expects argument of type 'int', but argument 5 has type 'u64' {aka 'long long unsigned int'} [-Wformat=] OpenTF.h:209:9: warning: format '%d' expects argument of type 'int', but argument 6 has type 'u64' {aka 'long long unsigned int'} [-Wformat=] OpenTF.h:209:9: warning: too many arguments for format [-Wformat-extra-args] OpenTF.h:243:11: warning: unknown conversion type character 'l' in format [-Wformat=] log("TF %u %d-%d %.2f%%, class %4d (%4d), %.3fs (%.0f GHz), ETA %dd %02d:%02d, FCs %llu (%.4f%%)\n", ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ OpenTF.h:243:11: warning: format '%f' expects argument of type 'double', but argument 13 has type 'u64' {aka 'long long unsigned int'} [-Wformat=] OpenTF.h:243:11: warning: too many arguments for format [-Wformat-extra-args] [/CODE]GpuOwL built with modified make, with warnings. Tip for testing or running multiple versions in a common test directory; gpuowl.cl and shared.h is specific to the gpuowl version and the right version must have that name for the gpuowl executable to be run. Rename gpuowl executables to include version and commit number. Rename gpuowl.cl and shared.h versions to identify version. Copy the ones you need active, to gpuowl.cl and shared.h. This can be automated with a batch file. This allows testing different versions rapidly in turn with the same set of worktodo and exponents in progress, as along as they are compatible, which as I recall, V1.9 or lower to 3.9 are (not sure about 4.x). I copied a bare minimum of work in progress and a worktodo file into the compile directory and gave the resulting v4.7 build executable a shot, without taking any time re setting up differently for P-1. The attachment shows the run result. Assertion failed errors occur if the shared.h or gpuowl.cl versions don't match. These did match. Nothing relevant in README.md. Nothing relevant to P-1 in the program's -h output. [CODE]C:\msys64\home\ken\gpuowl-compile\v4.7>openowl-v4.7-5b01b65-w64 -h 2018-10-31 13:47:52 gpuowl 4.7-5b01b65 Command line options: -user <name> : specify the user name. -cpu <name> : specify the hardware name. -time : display kernel profiling information. -fft <size> : specify FFT size, such as: 5000K, 4M, +2, -1. -block <value> : PRP GEC block size. Default 400. Smaller block is slower but detects errors sooner. -carry long|short : force carry type. Short carry may be faster, but requires high bits/word. -list fft : display a list of available FFT configurations. -tf <bit-offset> : enable auto trial factoring before PRP. Pass 0 to bit-offset for default TF depth. -device <N> : select a specific device: 0 : Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 1 : gfx804-8x1203-@3:0.0 Radeon 550 Series 2 : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz-12x2394-@0:0.0[/CODE]Try again after reading P-1 related posts. Meanwhile Preda can consider himself contacted. Further experimentation gives results that are consistent with v4.7 not handling well continuation from a v3.8 PRP run, but being capable of starting an exponent with P-1 stage 1 on Windows. Thanks again tServo. I am now relatively sure I had forgotten to do the amd app sdk 3.0 subdirectory copies at some point. [CODE]C:\msys64\home\ken\gpuowl-compile\v4.7>openowl-v4.7-5b01b65-w64 2018-10-31 14:21:39 gpuowl 4.7-5b01b65 2018-10-31 14:21:39 FFT 4608K: Width 512 (64x8), Height 512 (64x8), Middle 9; 17.77 bits/word 2018-10-31 14:21:39 Note: using short carry kernels 2018-10-31 14:21:40 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 2018-10-31 14:21:43 OpenCL compilation in 3416 ms, with "-DEXP=83872127u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0 " 2018-10-31 14:21:44 PRP M(83872127), FFT 4608K, 17.77 bits/word, B1 2000000, B2 40000000 2018-10-31 14:21:44 B1 mismatch: using B1=0 from from savefile 2018-10-31 14:21:47 OK loaded: 59221500/83872127, B1 0, blockSize 500, fea499db2f2de654 (expected fea499db2f2de654) 2018-10-31 14:21:47 B1 mismatch 2000000 0 2018-10-31 14:21:53 Exiting because "B1 mismatch" 2018-10-31 14:21:53 Bye C:\msys64\home\ken\gpuowl-compile\v4.7>openowl-v4.7-5b01b65-w64 2018-10-31 14:25:33 gpuowl 4.7-5b01b65 2018-10-31 14:25:33 FFT 4608K: Width 512 (64x8), Height 512 (64x8), Middle 9; 17.77 bits/word 2018-10-31 14:25:33 Note: using short carry kernels 2018-10-31 14:25:35 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 2018-10-31 14:25:38 OpenCL compilation in 3400 ms, with "-DEXP=83872127u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0 " 2018-10-31 14:25:39 PRP M(83872127), FFT 4608K, 17.77 bits/word, B1 0, B2 0 2018-10-31 14:25:42 OK loaded: 59221500/83872127, B1 0, blockSize 500, fea499db2f2de654 (expected fea499db2f2de654) 2018-10-31 14:25:42 Selected 0 P-1 trial points Assertion failed! Program: C:\msys64\home\ken\gpuowl-compile\v4.7\openowl-v4.7-5b01b65-W64.exe File: Signal.cpp, Line 14 Expression: oldHandler This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. C:\msys64\home\ken\gpuowl-compile\v4.7>openowl-v4.7-5b01b65-w64 2018-10-31 14:26:28 gpuowl 4.7-5b01b65 2018-10-31 14:26:28 FFT 9216K: Width 1024 (256x4), Height 512 (64x8), Middle 9; 16.11 bits/word 2018-10-31 14:26:28 Note: using short carry kernels 2018-10-31 14:26:29 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics 2018-10-31 14:26:32 OpenCL compilation in 2964 ms, with "-DEXP=152000249u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0 " 2018-10-31 14:26:34 PRP M(152000249), FFT 9216K, 16.11 bits/word, B1 2000000, B2 40000000 2018-10-31 14:27:53 10000/2885604 [ 0.35%], 7.89 ms/it [7.87, 7.90]; ETA 0d 06:18; 1d8b41680a978ace 2018-10-31 14:29:13 20000/2885604 [ 0.69%], 7.97 ms/it [7.88, 8.74]; ETA 0d 06:21; 554d1f5529375793 2018-10-31 14:30:31 30000/2885604 [ 1.04%], 7.88 ms/it [7.86, 7.89]; ETA 0d 06:15; f0ba0c0e5dfb2726 2018-10-31 14:31:51 40000/2885604 [ 1.39%], 7.96 ms/it [7.87, 8.74]; ETA 0d 06:17; b9fcfd476e29c72d 2018-10-31 14:33:10 50000/2885604 [ 1.73%], 7.87 ms/it [7.86, 7.88]; ETA 0d 06:12; 11dd99c3a3b06551 2018-10-31 14:34:29 60000/2885604 [ 2.08%], 7.96 ms/it [7.87, 8.74]; ETA 0d 06:15; 25a1cfd3c8019093 [/CODE]But no save and resume during P-1 stage 1. Stop before 6+ hours, and it restarts from zero. Stage one run time estimate seems to compare favorably to curve fits for CUDAPm1 V0.20 on an NVIDIA GTX1060, a gpu comparable in TF speed to the RX480. |
gpuowl 4.7-5b01b65
1 Attachment(s)
Windows executable in 7z file, built on Windows 7 x64. See preceding post [URL]https://www.mersenneforum.org/showpost.php?p=499186&postcount=791[/URL]
which indicates the meager extent of my testing of it so far. |
| All times are UTC. The time now is 23:09. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.