mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

kriesel 2018-10-31 20:07

[QUOTE=tServo;499176]One only needs to add the -static keyword to the end of the makefile to correct the problems.
I first got a zip snapshot I saved from September to work.
I googled that __imp_ keyword in the error message and found 2 references that
it mentioned static libraries.
I then remember kracker's instructions in post #356 ( thanks, kracker ! ) that used a static link.
I did the link manually as per those instructions and POOF! success.

However, another problem has arisen.

THREADS !

This is one thing that simply does not translate from Linux to Windoze.
The most recent versions ( October ? ) use them.
I have found a wrapper in a msys library and will try that.


---Marv[/QUOTE]
There's also Victor's post way back. [URL]https://www.mersenneforum.org/showpost.php?p=457343&postcount=26[/URL] The clstd 1.2 comment is interesting. That's what NVIDIA's opencl identifies as.

kriesel 2018-10-31 20:17

gpuowl v4.6-bb691cb build failed with one error
 
[CODE]ken@condorella MINGW64 ~/gpuowl-compile/v4.6
$ make openowl
g++ -std=c++17 -O2 -DREV="bb691cb" -Wall Worktodo.cpp Result.cpp common.cpp gpuowl.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Args.cpp GCD.cpp Primes.cpp Stats.cpp state.cpp -o openowl -lOpenCL -lgmp -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L. -static
Gpu.cpp:19:28: error: static assertion failed: size long
static_assert(sizeof(long) == 8, "size long");
~~~~~~~~~~~~~^~~~
Gpu.cpp: In function 'PRPState loadPRP(Gpu*, u32, u32, u32)':
Gpu.cpp:569:7: warning: unknown conversion type character 'l' in format [-Wformat=]
log("%s loaded: %d/%d, B1 %u, blockSize %d, %016llx (expected %016llx)\n",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Gpu.cpp:569:7: warning: unknown conversion type character 'l' in format [-Wformat=]
Gpu.cpp:569:7: warning: too many arguments for format [-Wformat-extra-args]
Gpu.cpp: In member function 'bool Gpu::isPrimePRP(u32, const Args&, u32, u64*, u64*, std::__cxx11::string*)':
Gpu.cpp:721:11: warning: unknown conversion type character 'l' in format [-Wformat=]
log("%s %8d / %d, %016llx (base %016llx)\n", isPrime ? "PP" : "CC", kEnd, E, res64, baseRes64);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Gpu.cpp:721:11: warning: unknown conversion type character 'l' in format [-Wformat=]
Gpu.cpp:721:11: warning: too many arguments for format [-Wformat-extra-args]
In file included from Task.cpp:4:
OpenTF.h: In member function 'virtual std::__cxx11::string OpenTF::findFactor(u32, u32, u32, u32, u32, u64*, u64*, bool)':
OpenTF.h:209:9: warning: unknown conversion type character 'l' in format [-Wformat=]
log("TF %u %u-%u, K %llu - %llu, %dx%d + 1x%d groups, start from class #%u\n",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OpenTF.h:209:9: warning: unknown conversion type character 'l' in format [-Wformat=]
OpenTF.h:209:9: warning: format '%d' expects argument of type 'int', but argument 5 has type 'u64' {aka 'long long unsigned int'} [-Wformat=]
OpenTF.h:209:9: warning: format '%d' expects argument of type 'int', but argument 6 has type 'u64' {aka 'long long unsigned int'} [-Wformat=]
OpenTF.h:209:9: warning: too many arguments for format [-Wformat-extra-args]
OpenTF.h:243:11: warning: unknown conversion type character 'l' in format [-Wformat=]
log("TF %u %d-%d %.2f%%, class %4d (%4d), %.3fs (%.0f GHz), ETA %dd %02d:%02d, FCs %llu (%.4f%%)\n",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OpenTF.h:243:11: warning: format '%f' expects argument of type 'double', but argument 13 has type 'u64' {aka 'long long unsigned int'} [-Wformat=]
OpenTF.h:243:11: warning: too many arguments for format [-Wformat-extra-args]
make: *** [Makefile:10: openowl] Error 1

ken@condorella MINGW64 ~/gpuowl-compile/v4.6
$
[/CODE]

kriesel 2018-10-31 20:23

gpuowl v4.3-537c681 build attempt
 
[CODE]ken@condorella MINGW64 ~/gpuowl-compile/v4.3
$ make openowl
g++ -std=c++17 -O2 -DREV=\"537c681\" -Wall Worktodo.cpp Result.cpp common.cpp gpuowl.cpp Gpu.cpp clwrap.cpp Task.cpp checkpoint.cpp timeutil.cpp Kset.cpp Args.cpp GCD.cpp -o openowl -lOpenCL -lgmp -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L. -static
Gpu.cpp:16:28: error: static assertion failed: size long
static_assert(sizeof(long) == 8, "size long");
~~~~~~~~~~~~~^~~~
Gpu.cpp: In function 'PRPState loadPRP(Gpu*, u32, u32, u32)':
Gpu.cpp:109:7: warning: unknown conversion type character 'l' in format [-Wformat=]
log("%s loaded: %d/%d, B1 %u, blockSize %d, %016llx (expected %016llx)\n",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Gpu.cpp:109:7: warning: unknown conversion type character 'l' in format [-Wformat=]
Gpu.cpp:109:7: warning: too many arguments for format [-Wformat-extra-args]
Gpu.cpp: In member function 'bool Gpu::isPrimePRP(u32, const Args&, u32*, u64*, u64*, std::__cxx11::string*)':
Gpu.cpp:217:11: warning: unknown conversion type character 'l' in format [-Wformat=]
log("%s %8d / %d, %016llx (base %016llx)\n", isPrime ? "PP" : "CC", kEnd, E, res64, baseRes64);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Gpu.cpp:217:11: warning: unknown conversion type character 'l' in format [-Wformat=]
Gpu.cpp:217:11: warning: too many arguments for format [-Wformat-extra-args]
In file included from Task.cpp:4:
OpenTF.h: In member function 'virtual std::__cxx11::string OpenTF::findFactor(u32, u32, u32, u32, u32, u64*, u64*, bool)':
OpenTF.h:209:9: warning: unknown conversion type character 'l' in format [-Wformat=]
log("TF %u %u-%u, K %llu - %llu, %dx%d + 1x%d groups, start from class #%u\n",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OpenTF.h:209:9: warning: unknown conversion type character 'l' in format [-Wformat=]
OpenTF.h:209:9: warning: format '%d' expects argument of type 'int', but argument 5 has type 'u64' {aka 'long long unsigned int'} [-Wformat=]
OpenTF.h:209:9: warning: format '%d' expects argument of type 'int', but argument 6 has type 'u64' {aka 'long long unsigned int'} [-Wformat=]
OpenTF.h:209:9: warning: too many arguments for format [-Wformat-extra-args]
OpenTF.h:243:11: warning: unknown conversion type character 'l' in format [-Wformat=]
log("TF %u %d-%d %.2f%%, class %4d (%4d), %.3fs (%.0f GHz), ETA %dd %02d:%02d, FCs %llu (%.4f%%)\n",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
OpenTF.h:243:11: warning: format '%f' expects argument of type 'double', but argument 13 has type 'u64' {aka 'long long unsigned int'} [-Wformat=]
OpenTF.h:243:11: warning: too many arguments for format [-Wformat-extra-args]
make: *** [Makefile:10: openowl] Error 1
[/CODE]

preda 2018-10-31 23:02

Thanks Marv and Ken for investigating the Windows build!

Ken, would you like to retry with a fresh check-out? (version 5.0)

Some know problems are hopefully fixed, in particular:
- "stage 0" of PRP-1 now supports proper save/stop/restart
- in "stage 1" of PRP-1, the GCD bits are now saved/restored, which allows speedy shut-down without GCD loss.

Plus many small changes. The log format changed a bit (again, I know). Maybe some new errors were introduced as well..

tServo 2018-10-31 23:10

I thought I had the thread problem solved by adding -lpthread to the makefile right after -lgmp.
It completed the make but pooped out with an assertion, file: signal.cpp, line 14
Expression:oldHandler

I will do more testing tomorrow.

preda 2018-10-31 23:13

[QUOTE=preda;499198]
Ken, would you like to retry with a fresh check-out? (version 5.0)

Some know problems are hopefully fixed, in particular:
- "stage 0" of PRP-1 now supports proper save/stop/restart
- in "stage 1" of PRP-1, the GCD bits are now saved/restored, which allows speedy shut-down without GCD loss.

Plus many small changes. The log format changed a bit (again, I know). Maybe some new errors were introduced as well..[/QUOTE]

In 5.0, TF was removed from the master branch (saved to a TF branch).

The multi-threading that was introduced is needed for one thing only:
running the GCD (on the CPU) in background, while continuing the GPU work as normal.

The GCD for a 332M exponent takes in my case about 4min (using 1 CPU core), and now the GCD will be done every 1M iterations.

in 5.0, also the savefile version was bumped (to accommodate the GCD & "stage0"), and 5.0 also dropped support for some older savefile versions (but should load a relatively recent 4.x savefile)

preda 2018-10-31 23:21

[QUOTE=tServo;499201]I thought I had the thread problem solved by adding -lpthread to the makefile right after -lgmp.
It completed the make but pooped out with an assertion, file: signal.cpp, line 14
Expression:oldHandler

I will do more testing tomorrow.[/QUOTE]

Thanks! that may actually be my problem. I attempted a simple fix in a commit right now (dropping those asserts).

kriesel 2018-10-31 23:53

[QUOTE=preda;499198]Thanks Marv and Ken for investigating the Windows build!

Ken, would you like to retry with a fresh check-out? (version 5.0)

Some know problems are hopefully fixed, in particular:
- "stage 0" of PRP-1 now supports proper save/stop/restart
- in "stage 1" of PRP-1, the GCD bits are now saved/restored, which allows speedy shut-down without GCD loss.

Plus many small changes. The log format changed a bit (again, I know). Maybe some new errors were introduced as well..[/QUOTE]
a) You're welcome. Doing it for my own reasons, but it seems right to share, as you and others do. The rapid development of gpuowl has been outstanding. Thank you for that.
b) I'll queue up a try at V5.0 after a few other things. I have m152000249 P-1 first stage running now.
c) Please use stage 1 for B1 and stage 2 for B2 terminology for gpuowl P-1, for consistency with mprime/prime95 and CUDAPm1. (CUDAPm1 has a placeholder comment in the source code, for "stage 3", expanding the B1 bounds. Presumably extending stage 2 would be "stage 4". Something to keep in mind for your to-do list for gpuOwL.)

tServo 2018-11-01 02:20

[QUOTE=tServo;499201]I thought I had the thread problem solved by adding -lpthread to the makefile right after -lgmp.
It completed the make but pooped out with an assertion, file: signal.cpp, line 14
Expression:oldHandler

I will do more testing tomorrow.[/QUOTE]

kriesel,

Reading the old thread from Victor you referenced, I'm thinking the correct thread library name might be "libwinpthread" .

kriesel 2018-11-01 02:28

1 Attachment(s)
[QUOTE=tServo;499176]One only needs to add the -static keyword to the end of the makefile to correct the problems.
I first got a zip snapshot I saved from September to work.
I googled that __imp_ keyword in the error message and found 2 references that
it mentioned static libraries.
I then remember kracker's instructions in post #356 ( thanks, kracker ! ) that used a static link.
I did the link manually as per those instructions and POOF! success.

However, another problem has arisen.

THREADS !

This is one thing that simply does not translate from Linux to Windoze.
The most recent versions ( October ? ) use them.
I have found a wrapper in a msys library and will try that.


---Marv[/QUOTE]
tServo, is termination at the P-1 gcd what you're talking about, here in v4.7? (looks like there was a save file written just before that, so the 6+ hours may not have been wasted)

Preda, while the status lines subtly indicate it's a P-1 stage 1 line (because of the even number of iterations total that is much less than the exponent), I feel a clear explicit indication would be useful; more easily readable to the user, and would not require as much modal behavior in a log processor.

kriesel 2018-11-01 04:07

gpuowl wishlist
 
As always, documentation.
Which versions' save files can be continued with which versions?

Some radix-3 transforms, and maybe 7 if it helps speed.
6M and 12M in particular.
It's a particularly long jump between 20M and 36M, so adding 24M or 32M or both would be good.
Similarly between 40M and 72M, 48M or 64M or both.

Nonzero offset, pseudorandom at start time.

A result output for stage one of P-1. There currently is none (at least if both B1 and B2 were specified).

Closer following of spelling and grammar. beginnig -> beginning
1 mul but 2 or more muls (justify with a space for the singular to preserve alignment)

Investigate or explain how a mul time in V5.0 can be negative or positive.[CODE]2018-10-31 22:51:10 condorella-rx480 48500017 1420000/1442145 [98.46%], 2.33 ms/it; ETA 0d 00:01; 5d31725e7ab0f7c1
2018-10-31 22:51:33 condorella-rx480 48500017 1430000/1442145 [99.16%], 2.33 ms/it; ETA 0d 00:00; 1e3bafdadb7a8d7d
2018-10-31 22:51:57 condorella-rx480 48500017 1440000/1442145 [99.85%], 2.33 ms/it; ETA 0d 00:00; d7cedafbb65603a1
2018-10-31 22:52:02 condorella-rx480 48500017.owl loaded: k 0, B1 1000000, block 400, res64 94d02e53c6bcdd1e, stage 1, baseBits 0
2018-10-31 22:52:06 condorella-rx480 48500017 B1=1000000 B2=15200000 (effective B2=15200000) selected 663110 P-1 points in 3.00s
2018-10-31 22:52:09 condorella-rx480 48500017 OK 800/48500400 [ 0.00%], 2.34 ms/it; 1 muls, -16.00 ms/mul; ETA 1d 07:31; 2e2a6aee231a86c8 (check 1.06s)
2018-10-31 22:52:30 condorella-rx480 48500017 10000/48500400 [ 0.02%], 2.33 ms/it; ETA 1d 07:27; 1c4b661aacc34be7
2018-10-31 22:52:50 condorella-rx480 48500017 GCD no factor (40.90s)
2018-10-31 22:52:54 condorella-rx480 48500017 20000/48500400 [ 0.04%], 2.32 ms/it; ETA 1d 07:17; b2cd122af52aded0
2018-10-31 22:53:17 condorella-rx480 48500017 30000/48500400 [ 0.06%], 2.32 ms/it; ETA 1d 07:17; 376d4c51cc537285
2018-10-31 22:53:40 condorella-rx480 48500017 40000/48500400 [ 0.08%], 2.32 ms/it; ETA 1d 07:13; 17928204924bff95
2018-10-31 22:54:03 condorella-rx480 48500017 50000/48500400 [ 0.10%], 2.32 ms/it; ETA 1d 07:14; a9cda67699ad5f44
2018-10-31 22:54:27 condorella-rx480 48500017 60000/48500400 [ 0.12%], 2.32 ms/it; ETA 1d 07:15; 64c25149478de6bd
2018-10-31 22:54:50 condorella-rx480 48500017 70000/48500400 [ 0.14%], 2.32 ms/it; ETA 1d 07:12; 79b0cf2f782c58d5
2018-10-31 22:55:13 condorella-rx480 48500017 80000/48500400 [ 0.16%], 2.32 ms/it; ETA 1d 07:16; 5d0a8b8a0c69b85a
2018-10-31 22:55:36 condorella-rx480 48500017 90000/48500400 [ 0.19%], 2.32 ms/it; ETA 1d 07:13; 0dde16c8b9bb3d15
2018-10-31 22:55:59 condorella-rx480 48500017 100000/48500400 [ 0.21%], 2.32 ms/it; ETA 1d 07:15; f9e65eeeeae55979
2018-10-31 22:56:23 condorella-rx480 48500017 110000/48500400 [ 0.23%], 2.32 ms/it; ETA 1d 07:11; acf02571d7509071
2018-10-31 22:56:46 condorella-rx480 48500017 120000/48500400 [ 0.25%], 2.32 ms/it; 1 muls, -0.17 ms/mul; ETA 1d 07:11; 18a5b3dab25d8c66
2018-10-31 22:57:09 condorella-rx480 48500017 130000/48500400 [ 0.27%], 2.32 ms/it; 4 muls, -1.87 ms/mul; ETA 1d 07:12; 5c5e040b3934a11a
2018-10-31 22:57:10 condorella-rx480 Stopping, please wait..
2018-10-31 22:57:11 condorella-rx480 48500017 OK 130400/48500400 [ 0.27%], 2.36 ms/it; ETA 1d 07:43; 796171e13eae512b (check 1.06s)
2018-10-31 22:57:11 condorella-rx480 Exiting because "stop requested"
2018-10-31 22:57:11 condorella-rx480 Bye
Terminate batch job (Y/N)? n
C:\msys64\home\ken\gpuowl-compile\v5.0>g50

C:\msys64\home\ken\gpuowl-compile\v5.0>openowl.exe -user kriesel -cpu condorella-rx480 -device 0
2018-10-31 22:57:26 gpuowl 5.0-f604bb1
2018-10-31 22:57:26 condorella-rx480 -user kriesel -cpu condorella-rx480 -device 0
2018-10-31 22:57:26 condorella-rx480 48500017 FFT 2560K: Width 64x8, Height 64x8, Middle 5; 18.50 bits/word
2018-10-31 22:57:26 condorella-rx480 using short carry kernels
2018-10-31 22:57:27 condorella-rx480 Ellesmere-36x1266-@28:0.0 Radeon (TM) RX 480 Graphics
2018-10-31 22:57:31 condorella-rx480 OpenCL compilation in 3353 ms, with "-DEXP=48500017u -DWIDTH=512u -DSMALL_HEIGHT=512u -DMIDDLE=5u -I. -cl-fast-relaxed-mat
h -cl-std=CL2.0 "
2018-10-31 22:57:31 condorella-rx480 48500017.owl loaded: k 130400, B1 1000000, block 400, res64 796171e13eae512b, stage 1, baseBits 0
2018-10-31 22:57:35 condorella-rx480 48500017 B1=1000000 B2=15200000 (effective B2=15200000) selected 663110 P-1 points in 2.97s
2018-10-31 22:57:38 condorella-rx480 48500017 OK 131200/48500400 [ 0.27%], 2.32 ms/it; ETA 1d 07:11; 86b7fc83e20178d2 (check 1.05s)
2018-10-31 22:57:59 condorella-rx480 48500017 140000/48500400 [ 0.29%], 2.34 ms/it; ETA 1d 07:26; abca893c953ae32a
2018-10-31 22:58:20 condorella-rx480 48500017 GCD no factor (41.85s)
2018-10-31 22:58:22 condorella-rx480 48500017 150000/48500400 [ 0.31%], 2.32 ms/it; 1 muls, -2.92 ms/mul; ETA 1d 07:13; e7ddf9caae37cbda
2018-10-31 22:58:47 condorella-rx480 48500017 OK 160000/48500400 [ 0.33%], 2.33 ms/it; 9 muls, 1.25 ms/mul; ETA 1d 07:14; 6a68f8317d271ba6 (check 1.07s)
2018-10-31 22:59:10 condorella-rx480 48500017 170000/48500400 [ 0.35%], 2.33 ms/it; 8 muls, -0.06 ms/mul; ETA 1d 07:20; c8ad36cf76eeb97c
2018-10-31 22:59:33 condorella-rx480 48500017 180000/48500400 [ 0.37%], 2.32 ms/it; 7 muls, 1.64 ms/mul; ETA 1d 07:11; 022cc2c45f350c10
2018-10-31 22:59:56 condorella-rx480 48500017 190000/48500400 [ 0.39%], 2.32 ms/it; 6 muls, 1.55 ms/mul; ETA 1d 07:11; dc6ce88b7bb055b2
2018-10-31 23:00:20 condorella-rx480 48500017 200000/48500400 [ 0.41%], 2.33 ms/it; 17 muls, 1.41 ms/mul; ETA 1d 07:12; 4370c2033347ada0
2018-10-31 23:00:43 condorella-rx480 48500017 210000/48500400 [ 0.43%], 2.32 ms/it; 17 muls, 5.16 ms/mul; ETA 1d 07:04; d22df0cbc1f8f931
2018-10-31 23:01:06 condorella-rx480 48500017 220000/48500400 [ 0.45%], 2.33 ms/it; 16 muls, 1.68 ms/mul; ETA 1d 07:12; a29346366a7594f7
2018-10-31 23:01:29 condorella-rx480 48500017 230000/48500400 [ 0.47%], 2.32 ms/it; 18 muls, 1.36 ms/mul; ETA 1d 07:10; 2b65f1963400a74a
2018-10-31 23:01:53 condorella-rx480 48500017 240000/48500400 [ 0.49%], 2.32 ms/it; 30 muls, 2.01 ms/mul; ETA 1d 07:10; 14dc048e310dbbf0
2018-10-31 23:02:16 condorella-rx480 48500017 250000/48500400 [ 0.52%], 2.32 ms/it; 29 muls, 2.43 ms/mul; ETA 1d 07:07; 398ab8acffd4febd[/CODE]


All times are UTC. The time now is 23:09.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.