mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

Smokingenius 2017-12-08 00:36

[QUOTE=kracker;471663]Latest binaries for Windows from git.[/QUOTE]

Thanks so much for the binary I really appreciate it. I pulled out an old AMD box to get started looking for some primes and can't wait to get this rolling on my AMD cards. :)

Smokingenius 2017-12-08 00:49

Compilation Error
 
Running into an error when using the binary that was posted a few posts up.

"C:\Prime 95\gpuOwL>gpuowl.exe
gpuOwL v1.9- GPU Mersenne primality checker
Turks, 6x 800MHz
OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEX
P=82891957u -DWIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 )
"C:\Users\LOCKHA~1\AppData\Local\Temp\OCLCCE7.tmp.cl", line 1: catastrophic erro
r:
cannot open source file "gpuowl.cl"
#include "gpuowl.cl"
^

1 catastrophic error detected in the compilation of "C:\Users\LOCKHA~1\AppData\L
ocal\Temp\OCLCCE7.tmp.cl".
Compilation terminated.

Internal error: clc compiler invocation failed.

OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -DEXP=82891957u -D
WIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 )
"C:\Users\LOCKHA~1\AppData\Local\Temp\OCLCD28.tmp.cl", line 1: catastrophic erro
r:
cannot open source file "gpuowl.cl"
#include "gpuowl.cl"
^

1 catastrophic error detected in the compilation of "C:\Users\LOCKHA~1\AppData\L
ocal\Temp\OCLCD28.tmp.cl".
Compilation terminated.

Internal error: clc compiler invocation failed.


Bye"

If anyone has any ideas I would appreciate the help. In the meant time I am going to try and see if I can compile directly from the gitHub repo since I have MinGW and openCL.

Regards.

preda 2017-12-08 02:13

[QUOTE=Smokingenius;473413]Running into an error when using the binary that was posted a few posts up.

"C:\Prime 95\gpuOwL>gpuowl.exe
gpuOwL v1.9- GPU Mersenne primality checker
Turks, 6x 800MHz
OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEX
P=82891957u -DWIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 )
"C:\Users\LOCKHA~1\AppData\Local\Temp\OCLCCE7.tmp.cl", line 1: catastrophic erro
r:
cannot open source file "gpuowl.cl"
#include "gpuowl.cl"
^
[/QUOTE]

You must have the "gpuowl.cl" file in the same folder as the executable. Was that it?

Smokingenius 2017-12-08 20:12

[QUOTE=preda;473420]You must have the "gpuowl.cl" file in the same folder as the executable. Was that it?[/QUOTE]

Yep that was it! I feel silly now. It wasn't in the binary zip that was provided a few posts back so I didn't include it from the github source when I unzipped it. I should of realized. I even went through the trouble of reinstalling all my opencl drivers and files lol. :)

Thanks for the help! Got these couple old AMD girls running! Not the performance of your cards or my GTXs with CudaLucas, but they are running and that is all that matters to me :)

preda 2017-12-08 21:56

[QUOTE=Smokingenius;473488]Yep that was it! I feel silly now. It wasn't in the binary zip that was provided a few posts back so I didn't include it from the github source when I unzipped it. I should of realized. I even went through the trouble of reinstalling all my opencl drivers and files lol. :)

Thanks for the help! Got these couple old AMD girls running! Not the performance of your cards or my GTXs with CudaLucas, but they are running and that is all that matters to me :)[/QUOTE]

Wow, I'm happy it's working.

I'd be curious to know how fast it is (time per iteration), what is your hardware, and what OS and driver.

In my experience it works best with ROCm 1.6, which requires Ubuntu 16.04. AMDGPU-pro should be OK as well altough slower than ROCm.

You can run with "-verbosity 1" to get some information about the range and distribution of iteration time (i.e. uniform iteration time or not).

PS: it seems these "TURKS" GPUs are quite old, they're pre-GCN. Well, it's impressive it works then! (and such hardware isn't supported by ROCm, so no need to worry about that).

GP2 2017-12-09 02:21

Maybe the title of this thread should be changed. Does gpuOwL still include LL testing functionality?

preda 2017-12-09 04:36

[QUOTE=GP2;473510]Maybe the title of this thread should be changed. Does gpuOwL still include LL testing functionality?[/QUOTE]

Yes I'm not against changing the title. Also I don't think it's a big problem if it stays unchanged. It's true that GpuOwl does not do LL anymore, OTOH LL could be seen as a broad synonym for "definitive primality testing for mersenne numbers", which is similar to what GpuOwl does (PRP).

I think the real goal of GpuOwl is to provide efficient primality testing for Mersenne numbers. It just so happened that "efficent" recently became PRP instead of LL. It was my mistake of being over-specific in the thread title, I should have said "mersenne primality testing" instead of LL.

Smokingenius 2017-12-09 18:58

[QUOTE=preda;473494]Wow, I'm happy it's working.

I'd be curious to know how fast it is (time per iteration), what is your hardware, and what OS and driver.

In my experience it works best with ROCm 1.6, which requires Ubuntu 16.04. AMDGPU-pro should be OK as well altough slower than ROCm.

You can run with "-verbosity 1" to get some information about the range and distribution of iteration time (i.e. uniform iteration time or not).

PS: it seems these "TURKS" GPUs are quite old, they're pre-GCN. Well, it's impressive it works then! (and such hardware isn't supported by ROCm, so no need to worry about that).[/QUOTE]

Well I am still working on it. Upon initial run of of the exe with -h it did find the 3 graphics cards. I have the Turks, and Devastator, and then the A8 the system was built with with integrated graphics. The two cards started up but never got anywhere and the A8 was erroring on compilation, but it was getting farther. I since re-installed the AMD drivers and OpenCL 2.0. The cards now error out on 7 errors and I get a "Bye" message. For the A8 I get a bunch of errors but it compiles and then appears to start running. I got the verbosity on 1 so you can know what speed I am getting with the A8 integrated graphics. I am also running a test with Prime95 running and not running to see if there is any effect.

preda 2017-12-10 02:13

[QUOTE=Smokingenius;473564]Well I am still working on it. Upon initial run of of the exe with -h it did find the 3 graphics cards. I have the Turks, and Devastator, and then the A8 the system was built with with integrated graphics. The two cards started up but never got anywhere and the A8 was erroring on compilation, but it was getting farther. I since re-installed the AMD drivers and OpenCL 2.0. The cards now error out on 7 errors and I get a "Bye" message. For the A8 I get a bunch of errors but it compiles and then appears to start running. I got the verbosity on 1 so you can know what speed I am getting with the A8 integrated graphics. I am also running a test with Prime95 running and not running to see if there is any effect.[/QUOTE]

Try with "-legacy" as well, will produce different performance behavior, may work better on old cards.

Feel free to post error messages here, I'm curious to look and see if there's anything I could fix.

"-verbosity 2" should allow you to see if it's progressing at all, or stuck for some reason.

M344587487 2017-12-15 09:26

Can anyone with a 1080 and 1080ti post their 4096K iteration timings please? There are Vega and Fury X timings dotted about the thread, it would be interesting to compare.

edit: Am I being dumb for wanting to bench nividia with an opencl program? Thinking about it the GTX cards are conspicuous in their absence in this thread.

kriesel 2017-12-15 17:13

[QUOTE=M344587487;474076]Can anyone with a 1080 and 1080ti post their 4096K iteration timings please? There are Vega and Fury X timings dotted about the thread, it would be interesting to compare.

edit: Am I being dumb for wanting to bench nividia with an opencl program? Thinking about it the GTX cards are conspicuous in their absence in this thread.[/QUOTE]

Not what you asked for in multiple respects, but might be of interest, and it's the closest I have:
FFT benchmark results for latest available CUDALucas program on GTX1070, versus CUDA level and 32-bit or 64-bit. Best time is 4.45msec/iteration of Lucas-Lehmer. Note this is from a card that runs thermal limited typically to 70% of TDP so alone or with better cooling would do better. No OpenCL involved. Driver version N378.66. In this test CUDA6.5 32-bit was fastest, CUDA 4.0 slower by far.

---------- GEFORCE GTX 1070 FFT 2.06BETA 4.0 X64 N378.66.TXT
4096 75846319 5.5530

---------- GEFORCE GTX 1070 FFT 2.06BETA 4.1 X64 N378.66.TXT
4096 75846319 4.5110

---------- GEFORCE GTX 1070 FFT 2.06BETA 4.2 X64 N378.66.TXT
4096 75846319 4.5163

---------- GEFORCE GTX 1070 FFT 2.06BETA 5.0 X64 N378.66.TXT
4096 75846319 4.5090

---------- GEFORCE GTX 1070 FFT 2.06BETA 5.5 X64 N378.66.TXT
4096 75846319 4.5381

---------- GEFORCE GTX 1070 FFT 2.06BETA 6.0 X64 N378.66.TXT
4096 75846319 4.4602

---------- GEFORCE GTX 1070 FFT 2.06BETA 6.5 X64 N378.66.TXT
4096 75846319 4.4638

---------- GEFORCE GTX 1070 FFT 2.06BETA 7.0 X64 N378.66.TXT
4096 75846319 4.4721

---------- GEFORCE GTX 1070 FFT 2.06BETA 7.5 X64 N378.66.TXT
4096 75846319 4.4957

---------- GEFORCE GTX 1070 FFT 2.06BETA 8.0 X64 N378.66.TXT
4096 75846319 4.4804

---------- GEFORCE GTX 1070 FFT 2.06BETA 4.0 WIN32 N378.66.TXT
4096 75846319 5.5271

---------- GEFORCE GTX 1070 FFT 2.06BETA 4.1 WIN32 N378.66.TXT
4096 75846319 4.4940

---------- GEFORCE GTX 1070 FFT 2.06BETA 4.2 WIN32 N378.66.TXT
4096 75846319 4.4935

---------- GEFORCE GTX 1070 FFT 2.06BETA 5.0 WIN32 N378.66.TXT
4096 75846319 4.4975

---------- GEFORCE GTX 1070 FFT 2.06BETA 5.5 WIN32 N378.66.TXT
4096 75846319 4.4892

---------- GEFORCE GTX 1070 FFT 2.06BETA 6.0 WIN32 N378.66.TXT
4096 75846319 4.5142

---------- GEFORCE GTX 1070 FFT 2.06BETA 6.5 WIN32 N378.66.TXT
4096 75846319 4.4501

Scaling vs NVIDIA specifications, and sqrt(TDP) for the 1070, I'd guess something like the following for CUDALucas timings, assuming TFLOPS, not memory bandwidth, is limiting
Model, TFlops, msec/iter
GTX1070 70%TDP, 5.4, 4.45
GTX1070 100%tdp, 6.5, 3.7
GTX1080, 9, 2.7
GTX1080Ti, 11.5, 2.1

I experimented very briefly with getting OpenCl-based applications (clLucas, mfakto, GpuOwl) going on my disparate hardware collection (mostly Intel and NVIDIA on Windows) and ran into sufficient obstacles to serve as a short to medium term deterrent. One modern laptop with OpenCl capable Intel HD620 demonstrated a Prime95 performance hit that was substantially larger than its GpuOwl throughput, for an early GpuOwl version, perhaps due to shared memory bandwidth.

If there is an inexpensive modest to decent performance OpenCl capable GPU card that can run within the ~60W power limits and of the bandwidth limits of a 1x PCIe extender/adapter to PCIe X16, please identify it. I think a couple of my Quadro 2000s have failed due to old age and steady use. The Quadro 2000s are getting scarce and more expensive, and I'd like to diversify into OpenCl/AMD a bit.


All times are UTC. The time now is 22:22.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.