mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2017-12-08, 00:36   #232
Smokingenius
 
Nov 2015

2×5 Posts
Default

Quote:
Originally Posted by kracker View Post
Latest binaries for Windows from git.
Thanks so much for the binary I really appreciate it. I pulled out an old AMD box to get started looking for some primes and can't wait to get this rolling on my AMD cards. :)
Smokingenius is offline   Reply With Quote
Old 2017-12-08, 00:49   #233
Smokingenius
 
Nov 2015

128 Posts
Default Compilation Error

Running into an error when using the binary that was posted a few posts up.

"C:\Prime 95\gpuOwL>gpuowl.exe
gpuOwL v1.9- GPU Mersenne primality checker
Turks, 6x 800MHz
OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEX
P=82891957u -DWIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 )
"C:\Users\LOCKHA~1\AppData\Local\Temp\OCLCCE7.tmp.cl", line 1: catastrophic erro
r:
cannot open source file "gpuowl.cl"
#include "gpuowl.cl"
^

1 catastrophic error detected in the compilation of "C:\Users\LOCKHA~1\AppData\L
ocal\Temp\OCLCCE7.tmp.cl".
Compilation terminated.

Internal error: clc compiler invocation failed.

OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -DEXP=82891957u -D
WIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 )
"C:\Users\LOCKHA~1\AppData\Local\Temp\OCLCD28.tmp.cl", line 1: catastrophic erro
r:
cannot open source file "gpuowl.cl"
#include "gpuowl.cl"
^

1 catastrophic error detected in the compilation of "C:\Users\LOCKHA~1\AppData\L
ocal\Temp\OCLCD28.tmp.cl".
Compilation terminated.

Internal error: clc compiler invocation failed.


Bye"

If anyone has any ideas I would appreciate the help. In the meant time I am going to try and see if I can compile directly from the gitHub repo since I have MinGW and openCL.

Regards.
Smokingenius is offline   Reply With Quote
Old 2017-12-08, 02:13   #234
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3×457 Posts
Default

Quote:
Originally Posted by Smokingenius View Post
Running into an error when using the binary that was posted a few posts up.

"C:\Prime 95\gpuOwL>gpuowl.exe
gpuOwL v1.9- GPU Mersenne primality checker
Turks, 6x 800MHz
OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEX
P=82891957u -DWIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 )
"C:\Users\LOCKHA~1\AppData\Local\Temp\OCLCCE7.tmp.cl", line 1: catastrophic erro
r:
cannot open source file "gpuowl.cl"
#include "gpuowl.cl"
^
You must have the "gpuowl.cl" file in the same folder as the executable. Was that it?
preda is offline   Reply With Quote
Old 2017-12-08, 20:12   #235
Smokingenius
 
Nov 2015

2×5 Posts
Default

Quote:
Originally Posted by preda View Post
You must have the "gpuowl.cl" file in the same folder as the executable. Was that it?
Yep that was it! I feel silly now. It wasn't in the binary zip that was provided a few posts back so I didn't include it from the github source when I unzipped it. I should of realized. I even went through the trouble of reinstalling all my opencl drivers and files lol. :)

Thanks for the help! Got these couple old AMD girls running! Not the performance of your cards or my GTXs with CudaLucas, but they are running and that is all that matters to me :)
Smokingenius is offline   Reply With Quote
Old 2017-12-08, 21:56   #236
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by Smokingenius View Post
Yep that was it! I feel silly now. It wasn't in the binary zip that was provided a few posts back so I didn't include it from the github source when I unzipped it. I should of realized. I even went through the trouble of reinstalling all my opencl drivers and files lol. :)

Thanks for the help! Got these couple old AMD girls running! Not the performance of your cards or my GTXs with CudaLucas, but they are running and that is all that matters to me :)
Wow, I'm happy it's working.

I'd be curious to know how fast it is (time per iteration), what is your hardware, and what OS and driver.

In my experience it works best with ROCm 1.6, which requires Ubuntu 16.04. AMDGPU-pro should be OK as well altough slower than ROCm.

You can run with "-verbosity 1" to get some information about the range and distribution of iteration time (i.e. uniform iteration time or not).

PS: it seems these "TURKS" GPUs are quite old, they're pre-GCN. Well, it's impressive it works then! (and such hardware isn't supported by ROCm, so no need to worry about that).

Last fiddled with by preda on 2017-12-08 at 22:00
preda is offline   Reply With Quote
Old 2017-12-09, 02:21   #237
GP2
 
GP2's Avatar
 
Sep 2003

5×11×47 Posts
Default

Maybe the title of this thread should be changed. Does gpuOwL still include LL testing functionality?
GP2 is offline   Reply With Quote
Old 2017-12-09, 04:36   #238
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by GP2 View Post
Maybe the title of this thread should be changed. Does gpuOwL still include LL testing functionality?
Yes I'm not against changing the title. Also I don't think it's a big problem if it stays unchanged. It's true that GpuOwl does not do LL anymore, OTOH LL could be seen as a broad synonym for "definitive primality testing for mersenne numbers", which is similar to what GpuOwl does (PRP).

I think the real goal of GpuOwl is to provide efficient primality testing for Mersenne numbers. It just so happened that "efficent" recently became PRP instead of LL. It was my mistake of being over-specific in the thread title, I should have said "mersenne primality testing" instead of LL.

Last fiddled with by preda on 2017-12-09 at 04:37
preda is offline   Reply With Quote
Old 2017-12-09, 18:58   #239
Smokingenius
 
Nov 2015

2·5 Posts
Default

Quote:
Originally Posted by preda View Post
Wow, I'm happy it's working.

I'd be curious to know how fast it is (time per iteration), what is your hardware, and what OS and driver.

In my experience it works best with ROCm 1.6, which requires Ubuntu 16.04. AMDGPU-pro should be OK as well altough slower than ROCm.

You can run with "-verbosity 1" to get some information about the range and distribution of iteration time (i.e. uniform iteration time or not).

PS: it seems these "TURKS" GPUs are quite old, they're pre-GCN. Well, it's impressive it works then! (and such hardware isn't supported by ROCm, so no need to worry about that).
Well I am still working on it. Upon initial run of of the exe with -h it did find the 3 graphics cards. I have the Turks, and Devastator, and then the A8 the system was built with with integrated graphics. The two cards started up but never got anywhere and the A8 was erroring on compilation, but it was getting farther. I since re-installed the AMD drivers and OpenCL 2.0. The cards now error out on 7 errors and I get a "Bye" message. For the A8 I get a bunch of errors but it compiles and then appears to start running. I got the verbosity on 1 so you can know what speed I am getting with the A8 integrated graphics. I am also running a test with Prime95 running and not running to see if there is any effect.
Smokingenius is offline   Reply With Quote
Old 2017-12-10, 02:13   #240
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by Smokingenius View Post
Well I am still working on it. Upon initial run of of the exe with -h it did find the 3 graphics cards. I have the Turks, and Devastator, and then the A8 the system was built with with integrated graphics. The two cards started up but never got anywhere and the A8 was erroring on compilation, but it was getting farther. I since re-installed the AMD drivers and OpenCL 2.0. The cards now error out on 7 errors and I get a "Bye" message. For the A8 I get a bunch of errors but it compiles and then appears to start running. I got the verbosity on 1 so you can know what speed I am getting with the A8 integrated graphics. I am also running a test with Prime95 running and not running to see if there is any effect.
Try with "-legacy" as well, will produce different performance behavior, may work better on old cards.

Feel free to post error messages here, I'm curious to look and see if there's anything I could fix.

"-verbosity 2" should allow you to see if it's progressing at all, or stuck for some reason.
preda is offline   Reply With Quote
Old 2017-12-15, 09:26   #241
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

3·52·11 Posts
Default

Can anyone with a 1080 and 1080ti post their 4096K iteration timings please? There are Vega and Fury X timings dotted about the thread, it would be interesting to compare.

edit: Am I being dumb for wanting to bench nividia with an opencl program? Thinking about it the GTX cards are conspicuous in their absence in this thread.

Last fiddled with by M344587487 on 2017-12-15 at 10:09
M344587487 is offline   Reply With Quote
Old 2017-12-15, 17:13   #242
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10101001011012 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Can anyone with a 1080 and 1080ti post their 4096K iteration timings please? There are Vega and Fury X timings dotted about the thread, it would be interesting to compare.

edit: Am I being dumb for wanting to bench nividia with an opencl program? Thinking about it the GTX cards are conspicuous in their absence in this thread.
Not what you asked for in multiple respects, but might be of interest, and it's the closest I have:
FFT benchmark results for latest available CUDALucas program on GTX1070, versus CUDA level and 32-bit or 64-bit. Best time is 4.45msec/iteration of Lucas-Lehmer. Note this is from a card that runs thermal limited typically to 70% of TDP so alone or with better cooling would do better. No OpenCL involved. Driver version N378.66. In this test CUDA6.5 32-bit was fastest, CUDA 4.0 slower by far.

---------- GEFORCE GTX 1070 FFT 2.06BETA 4.0 X64 N378.66.TXT
4096 75846319 5.5530

---------- GEFORCE GTX 1070 FFT 2.06BETA 4.1 X64 N378.66.TXT
4096 75846319 4.5110

---------- GEFORCE GTX 1070 FFT 2.06BETA 4.2 X64 N378.66.TXT
4096 75846319 4.5163

---------- GEFORCE GTX 1070 FFT 2.06BETA 5.0 X64 N378.66.TXT
4096 75846319 4.5090

---------- GEFORCE GTX 1070 FFT 2.06BETA 5.5 X64 N378.66.TXT
4096 75846319 4.5381

---------- GEFORCE GTX 1070 FFT 2.06BETA 6.0 X64 N378.66.TXT
4096 75846319 4.4602

---------- GEFORCE GTX 1070 FFT 2.06BETA 6.5 X64 N378.66.TXT
4096 75846319 4.4638

---------- GEFORCE GTX 1070 FFT 2.06BETA 7.0 X64 N378.66.TXT
4096 75846319 4.4721

---------- GEFORCE GTX 1070 FFT 2.06BETA 7.5 X64 N378.66.TXT
4096 75846319 4.4957

---------- GEFORCE GTX 1070 FFT 2.06BETA 8.0 X64 N378.66.TXT
4096 75846319 4.4804

---------- GEFORCE GTX 1070 FFT 2.06BETA 4.0 WIN32 N378.66.TXT
4096 75846319 5.5271

---------- GEFORCE GTX 1070 FFT 2.06BETA 4.1 WIN32 N378.66.TXT
4096 75846319 4.4940

---------- GEFORCE GTX 1070 FFT 2.06BETA 4.2 WIN32 N378.66.TXT
4096 75846319 4.4935

---------- GEFORCE GTX 1070 FFT 2.06BETA 5.0 WIN32 N378.66.TXT
4096 75846319 4.4975

---------- GEFORCE GTX 1070 FFT 2.06BETA 5.5 WIN32 N378.66.TXT
4096 75846319 4.4892

---------- GEFORCE GTX 1070 FFT 2.06BETA 6.0 WIN32 N378.66.TXT
4096 75846319 4.5142

---------- GEFORCE GTX 1070 FFT 2.06BETA 6.5 WIN32 N378.66.TXT
4096 75846319 4.4501

Scaling vs NVIDIA specifications, and sqrt(TDP) for the 1070, I'd guess something like the following for CUDALucas timings, assuming TFLOPS, not memory bandwidth, is limiting
Model, TFlops, msec/iter
GTX1070 70%TDP, 5.4, 4.45
GTX1070 100%tdp, 6.5, 3.7
GTX1080, 9, 2.7
GTX1080Ti, 11.5, 2.1

I experimented very briefly with getting OpenCl-based applications (clLucas, mfakto, GpuOwl) going on my disparate hardware collection (mostly Intel and NVIDIA on Windows) and ran into sufficient obstacles to serve as a short to medium term deterrent. One modern laptop with OpenCl capable Intel HD620 demonstrated a Prime95 performance hit that was substantially larger than its GpuOwl throughput, for an early GpuOwl version, perhaps due to shared memory bandwidth.

If there is an inexpensive modest to decent performance OpenCl capable GPU card that can run within the ~60W power limits and of the bandwidth limits of a 1x PCIe extender/adapter to PCIe X16, please identify it. I think a couple of my Quadro 2000s have failed due to old age and steady use. The Quadro 2000s are getting scarce and more expensive, and I'd like to diversify into OpenCl/AMD a bit.

Last fiddled with by kriesel on 2017-12-15 at 17:26
kriesel is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 16:56.


Mon Aug 2 16:56:47 UTC 2021 up 10 days, 11:25, 0 users, load averages: 2.49, 2.37, 2.23

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.