![]() |
[QUOTE=kracker;471663]Latest binaries for Windows from git.[/QUOTE]
Thanks so much for the binary I really appreciate it. I pulled out an old AMD box to get started looking for some primes and can't wait to get this rolling on my AMD cards. :) |
Compilation Error
Running into an error when using the binary that was posted a few posts up.
"C:\Prime 95\gpuOwL>gpuowl.exe gpuOwL v1.9- GPU Mersenne primality checker Turks, 6x 800MHz OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEX P=82891957u -DWIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 ) "C:\Users\LOCKHA~1\AppData\Local\Temp\OCLCCE7.tmp.cl", line 1: catastrophic erro r: cannot open source file "gpuowl.cl" #include "gpuowl.cl" ^ 1 catastrophic error detected in the compilation of "C:\Users\LOCKHA~1\AppData\L ocal\Temp\OCLCCE7.tmp.cl". Compilation terminated. Internal error: clc compiler invocation failed. OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -DEXP=82891957u -D WIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 ) "C:\Users\LOCKHA~1\AppData\Local\Temp\OCLCD28.tmp.cl", line 1: catastrophic erro r: cannot open source file "gpuowl.cl" #include "gpuowl.cl" ^ 1 catastrophic error detected in the compilation of "C:\Users\LOCKHA~1\AppData\L ocal\Temp\OCLCD28.tmp.cl". Compilation terminated. Internal error: clc compiler invocation failed. Bye" If anyone has any ideas I would appreciate the help. In the meant time I am going to try and see if I can compile directly from the gitHub repo since I have MinGW and openCL. Regards. |
[QUOTE=Smokingenius;473413]Running into an error when using the binary that was posted a few posts up.
"C:\Prime 95\gpuOwL>gpuowl.exe gpuOwL v1.9- GPU Mersenne primality checker Turks, 6x 800MHz OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEX P=82891957u -DWIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 ) "C:\Users\LOCKHA~1\AppData\Local\Temp\OCLCCE7.tmp.cl", line 1: catastrophic erro r: cannot open source file "gpuowl.cl" #include "gpuowl.cl" ^ [/QUOTE] You must have the "gpuowl.cl" file in the same folder as the executable. Was that it? |
[QUOTE=preda;473420]You must have the "gpuowl.cl" file in the same folder as the executable. Was that it?[/QUOTE]
Yep that was it! I feel silly now. It wasn't in the binary zip that was provided a few posts back so I didn't include it from the github source when I unzipped it. I should of realized. I even went through the trouble of reinstalling all my opencl drivers and files lol. :) Thanks for the help! Got these couple old AMD girls running! Not the performance of your cards or my GTXs with CudaLucas, but they are running and that is all that matters to me :) |
[QUOTE=Smokingenius;473488]Yep that was it! I feel silly now. It wasn't in the binary zip that was provided a few posts back so I didn't include it from the github source when I unzipped it. I should of realized. I even went through the trouble of reinstalling all my opencl drivers and files lol. :)
Thanks for the help! Got these couple old AMD girls running! Not the performance of your cards or my GTXs with CudaLucas, but they are running and that is all that matters to me :)[/QUOTE] Wow, I'm happy it's working. I'd be curious to know how fast it is (time per iteration), what is your hardware, and what OS and driver. In my experience it works best with ROCm 1.6, which requires Ubuntu 16.04. AMDGPU-pro should be OK as well altough slower than ROCm. You can run with "-verbosity 1" to get some information about the range and distribution of iteration time (i.e. uniform iteration time or not). PS: it seems these "TURKS" GPUs are quite old, they're pre-GCN. Well, it's impressive it works then! (and such hardware isn't supported by ROCm, so no need to worry about that). |
Maybe the title of this thread should be changed. Does gpuOwL still include LL testing functionality?
|
[QUOTE=GP2;473510]Maybe the title of this thread should be changed. Does gpuOwL still include LL testing functionality?[/QUOTE]
Yes I'm not against changing the title. Also I don't think it's a big problem if it stays unchanged. It's true that GpuOwl does not do LL anymore, OTOH LL could be seen as a broad synonym for "definitive primality testing for mersenne numbers", which is similar to what GpuOwl does (PRP). I think the real goal of GpuOwl is to provide efficient primality testing for Mersenne numbers. It just so happened that "efficent" recently became PRP instead of LL. It was my mistake of being over-specific in the thread title, I should have said "mersenne primality testing" instead of LL. |
[QUOTE=preda;473494]Wow, I'm happy it's working.
I'd be curious to know how fast it is (time per iteration), what is your hardware, and what OS and driver. In my experience it works best with ROCm 1.6, which requires Ubuntu 16.04. AMDGPU-pro should be OK as well altough slower than ROCm. You can run with "-verbosity 1" to get some information about the range and distribution of iteration time (i.e. uniform iteration time or not). PS: it seems these "TURKS" GPUs are quite old, they're pre-GCN. Well, it's impressive it works then! (and such hardware isn't supported by ROCm, so no need to worry about that).[/QUOTE] Well I am still working on it. Upon initial run of of the exe with -h it did find the 3 graphics cards. I have the Turks, and Devastator, and then the A8 the system was built with with integrated graphics. The two cards started up but never got anywhere and the A8 was erroring on compilation, but it was getting farther. I since re-installed the AMD drivers and OpenCL 2.0. The cards now error out on 7 errors and I get a "Bye" message. For the A8 I get a bunch of errors but it compiles and then appears to start running. I got the verbosity on 1 so you can know what speed I am getting with the A8 integrated graphics. I am also running a test with Prime95 running and not running to see if there is any effect. |
[QUOTE=Smokingenius;473564]Well I am still working on it. Upon initial run of of the exe with -h it did find the 3 graphics cards. I have the Turks, and Devastator, and then the A8 the system was built with with integrated graphics. The two cards started up but never got anywhere and the A8 was erroring on compilation, but it was getting farther. I since re-installed the AMD drivers and OpenCL 2.0. The cards now error out on 7 errors and I get a "Bye" message. For the A8 I get a bunch of errors but it compiles and then appears to start running. I got the verbosity on 1 so you can know what speed I am getting with the A8 integrated graphics. I am also running a test with Prime95 running and not running to see if there is any effect.[/QUOTE]
Try with "-legacy" as well, will produce different performance behavior, may work better on old cards. Feel free to post error messages here, I'm curious to look and see if there's anything I could fix. "-verbosity 2" should allow you to see if it's progressing at all, or stuck for some reason. |
Can anyone with a 1080 and 1080ti post their 4096K iteration timings please? There are Vega and Fury X timings dotted about the thread, it would be interesting to compare.
edit: Am I being dumb for wanting to bench nividia with an opencl program? Thinking about it the GTX cards are conspicuous in their absence in this thread. |
[QUOTE=M344587487;474076]Can anyone with a 1080 and 1080ti post their 4096K iteration timings please? There are Vega and Fury X timings dotted about the thread, it would be interesting to compare.
edit: Am I being dumb for wanting to bench nividia with an opencl program? Thinking about it the GTX cards are conspicuous in their absence in this thread.[/QUOTE] Not what you asked for in multiple respects, but might be of interest, and it's the closest I have: FFT benchmark results for latest available CUDALucas program on GTX1070, versus CUDA level and 32-bit or 64-bit. Best time is 4.45msec/iteration of Lucas-Lehmer. Note this is from a card that runs thermal limited typically to 70% of TDP so alone or with better cooling would do better. No OpenCL involved. Driver version N378.66. In this test CUDA6.5 32-bit was fastest, CUDA 4.0 slower by far. ---------- GEFORCE GTX 1070 FFT 2.06BETA 4.0 X64 N378.66.TXT 4096 75846319 5.5530 ---------- GEFORCE GTX 1070 FFT 2.06BETA 4.1 X64 N378.66.TXT 4096 75846319 4.5110 ---------- GEFORCE GTX 1070 FFT 2.06BETA 4.2 X64 N378.66.TXT 4096 75846319 4.5163 ---------- GEFORCE GTX 1070 FFT 2.06BETA 5.0 X64 N378.66.TXT 4096 75846319 4.5090 ---------- GEFORCE GTX 1070 FFT 2.06BETA 5.5 X64 N378.66.TXT 4096 75846319 4.5381 ---------- GEFORCE GTX 1070 FFT 2.06BETA 6.0 X64 N378.66.TXT 4096 75846319 4.4602 ---------- GEFORCE GTX 1070 FFT 2.06BETA 6.5 X64 N378.66.TXT 4096 75846319 4.4638 ---------- GEFORCE GTX 1070 FFT 2.06BETA 7.0 X64 N378.66.TXT 4096 75846319 4.4721 ---------- GEFORCE GTX 1070 FFT 2.06BETA 7.5 X64 N378.66.TXT 4096 75846319 4.4957 ---------- GEFORCE GTX 1070 FFT 2.06BETA 8.0 X64 N378.66.TXT 4096 75846319 4.4804 ---------- GEFORCE GTX 1070 FFT 2.06BETA 4.0 WIN32 N378.66.TXT 4096 75846319 5.5271 ---------- GEFORCE GTX 1070 FFT 2.06BETA 4.1 WIN32 N378.66.TXT 4096 75846319 4.4940 ---------- GEFORCE GTX 1070 FFT 2.06BETA 4.2 WIN32 N378.66.TXT 4096 75846319 4.4935 ---------- GEFORCE GTX 1070 FFT 2.06BETA 5.0 WIN32 N378.66.TXT 4096 75846319 4.4975 ---------- GEFORCE GTX 1070 FFT 2.06BETA 5.5 WIN32 N378.66.TXT 4096 75846319 4.4892 ---------- GEFORCE GTX 1070 FFT 2.06BETA 6.0 WIN32 N378.66.TXT 4096 75846319 4.5142 ---------- GEFORCE GTX 1070 FFT 2.06BETA 6.5 WIN32 N378.66.TXT 4096 75846319 4.4501 Scaling vs NVIDIA specifications, and sqrt(TDP) for the 1070, I'd guess something like the following for CUDALucas timings, assuming TFLOPS, not memory bandwidth, is limiting Model, TFlops, msec/iter GTX1070 70%TDP, 5.4, 4.45 GTX1070 100%tdp, 6.5, 3.7 GTX1080, 9, 2.7 GTX1080Ti, 11.5, 2.1 I experimented very briefly with getting OpenCl-based applications (clLucas, mfakto, GpuOwl) going on my disparate hardware collection (mostly Intel and NVIDIA on Windows) and ran into sufficient obstacles to serve as a short to medium term deterrent. One modern laptop with OpenCl capable Intel HD620 demonstrated a Prime95 performance hit that was substantially larger than its GpuOwl throughput, for an early GpuOwl version, perhaps due to shared memory bandwidth. If there is an inexpensive modest to decent performance OpenCl capable GPU card that can run within the ~60W power limits and of the bandwidth limits of a 1x PCIe extender/adapter to PCIe X16, please identify it. I think a couple of my Quadro 2000s have failed due to old age and steady use. The Quadro 2000s are getting scarce and more expensive, and I'd like to diversify into OpenCl/AMD a bit. |
| All times are UTC. The time now is 22:22. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.