![]() |
|
|
#232 |
|
Nov 2015
2×5 Posts |
|
|
|
|
|
|
#233 |
|
Nov 2015
128 Posts |
Running into an error when using the binary that was posted a few posts up.
"C:\Prime 95\gpuOwL>gpuowl.exe gpuOwL v1.9- GPU Mersenne primality checker Turks, 6x 800MHz OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0 -DEX P=82891957u -DWIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 ) "C:\Users\LOCKHA~1\AppData\Local\Temp\OCLCCE7.tmp.cl", line 1: catastrophic erro r: cannot open source file "gpuowl.cl" #include "gpuowl.cl" ^ 1 catastrophic error detected in the compilation of "C:\Users\LOCKHA~1\AppData\L ocal\Temp\OCLCCE7.tmp.cl". Compilation terminated. Internal error: clc compiler invocation failed. OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -DEXP=82891957u -D WIDTH=2048u -DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 ) "C:\Users\LOCKHA~1\AppData\Local\Temp\OCLCD28.tmp.cl", line 1: catastrophic erro r: cannot open source file "gpuowl.cl" #include "gpuowl.cl" ^ 1 catastrophic error detected in the compilation of "C:\Users\LOCKHA~1\AppData\L ocal\Temp\OCLCD28.tmp.cl". Compilation terminated. Internal error: clc compiler invocation failed. Bye" If anyone has any ideas I would appreciate the help. In the meant time I am going to try and see if I can compile directly from the gitHub repo since I have MinGW and openCL. Regards. |
|
|
|
|
|
#234 | |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
Quote:
|
|
|
|
|
|
|
#235 | |
|
Nov 2015
1010 Posts |
Quote:
Thanks for the help! Got these couple old AMD girls running! Not the performance of your cards or my GTXs with CudaLucas, but they are running and that is all that matters to me :) |
|
|
|
|
|
|
#236 | |
|
"Mihai Preda"
Apr 2015
3×457 Posts |
Quote:
I'd be curious to know how fast it is (time per iteration), what is your hardware, and what OS and driver. In my experience it works best with ROCm 1.6, which requires Ubuntu 16.04. AMDGPU-pro should be OK as well altough slower than ROCm. You can run with "-verbosity 1" to get some information about the range and distribution of iteration time (i.e. uniform iteration time or not). PS: it seems these "TURKS" GPUs are quite old, they're pre-GCN. Well, it's impressive it works then! (and such hardware isn't supported by ROCm, so no need to worry about that). Last fiddled with by preda on 2017-12-08 at 22:00 |
|
|
|
|
|
|
#237 |
|
Sep 2003
50318 Posts |
Maybe the title of this thread should be changed. Does gpuOwL still include LL testing functionality?
|
|
|
|
|
|
#238 | |
|
"Mihai Preda"
Apr 2015
3×457 Posts |
Quote:
I think the real goal of GpuOwl is to provide efficient primality testing for Mersenne numbers. It just so happened that "efficent" recently became PRP instead of LL. It was my mistake of being over-specific in the thread title, I should have said "mersenne primality testing" instead of LL. Last fiddled with by preda on 2017-12-09 at 04:37 |
|
|
|
|
|
|
#239 | |
|
Nov 2015
128 Posts |
Quote:
|
|
|
|
|
|
|
#240 | |
|
"Mihai Preda"
Apr 2015
3×457 Posts |
Quote:
Feel free to post error messages here, I'm curious to look and see if there's anything I could fix. "-verbosity 2" should allow you to see if it's progressing at all, or stuck for some reason. |
|
|
|
|
|
|
#241 |
|
"Composite as Heck"
Oct 2017
33916 Posts |
Can anyone with a 1080 and 1080ti post their 4096K iteration timings please? There are Vega and Fury X timings dotted about the thread, it would be interesting to compare.
edit: Am I being dumb for wanting to bench nividia with an opencl program? Thinking about it the GTX cards are conspicuous in their absence in this thread. Last fiddled with by M344587487 on 2017-12-15 at 10:09 |
|
|
|
|
|
#242 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,419 Posts |
Quote:
FFT benchmark results for latest available CUDALucas program on GTX1070, versus CUDA level and 32-bit or 64-bit. Best time is 4.45msec/iteration of Lucas-Lehmer. Note this is from a card that runs thermal limited typically to 70% of TDP so alone or with better cooling would do better. No OpenCL involved. Driver version N378.66. In this test CUDA6.5 32-bit was fastest, CUDA 4.0 slower by far. ---------- GEFORCE GTX 1070 FFT 2.06BETA 4.0 X64 N378.66.TXT 4096 75846319 5.5530 ---------- GEFORCE GTX 1070 FFT 2.06BETA 4.1 X64 N378.66.TXT 4096 75846319 4.5110 ---------- GEFORCE GTX 1070 FFT 2.06BETA 4.2 X64 N378.66.TXT 4096 75846319 4.5163 ---------- GEFORCE GTX 1070 FFT 2.06BETA 5.0 X64 N378.66.TXT 4096 75846319 4.5090 ---------- GEFORCE GTX 1070 FFT 2.06BETA 5.5 X64 N378.66.TXT 4096 75846319 4.5381 ---------- GEFORCE GTX 1070 FFT 2.06BETA 6.0 X64 N378.66.TXT 4096 75846319 4.4602 ---------- GEFORCE GTX 1070 FFT 2.06BETA 6.5 X64 N378.66.TXT 4096 75846319 4.4638 ---------- GEFORCE GTX 1070 FFT 2.06BETA 7.0 X64 N378.66.TXT 4096 75846319 4.4721 ---------- GEFORCE GTX 1070 FFT 2.06BETA 7.5 X64 N378.66.TXT 4096 75846319 4.4957 ---------- GEFORCE GTX 1070 FFT 2.06BETA 8.0 X64 N378.66.TXT 4096 75846319 4.4804 ---------- GEFORCE GTX 1070 FFT 2.06BETA 4.0 WIN32 N378.66.TXT 4096 75846319 5.5271 ---------- GEFORCE GTX 1070 FFT 2.06BETA 4.1 WIN32 N378.66.TXT 4096 75846319 4.4940 ---------- GEFORCE GTX 1070 FFT 2.06BETA 4.2 WIN32 N378.66.TXT 4096 75846319 4.4935 ---------- GEFORCE GTX 1070 FFT 2.06BETA 5.0 WIN32 N378.66.TXT 4096 75846319 4.4975 ---------- GEFORCE GTX 1070 FFT 2.06BETA 5.5 WIN32 N378.66.TXT 4096 75846319 4.4892 ---------- GEFORCE GTX 1070 FFT 2.06BETA 6.0 WIN32 N378.66.TXT 4096 75846319 4.5142 ---------- GEFORCE GTX 1070 FFT 2.06BETA 6.5 WIN32 N378.66.TXT 4096 75846319 4.4501 Scaling vs NVIDIA specifications, and sqrt(TDP) for the 1070, I'd guess something like the following for CUDALucas timings, assuming TFLOPS, not memory bandwidth, is limiting Model, TFlops, msec/iter GTX1070 70%TDP, 5.4, 4.45 GTX1070 100%tdp, 6.5, 3.7 GTX1080, 9, 2.7 GTX1080Ti, 11.5, 2.1 I experimented very briefly with getting OpenCl-based applications (clLucas, mfakto, GpuOwl) going on my disparate hardware collection (mostly Intel and NVIDIA on Windows) and ran into sufficient obstacles to serve as a short to medium term deterrent. One modern laptop with OpenCl capable Intel HD620 demonstrated a Prime95 performance hit that was substantially larger than its GpuOwl throughput, for an early GpuOwl version, perhaps due to shared memory bandwidth. If there is an inexpensive modest to decent performance OpenCl capable GPU card that can run within the ~60W power limits and of the bandwidth limits of a 1x PCIe extender/adapter to PCIe X16, please identify it. I think a couple of my Quadro 2000s have failed due to old age and steady use. The Quadro 2000s are getting scarce and more expensive, and I'd like to diversify into OpenCl/AMD a bit. Last fiddled with by kriesel on 2017-12-15 at 17:26 |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |