![]() |
![]() |
#12 |
"Mihai Preda"
Apr 2015
22×192 Posts |
![]()
Something is definitely unexpected with the performance you see. I'm using AMDGPU-Pro as OpenCL compiler (either most recent, 17.10, or prev 16.xx, were producing similar performance). What OpenCL driver do you use? is the GPU in a good PCIe slot (no transfer bottleneck)?
I would dump the .isa compiled kernels and look there for diffs (you can pass "-save-temps" in clwrap.h, it's there in a comment, and send me the .isa). Or I'll add an option to enable that as an argument. |
![]() |
![]() |
![]() |
#13 |
Jan 2013
10001002 Posts |
![]()
Sam Harris would approve of the name. They're perfectly good after all.
|
![]() |
![]() |
![]() |
#14 |
"Mr. Meeseeks"
Jan 2012
California, USA
27×17 Posts |
![]()
Been tinkering around with it on windows.. got this after compiling:
Code:
gpuOwL v0.1 GPU Lucas-Lehmer primality checker LL of 76000021 at iteration 0 FFT 1024*2048 (4M words, 18.12 bits per word) log An invalid option was specified. error -11 Assertion failed! Program: C:\Users\Back\Desktop\gpuowl-0832c6d\gpuowl-0832c6d.exe File: clwrap.h, Line 66 Expression: false This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. EDIT: Pulled and recompiled from the latest commit.. Code:
OpenCL Compilation error -11, log: An invalid option was specified. Last fiddled with by kracker on 2017-04-22 at 01:24 |
![]() |
![]() |
![]() |
#15 | |
"Victor de Hollander"
Aug 2011
the Netherlands
49B16 Posts |
![]() Quote:
error -11 Assertion failed! clwrap.h, Line 66 |
|
![]() |
![]() |
![]() |
#16 |
"Mihai Preda"
Apr 2015
26448 Posts |
![]()
Kracker, Victor: thanks for trying the compile! As I only compiled with one CL implementation myself, I was not aware of these problems.
Now, would you try again with a fresh checked-out version, I simplified the CL options. If it still doesn't pass the CL compiler, you can try deleting -cl-fast-relaxed-math -cl-std=CL2.0 from clwrap.h line 89 (but hopefully you won't have to do this). Concerning the AID in worktodo.txt, try a single 0, like this: Test=0,24036583,75,1 Otherwise, what should pass in any case is 32 hex-digits, like 00000.. repeated 32 times. But Test=N/A would not pass because N/A is not valid hexadecimal digits. - Mihai |
![]() |
![]() |
![]() |
#17 |
"Mihai Preda"
Apr 2015
5A416 Posts |
![]()
@airsquirrels: could you run a fresh checked-out version with command line args "-cl -save-temps", e.g.: ./gpuowl -cl -save-temps
This passes "-save-temps" to the CL compiler, which should produce a dump of the GCN ISA. If that works, you should see a file like "_temp_1_Fiji.isa". Could you send that .isa file to me, to see if the reason for the perf degradation is in poor generated ISA code (like too many VGPRs used). thanks, Mihai |
![]() |
![]() |
![]() |
#18 | |
"Victor de Hollander"
Aug 2011
the Netherlands
32×131 Posts |
![]() Quote:
I tried with the default "-cl-fast-relaxed-math" and "-cl-opt-disable" change in line 89 in the clwrap.h but still got some errors. I forgot to mention, but my card is a HD7950, which only supports OpenCL1.2 https://en.wikipedia.org/wiki/Radeon_HD_7000_Series At least gpuOwl detects it is a Tahiti OpenCL 1.2 AMD-APP 2079.5 device :). Attached the _temp_0_Tahiti.cl that was created. (I had to rename it to .txt or else I couldn't upload to the forum. Code:
C:\msys64\home\gpuowl>gpuowl gpuOwL v0.1 GPU Lucas-Lehmer primality checker Tahiti - OpenCL 1.2 AMD-APP (2079.5) OpenCL compilation error -11, log: "C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 15: error: attributes may not appear here double2 _O mul(double2 u, double a, double b) { return (double2) { u.x * a - u .y * b, u.x * b + u.y * a}; } ^ "C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 16: error: attributes may not appear here double2 _O mul(double2 u, double2 v) { return mul(u, v.x, v.y); } ^ "C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 59: error: attributes may not appear here void _O shuffle(local double *lds, double2 *u, uint n, uint f) { ^ "C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 83: error: variable with automatic storage duration cannot be stored in the named address space local double lds[1024]; ^ "C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 86: error: identifier "lds" is undefined shuffle(lds, u, 4, 64); ^ "C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 102: error: variable with automatic storage duration cannot be stored in the named address space local double lds[2048]; ^ "C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 105: error: identifier "lds" is undefined shuffle(lds, u, 8, 32); ^ "C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 365: error: variable with automatic storage duration cannot be stored in the named address space local double lds[4096]; ^ "C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 372: error: identifier "lds" is undefined lds[l * 64 + (c + l) % 64] = ((double *)(u + i))[b]; ^ "C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 378: error: identifier "lds" is undefined ((double *)(u + i))[b] = lds[l * 64 + (c + l) % 64]; ^ 10 errors detected in the compilation of "C:\Users\Victor\AppData\Local\Temp\OCL 6952T1.cl". Frontend phase failed compilation. Code:
clinfo Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.0 AMD-APP (2079.5) Platform Name: AMD Accelerated Parallel Proces sing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offl ine_devices Platform Name: AMD Accelerated Parallel Proces sing Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Vendor ID: 1002h Board name: AMD Radeon HD 7900 Series Device Topology: PCI[ B#1, D#0, F#0 ] Max compute units: 28 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 4 Preferred vector width short: 2 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 4 Native vector width short: 2 Native vector width int: 1 Native vector width long: 1 Native vector width float: 1 Native vector width double: 1 Max clock frequency: 900Mhz Address bits: 32 Max memory allocation: 2214174021 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 2048 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 16384 Global memory size: 3221225472 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Max pipe arguments: 0 Max pipe active reservations: 0 Max pipe packet size: 0 Max global variable size: 0 Max global variable preferred total size: 0 Max read/write image args: 0 Max on device events: 0 Queue on device max size: 0 Max on device queues: 0 Queue on device preferred size: 0 SVM capabilities: Coarse grain buffer: No Fine grain buffer: No Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue on Host properties: Out-of-Order: No Profiling : Yes Queue on Device properties: Out-of-Order: No Profiling : No Platform ID: 000007FED5DF5188 Name: Tahiti Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 1.2 Driver version: 2079.5 (VM) Profile: FULL_PROFILE Version: OpenCL 1.2 AMD-APP (2079.5) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_ global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int3 2_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_ khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd _media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sha ring cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_e vent Device Type: CL_DEVICE_TYPE_CPU Vendor ID: 1002h Board name: Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 8 Preferred vector width double: 4 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 8 Native vector width double: 4 Max clock frequency: 3300Mhz Address bits: 64 Max memory allocation: 4289850368 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 64 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 32768 Global memory size: 17159401472 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Max pipe arguments: 16 Max pipe active reservations: 16 Max pipe packet size: 4289850368 Max global variable size: 1879048192 Max global variable preferred total size: 1879048192 Max read/write image args: 64 Max on device events: 0 Queue on device max size: 0 Max on device queues: 0 Queue on device preferred size: 0 SVM capabilities: Coarse grain buffer: No Fine grain buffer: No Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 1 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 310 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue on Host properties: Out-of-Order: No Profiling : Yes Queue on Device properties: Out-of-Order: No Profiling : No Platform ID: 000007FED5DF5188 Name: Intel(R) Core(TM) i5-250 0K CPU @ 3.30GHz Vendor: GenuineIntel Device OpenCL C version: OpenCL C 1.2 Driver version: 2079.5 (sse2,avx) Profile: FULL_PROFILE Version: OpenCL 1.2 AMD-APP (2079.5) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_ global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int3 2_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_ khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec 3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sh aring cl_khr_spir cl_khr_gl_event |
|
![]() |
![]() |
![]() |
#19 |
"Mihai Preda"
Apr 2015
5A416 Posts |
![]()
OK, I tried to fix these CL compilation errors as well (please retry).
OTOH I need to investigate a bug that appears to be present -- I would not recommend doing any serious LL with gpuOwL right now, I need to validate it a bit more first. |
![]() |
![]() |
![]() |
#20 |
"Mr. Meeseeks"
Jan 2012
California, USA
1000100000002 Posts |
![]()
Recompiled.. upon launch I got:
Code:
OpenCL compilation error -11, log: An invalid option was specified. |
![]() |
![]() |
![]() |
#21 |
"Mr. Meeseeks"
Jan 2012
California, USA
217610 Posts |
![]()
Very impressive! It actually is slightly faster on my low end HD7770(will try on my R9 285 with OCL 2.0 capability when I have time) and also with better error numbers.. also residues seem to be matching with clLucas.
|
![]() |
![]() |
![]() |
#22 |
"Kieren"
Jul 2011
In My Own Galaxy!
2×3×1,693 Posts |
![]()
If I may say, as a spectator, and a non coder, it amazes me to watch this birth process. The cooperation and involvement by several parties is impressive. Seeing this play out is one of the big pay-offs for hanging out on this forum.
Last fiddled with by kladner on 2017-04-23 at 08:10 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1719 | 2023-01-16 15:51 |
GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |