![]() |
|
|
#12 |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
Something is definitely unexpected with the performance you see. I'm using AMDGPU-Pro as OpenCL compiler (either most recent, 17.10, or prev 16.xx, were producing similar performance). What OpenCL driver do you use? is the GPU in a good PCIe slot (no transfer bottleneck)?
I would dump the .isa compiled kernels and look there for diffs (you can pass "-save-temps" in clwrap.h, it's there in a comment, and send me the .isa). Or I'll add an option to enable that as an argument. |
|
|
|
|
|
#13 |
|
Jan 2013
22·17 Posts |
Sam Harris would approve of the name. They're perfectly good after all.
|
|
|
|
|
|
#14 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Been tinkering around with it on windows.. got this after compiling:
Code:
gpuOwL v0.1 GPU Lucas-Lehmer primality checker LL of 76000021 at iteration 0 FFT 1024*2048 (4M words, 18.12 bits per word) log An invalid option was specified. error -11 Assertion failed! Program: C:\Users\Back\Desktop\gpuowl-0832c6d\gpuowl-0832c6d.exe File: clwrap.h, Line 66 Expression: false This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. EDIT: Pulled and recompiled from the latest commit.. Code:
OpenCL Compilation error -11, log: An invalid option was specified. Last fiddled with by kracker on 2017-04-22 at 01:24 |
|
|
|
|
|
#15 | |
|
"Victor de Hollander"
Aug 2011
the Netherlands
23·3·72 Posts |
Quote:
error -11 Assertion failed! clwrap.h, Line 66 |
|
|
|
|
|
|
#16 |
|
"Mihai Preda"
Apr 2015
55B16 Posts |
Kracker, Victor: thanks for trying the compile! As I only compiled with one CL implementation myself, I was not aware of these problems.
Now, would you try again with a fresh checked-out version, I simplified the CL options. If it still doesn't pass the CL compiler, you can try deleting -cl-fast-relaxed-math -cl-std=CL2.0 from clwrap.h line 89 (but hopefully you won't have to do this). Concerning the AID in worktodo.txt, try a single 0, like this: Test=0,24036583,75,1 Otherwise, what should pass in any case is 32 hex-digits, like 00000.. repeated 32 times. But Test=N/A would not pass because N/A is not valid hexadecimal digits. - Mihai |
|
|
|
|
|
#17 |
|
"Mihai Preda"
Apr 2015
3·457 Posts |
@airsquirrels: could you run a fresh checked-out version with command line args "-cl -save-temps", e.g.: ./gpuowl -cl -save-temps
This passes "-save-temps" to the CL compiler, which should produce a dump of the GCN ISA. If that works, you should see a file like "_temp_1_Fiji.isa". Could you send that .isa file to me, to see if the reason for the perf degradation is in poor generated ISA code (like too many VGPRs used). thanks, Mihai |
|
|
|
|
|
#18 | |
|
"Victor de Hollander"
Aug 2011
the Netherlands
22308 Posts |
Quote:
I tried with the default "-cl-fast-relaxed-math" and "-cl-opt-disable" change in line 89 in the clwrap.h but still got some errors. I forgot to mention, but my card is a HD7950, which only supports OpenCL1.2 https://en.wikipedia.org/wiki/Radeon_HD_7000_Series At least gpuOwl detects it is a Tahiti OpenCL 1.2 AMD-APP 2079.5 device :). Attached the _temp_0_Tahiti.cl that was created. (I had to rename it to .txt or else I couldn't upload to the forum. Code:
C:\msys64\home\gpuowl>gpuowl
gpuOwL v0.1 GPU Lucas-Lehmer primality checker
Tahiti - OpenCL 1.2 AMD-APP (2079.5)
OpenCL compilation error -11, log:
"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 15: error: attributes
may not appear here
double2 _O mul(double2 u, double a, double b) { return (double2) { u.x * a - u
.y * b, u.x * b + u.y * a}; }
^
"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 16: error: attributes
may not appear here
double2 _O mul(double2 u, double2 v) { return mul(u, v.x, v.y); }
^
"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 59: error: attributes
may not appear here
void _O shuffle(local double *lds, double2 *u, uint n, uint f) {
^
"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 83: error: variable
with automatic storage duration cannot be stored in the named
address space
local double lds[1024];
^
"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 86: error: identifier
"lds" is undefined
shuffle(lds, u, 4, 64);
^
"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 102: error: variable
with automatic storage duration cannot be stored in the named
address space
local double lds[2048];
^
"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 105: error: identifier
"lds" is undefined
shuffle(lds, u, 8, 32);
^
"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 365: error: variable
with automatic storage duration cannot be stored in the named
address space
local double lds[4096];
^
"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 372: error: identifier
"lds" is undefined
lds[l * 64 + (c + l) % 64] = ((double *)(u + i))[b];
^
"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 378: error: identifier
"lds" is undefined
((double *)(u + i))[b] = lds[l * 64 + (c + l) % 64];
^
10 errors detected in the compilation of "C:\Users\Victor\AppData\Local\Temp\OCL
6952T1.cl".
Frontend phase failed compilation.
Code:
clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (2079.5)
Platform Name: AMD Accelerated Parallel Proces
sing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_khr_d3d10_sharing
cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offl
ine_devices
Platform Name: AMD Accelerated Parallel Proces
sing
Number of devices: 2
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: AMD Radeon HD 7900 Series
Device Topology: PCI[ B#1, D#0, F#0 ]
Max compute units: 28
Max work items dimensions: 3
Max work items[0]: 256
Max work items[1]: 256
Max work items[2]: 256
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 900Mhz
Address bits: 32
Max memory allocation: 2214174021
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 2048
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 3221225472
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Max pipe arguments: 0
Max pipe active reservations: 0
Max pipe packet size: 0
Max global variable size: 0
Max global variable preferred total size: 0
Max read/write image args: 0
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 000007FED5DF5188
Name: Tahiti
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 1.2
Driver version: 2079.5 (VM)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (2079.5)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_
global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int3
2_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_
khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store
cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd
_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sha
ring cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_e
vent
Device Type: CL_DEVICE_TYPE_CPU
Vendor ID: 1002h
Board name:
Max compute units: 4
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 8
Preferred vector width double: 4
Native vector width char: 16
Native vector width short: 8
Native vector width int: 4
Native vector width long: 2
Native vector width float: 8
Native vector width double: 4
Max clock frequency: 3300Mhz
Address bits: 64
Max memory allocation: 4289850368
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 64
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 32768
Global memory size: 17159401472
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 4289850368
Max global variable size: 1879048192
Max global variable preferred total size: 1879048192
Max read/write image args: 64
Max on device events: 0
Queue on device max size: 0
Max on device queues: 0
Queue on device preferred size: 0
SVM capabilities:
Coarse grain buffer: No
Fine grain buffer: No
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 1
Error correction support: 0
Unified memory for Host and Device: 1
Profiling timer resolution: 310
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: No
Profiling : No
Platform ID: 000007FED5DF5188
Name: Intel(R) Core(TM) i5-250
0K CPU @ 3.30GHz
Vendor: GenuineIntel
Device OpenCL C version: OpenCL C 1.2
Driver version: 2079.5 (sse2,avx)
Profile: FULL_PROFILE
Version: OpenCL 1.2 AMD-APP (2079.5)
Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_
global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int3
2_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_
khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store
cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec
3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sh
aring cl_khr_spir cl_khr_gl_event
|
|
|
|
|
|
|
#19 |
|
"Mihai Preda"
Apr 2015
101010110112 Posts |
OK, I tried to fix these CL compilation errors as well (please retry).
OTOH I need to investigate a bug that appears to be present -- I would not recommend doing any serious LL with gpuOwL right now, I need to validate it a bit more first. |
|
|
|
|
|
#20 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
Recompiled.. upon launch I got:
Code:
OpenCL compilation error -11, log: An invalid option was specified. |
|
|
|
|
|
#21 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23×271 Posts |
Very impressive! It actually is slightly faster on my low end HD7770(will try on my R9 285 with OCL 2.0 capability when I have time) and also with better error numbers.. also residues seem to be matching with clLucas.
|
|
|
|
|
|
#22 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
2·3·1,693 Posts |
If I may say, as a spectator, and a non coder, it amazes me to watch this birth process. The cooperation and involvement by several parties is impressive. Seeing this play out is one of the big pay-offs for hanging out on this forum.
Last fiddled with by kladner on 2017-04-23 at 08:10 |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| GPUOWL AMD Windows OpenCL issues | xx005fs | GpuOwl | 0 | 2019-07-26 21:37 |
| Testing an expression for primality | 1260 | Software | 17 | 2015-08-28 01:35 |
| Testing Mersenne cofactors for primality? | CRGreathouse | Computer Science & Computational Number Theory | 18 | 2013-06-08 19:12 |
| Primality-testing program with multiple types of moduli (PFGW-related) | Unregistered | Information & Answers | 4 | 2006-10-04 22:38 |