mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2017-04-20, 01:08   #12
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·347 Posts
Default

Something is definitely unexpected with the performance you see. I'm using AMDGPU-Pro as OpenCL compiler (either most recent, 17.10, or prev 16.xx, were producing similar performance). What OpenCL driver do you use? is the GPU in a good PCIe slot (no transfer bottleneck)?

I would dump the .isa compiled kernels and look there for diffs (you can pass "-save-temps" in clwrap.h, it's there in a comment, and send me the .isa). Or I'll add an option to enable that as an argument.
preda is offline   Reply With Quote
Old 2017-04-20, 03:22   #13
tului
 
Jan 2013

2×3×11 Posts
Default

Sam Harris would approve of the name. They're perfectly good after all.
tului is offline   Reply With Quote
Old 2017-04-22, 01:08   #14
kracker
ἀβουλία
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

2×13×83 Posts
Default

Been tinkering around with it on windows.. got this after compiling:

Code:
gpuOwL v0.1 GPU Lucas-Lehmer primality checker
LL of 76000021 at iteration 0
FFT 1024*2048 (4M words, 18.12 bits per word)
log An invalid option was specified.

error -11
Assertion failed!
Program: C:\Users\Back\Desktop\gpuowl-0832c6d\gpuowl-0832c6d.exe
File: clwrap.h, Line 66

Expression: false

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
Also, it might be better for the program to accept any assignment ID, I tried N/A, and also some random gibberish and it didn't accept it. Probably needed a minimum number of characters for it to accept it.

EDIT: Pulled and recompiled from the latest commit..

Code:
OpenCL Compilation error -11, log:
An invalid option was specified.

Last fiddled with by kracker on 2017-04-22 at 01:24
kracker is offline   Reply With Quote
Old 2017-04-22, 09:10   #15
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

23×3×72 Posts
Default

Quote:
Originally Posted by kracker View Post
Been tinkering around with it on windows.. got this after compiling:

Code:
gpuOwL v0.1 GPU Lucas-Lehmer primality checker
LL of 76000021 at iteration 0
FFT 1024*2048 (4M words, 18.12 bits per word)
log An invalid option was specified.

error -11
Assertion failed!
Program: C:\Users\Back\Desktop\gpuowl-0832c6d\gpuowl-0832c6d.exe
File: clwrap.h, Line 66

Expression: false

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
Also, it might be better for the program to accept any assignment ID, I tried N/A, and also some random gibberish and it didn't accept it. Probably needed a minimum number of characters for it to accept it.

EDIT: Pulled and recompiled from the latest commit..

Code:
OpenCL Compilation error -11, log:
An invalid option was specified.
Same here, compiles without errors on mingw64, but also
error -11
Assertion failed!
clwrap.h, Line 66
Attached Thumbnails
Click image for larger version

Name:	error.png
Views:	168
Size:	55.6 KB
ID:	15964  
VictordeHolland is offline   Reply With Quote
Old 2017-04-22, 11:37   #16
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·347 Posts
Default

Kracker, Victor: thanks for trying the compile! As I only compiled with one CL implementation myself, I was not aware of these problems.

Now, would you try again with a fresh checked-out version, I simplified the CL options.

If it still doesn't pass the CL compiler, you can try deleting -cl-fast-relaxed-math -cl-std=CL2.0 from clwrap.h line 89 (but hopefully you won't have to do this).

Concerning the AID in worktodo.txt, try a single 0, like this:
Test=0,24036583,75,1

Otherwise, what should pass in any case is 32 hex-digits, like 00000.. repeated 32 times.
But Test=N/A would not pass because N/A is not valid hexadecimal digits.

- Mihai
preda is offline   Reply With Quote
Old 2017-04-22, 11:44   #17
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

100000100012 Posts
Default

@airsquirrels: could you run a fresh checked-out version with command line args "-cl -save-temps", e.g.: ./gpuowl -cl -save-temps

This passes "-save-temps" to the CL compiler, which should produce a dump of the GCN ISA. If that works, you should see a file like "_temp_1_Fiji.isa". Could you send that .isa file to me, to see if the reason for the perf degradation is in poor generated ISA code (like too many VGPRs used).

thanks,
Mihai
preda is offline   Reply With Quote
Old 2017-04-22, 13:10   #18
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

23·3·72 Posts
Default

Quote:
Originally Posted by preda View Post
Kracker, Victor: thanks for trying the compile! As I only compiled with one CL implementation myself, I was not aware of these problems.

Now, would you try again with a fresh checked-out version, I simplified the CL options.

If it still doesn't pass the CL compiler, you can try deleting -cl-fast-relaxed-math -cl-std=CL2.0 from clwrap.h line 89 (but hopefully you won't have to do this).

Concerning the AID in worktodo.txt, try a single 0, like this:
Test=0,24036583,75,1

Otherwise, what should pass in any case is 32 hex-digits, like 00000.. repeated 32 times.
But Test=N/A would not pass because N/A is not valid hexadecimal digits.

- Mihai
Hi,

I tried with the default "-cl-fast-relaxed-math" and "-cl-opt-disable" change in line 89 in the clwrap.h but still got some errors.

I forgot to mention, but my card is a HD7950, which only supports OpenCL1.2
https://en.wikipedia.org/wiki/Radeon_HD_7000_Series

At least gpuOwl detects it is a Tahiti OpenCL 1.2 AMD-APP 2079.5 device :).

Attached the _temp_0_Tahiti.cl that was created. (I had to rename it to .txt or else I couldn't upload to the forum.

Code:
C:\msys64\home\gpuowl>gpuowl
gpuOwL v0.1 GPU Lucas-Lehmer primality checker
Tahiti - OpenCL 1.2 AMD-APP (2079.5)
OpenCL compilation error -11, log:
"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 15: error: attributes
          may not appear here
  double2 _O mul(double2 u, double a, double b) { return (double2) { u.x * a - u
.y * b, u.x * b + u.y * a}; }
          ^

"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 16: error: attributes
          may not appear here
  double2 _O mul(double2 u, double2 v) { return mul(u, v.x, v.y); }
          ^

"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 59: error: attributes
          may not appear here
  void _O shuffle(local double *lds, double2 *u, uint n, uint f) {
       ^

"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 83: error: variable
          with automatic storage duration cannot be stored in the named
          address space
    local double lds[1024];
                 ^

"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 86: error: identifier
          "lds" is undefined
    shuffle(lds,   u, 4, 64);
            ^

"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 102: error: variable
          with automatic storage duration cannot be stored in the named
          address space
    local double lds[2048];
                 ^

"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 105: error: identifier
          "lds" is undefined
    shuffle(lds,   u, 8, 32);
            ^

"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 365: error: variable
          with automatic storage duration cannot be stored in the named
          address space
    local double lds[4096];
                 ^

"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 372: error: identifier
          "lds" is undefined
        lds[l * 64 + (c + l) % 64] = ((double *)(u + i))[b];
        ^

"C:\Users\Victor\AppData\Local\Temp\OCL6952T1.cl", line 378: error: identifier
          "lds" is undefined
        ((double *)(u + i))[b] = lds[l * 64 + (c + l) % 64];
                                 ^

10 errors detected in the compilation of "C:\Users\Victor\AppData\Local\Temp\OCL
6952T1.cl".
Frontend phase failed compilation.
Here is what I get from the clinfo command
Code:
clinfo
Number of platforms:                             1
  Platform Profile:                              FULL_PROFILE
  Platform Version:                              OpenCL 2.0 AMD-APP (2079.5)
  Platform Name:                                 AMD Accelerated Parallel Proces
sing
  Platform Vendor:                               Advanced Micro Devices, Inc.
  Platform Extensions:                           cl_khr_icd cl_khr_d3d10_sharing
 cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offl
ine_devices


  Platform Name:                                 AMD Accelerated Parallel Proces
sing
Number of devices:                               2
  Device Type:                                   CL_DEVICE_TYPE_GPU
  Vendor ID:                                     1002h
  Board name:                                    AMD Radeon HD 7900 Series
  Device Topology:                               PCI[ B#1, D#0, F#0 ]
  Max compute units:                             28
  Max work items dimensions:                     3
    Max work items[0]:                           256
    Max work items[1]:                           256
    Max work items[2]:                           256
  Max work group size:                           256
  Preferred vector width char:                   4
  Preferred vector width short:                  2
  Preferred vector width int:                    1
  Preferred vector width long:                   1
  Preferred vector width float:                  1
  Preferred vector width double:                 1
  Native vector width char:                      4
  Native vector width short:                     2
  Native vector width int:                       1
  Native vector width long:                      1
  Native vector width float:                     1
  Native vector width double:                    1
  Max clock frequency:                           900Mhz
  Address bits:                                  32
  Max memory allocation:                         2214174021
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          8
  Max image 2D width:                            16384
  Max image 2D height:                           16384
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    16
  Max size of kernel argument:                   1024
  Alignment (bits) of base address:              2048
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     No
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    16384
  Global memory size:                            3221225472
  Constant buffer size:                          65536
  Max number of constant args:                   8
  Local memory type:                             Scratchpad
  Local memory size:                             32768
  Max pipe arguments:                            0
  Max pipe active reservations:                  0
  Max pipe packet size:                          0
  Max global variable size:                      0
  Max global variable preferred total size:      0
  Max read/write image args:                     0
  Max on device events:                          0
  Queue on device max size:                      0
  Max on device queues:                          0
  Queue on device preferred size:                0
  SVM capabilities:
    Coarse grain buffer:                         No
    Fine grain buffer:                           No
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     64
  Error correction support:                      0
  Unified memory for Host and Device:            0
  Profiling timer resolution:                    1
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:
    Execute OpenCL kernels:                      Yes
    Execute native function:                     No
  Queue on Host properties:
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:
    Out-of-Order:                                No
    Profiling :                                  No
  Platform ID:                                   000007FED5DF5188
  Name:                                          Tahiti
  Vendor:                                        Advanced Micro Devices, Inc.
  Device OpenCL C version:                       OpenCL C 1.2
  Driver version:                                2079.5 (VM)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 1.2 AMD-APP (2079.5)
  Extensions:                                    cl_khr_fp64 cl_amd_fp64 cl_khr_
global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int3
2_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_
khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store
cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd
_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sharing cl_khr_d3d11_sha
ring cl_khr_dx9_media_sharing cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_e
vent


  Device Type:                                   CL_DEVICE_TYPE_CPU
  Vendor ID:                                     1002h
  Board name:
  Max compute units:                             4
  Max work items dimensions:                     3
    Max work items[0]:                           1024
    Max work items[1]:                           1024
    Max work items[2]:                           1024
  Max work group size:                           1024
  Preferred vector width char:                   16
  Preferred vector width short:                  8
  Preferred vector width int:                    4
  Preferred vector width long:                   2
  Preferred vector width float:                  8
  Preferred vector width double:                 4
  Native vector width char:                      16
  Native vector width short:                     8
  Native vector width int:                       4
  Native vector width long:                      2
  Native vector width float:                     8
  Native vector width double:                    4
  Max clock frequency:                           3300Mhz
  Address bits:                                  64
  Max memory allocation:                         4289850368
  Image support:                                 Yes
  Max number of images read arguments:           128
  Max number of images write arguments:          64
  Max image 2D width:                            8192
  Max image 2D height:                           8192
  Max image 3D width:                            2048
  Max image 3D height:                           2048
  Max image 3D depth:                            2048
  Max samplers within kernel:                    16
  Max size of kernel argument:                   4096
  Alignment (bits) of base address:              1024
  Minimum alignment (bytes) for any datatype:    128
  Single precision floating point capability
    Denorms:                                     Yes
    Quiet NaNs:                                  Yes
    Round to nearest even:                       Yes
    Round to zero:                               Yes
    Round to +ve and infinity:                   Yes
    IEEE754-2008 fused multiply-add:             Yes
  Cache type:                                    Read/Write
  Cache line size:                               64
  Cache size:                                    32768
  Global memory size:                            17159401472
  Constant buffer size:                          65536
  Max number of constant args:                   8
  Local memory type:                             Global
  Local memory size:                             32768
  Max pipe arguments:                            16
  Max pipe active reservations:                  16
  Max pipe packet size:                          4289850368
  Max global variable size:                      1879048192
  Max global variable preferred total size:      1879048192
  Max read/write image args:                     64
  Max on device events:                          0
  Queue on device max size:                      0
  Max on device queues:                          0
  Queue on device preferred size:                0
  SVM capabilities:
    Coarse grain buffer:                         No
    Fine grain buffer:                           No
    Fine grain system:                           No
    Atomics:                                     No
  Preferred platform atomic alignment:           0
  Preferred global atomic alignment:             0
  Preferred local atomic alignment:              0
  Kernel Preferred work group size multiple:     1
  Error correction support:                      0
  Unified memory for Host and Device:            1
  Profiling timer resolution:                    310
  Device endianess:                              Little
  Available:                                     Yes
  Compiler available:                            Yes
  Execution capabilities:
    Execute OpenCL kernels:                      Yes
    Execute native function:                     Yes
  Queue on Host properties:
    Out-of-Order:                                No
    Profiling :                                  Yes
  Queue on Device properties:
    Out-of-Order:                                No
    Profiling :                                  No
  Platform ID:                                   000007FED5DF5188
  Name:                                                 Intel(R) Core(TM) i5-250
0K CPU @ 3.30GHz
  Vendor:                                        GenuineIntel
  Device OpenCL C version:                       OpenCL C 1.2
  Driver version:                                2079.5 (sse2,avx)
  Profile:                                       FULL_PROFILE
  Version:                                       OpenCL 1.2 AMD-APP (2079.5)
  Extensions:                                    cl_khr_fp64 cl_amd_fp64 cl_khr_
global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int3
2_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_
khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store
cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec
3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_d3d10_sh
aring cl_khr_spir cl_khr_gl_event
Attached Files
File Type: txt _temp_0_Tahiti.txt (12.0 KB, 161 views)
VictordeHolland is offline   Reply With Quote
Old 2017-04-23, 00:17   #19
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

41116 Posts
Default

OK, I tried to fix these CL compilation errors as well (please retry).

OTOH I need to investigate a bug that appears to be present -- I would not recommend doing any serious LL with gpuOwL right now, I need to validate it a bit more first.
preda is offline   Reply With Quote
Old 2017-04-23, 01:07   #20
kracker
ἀβουλία
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

2×13×83 Posts
Default

Recompiled.. upon launch I got:
Code:
OpenCL compilation error -11, log:
An invalid option was specified.
I removed -cl-std=CL2.0 from clwrap.h and it appears to be working... will see how it goes.
kracker is offline   Reply With Quote
Old 2017-04-23, 01:55   #21
kracker
ἀβουλία
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

2×13×83 Posts
Default

Very impressive! It actually is slightly faster on my low end HD7770(will try on my R9 285 with OCL 2.0 capability when I have time) and also with better error numbers.. also residues seem to be matching with clLucas.
Attached Thumbnails
Click image for larger version

Name:	Untitled.png
Views:	215
Size:	20.8 KB
ID:	15970  
kracker is offline   Reply With Quote
Old 2017-04-23, 08:06   #22
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

33·5·73 Posts
Default

If I may say, as a spectator, and a non coder, it amazes me to watch this birth process. The cooperation and involvement by several parties is impressive. Seeing this play out is one of the big pay-offs for hanging out on this forum.

Last fiddled with by kladner on 2017-04-23 at 08:10
kladner is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1616 2020-05-31 16:46
GPUOWL AMD Windows OpenCL issues xx005fs GPU Computing 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 05:40.

Sat Jun 6 05:40:00 UTC 2020 up 73 days, 3:13, 0 users, load averages: 1.34, 1.14, 1.10

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.