mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-11-04, 13:48   #166
aheeffer
 
Aug 2020

3710 Posts
Default

Different computer and a Vega 64 instead of a Radeon VII, same problem using v.7.2. Raising the fft size to 6.5M avoids the problem but running at 2730µ/iter instead of 2020µ.


Code:
2020-11-04 07:12:53 Rig02-RadeonVega64-02 109004201 OpenCL args "-DEXP=109004201u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DAMDGPU=1 -DCARRY64=1 -DCARRYM64=1 -DWEIGHT_STEP_MINUS_1=0x9.8841a10b5e2bp-4 -DIWEIGHT_STEP_MINUS_1=-0xb.f26a11911bbp-5  -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-11-04 07:12:53 Rig02-RadeonVega64-02 109004201 ASM compilation failed, retrying compilation using NO_ASM
2020-11-04 07:12:55 Rig02-RadeonVega64-02 109004201 OpenCL compilation in 2.25 s
2020-11-04 07:12:56 Rig02-RadeonVega64-02 109004201 maxAlloc: 6.5 GB
2020-11-04 07:12:56 Rig02-RadeonVega64-02 109004201 P1(5.5M) 7935851 bits
2020-11-04 07:12:56 Rig02-RadeonVega64-02 109004201 PRP starting from beginning
2020-11-04 07:12:56 Rig02-RadeonVega64-02 109004201 Acquired memory lock 'c:\gpuowl\pool\memlock-1'
2020-11-04 07:12:56 Rig02-RadeonVega64-02 109004201 P1(5.5M) using 258 buffers
2020-11-04 07:12:58 Rig02-RadeonVega64-02 109004201 [0] 36500ec1 != fffffffb
2020-11-04 07:12:58 Rig02-RadeonVega64-02 109004201 [1] 8cf00cca != 00000019
2020-11-04 07:12:58 Rig02-RadeonVega64-02 109004201 [2] 7aff4181 != ffffff83
2020-11-04 07:12:58 Rig02-RadeonVega64-02 109004201 [3] 3003737c != 00000271
2020-11-04 07:12:58 Rig02-RadeonVega64-02 109004201 [4] 7003a3e7 != fffff3cb
2020-11-04 07:12:58 Rig02-RadeonVega64-02 109004201 [5] 7bae3f3f != 00003d09
2020-11-04 07:12:59 Rig02-RadeonVega64-02 109004201 [6] f648cd5e != fffeced3
2020-11-04 07:12:59 Rig02-RadeonVega64-02 109004201 [7] 290e228d != 0005f5e1
2020-11-04 07:12:59 Rig02-RadeonVega64-02 109004201 [8] 89e2769b != ffe2329b
2020-11-04 07:12:59 Rig02-RadeonVega64-02 109004201 [9] 628fd07c != 009502f9
2020-11-04 07:12:59 Rig02-RadeonVega64-02 109004201 [10] 4518a126 != fd16f123
2020-11-04 07:12:59 Rig02-RadeonVega64-02 109004201 [11] 8ea0fa4a != 0e8d4a51
2020-11-04 07:12:59 Rig02-RadeonVega64-02 109004201 [12] a9f40f61 != b73d8c6b
2020-11-04 07:12:59 Rig02-RadeonVega64-02 109004201 [13] 7fd856fb != 6bcc41e9
2020-11-04 07:12:59 Rig02-RadeonVega64-02 109004201 [14] 78e2a243 != e502b673
2020-11-04 07:12:59 Rig02-RadeonVega64-02 109004201 [15] 14740e82 != 86f26fc1
2020-11-04 07:12:59 Rig02-RadeonVega64-02 109004201 [16] 9d46583b != 5d43d13b
2020-11-04 07:12:59 Rig02-RadeonVega64-02 109004201 [17] 7daf7a00 != 2dace9d9
2020-11-04 07:12:59 Rig02-RadeonVega64-02 109004201 [18] 9a50f044 != 1b9f6ec3
2020-11-04 07:12:59 Rig02-RadeonVega64-02 109004201 [19] 8b0dcfc9 != 75e2d631
2020-11-04 07:12:59 Rig02-RadeonVega64-02 109004201 fold() does not roundtrip
2020-11-04 07:12:59 Rig02-RadeonVega64-02 109004201 P1(5.5M) releasing 258 buffers
2020-11-04 07:12:59 Rig02-RadeonVega64-02 109004201 Released memory lock 'c:\gpuowl\pool\memlock-1'
2020-11-04 07:12:59 Rig02-RadeonVega64-02 Exiting because "fold roundtrip"
2020-11-04 07:12:59 Rig02-RadeonVega64-02 Bye
aheeffer is offline   Reply With Quote
Old 2020-11-04, 20:28   #167
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2×11×61 Posts
Default

Quote:
Originally Posted by aheeffer View Post
Different computer and a Vega 64 instead of a Radeon VII, same problem using v.7.2. Raising the fft size to 6.5M avoids the problem but running at 2730µ/iter instead of 2020µ.
You did not include the version you're running. I see in the opencl args that it uses -cl-unsafe-math-optimizations . This was dropped a few versions back (now you need to run with -unsafeMath to get that). I propose you try with a more recent version, just to rule that as a factor out.
preda is offline   Reply With Quote
Old 2020-11-04, 21:19   #168
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

3×127 Posts
Default

I am almost sure he mentioned it... v7.2

It is in the text above the gpuOwl output.
Viliam Furik is offline   Reply With Quote
Old 2020-11-04, 22:03   #169
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

53E16 Posts
Default

Quote:
Originally Posted by aheeffer View Post
Different computer and a Vega 64 instead of a Radeon VII, same problem using v.7.2. Raising the fft size to 6.5M avoids the problem but running at 2730µ/iter instead of 2020µ.
So are you running with -unsafeMath -- why? please try without that. Don't use -unsafeMath unless you have a good reason for it.

Last fiddled with by preda on 2020-11-04 at 22:22 Reason: update quote
preda is offline   Reply With Quote
Old 2020-11-04, 22:11   #170
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·11·61 Posts
Default

Quote:
Originally Posted by Viliam Furik View Post
I am almost sure he mentioned it... v7.2

It is in the text above the gpuOwl output.
The version is this info that is printed by GpuOwl on every run, including "-h". It allows to track down exactly which changes are included and which are not.

GpuOwl VERSION v7.2-13-g266aed4

The v7.2 is a shortcut which gives a very approximate indication of which features are present; not so useful for bug reproduction.
preda is offline   Reply With Quote
Old 2020-11-06, 09:27   #171
aheeffer
 
Aug 2020

37 Posts
Default

Quote:
Originally Posted by preda View Post
You did not include the version you're running. I see in the opencl args that it uses -cl-unsafe-math-optimizations . This was dropped a few versions back (now you need to run with -unsafeMath to get that). I propose you try with a more recent version, just to rule that as a factor out.
I updated gpuowl-win to 7.2.13 and it is running fine with the default options now at the expected iteration times. Thanks.
aheeffer is offline   Reply With Quote
Old 2020-11-12, 17:37   #172
mrh
 
"mrh"
Oct 2018
Temecula, ca

2×3×11 Posts
Default rocm version?

What version of rocm is recommended for 7.x? I'm using 2.10.0 and not able to compile, so I'm guessing it is time to upgrade?
mrh is offline   Reply With Quote
Old 2020-11-12, 21:54   #173
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·11·61 Posts
Default

Quote:
Originally Posted by mrh View Post
What version of rocm is recommended for 7.x? I'm using 2.10.0 and not able to compile, so I'm guessing it is time to upgrade?
I think ROCm 3.3 is good. Also any after 3.5 should work fine, but maybe slower than 3.3. You can try, now it's easier to install multiple versions of ROCm OpenCL in parallel (at the same time) and choose which one is used with LD_LIBRARY_PATH
preda is offline   Reply With Quote
Old 2020-11-13, 00:39   #174
mrh
 
"mrh"
Oct 2018
Temecula, ca

10000102 Posts
Default

Quote:
Originally Posted by preda View Post
I think ROCm 3.3 is good. Also any after 3.5 should work fine, but maybe slower than 3.3. You can try, now it's easier to install multiple versions of ROCm OpenCL in parallel (at the same time) and choose which one is used with LD_LIBRARY_PATH
Thanks! I had to make a two simple changes to make it compile, I must have a different g++.

Code:
diff --git a/Gpu.cpp b/Gpu.cpp
index 9e5f09a..3e6739e 100644
--- a/Gpu.cpp
+++ b/Gpu.cpp
@@ -24,6 +24,7 @@
 #include <numeric>
 #include <bitset>
 #include <limits>
+#include <iomanip>
 
 #ifndef M_PIl
 #define M_PIl 3.141592653589793238462643383279502884L
diff --git a/Pm1Plan.cpp b/Pm1Plan.cpp
index fa84b43..afdf461 100644
--- a/Pm1Plan.cpp
+++ b/Pm1Plan.cpp
@@ -41,7 +41,7 @@ u32 reduce(u32 B1, u32 pos) {
   return pos;
 }
 
-constexpr u32 firstMissingFactor(u32 D) {
+u32 firstMissingFactor(u32 D) {
   switch (D) {
   case 210:
   case 420:
mrh is offline   Reply With Quote
Old 2020-11-22, 01:36   #175
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

40248 Posts
Default

Using nVidia OpenCL, I also need to add

#include <iomanip>

to Gpu.cpp to get it to compile.
frmky is offline   Reply With Quote
Old 2020-11-22, 08:44   #176
Viliam Furik
 
"Viliam Furík"
Jul 2018
Martin, Slovakia

3·127 Posts
Default Big slowdown

I have downloaded the version 7.2-13-g266aed4, and when I run a 108M test, it runs at 1250 us/it. The same test runs at 920 us/it when using v6.11-380-g79ea0cc.

I have also noticed it's not saying anything about the duration of the GEC, but I hope it is doing it.

While writing, I have noticed it looks like it's doing P-1 on a different FFT size, is that possible?

Code:
2020-11-22 09:39:53 GpuOwl VERSION v7.2-13-g266aed4
2020-11-22 09:39:53 config: -device 1
2020-11-22 09:39:53 config: -proof 8
2020-11-22 09:39:53 config: -nospin
2020-11-22 09:39:53 device 1, unique id ''
2020-11-22 09:39:53 gfx906-1 108850051 FFT: 6M 1K:12:256 (17.30 bpw)
2020-11-22 09:39:53 gfx906-1 108850051 OpenCL args "-DEXP=108850051u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0.62309825525553619 -DIWEIGHT_STEP_MINUS_1=-0.3838943534305243 -DIWEIGHTS={0,-0.3838943534305243,-0.24082766453041662,-0.064539274795706897,-0.42365736505765839,-0.28982409650658664,-0.12491323160025802,-0.46085410075068395,-0.33565833429543745,-0.18139069701609609,-0.49565018609731404,-0.37853446361658188,-0.23422314777169634,-0.056401114659886273,-0.41864339864529271,-0.28364583046985026,}  -cl-std=CL2.0 -cl-finite-math-only "
2020-11-22 09:39:54 gfx906-1 108850051 ASM compilation failed, retrying compilation using NO_ASM
2020-11-22 09:39:58 gfx906-1 108850051 OpenCL compilation in 4.79 s
2020-11-22 09:39:58 gfx906-1 108850051 maxAlloc: 0.0 GB
2020-11-22 09:39:58 gfx906-1 108850051 You should use -maxAlloc if your GPU has more than 4GB memory. See help '-h'
2020-11-22 09:39:58 gfx906-1 108850051 P1(5.5M) 7935851 bits
2020-11-22 09:39:58 gfx906-1 108850051 PRP starting from beginning
2020-11-22 09:39:59 gfx906-1 108850051 Acquired memory lock 'memlock-1'
2020-11-22 09:39:59 gfx906-1 108850051 P1(5.5M) using 112 buffers
2020-11-22 09:40:02 gfx906-1 108850051 OK         0 on-load: blockSize 400, 0000000000000003
2020-11-22 09:40:02 gfx906-1 108850051 validating proof residues for power 8
2020-11-22 09:40:02 gfx906-1 108850051 Proof using power 8
2020-11-22 09:40:15 gfx906-1 108850051        10000   0.01% a834a715c12eb82f 1248 us/it
2020-11-22 09:40:27 gfx906-1 108850051        20000   0.02% 399d28f60cdc9b8e 1251 us/it
2020-11-22 09:40:35 gfx906-1 108850051 Stopping, please wait..
2020-11-22 09:40:36 gfx906-1 108850051 OK     26000   0.02% 64247ab7a49860c3 1251 us/it + check 0.51s + save 0.75s; ETA 1d 13:48 | P1(5.5M) 0.3% ETA 02:45 5b28693878aaf77c
2020-11-22 09:40:36 gfx906-1 108850051 P1(5.5M) releasing 112 buffers
2020-11-22 09:40:36 gfx906-1 108850051 Released memory lock 'memlock-1'
2020-11-22 09:40:36 gfx906-1 Exiting because "stop requested"
2020-11-22 09:40:36 gfx906-1 Bye
Viliam Furik is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
GpuOwl PRP-Proof changes preda GpuOwl 20 2020-10-17 06:51
gpuowl: runtime error SELROC GpuOwl 59 2020-10-02 03:56
gpuOWL for Wagstaff GP2 GpuOwl 22 2020-06-13 16:57
gpuowl tuning M344587487 GpuOwl 14 2018-12-29 08:11
How to interface gpuOwl with PrimeNet preda PrimeNet 2 2017-10-07 21:32

All times are UTC. The time now is 04:59.

Mon Mar 1 04:59:27 UTC 2021 up 88 days, 1:10, 0 users, load averages: 1.64, 1.69, 1.70

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.