mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-11-01, 19:25   #155
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

22·11·47 Posts
Default

Can we upgrade in the middle of a PRP run with 7.1?
frmky is offline   Reply With Quote
Old 2020-11-01, 20:41   #156
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

17·79 Posts
Default

Quote:
Originally Posted by frmky View Post
Can we upgrade in the middle of a PRP run with 7.1?
Yes.
preda is offline   Reply With Quote
Old 2020-11-02, 00:50   #157
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

81416 Posts
Default

Just tried to update to 7.2 and got the following error
Code:
Gpu.cpp:37:15: error: static_assert expression is not an integral constant expression
static_assert(sinl(M_PI) != sin(M_PI));
              ^~~~~~~~~~~~~~~~~~~~~~~
frmky is offline   Reply With Quote
Old 2020-11-03, 03:06   #158
Ethan (EO)
 
Ethan (EO)'s Avatar
 
"Ethan O'Connor"
Oct 2002
GIMPS since Jan 1996

22·23 Posts
Default

I think commit 0ce8b7cf728cb800e95f896ec8974608f882a248 (Proof verify from temporary proof file) broke proof file handling on Windows.

I'm not certain, but I believe file rename operations can't cross directory boundaries on Windows, so

Code:
gpu.cpp:

1449     fs::rename(tmpFile, proofFile);
is failing. Going back one commit to before the proof file verification changes makes everything work.


The change below gets past the rename:
Code:
--- a/Gpu.cpp
+++ b/Gpu.cpp
@@ -1448,7 +1448,8 @@ fs::path Gpu::saveProof(const Args& args, const ProofSet& proofSet) {
     if (ok) {
       error_code noThrow;
       fs::remove(proofFile, noThrow);
-      fs::rename(tmpFile, proofFile);
+      fs::copy(tmpFile, proofFile);
+      fs::remove(tmpFile, noThrow);
       log("Proof '%s' generated\n", proofFile.string().c_str());
       return proofFile;
     }
but fails with

Code:
2020-11-02 19:01:17 Highland20201080ti 1060223 Proof 'proof\1060223-8.proof' generated
2020-11-02 19:01:17 Highland20201080ti 1060223 Released memory lock 'memlock-0'
2020-11-02 19:01:17 Highland20201080ti 1060223 Proof file 'proof\1060223-8.proof' has invalid header
2020-11-02 19:01:17 Highland20201080ti Exiting because "Invalid proof header"
2020-11-02 19:01:17 Highland20201080ti Bye
But I'm out of time to track this down further for now.

Last fiddled with by Ethan (EO) on 2020-11-03 at 03:08
Ethan (EO) is offline   Reply With Quote
Old 2020-11-03, 07:48   #159
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

17·79 Posts
Default

Quote:
Originally Posted by Ethan (EO) View Post
but fails with

Code:
2020-11-02 19:01:17 Highland20201080ti 1060223 Proof 'proof\1060223-8.proof' generated
2020-11-02 19:01:17 Highland20201080ti 1060223 Released memory lock 'memlock-0'
2020-11-02 19:01:17 Highland20201080ti 1060223 Proof file 'proof\1060223-8.proof' has invalid header
2020-11-02 19:01:17 Highland20201080ti Exiting because "Invalid proof header"
2020-11-02 19:01:17 Highland20201080ti Bye
I think I found the reason, it was related to the temp proof file being closed on a background thread which prevented the move. Hopefully fixed now.

Last fiddled with by preda on 2020-11-03 at 10:28
preda is offline   Reply With Quote
Old 2020-11-03, 07:53   #160
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

17·79 Posts
Default

Quote:
Originally Posted by frmky View Post
Just tried to update to 7.2 and got the following error
Code:
Gpu.cpp:37:15: error: static_assert expression is not an integral constant expression
static_assert(sinl(M_PI) != sin(M_PI));
              ^~~~~~~~~~~~~~~~~~~~~~~
Just comment out that line. I commited this change too, but you can also apply it locally (prepend "//").
preda is offline   Reply With Quote
Old 2020-11-03, 09:41   #161
aheeffer
 
Aug 2020

37 Posts
Default v.7.2.2 Windows

Repeatedly when restarting. What to do now?

Code:
2020-11-03 10:35:04 GpuOwl VERSION v7.2-2-ga135d8d
2020-11-03 10:35:04 Note: not found 'c:\gpuowl\pool\config.txt'
2020-11-03 10:35:04 config: -user al -cpu Rig02-RadeonVII-01i2 -d 0 -pool c:\gpuowl\pool -maxAlloc 7500 -proof 9
2020-11-03 10:35:04 device 0, unique id ''
2020-11-03 10:35:04 Rig02-RadeonVII-01i2 109003607 FFT: 6M 1K:12:256 (17.33 bpw)
2020-11-03 10:35:05 Rig02-RadeonVII-01i2 109003607 OpenCL args "-DEXP=109003607u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DAMDGPU=1 -DCARRY64=1 -DCARRYM64=1 -DWEIGHT_STEP_MINUS_1=0x9.88af22f9109d8p-4 -DIWEIGHT_STEP_MINUS_1=-0xb.f2c012041673p-5  -cl-std=CL2.0 -cl-finite-math-only "
2020-11-03 10:35:05 Rig02-RadeonVII-01i2 109003607 ASM compilation failed, retrying compilation using NO_ASM
2020-11-03 10:35:07 Rig02-RadeonVII-01i2 109003607 OpenCL compilation in 2.63 s
2020-11-03 10:35:08 Rig02-RadeonVII-01i2 109003607 maxAlloc: 7.3 GB
2020-11-03 10:35:08 Rig02-RadeonVII-01i2 109003607 P1(5.5M) 7935851 bits
2020-11-03 10:35:08 Rig02-RadeonVII-01i2 109003607 PRP starting from beginning
2020-11-03 10:35:08 Rig02-RadeonVII-01i2 109003607 Acquired memory lock 'c:\gpuowl\pool\memlock-0'
2020-11-03 10:35:08 Rig02-RadeonVII-01i2 109003607 P1(5.5M) using 296 buffers
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [0] febc0320 != fffffffb
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [1] ebdffc29 != 00000019
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [2] 1d004502 != ffffff83
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [3] 8000c772 != 00000271
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [4] cfeb1bc7 != fffff3cb
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [5] fffcfd30 != 00003d09
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [6] 00bbcec7 != fffeced3
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [7] f31df59e != 0005f5e1
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [8] 1ea21f6a != ffe2329b
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [9] a694d47a != 009502f9
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [10] c51a8525 != fd16f123
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [11] 8e884a5b != 0e8d4a51
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [12] b5700c45 != b73d8c6b
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [13] 70b04163 != 6bcc41e9
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [14] e1a2a8c3 != e502b673
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [15] f0f29ec1 != 86f26fc1
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [16] 3143d339 != 5d43d13b
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [17] 2db9e9d4 != 2dace9d9
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [18] 9bcdeed5 != 1b9f6ec3
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 [19] 76d3d797 != 75e2d631
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 fold() does not roundtrip
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 P1(5.5M) releasing 296 buffers
2020-11-03 10:35:11 Rig02-RadeonVII-01i2 109003607 Released memory lock 'c:\gpuowl\pool\memlock-0'
2020-11-03 10:35:11 Rig02-RadeonVII-01i2 Exiting because "fold roundtrip"
2020-11-03 10:35:11 Rig02-RadeonVII-01i2 Bye
aheeffer is offline   Reply With Quote
Old 2020-11-03, 10:34   #162
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

17×79 Posts
Default

Quote:
Originally Posted by aheeffer View Post
Repeatedly when restarting. What to do now?

Code:
2020-11-03 10:35:04 GpuOwl VERSION v7.2-2-ga135d8d
2020-11-03 10:35:08 Rig02-RadeonVII-01i2 109003607 P1(5.5M) 7935851 bits
2020-11-03 10:35:08 Rig02-RadeonVII-01i2 109003607 PRP starting from beginning
2020-11-03 10:35:10 Rig02-RadeonVII-01i2 109003607 fold() does not roundtrip
Try increasing the FFT size, changing the FFT variant, or otherwise fiddling with FFT precision settings. Let us know how it goes.

EDIT: I tried your exponent, with a similar setup (see below) -- the same number of buffers, same -maxAlloc, same FFT size, and it works fine for me. So it's not a problem of FFT size.

Now I see: you're not using ROCm, so you're probably on Windows? Adrenaline?

Anyway, I also tried my compilation with -NO_ASM to reproduce closer the source your setup sees, and it still works for me. So at this point IMO this indicates a reduced precission of some FFT operations. Whether this is due to a bug in the OpenCL compiler (i.e. GPU driver), or not (i.e. the driver is within rights to generate the lower-precision variant), the outcome is the same.

Increasing the FFT size may help, but would have a cost too. Running without P-1 may work too (but a pity if you wanted P-1). Using ROCm on Linux is another option.

Code:
2020-11-03 21:44:24 GpuOwl VERSION v7.2-8-g8906157
2020-11-03 21:44:24 config: -cpu XFX -uid 780c28c172da5ebb
2020-11-03 21:44:24 config: -prp 109003607 -B1 5500000 -maxAlloc 7.33G -use CARRY64,NO_ASM 
2020-11-03 21:44:24 device 1, unique id '780c28c172da5ebb'
2020-11-03 21:44:24 XFX 109003607 FFT: 6M 1K:12:256 (17.33 bpw)
2020-11-03 21:44:25 XFX 109003607 OpenCL args "-DEXP=109003607u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=12u -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0x1.3115e45f2213bp-1 -DIWEIGHT_STEP_MINUS_1=-0x1.7e58024082ce6p-2 -DCARRY64=1 -DNO_ASM=1  -cl-std=CL2.0 -cl-finite-math-only "
2020-11-03 21:44:27 XFX 109003607 OpenCL compilation in 2.38 s
2020-11-03 21:44:28 XFX 109003607 maxAlloc: 7.3 GB
2020-11-03 21:44:28 XFX 109003607 P1(5.5M) 7935851 bits
2020-11-03 21:44:28 XFX 109003607 PRP starting from beginning
2020-11-03 21:44:28 XFX 109003607 Acquired memory lock 'memlock-1'
2020-11-03 21:44:28 XFX 109003607 P1(5.5M) using 296 buffers
2020-11-03 21:44:38 XFX 109003607 OK         0 on-load: blockSize 400, 0000000000000003

Last fiddled with by preda on 2020-11-03 at 10:50
preda is offline   Reply With Quote
Old 2020-11-03, 10:58   #163
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

17×79 Posts
Default

Quote:
Originally Posted by aheeffer View Post
Repeatedly when restarting. What to do now?
One more question: did this use to work (on an earlier 7.x version)? is the failure something recent, thus indicating that one of my recent changes broke something that was working before?
preda is offline   Reply With Quote
Old 2020-11-03, 17:32   #164
Ethan (EO)
 
Ethan (EO)'s Avatar
 
"Ethan O'Connor"
Oct 2002
GIMPS since Jan 1996

1348 Posts
Default

Yep - all good now.

Quote:
Originally Posted by preda View Post
I think I found the reason, it was related to the temp proof file being closed on a background thread which prevented the move. Hopefully fixed now.
Ethan (EO) is offline   Reply With Quote
Old 2020-11-03, 20:31   #165
aheeffer
 
Aug 2020

37 Posts
Default

Quote:
Originally Posted by preda View Post
Try increasing the FFT size, changing the FFT variant, or otherwise fiddling with FFT precision settings. Let us know how it goes.
Restarting with -fft - 6.5M works.

Yes I am using Windows (see header message)

I had not this problem with v.7.1.1 with other exponents with the same fft size.
aheeffer is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
GpuOwl PRP-Proof changes preda GpuOwl 20 2020-10-17 06:51
gpuowl: runtime error SELROC GpuOwl 59 2020-10-02 03:56
gpuOWL for Wagstaff GP2 GpuOwl 22 2020-06-13 16:57
gpuowl tuning M344587487 GpuOwl 14 2018-12-29 08:11
How to interface gpuOwl with PrimeNet preda PrimeNet 2 2017-10-07 21:32

All times are UTC. The time now is 22:08.

Thu Mar 4 22:08:56 UTC 2021 up 91 days, 18:20, 0 users, load averages: 1.57, 1.85, 1.75

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.