mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-02-15, 19:27   #1860
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

5·2,351 Posts
Default

Quote:
Originally Posted by Prime95 View Post
@Ernst: The GCN timings doc is here -- https://github.com/CLRX/CLRX-mirror/wiki/GcnTimings

ROE error checking would be slower but it would be useful debugging option to sanity check FFT length selections.
Thanks, George - you'll need to to let me know if I'm reading that correctly: I see a V_RNDNE_F64 instruction with latency DPFACTOR*4 = 8 cycles on Radeon. All other things being equal, that equals the 4+4 cycle latency needed for the DNINT(x) = (x + c) - c "hand-rolled round" alternative. In practice, other operations (e.g. computing DWT weights) can be interleaved with the round to help hide the latency.

If Mihai could add ROE checking to just the the carry step used in the current p-1 stage, that would be great - even if a GEC-enhanced p-1 is coming down the pike, it's always useful to have multiple checks, to catch both FFT-length-related errors and "other" ones - on flaky hardware, in my case my aging Haswell quad, I've found sudden emission of fatal ROEs, nonreproducible on interval-retry, to be a reliable indicator of upcoming system-needs-rebootness.
ewmayer is offline   Reply With Quote
Old 2020-02-16, 07:28   #1861
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

41·199 Posts
Default

Quote:
Originally Posted by ewmayer View Post
All other things being equal, that equals the 4+4 cycle latency needed for the DNINT(x) = (x + c) - c "hand-rolled round" alternative.
Yes, and you cannot get OpenCL to generate a V_RNDNE_F64 instruction unless you resort to __asm syntax.
Prime95 is offline   Reply With Quote
Old 2020-02-17, 02:07   #1862
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×29×127 Posts
Default small detail

Code:
2020-02-16 20:01:42 asrock/radeonvii 4444091 FFT 224K: Width 8x8, Height 64x4, Middle 7; 19.37 bits/word
2020-02-16 20:01:42 asrock/radeonvii OpenCL args "-DEXP=4444091u -DWIDTH=64u -DSMALL_HEIGHT=256u -DMIDDLE=7u -DWEIGHT_STEP=0x
c.571b3d76085f8p-3 -DIWEIGHT_STEP=0xa.5f5fa9671576p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b151
38p-4 -DAMDGPU=1 -DCHEBYSHEV_METHOD_FMA=1 -DCHEBYSHEV_MIDDLEMUL2=1 -DMERGED_MIDDLE=1 -DMORE_ACCURATE=1 -DNO_ASM=1 -DT2_SHUFFL
E_HEIGHT=1 -DT2_SHUFFLE_MIDDLE=1 -DWORKINGIN1A=1 -DWORKINGOUT1A=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-02-16 20:01:43 asrock/radeonvii OpenCL compilation error -11 (args -DEXP=4444091u -DWIDTH=64u -DSMALL_HEIGHT=256u -DMIDD
LE=7u -DWEIGHT_STEP=0xc.571b3d76085f8p-3 -DIWEIGHT_STEP=0xa.5f5fa9671576p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_B
IGSTEP=0xa.5fed6a9b15138p-4 -DAMDGPU=1 -DCHEBYSHEV_METHOD_FMA=1 -DCHEBYSHEV_MIDDLEMUL2=1 -DMERGED_MIDDLE=1 -DMORE_ACCURATE=1
-DNO_ASM=1 -DT2_SHUFFLE_HEIGHT=1 -DT2_SHUFFLE_MIDDLE=1 -DWORKINGIN1A=1 -DWORKINGOUT1A=1  -I. -cl-fast-relaxed-math -cl-std=CL
2.0 -DNO_ASM=1)
2020-02-16 20:01:43 asrock/radeonvii C:\Users\User\AppData\Local\Temp\\OCL8000T3.cl:1942:2: error: WORKINGOUT1 not compatible
 with this FFT size
#error WORKINGOUT1 not compatible with this FFT size
 ^
1 error generated.

error: Clang front-end compilation failed!
Frontend phase failed compilation.
 Error: Compiling CL to IR
Seems like that error message should refer to WORKINGOUT1A.


Is there a mapping for which fft lengths are supported by the various -use options?

Last fiddled with by kriesel on 2020-02-17 at 02:09
kriesel is offline   Reply With Quote
Old 2020-02-17, 15:07   #1863
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

736610 Posts
Default

Quote:
Originally Posted by kriesel View Post
P4 0/7611 MiB
Stay tuned for K80, haven't got one in a while.
K80 0/11441 MiB
kriesel is offline   Reply With Quote
Old 2020-02-17, 17:10   #1864
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·29·127 Posts
Default gpuowl v6.11-134 P-11 memory allocation error

Win7 x64, on asrock motherboard, asrock Radeon VII shakedown cruise, on known-factor P-1 test candidates, hit an error that stopped the show until found crashed several hours later.
Code:
C:\Users\User\Documents\gpuowl-v6.11-134\radeonvii>gpuowl-win
2020-02-17 01:28:32 gpuowl v6.11-134-g1e0ce1d
2020-02-17 01:28:32 config: -device 1 -user kriesel -cpu asrock/radeonvii -yield -maxAlloc 16000 -use NO_ASM
2020-02-17 01:28:32 config:
2020-02-17 01:28:32 config: :not compatible with 224K fft: ,WORKINGOUT1A
2020-02-17 01:28:32 config: :best for 4608K: ,MERGED_MIDDLE,WORKINGIN1A,WORKINGOUT1A,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,CHEB
YSHEV_METHOD_FMA,CHEBYSHEV_MIDDLEMUL2,MORE_ACCURATE
2020-02-17 01:28:32 device 1, unique id ''
2020-02-17 01:28:32 asrock/radeonvii 150000713 FFT 8192K: Width 256x8, Height 256x8; 17.88 bits/word
2020-02-17 01:28:32 asrock/radeonvii using long carry kernels
2020-02-17 01:28:36 asrock/radeonvii OpenCL args "-DEXP=150000713u -DWIDTH=2048u -DSMALL_HEIGHT=2048u -DMIDDLE=1u -DWEIGHT_ST
EP=0x8.af5a78e9513b8p-3 -DIWEIGHT_STEP=0xe.bcf3fa7f78dc8p-4 -DWEIGHT_BIGSTEP=0xe.ac0c6e7dd2438p-3 -DIWEIGHT_BIGSTEP=0x8.b95c1
e3ea8bd8p-4 -DAMDGPU=1 -DNO_ASM=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-02-17 01:28:48 asrock/radeonvii OpenCL compilation in 12.10 s
2020-02-17 01:28:49 asrock/radeonvii 150000713 P1 B1=30030, B2=2400000; 43305 bits; starting at 0
2020-02-17 01:29:07 asrock/radeonvii 150000713 P1    10000  23.09%; 1804 us/it; ETA 0d 00:01; 2b8e087d5548b0be
2020-02-17 01:29:25 asrock/radeonvii 150000713 P1    20000  46.18%; 1799 us/it; ETA 0d 00:01; 34138e15eb0339f5
2020-02-17 01:29:43 asrock/radeonvii 150000713 P1    30000  69.28%; 1799 us/it; ETA 0d 00:00; 24160ea83dc597ec
2020-02-17 01:30:01 asrock/radeonvii 150000713 P1    40000  92.37%; 1801 us/it; ETA 0d 00:00; 59711be3f706d044
2020-02-17 01:30:08 asrock/radeonvii saved
2020-02-17 01:30:08 asrock/radeonvii 150000713 P1    43305 100.00%; 2084 us/it; ETA 0d 00:00; 986ed60328d8a1dc
2020-02-17 01:30:08 asrock/radeonvii P-1 (B1=30030, B2=2400000, D=30030): primes 173054, expanded 217731, doubles 36995 (left
 102423), singles 99064, total 136059 (79%)
2020-02-17 01:30:08 asrock/radeonvii 150000713 P2 using blocks [1 - 80] to cover 136059 primes
GNU MP: Cannot reallocate memory (old_size=8 new_size=18750104)

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
Then there's a Windows popup containing
Code:
Problem signature:
  Problem Event Name:    APPCRASH
  Application Name:    gpuowl-win.exe
  Application Version:    0.0.0.0
  Application Timestamp:    00000000
  Fault Module Name:    gpuowl-win.exe
  Fault Module Version:    0.0.0.0
  Fault Module Timestamp:    00000000
  Exception Code:    40000015
  Exception Offset:    000000000003eff1
  OS Version:    6.1.7601.2.1.0.256.48
  Locale ID:    1033
  Additional Information 1:    8095
  Additional Information 2:    8095155e3f9bfb3a0fc1b40b27c9d8c8
  Additional Information 3:    ab9e
  Additional Information 4:    ab9eae2761104200d519e0bef6c90ec9

Read our privacy statement online:
  http://go.microsoft.com/fwlink/?linkid=104288&clcid=0x0409

If the online privacy statement is not available, please read our privacy statement offline:
  C:\Windows\system32\en-US\erofflps.txt
When that's closed, one more line of console output appears from gpuowl:
Code:
2020-02-17 11:07:12 asrock/radeonvii 150000713 P2 using 231 buffers of 64.0 MB each
It's repeatable. Will try with the latest commit later.
kriesel is offline   Reply With Quote
Old 2020-02-17, 17:51   #1865
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·29·127 Posts
Default Radeon VII 48M FFT tune

Code:
Asrock Radeon VII on Win7 X64, Asrock 6-pcie motherboard on open air miner frame
gpuowl V6.11-134-g1e0ce1d
tune 852348659 PRP, 48M fft
stock settings, no OC or undervolt, etc.

NO_ASM 10309
NO_ASM 10302

UNROLL_ALL 10300
UNROLL_NONE 10096
UNROLL_WIDTH 10097
UNROLL_HEIGHT 10095 *
UNROLL_MIDDLEMUL1 10154
UNROLL_MIDDLEMUL2 10158

WORKINGIN 38066
WORKINGIN 38063
WORKINGIN1 10457
WORKINGIN1A 10038 *
WORKINGIN2 11576
WORKINGIN3 10361
WORKINGIN4 10784
WORKINGIN5 10292

WORKINGOUT 25480
WORKINGOUT0 11645
WORKINGOUT1 10202 
WORKINGOUT1A 10116 *
WORKINGOUT2 16205
WORKINGOUT3 10207
WORKINGOUT4 10952
WORKINGOUT5 10750

mistakenly used workingout1 a while...
NO_ASM 
NO_ASM 10291
...,UNROLL_WIDTH,UNROLL_HEIGHT 10098 *
...,UNROLL_WIDTH,UNROLL_MIDDLEMUL1 10159
...,UNROLL_HEIGHT,UNROLL_MIDDLEMUL1 10157
...,UNROLL_WIDTH,UNROLL_HEIGHT,UNROLL_MIDDLEMUL1 10157

NO_ASM,MERGED_MIDDLE,WORKINGIN1A,WORKINGOUT1 9961
...,T2_SHUFFLE_WIDTH 9906
...,T2_SHUFFLE_MIDDLE 9879
...,T2_SHUFFLE_HEIGHT 9645
...,T2_SHUFFLE_REVERSELINE 9965
...,T2_SHUFFLE 9485 *

NO_ASM 10311
NO_ASM 10296
...,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE 9594
...,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_WIDTH 9626
...,T2_SHUFFLE_WIDTH,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_MIDDLE 9878
...,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE 9518
...,T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,SHUFFLE_REVERSELINE 9484

correct to workingout1a
NO_ASM 10296
NO_ASM 10299
...,CARRY32 9289 *
...,CARRY64 9443

NO_ASM 10301
NO_ASM 10294
...,FANCY_MIDDLEMUL1 error no middlemul1 for the 48M fft
...,MORE_SQUARES_MIDDLEMUL1 9274 *
...,CHEBYSHEV_METHOD EE
...,CHEBYSHEV_METHOD_FMA EE
...,ORIGINAL_METHOD 9290
...,ORIGINAL_TWEAKED 9288

NO_ASM 10310
NO_ASM 10304
...,ORIG_MIDDLEMUL2 9277 *
...,CHEBYSHEV_MIDDLEMUL2 EE

NO_ASM 10288
NO_ASM 10286
...,ORIG_SLOWTRIG 9438
...,NEW_SLOWTRIG 9278
...,MORE_ACCURATE 9278
...,LESS_ACCURATE 9230 *

NO_ASM,MERGED_MIDDLE,WORKINGIN1A,WORKINGOUT1A,UNROLL_HEIGHT,T2_SHUFFLE,CARRY32,MORE_SQUARES_MIDDLEMUL1,ORIG_MIDDLEMUL2,LESS_ACCURATE

gain from tuning
 10305/9230 = ~1.1165
without -time, it's a bit faster, ~9161us/it
This yields an estimated run time of 3 months (90.3 days to be precise). The upper limit of mersenne.org would take ~4.2 months.
The same gpu in the same conditions but with a different tune for 4.5M fft has produced a matching PRP DC in its first attempt.

Last fiddled with by kriesel on 2020-02-17 at 18:05
kriesel is offline   Reply With Quote
Old 2020-02-17, 18:24   #1866
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·29·127 Posts
Default gpuowl-win v6.11-147-g3b8b00e build

Just built, only -h run so far.
Attached Files
File Type: txt build-log.txt (6.2 KB, 179 views)
File Type: zip gpuowl-win-v6.11-147-g3b8b00e.zip (642.7 KB, 175 views)
kriesel is offline   Reply With Quote
Old 2020-02-18, 00:18   #1867
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11100110001102 Posts
Default memory allocation error in v6.11-147

Code:
C:\Users\User\Documents\gpuowl-v6.11-147\radeonvii>gpuowl-win
2020-02-17 18:14:27 gpuowl v6.11-147-g3b8b00e
2020-02-17 18:14:27 config: -device 1 -user kriesel -cpu asrock/radeonvii -yield -maxAlloc 155000 -use NO_ASM,MERGED_MIDDLE,UNRO
LL_HEIGHT,T2_SHUFFLE,CARRY32,MORE_SQUARES_MIDDLEMUL1,ORIG_MIDDLEMUL2,LESS_ACCURATE
2020-02-17 18:14:27 config:

2020-02-17 18:14:28 device 1, unique id ''
2020-02-17 18:14:28 asrock/radeonvii 24000577 FFT 1280K: Width 8x8, Height 256x4, Middle 10; 18.31 bits/word
2020-02-17 18:14:29 asrock/radeonvii OpenCL args "-DEXP=24000577u -DWIDTH=64u -DSMALL_HEIGHT=1024u -DMIDDLE=10u -DWEIGHT_STEP=0x
c.e5beac96a0b88p-3 -DIWEIGHT_STEP=0x9.eca8ba4660afp-4 -DWEIGHT_BIGSTEP=0xe.ac0c6e7dd2438p-3 -DIWEIGHT_BIGSTEP=0x8.b95c1e3ea8bd8p
-4 -DPM1=1 -DAMDGPU=1 -DCARRY32=1 -DLESS_ACCURATE=1 -DMERGED_MIDDLE=1 -DMORE_SQUARES_MIDDLEMUL1=1 -DNO_ASM=1 -DORIG_MIDDLEMUL2=1
 -DT2_SHUFFLE=1 -DUNROLL_HEIGHT=1  -cl-fast-relaxed-math -cl-std=CL2.0"
2020-02-17 18:14:41 asrock/radeonvii OpenCL compilation in 11.44 s
2020-02-17 18:14:41 asrock/radeonvii 24000577 P1 B1=300000, B2=9000000; 432351 bits; starting at 432350
2020-02-17 18:14:41 asrock/radeonvii 24000577 P1   432351 100.00%; 84768 us/it; ETA 0d 00:00; 55a8d888497469ec
2020-02-17 18:14:41 asrock/radeonvii P-1 (B1=300000, B2=9000000, D=30030): primes 576492, expanded 615799, doubles 105850 (left
373895), singles 364792, total 470642 (82%)
2020-02-17 18:14:41 asrock/radeonvii 24000577 P2 using blocks [10 - 300] to cover 470642 primes
GNU MP: Cannot allocate memory (size=164912)

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
2020-02-17 18:14:42 asrock/radeonvii 24000577 P2 using 1440 buffers of 10.0 MB each
It was able to complete after -use was reduced to merely NO_ASM. But it's still missing the 15M test factor.
Code:
 {"exponent":"15000031", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-147-g3b8b00e"}, "timestamp":"2020-02-17 23:16:22 UTC", "user":"kriesel", "computer":"asrock/radeonvii", "fft-length":786432, "B1":180000, "B2":3780000}
But tthe 81M test exponent also fails with the cannot allocate memory error (size=2637840)

Last fiddled with by kriesel on 2020-02-18 at 00:47
kriesel is offline   Reply With Quote
Old 2020-02-18, 11:32   #1868
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

22×192 Posts
Default

Quote:
Originally Posted by kriesel View Post
2020-02-17 18:14:27 config: -device 1 -user kriesel -cpu asrock/radeonvii -yield -maxAlloc 155000 -use NO_ASM,MERGED_MIDDLE,UNRO
LL_HEIGHT,T2_SHUFFLE,CARRY32,MORE_SQUARES_MIDDLEMUL1,ORIG_MIDDLEMUL2,LESS_ACCURATE
You realize you're using an unrealistically large maxAlloc; I don't know if this is what's causing the mem alloc error.

I fixed the auto FFT size for P-1.
preda is online now   Reply With Quote
Old 2020-02-18, 14:50   #1869
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2×29×127 Posts
Default

Quote:
Originally Posted by preda View Post
You realize you're using an unrealistically large maxAlloc; I don't know if this is what's causing the mem alloc error.

I fixed the auto FFT size for P-1.
I ran into trouble at 16000. 155 was supposed to be a reduction. And 1440 x 10 should fit within 15500 or 16000.
kriesel is offline   Reply With Quote
Old 2020-02-18, 17:20   #1870
axn
 
axn's Avatar
 
Jun 2003

10101001111002 Posts
Default

Quote:
Originally Posted by kriesel View Post
I ran into trouble at 16000. 155 was supposed to be a reduction. And 1440 x 10 should fit within 15500 or 16000.
You did notice the extra 0, right?
axn is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1719 2023-01-16 15:51
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 15:02.


Thu Feb 2 15:02:00 UTC 2023 up 168 days, 12:30, 1 user, load averages: 0.70, 0.90, 0.96

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔