mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

ewmayer 2020-02-15 19:27

[QUOTE=Prime95;537629]@Ernst: The GCN timings doc is here -- [url]https://github.com/CLRX/CLRX-mirror/wiki/GcnTimings[/url]

ROE error checking would be slower but it would be useful debugging option to sanity check FFT length selections.[/QUOTE]

Thanks, George - you'll need to to let me know if I'm reading that correctly: I see a V_RNDNE_F64 instruction with latency DPFACTOR*4 = 8 cycles on Radeon. All other things being equal, that equals the 4+4 cycle latency needed for the DNINT(x) = (x + c) - c "hand-rolled round" alternative. In practice, other operations (e.g. computing DWT weights) can be interleaved with the round to help hide the latency.

If Mihai could add ROE checking to just the the carry step used in the current p-1 stage, that would be great - even if a GEC-enhanced p-1 is coming down the pike, it's always useful to have multiple checks, to catch both FFT-length-related errors and "other" ones - on flaky hardware, in my case my aging Haswell quad, I've found sudden emission of fatal ROEs, nonreproducible on interval-retry, to be a reliable indicator of upcoming system-needs-rebootness.

Prime95 2020-02-16 07:28

[QUOTE=ewmayer;537652]All other things being equal, that equals the 4+4 cycle latency needed for the DNINT(x) = (x + c) - c "hand-rolled round" alternative.[/QUOTE]

Yes, and you cannot get OpenCL to generate a V_RNDNE_F64 instruction unless you resort to __asm syntax.

kriesel 2020-02-17 02:07

small detail
 
[CODE]2020-02-16 20:01:42 asrock/radeonvii 4444091 FFT 224K: Width 8x8, Height 64x4, Middle 7; 19.37 bits/word
2020-02-16 20:01:42 asrock/radeonvii OpenCL args "-DEXP=4444091u -DWIDTH=64u -DSMALL_HEIGHT=256u -DMIDDLE=7u -DWEIGHT_STEP=0x
c.571b3d76085f8p-3 -DIWEIGHT_STEP=0xa.5f5fa9671576p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b151
38p-4 -DAMDGPU=1 -DCHEBYSHEV_METHOD_FMA=1 -DCHEBYSHEV_MIDDLEMUL2=1 -DMERGED_MIDDLE=1 -DMORE_ACCURATE=1 -DNO_ASM=1 -DT2_SHUFFL
E_HEIGHT=1 -DT2_SHUFFLE_MIDDLE=1 -DWORKINGIN1A=1 -DWORKINGOUT1A=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-02-16 20:01:43 asrock/radeonvii OpenCL compilation error -11 (args -DEXP=4444091u -DWIDTH=64u -DSMALL_HEIGHT=256u -DMIDD
LE=7u -DWEIGHT_STEP=0xc.571b3d76085f8p-3 -DIWEIGHT_STEP=0xa.5f5fa9671576p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_B
IGSTEP=0xa.5fed6a9b15138p-4 -DAMDGPU=1 -DCHEBYSHEV_METHOD_FMA=1 -DCHEBYSHEV_MIDDLEMUL2=1 -DMERGED_MIDDLE=1 -DMORE_ACCURATE=1
-DNO_ASM=1 -DT2_SHUFFLE_HEIGHT=1 -DT2_SHUFFLE_MIDDLE=1 -DWORKINGIN1A=1 -DWORKINGOUT1A=1 -I. -cl-fast-relaxed-math -cl-std=CL
2.0 -DNO_ASM=1)
2020-02-16 20:01:43 asrock/radeonvii C:\Users\User\AppData\Local\Temp\\OCL8000T3.cl:1942:2: error: WORKINGOUT1 not compatible
with this FFT size
#error [B]WORKINGOUT1[/B] not compatible with this FFT size
^
1 error generated.

error: Clang front-end compilation failed!
Frontend phase failed compilation.
Error: Compiling CL to IR[/CODE]Seems like that error message should refer to WORKINGOUT1[B]A.[/B]


Is there a mapping for which fft lengths are supported by the various -use options?

kriesel 2020-02-17 15:07

[QUOTE=kriesel;537643]P4 [B]0/7611 MiB[/B]
Stay tuned for K80, haven't got one in a while.[/QUOTE]
[B]K80 0/11441 MiB[/B]

kriesel 2020-02-17 17:10

gpuowl v6.11-134 P-11 memory allocation error
 
Win7 x64, on asrock motherboard, asrock Radeon VII shakedown cruise, on known-factor P-1 test candidates, hit an error that stopped the show until found crashed several hours later.[CODE]C:\Users\User\Documents\gpuowl-v6.11-134\radeonvii>gpuowl-win
2020-02-17 01:28:32 gpuowl v6.11-134-g1e0ce1d
2020-02-17 01:28:32 config: -device 1 -user kriesel -cpu asrock/radeonvii -yield -maxAlloc 16000 -use NO_ASM
2020-02-17 01:28:32 config:
2020-02-17 01:28:32 config: :not compatible with 224K fft: ,WORKINGOUT1A
2020-02-17 01:28:32 config: :best for 4608K: ,MERGED_MIDDLE,WORKINGIN1A,WORKINGOUT1A,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,CHEB
YSHEV_METHOD_FMA,CHEBYSHEV_MIDDLEMUL2,MORE_ACCURATE
2020-02-17 01:28:32 device 1, unique id ''
2020-02-17 01:28:32 asrock/radeonvii 150000713 FFT 8192K: Width 256x8, Height 256x8; 17.88 bits/word
2020-02-17 01:28:32 asrock/radeonvii using long carry kernels
2020-02-17 01:28:36 asrock/radeonvii OpenCL args "-DEXP=150000713u -DWIDTH=2048u -DSMALL_HEIGHT=2048u -DMIDDLE=1u -DWEIGHT_ST
EP=0x8.af5a78e9513b8p-3 -DIWEIGHT_STEP=0xe.bcf3fa7f78dc8p-4 -DWEIGHT_BIGSTEP=0xe.ac0c6e7dd2438p-3 -DIWEIGHT_BIGSTEP=0x8.b95c1
e3ea8bd8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-02-17 01:28:48 asrock/radeonvii OpenCL compilation in 12.10 s
2020-02-17 01:28:49 asrock/radeonvii 150000713 P1 B1=30030, B2=2400000; 43305 bits; starting at 0
2020-02-17 01:29:07 asrock/radeonvii 150000713 P1 10000 23.09%; 1804 us/it; ETA 0d 00:01; 2b8e087d5548b0be
2020-02-17 01:29:25 asrock/radeonvii 150000713 P1 20000 46.18%; 1799 us/it; ETA 0d 00:01; 34138e15eb0339f5
2020-02-17 01:29:43 asrock/radeonvii 150000713 P1 30000 69.28%; 1799 us/it; ETA 0d 00:00; 24160ea83dc597ec
2020-02-17 01:30:01 asrock/radeonvii 150000713 P1 40000 92.37%; 1801 us/it; ETA 0d 00:00; 59711be3f706d044
2020-02-17 01:30:08 asrock/radeonvii saved
2020-02-17 01:30:08 asrock/radeonvii 150000713 P1 43305 100.00%; 2084 us/it; ETA 0d 00:00; 986ed60328d8a1dc
2020-02-17 01:30:08 asrock/radeonvii P-1 (B1=30030, B2=2400000, D=30030): primes 173054, expanded 217731, doubles 36995 (left
102423), singles 99064, total 136059 (79%)
2020-02-17 01:30:08 asrock/radeonvii 150000713 P2 using blocks [1 - 80] to cover 136059 primes
GNU MP: Cannot reallocate memory (old_size=8 new_size=18750104)

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
[/CODE]Then there's a Windows popup containing[CODE]Problem signature:
Problem Event Name: APPCRASH
Application Name: gpuowl-win.exe
Application Version: 0.0.0.0
Application Timestamp: 00000000
Fault Module Name: gpuowl-win.exe
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 00000000
Exception Code: 40000015
Exception Offset: 000000000003eff1
OS Version: 6.1.7601.2.1.0.256.48
Locale ID: 1033
Additional Information 1: 8095
Additional Information 2: 8095155e3f9bfb3a0fc1b40b27c9d8c8
Additional Information 3: ab9e
Additional Information 4: ab9eae2761104200d519e0bef6c90ec9

Read our privacy statement online:
http://go.microsoft.com/fwlink/?linkid=104288&clcid=0x0409

If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\system32\en-US\erofflps.txt
[/CODE]When that's closed, one more line of console output appears from gpuowl:[CODE]2020-02-17 11:07:12 asrock/radeonvii 150000713 P2 using 231 buffers of 64.0 MB each[/CODE]
It's repeatable. Will try with the latest commit later.

kriesel 2020-02-17 17:51

Radeon VII 48M FFT tune
 
[CODE]Asrock Radeon VII on Win7 X64, Asrock 6-pcie motherboard on open air miner frame
gpuowl V6.11-134-g1e0ce1d
tune 852348659 PRP, 48M fft
stock settings, no OC or undervolt, etc.

NO_ASM 10309
NO_ASM 10302

UNROLL_ALL 10300
UNROLL_NONE 10096
UNROLL_WIDTH 10097
UNROLL_HEIGHT 10095 *
UNROLL_MIDDLEMUL1 10154
UNROLL_MIDDLEMUL2 10158

WORKINGIN 38066
WORKINGIN 38063
WORKINGIN1 10457
WORKINGIN1A 10038 *
WORKINGIN2 11576
WORKINGIN3 10361
WORKINGIN4 10784
WORKINGIN5 10292

WORKINGOUT 25480
WORKINGOUT0 11645
WORKINGOUT1 10202
WORKINGOUT1A 10116 *
WORKINGOUT2 16205
WORKINGOUT3 10207
WORKINGOUT4 10952
WORKINGOUT5 10750

mistakenly used workingout1 a while...
NO_ASM
NO_ASM 10291
...,UNROLL_WIDTH,UNROLL_HEIGHT 10098 *
...,UNROLL_WIDTH,UNROLL_MIDDLEMUL1 10159
...,UNROLL_HEIGHT,UNROLL_MIDDLEMUL1 10157
...,UNROLL_WIDTH,UNROLL_HEIGHT,UNROLL_MIDDLEMUL1 10157

NO_ASM,MERGED_MIDDLE,WORKINGIN1A,WORKINGOUT1 9961
...,T2_SHUFFLE_WIDTH 9906
...,T2_SHUFFLE_MIDDLE 9879
...,T2_SHUFFLE_HEIGHT 9645
...,T2_SHUFFLE_REVERSELINE 9965
...,T2_SHUFFLE 9485 *

NO_ASM 10311
NO_ASM 10296
...,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE 9594
...,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_WIDTH 9626
...,T2_SHUFFLE_WIDTH,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_MIDDLE 9878
...,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE 9518
...,T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,SHUFFLE_REVERSELINE 9484

correct to workingout1a
NO_ASM 10296
NO_ASM 10299
...,CARRY32 9289 *
...,CARRY64 9443

NO_ASM 10301
NO_ASM 10294
...,FANCY_MIDDLEMUL1 error no middlemul1 for the 48M fft
...,MORE_SQUARES_MIDDLEMUL1 9274 *
...,CHEBYSHEV_METHOD EE
...,CHEBYSHEV_METHOD_FMA EE
...,ORIGINAL_METHOD 9290
...,ORIGINAL_TWEAKED 9288

NO_ASM 10310
NO_ASM 10304
...,ORIG_MIDDLEMUL2 9277 *
...,CHEBYSHEV_MIDDLEMUL2 EE

NO_ASM 10288
NO_ASM 10286
...,ORIG_SLOWTRIG 9438
...,NEW_SLOWTRIG 9278
...,MORE_ACCURATE 9278
...,LESS_ACCURATE 9230 *

NO_ASM,MERGED_MIDDLE,WORKINGIN1A,WORKINGOUT1A,UNROLL_HEIGHT,T2_SHUFFLE,CARRY32,MORE_SQUARES_MIDDLEMUL1,ORIG_MIDDLEMUL2,LESS_ACCURATE

gain from tuning
10305/9230 = ~1.1165
without -time, it's a bit faster, ~9161us/it[/CODE]This yields an estimated run time of 3 months (90.3 days to be precise). The upper limit of mersenne.org would take ~4.2 months.
The same gpu in the same conditions but with a different tune for 4.5M fft has produced a matching PRP DC in its first attempt.

kriesel 2020-02-17 18:24

gpuowl-win v6.11-147-g3b8b00e build
 
2 Attachment(s)
Just built, only -h run so far.

kriesel 2020-02-18 00:18

memory allocation error in v6.11-147
 
[CODE]C:\Users\User\Documents\gpuowl-v6.11-147\radeonvii>gpuowl-win
2020-02-17 18:14:27 gpuowl v6.11-147-g3b8b00e
2020-02-17 18:14:27 config: -device 1 -user kriesel -cpu asrock/radeonvii -yield -maxAlloc 155000 -use NO_ASM,MERGED_MIDDLE,UNRO
LL_HEIGHT,T2_SHUFFLE,CARRY32,MORE_SQUARES_MIDDLEMUL1,ORIG_MIDDLEMUL2,LESS_ACCURATE
2020-02-17 18:14:27 config:

2020-02-17 18:14:28 device 1, unique id ''
2020-02-17 18:14:28 asrock/radeonvii 24000577 FFT 1280K: Width 8x8, Height 256x4, Middle 10; 18.31 bits/word
2020-02-17 18:14:29 asrock/radeonvii OpenCL args "-DEXP=24000577u -DWIDTH=64u -DSMALL_HEIGHT=1024u -DMIDDLE=10u -DWEIGHT_STEP=0x
c.e5beac96a0b88p-3 -DIWEIGHT_STEP=0x9.eca8ba4660afp-4 -DWEIGHT_BIGSTEP=0xe.ac0c6e7dd2438p-3 -DIWEIGHT_BIGSTEP=0x8.b95c1e3ea8bd8p
-4 -DPM1=1 -DAMDGPU=1 -DCARRY32=1 -DLESS_ACCURATE=1 -DMERGED_MIDDLE=1 -DMORE_SQUARES_MIDDLEMUL1=1 -DNO_ASM=1 -DORIG_MIDDLEMUL2=1
-DT2_SHUFFLE=1 -DUNROLL_HEIGHT=1 -cl-fast-relaxed-math -cl-std=CL2.0"
2020-02-17 18:14:41 asrock/radeonvii OpenCL compilation in 11.44 s
2020-02-17 18:14:41 asrock/radeonvii 24000577 P1 B1=300000, B2=9000000; 432351 bits; starting at 432350
2020-02-17 18:14:41 asrock/radeonvii 24000577 P1 432351 100.00%; 84768 us/it; ETA 0d 00:00; 55a8d888497469ec
2020-02-17 18:14:41 asrock/radeonvii P-1 (B1=300000, B2=9000000, D=30030): primes 576492, expanded 615799, doubles 105850 (left
373895), singles 364792, total 470642 (82%)
2020-02-17 18:14:41 asrock/radeonvii 24000577 P2 using blocks [10 - 300] to cover 470642 primes
GNU MP: Cannot allocate memory (size=164912)

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
2020-02-17 18:14:42 asrock/radeonvii 24000577 P2 using 1440 buffers of 10.0 MB each[/CODE]It was able to complete after -use was reduced to merely NO_ASM. But it's still missing the 15M test factor.[CODE]
{"exponent":"15000031", "worktype":"PM1", "status":"[B]NF[/B]", "program":{"name":"gpuowl", "version":"v6.11-147-g3b8b00e"}, "timestamp":"2020-02-17 23:16:22 UTC", "user":"kriesel", "computer":"asrock/radeonvii", "fft-length":786432, "B1":180000, "B2":3780000}[/CODE]But tthe 81M test exponent also fails with the cannot allocate memory error (size=2637840)

preda 2020-02-18 11:32

[QUOTE=kriesel;537797]2020-02-17 18:14:27 config: -device 1 -user kriesel -cpu asrock/radeonvii -yield -maxAlloc 155000 -use NO_ASM,MERGED_MIDDLE,UNRO
LL_HEIGHT,T2_SHUFFLE,CARRY32,MORE_SQUARES_MIDDLEMUL1,ORIG_MIDDLEMUL2,LESS_ACCURATE
[/QUOTE]

You realize you're using an unrealistically large maxAlloc; I don't know if this is what's causing the mem alloc error.

I fixed the auto FFT size for P-1.

kriesel 2020-02-18 14:50

[QUOTE=preda;537830]You realize you're using an unrealistically large maxAlloc; I don't know if this is what's causing the mem alloc error.

I fixed the auto FFT size for P-1.[/QUOTE]
I ran into trouble at 16000. 155 was supposed to be a reduction. And 1440 x 10 should fit within 15500 or 16000.

axn 2020-02-18 17:20

[QUOTE=kriesel;537841]I ran into trouble at 16000. 155 was supposed to be a reduction. And 1440 x 10 should fit within 15500 or 16000.[/QUOTE]

You did notice the extra 0, right?


All times are UTC. The time now is 23:11.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.