![]() |
[QUOTE=Prime95;537629]@Ernst: The GCN timings doc is here -- [url]https://github.com/CLRX/CLRX-mirror/wiki/GcnTimings[/url]
ROE error checking would be slower but it would be useful debugging option to sanity check FFT length selections.[/QUOTE] Thanks, George - you'll need to to let me know if I'm reading that correctly: I see a V_RNDNE_F64 instruction with latency DPFACTOR*4 = 8 cycles on Radeon. All other things being equal, that equals the 4+4 cycle latency needed for the DNINT(x) = (x + c) - c "hand-rolled round" alternative. In practice, other operations (e.g. computing DWT weights) can be interleaved with the round to help hide the latency. If Mihai could add ROE checking to just the the carry step used in the current p-1 stage, that would be great - even if a GEC-enhanced p-1 is coming down the pike, it's always useful to have multiple checks, to catch both FFT-length-related errors and "other" ones - on flaky hardware, in my case my aging Haswell quad, I've found sudden emission of fatal ROEs, nonreproducible on interval-retry, to be a reliable indicator of upcoming system-needs-rebootness. |
[QUOTE=ewmayer;537652]All other things being equal, that equals the 4+4 cycle latency needed for the DNINT(x) = (x + c) - c "hand-rolled round" alternative.[/QUOTE]
Yes, and you cannot get OpenCL to generate a V_RNDNE_F64 instruction unless you resort to __asm syntax. |
small detail
[CODE]2020-02-16 20:01:42 asrock/radeonvii 4444091 FFT 224K: Width 8x8, Height 64x4, Middle 7; 19.37 bits/word
2020-02-16 20:01:42 asrock/radeonvii OpenCL args "-DEXP=4444091u -DWIDTH=64u -DSMALL_HEIGHT=256u -DMIDDLE=7u -DWEIGHT_STEP=0x c.571b3d76085f8p-3 -DIWEIGHT_STEP=0xa.5f5fa9671576p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_BIGSTEP=0xa.5fed6a9b151 38p-4 -DAMDGPU=1 -DCHEBYSHEV_METHOD_FMA=1 -DCHEBYSHEV_MIDDLEMUL2=1 -DMERGED_MIDDLE=1 -DMORE_ACCURATE=1 -DNO_ASM=1 -DT2_SHUFFL E_HEIGHT=1 -DT2_SHUFFLE_MIDDLE=1 -DWORKINGIN1A=1 -DWORKINGOUT1A=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-02-16 20:01:43 asrock/radeonvii OpenCL compilation error -11 (args -DEXP=4444091u -DWIDTH=64u -DSMALL_HEIGHT=256u -DMIDD LE=7u -DWEIGHT_STEP=0xc.571b3d76085f8p-3 -DIWEIGHT_STEP=0xa.5f5fa9671576p-4 -DWEIGHT_BIGSTEP=0xc.5672a115506d8p-3 -DIWEIGHT_B IGSTEP=0xa.5fed6a9b15138p-4 -DAMDGPU=1 -DCHEBYSHEV_METHOD_FMA=1 -DCHEBYSHEV_MIDDLEMUL2=1 -DMERGED_MIDDLE=1 -DMORE_ACCURATE=1 -DNO_ASM=1 -DT2_SHUFFLE_HEIGHT=1 -DT2_SHUFFLE_MIDDLE=1 -DWORKINGIN1A=1 -DWORKINGOUT1A=1 -I. -cl-fast-relaxed-math -cl-std=CL 2.0 -DNO_ASM=1) 2020-02-16 20:01:43 asrock/radeonvii C:\Users\User\AppData\Local\Temp\\OCL8000T3.cl:1942:2: error: WORKINGOUT1 not compatible with this FFT size #error [B]WORKINGOUT1[/B] not compatible with this FFT size ^ 1 error generated. error: Clang front-end compilation failed! Frontend phase failed compilation. Error: Compiling CL to IR[/CODE]Seems like that error message should refer to WORKINGOUT1[B]A.[/B] Is there a mapping for which fft lengths are supported by the various -use options? |
[QUOTE=kriesel;537643]P4 [B]0/7611 MiB[/B]
Stay tuned for K80, haven't got one in a while.[/QUOTE] [B]K80 0/11441 MiB[/B] |
gpuowl v6.11-134 P-11 memory allocation error
Win7 x64, on asrock motherboard, asrock Radeon VII shakedown cruise, on known-factor P-1 test candidates, hit an error that stopped the show until found crashed several hours later.[CODE]C:\Users\User\Documents\gpuowl-v6.11-134\radeonvii>gpuowl-win
2020-02-17 01:28:32 gpuowl v6.11-134-g1e0ce1d 2020-02-17 01:28:32 config: -device 1 -user kriesel -cpu asrock/radeonvii -yield -maxAlloc 16000 -use NO_ASM 2020-02-17 01:28:32 config: 2020-02-17 01:28:32 config: :not compatible with 224K fft: ,WORKINGOUT1A 2020-02-17 01:28:32 config: :best for 4608K: ,MERGED_MIDDLE,WORKINGIN1A,WORKINGOUT1A,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE,CHEB YSHEV_METHOD_FMA,CHEBYSHEV_MIDDLEMUL2,MORE_ACCURATE 2020-02-17 01:28:32 device 1, unique id '' 2020-02-17 01:28:32 asrock/radeonvii 150000713 FFT 8192K: Width 256x8, Height 256x8; 17.88 bits/word 2020-02-17 01:28:32 asrock/radeonvii using long carry kernels 2020-02-17 01:28:36 asrock/radeonvii OpenCL args "-DEXP=150000713u -DWIDTH=2048u -DSMALL_HEIGHT=2048u -DMIDDLE=1u -DWEIGHT_ST EP=0x8.af5a78e9513b8p-3 -DIWEIGHT_STEP=0xe.bcf3fa7f78dc8p-4 -DWEIGHT_BIGSTEP=0xe.ac0c6e7dd2438p-3 -DIWEIGHT_BIGSTEP=0x8.b95c1 e3ea8bd8p-4 -DAMDGPU=1 -DNO_ASM=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2020-02-17 01:28:48 asrock/radeonvii OpenCL compilation in 12.10 s 2020-02-17 01:28:49 asrock/radeonvii 150000713 P1 B1=30030, B2=2400000; 43305 bits; starting at 0 2020-02-17 01:29:07 asrock/radeonvii 150000713 P1 10000 23.09%; 1804 us/it; ETA 0d 00:01; 2b8e087d5548b0be 2020-02-17 01:29:25 asrock/radeonvii 150000713 P1 20000 46.18%; 1799 us/it; ETA 0d 00:01; 34138e15eb0339f5 2020-02-17 01:29:43 asrock/radeonvii 150000713 P1 30000 69.28%; 1799 us/it; ETA 0d 00:00; 24160ea83dc597ec 2020-02-17 01:30:01 asrock/radeonvii 150000713 P1 40000 92.37%; 1801 us/it; ETA 0d 00:00; 59711be3f706d044 2020-02-17 01:30:08 asrock/radeonvii saved 2020-02-17 01:30:08 asrock/radeonvii 150000713 P1 43305 100.00%; 2084 us/it; ETA 0d 00:00; 986ed60328d8a1dc 2020-02-17 01:30:08 asrock/radeonvii P-1 (B1=30030, B2=2400000, D=30030): primes 173054, expanded 217731, doubles 36995 (left 102423), singles 99064, total 136059 (79%) 2020-02-17 01:30:08 asrock/radeonvii 150000713 P2 using blocks [1 - 80] to cover 136059 primes GNU MP: Cannot reallocate memory (old_size=8 new_size=18750104) This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. [/CODE]Then there's a Windows popup containing[CODE]Problem signature: Problem Event Name: APPCRASH Application Name: gpuowl-win.exe Application Version: 0.0.0.0 Application Timestamp: 00000000 Fault Module Name: gpuowl-win.exe Fault Module Version: 0.0.0.0 Fault Module Timestamp: 00000000 Exception Code: 40000015 Exception Offset: 000000000003eff1 OS Version: 6.1.7601.2.1.0.256.48 Locale ID: 1033 Additional Information 1: 8095 Additional Information 2: 8095155e3f9bfb3a0fc1b40b27c9d8c8 Additional Information 3: ab9e Additional Information 4: ab9eae2761104200d519e0bef6c90ec9 Read our privacy statement online: http://go.microsoft.com/fwlink/?linkid=104288&clcid=0x0409 If the online privacy statement is not available, please read our privacy statement offline: C:\Windows\system32\en-US\erofflps.txt [/CODE]When that's closed, one more line of console output appears from gpuowl:[CODE]2020-02-17 11:07:12 asrock/radeonvii 150000713 P2 using 231 buffers of 64.0 MB each[/CODE] It's repeatable. Will try with the latest commit later. |
Radeon VII 48M FFT tune
[CODE]Asrock Radeon VII on Win7 X64, Asrock 6-pcie motherboard on open air miner frame
gpuowl V6.11-134-g1e0ce1d tune 852348659 PRP, 48M fft stock settings, no OC or undervolt, etc. NO_ASM 10309 NO_ASM 10302 UNROLL_ALL 10300 UNROLL_NONE 10096 UNROLL_WIDTH 10097 UNROLL_HEIGHT 10095 * UNROLL_MIDDLEMUL1 10154 UNROLL_MIDDLEMUL2 10158 WORKINGIN 38066 WORKINGIN 38063 WORKINGIN1 10457 WORKINGIN1A 10038 * WORKINGIN2 11576 WORKINGIN3 10361 WORKINGIN4 10784 WORKINGIN5 10292 WORKINGOUT 25480 WORKINGOUT0 11645 WORKINGOUT1 10202 WORKINGOUT1A 10116 * WORKINGOUT2 16205 WORKINGOUT3 10207 WORKINGOUT4 10952 WORKINGOUT5 10750 mistakenly used workingout1 a while... NO_ASM NO_ASM 10291 ...,UNROLL_WIDTH,UNROLL_HEIGHT 10098 * ...,UNROLL_WIDTH,UNROLL_MIDDLEMUL1 10159 ...,UNROLL_HEIGHT,UNROLL_MIDDLEMUL1 10157 ...,UNROLL_WIDTH,UNROLL_HEIGHT,UNROLL_MIDDLEMUL1 10157 NO_ASM,MERGED_MIDDLE,WORKINGIN1A,WORKINGOUT1 9961 ...,T2_SHUFFLE_WIDTH 9906 ...,T2_SHUFFLE_MIDDLE 9879 ...,T2_SHUFFLE_HEIGHT 9645 ...,T2_SHUFFLE_REVERSELINE 9965 ...,T2_SHUFFLE 9485 * NO_ASM 10311 NO_ASM 10296 ...,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_MIDDLE 9594 ...,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_WIDTH 9626 ...,T2_SHUFFLE_WIDTH,T2_SHUFFLE_REVERSELINE,T2_SHUFFLE_MIDDLE 9878 ...,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,T2_SHUFFLE_REVERSELINE 9518 ...,T2_SHUFFLE_WIDTH,T2_SHUFFLE_MIDDLE,T2_SHUFFLE_HEIGHT,SHUFFLE_REVERSELINE 9484 correct to workingout1a NO_ASM 10296 NO_ASM 10299 ...,CARRY32 9289 * ...,CARRY64 9443 NO_ASM 10301 NO_ASM 10294 ...,FANCY_MIDDLEMUL1 error no middlemul1 for the 48M fft ...,MORE_SQUARES_MIDDLEMUL1 9274 * ...,CHEBYSHEV_METHOD EE ...,CHEBYSHEV_METHOD_FMA EE ...,ORIGINAL_METHOD 9290 ...,ORIGINAL_TWEAKED 9288 NO_ASM 10310 NO_ASM 10304 ...,ORIG_MIDDLEMUL2 9277 * ...,CHEBYSHEV_MIDDLEMUL2 EE NO_ASM 10288 NO_ASM 10286 ...,ORIG_SLOWTRIG 9438 ...,NEW_SLOWTRIG 9278 ...,MORE_ACCURATE 9278 ...,LESS_ACCURATE 9230 * NO_ASM,MERGED_MIDDLE,WORKINGIN1A,WORKINGOUT1A,UNROLL_HEIGHT,T2_SHUFFLE,CARRY32,MORE_SQUARES_MIDDLEMUL1,ORIG_MIDDLEMUL2,LESS_ACCURATE gain from tuning 10305/9230 = ~1.1165 without -time, it's a bit faster, ~9161us/it[/CODE]This yields an estimated run time of 3 months (90.3 days to be precise). The upper limit of mersenne.org would take ~4.2 months. The same gpu in the same conditions but with a different tune for 4.5M fft has produced a matching PRP DC in its first attempt. |
gpuowl-win v6.11-147-g3b8b00e build
2 Attachment(s)
Just built, only -h run so far.
|
memory allocation error in v6.11-147
[CODE]C:\Users\User\Documents\gpuowl-v6.11-147\radeonvii>gpuowl-win
2020-02-17 18:14:27 gpuowl v6.11-147-g3b8b00e 2020-02-17 18:14:27 config: -device 1 -user kriesel -cpu asrock/radeonvii -yield -maxAlloc 155000 -use NO_ASM,MERGED_MIDDLE,UNRO LL_HEIGHT,T2_SHUFFLE,CARRY32,MORE_SQUARES_MIDDLEMUL1,ORIG_MIDDLEMUL2,LESS_ACCURATE 2020-02-17 18:14:27 config: 2020-02-17 18:14:28 device 1, unique id '' 2020-02-17 18:14:28 asrock/radeonvii 24000577 FFT 1280K: Width 8x8, Height 256x4, Middle 10; 18.31 bits/word 2020-02-17 18:14:29 asrock/radeonvii OpenCL args "-DEXP=24000577u -DWIDTH=64u -DSMALL_HEIGHT=1024u -DMIDDLE=10u -DWEIGHT_STEP=0x c.e5beac96a0b88p-3 -DIWEIGHT_STEP=0x9.eca8ba4660afp-4 -DWEIGHT_BIGSTEP=0xe.ac0c6e7dd2438p-3 -DIWEIGHT_BIGSTEP=0x8.b95c1e3ea8bd8p -4 -DPM1=1 -DAMDGPU=1 -DCARRY32=1 -DLESS_ACCURATE=1 -DMERGED_MIDDLE=1 -DMORE_SQUARES_MIDDLEMUL1=1 -DNO_ASM=1 -DORIG_MIDDLEMUL2=1 -DT2_SHUFFLE=1 -DUNROLL_HEIGHT=1 -cl-fast-relaxed-math -cl-std=CL2.0" 2020-02-17 18:14:41 asrock/radeonvii OpenCL compilation in 11.44 s 2020-02-17 18:14:41 asrock/radeonvii 24000577 P1 B1=300000, B2=9000000; 432351 bits; starting at 432350 2020-02-17 18:14:41 asrock/radeonvii 24000577 P1 432351 100.00%; 84768 us/it; ETA 0d 00:00; 55a8d888497469ec 2020-02-17 18:14:41 asrock/radeonvii P-1 (B1=300000, B2=9000000, D=30030): primes 576492, expanded 615799, doubles 105850 (left 373895), singles 364792, total 470642 (82%) 2020-02-17 18:14:41 asrock/radeonvii 24000577 P2 using blocks [10 - 300] to cover 470642 primes GNU MP: Cannot allocate memory (size=164912) This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. 2020-02-17 18:14:42 asrock/radeonvii 24000577 P2 using 1440 buffers of 10.0 MB each[/CODE]It was able to complete after -use was reduced to merely NO_ASM. But it's still missing the 15M test factor.[CODE] {"exponent":"15000031", "worktype":"PM1", "status":"[B]NF[/B]", "program":{"name":"gpuowl", "version":"v6.11-147-g3b8b00e"}, "timestamp":"2020-02-17 23:16:22 UTC", "user":"kriesel", "computer":"asrock/radeonvii", "fft-length":786432, "B1":180000, "B2":3780000}[/CODE]But tthe 81M test exponent also fails with the cannot allocate memory error (size=2637840) |
[QUOTE=kriesel;537797]2020-02-17 18:14:27 config: -device 1 -user kriesel -cpu asrock/radeonvii -yield -maxAlloc 155000 -use NO_ASM,MERGED_MIDDLE,UNRO
LL_HEIGHT,T2_SHUFFLE,CARRY32,MORE_SQUARES_MIDDLEMUL1,ORIG_MIDDLEMUL2,LESS_ACCURATE [/QUOTE] You realize you're using an unrealistically large maxAlloc; I don't know if this is what's causing the mem alloc error. I fixed the auto FFT size for P-1. |
[QUOTE=preda;537830]You realize you're using an unrealistically large maxAlloc; I don't know if this is what's causing the mem alloc error.
I fixed the auto FFT size for P-1.[/QUOTE] I ran into trouble at 16000. 155 was supposed to be a reduction. And 1440 x 10 should fit within 15500 or 16000. |
[QUOTE=kriesel;537841]I ran into trouble at 16000. 155 was supposed to be a reduction. And 1440 x 10 should fit within 15500 or 16000.[/QUOTE]
You did notice the extra 0, right? |
| All times are UTC. The time now is 23:11. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.