![]() |
[QUOTE=kriesel;561028]It's repeatable in V7.0-35, on both error looping and normally running worktodo lines.
[/QUOTE] Thanks, I'll need to look into why STATS fails for large exponents. |
Proof validation
For those who can, it may be a good idea to use -proof 9, which enables validation of the proof. The cost of the validation is 0.2% which is small enough (on the order of 2-3 minutes on R7), but it makes sure that the proof is good before beaming it up to the server.
|
FYI
[QUOTE=preda;561057]Thanks, I'll need to look into why STATS fails for large exponents.[/QUOTE]-use STATS some quick try on existing runs results, supplemented with ~binary search with new worktodo lines:
123M PRP, v7.0-40, ok 150M PRP, V6.11-364 ok 177.8M LL, V6.11-364 ok 181M LL, V6.11-380 ok 190M LL, V6.11-364 ok 320M PRP, V7.0-40 ok 480M PRP, V7.1-1, ok 554M PRP, V7.1-1, ok 558M PRP/P-1, V7.1-1, ok 560M PRP/P-1, V7.1-1, ok 561 PRP/P-1, V7.1-1, out of resources error 562.6 PRP/P-1, V7.1-1, out of resources error 600M PRP/P-1, V7.1-1, out of resources error 642M PRP, V6.11-364 out of resources error 764M PRP, V6.11-364 out of resources error (previously reported below:) 843M PRP/P-1 V7.0-35 out of resources error 957M PRP/P-1 V7.0-35 out of resources error |
Gpuowl FFT limits from the help.txt files
v7.1-1: FFT 30M [ 47.19M - 560.64M] 1K:15:1K 4K:15:256 FFT 32M [ 50.33M - 599.62M] 4K:8:512 4K:4:1K V6.11-364: FFT 30M [ 47.19M - 560.64M] 1K:15:1K 4K:15:256 FFT 32M [ 50.33M - 599.62M] 4K:8:512 4K:4:1K 560.63M B1=3200000,B2=140000000;PRP=0,1,2,560630051,-1,83,2 STATS ok at 30M fft, maxalloc 14G, Radeon VII 560.65M B1=3200000,B2=140000000;PRP=0,1,2,560650067,-1,83,2 STATS at 32M fail with OUT_OF_RESOURCES error 560.63M forced to 32M with -fft 32M in config.txt: fail with OUT_OF_RESOURCES error -use STATS OUT_OF_RESOURCES fatal error appears to relate to fft size 32M or larger |
... and that behavior transition coincides with the change to 4K head, since I was testing with default fft selection.
[CODE]FFT 30M [ 47.19M - 560.64M] 1K:15:1K 4K:15:256 FFT 32M [ 50.33M - 599.62M] 4K:8:512 4K:4:1K FFT 36M [ 56.62M - 671.04M] 4K:9:512 FFT 40M [ 62.91M - 743.74M] 4K:10:512 4K:5:1K FFT 44M [ 69.21M - 816.39M] 4K:11:512 FFT 48M [ 75.50M - 889.11M] 4K:12:512 4K:6:1K FFT 52M [ 81.79M - 961.97M] 4K:13:512 FFT 56M [ 88.08M - 1033.20M] 4K:14:512 4K:7:1K FFT 60M [ 94.37M - 1103.74M] 4K:15:512 FFT 64M [100.66M - 1177.31M] 4K:8:1K FFT 72M [113.25M - 1321.02M] 4K:9:1K FFT 80M [125.83M - 1464.31M] 4K:10:1K FFT 88M [138.41M - 1607.03M] 4K:11:1K FFT 96M [150.99M - 1751.79M] 4K:12:1K FFT 104M [163.58M - 1893.52M] 4K:13:1K FFT 112M [176.16M - 2035.14M] 4K:14:1K FFT 120M [188.74M - 2172.36M] 4K:15:1K[/CODE]Looking at the fft descriptors tested for [URL]https://mersenneforum.org/showpost.php?p=561090&postcount=146[/URL] 123M 1K:13:256 150M 1K:8:512 177M, 181M, 190M 1K:10:512 320M 1K:9:1K 480M 1K:13:1M 554M, 556M, 560M 1K:15:1K 1K head; 8, 9, 10, 13, 15 middle; 256, 512, or 1K tail combinations tried ok; 561M, 562.6M 4K:8:512 600M, 642M 4K:9:512 764M 4K:11:512 843M 4K:12:512 957M 4K:13:512 4K head; 8, 9, 11, 12 or 13 middle; 512 tail combinations tried failed. The common factor seems to be related to the 4K head. Some middles fall in both lists, as does the 512 tail. That would mean all fft lengths from 32M to 120M could have the -use STATS OUT_OF_RESOURCES issue. (edit) But it also means some lower than 32M could have it. And this is confirmed by running a single test on the same exponent that was ok with the default fft, with 4K:15:256:[CODE]2020-10-26 12:16:31 gpuowl v7.1-1-g0f73d04 2020-10-26 12:16:31 config: -user kriesel -cpu asr2/radeonvii0 -d 1 -maxAlloc 14G -proof 9 -log 100000 -use NO_ASM,STATS -fft 4K:15:256 2020-10-26 12:16:31 device 1, unique id '' 2020-10-26 12:16:31 asr2/radeonvii0 560630051 FFT: 30M 4K:15:256 (17.82 bpw) 2020-10-26 12:16:36 asr2/radeonvii0 560630051 OpenCL args "-DEXP=560630051u -DWIDTH=4096u -DSMALL_HEIGHT=256u -DMIDDLE=15u -DAMDGPU=1 -DCARRY64=1 -DCARRYM64=1 -DMM_CHAIN=3u -DMM2_CHAIN=3u -DMAX_ACCURACY=1 -DULTRA_TRIG=1 -DWEIGHT_STEP_MINUS_1=0x8.681b5a84b24dp-6 -DIWEIGHT_STEP_MINUS_1=-0xe.dc7abc0d7b388p-7 -DNO_ASM=1 -DSTATS=1 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2020-10-26 12:16:41 asr2/radeonvii0 560630051 OpenCL compilation in 4.72 s 2020-10-26 12:16:43 asr2/radeonvii0 560630051 maxAlloc: 14.0 GB 2020-10-26 12:16:43 asr2/radeonvii0 560630051 P1(3.2M) 4617012 bits 2020-10-26 12:16:44 asr2/radeonvii0 560630051 Acquired memory lock 'memlock-1' 2020-10-26 12:16:44 asr2/radeonvii0 560630051 P1(3.2M) using 100 buffers 2020-10-26 12:16:46 asr2/radeonvii0 560630051 P1(3.2M) releasing 100 buffers 2020-10-26 12:16:46 asr2/radeonvii0 560630051 Released memory lock 'memlock-1' 2020-10-26 12:16:47 asr2/radeonvii0 Exception gpu_error: OUT_OF_RESOURCES carryFused at clwrap.cpp:325 run 2020-10-26 12:16:47 asr2/radeonvii0 Bye[/CODE](end edit) It appears from extrapolation of FFT lengths' exponent limits that a modified gpuowl to attack P-1 or testing of F33 would require at least 480M length (perhaps as 4K:15:4K), and that would benefit from -use STATS checks being available both in development and end use. |
Assertion failed on large P-1
1 Attachment(s)
Same asr2 system as in some previous reports, gpuowl-win 7.1-1
[CODE]2020-10-28 10:15:50 asr2/radeonvii0 937156667 P1 Jacobi OK @ 12451200 d8514e45158c1e7d 2020-10-28 10:16:00 asr2/radeonvii0 937156667 OK 12500800 1.33% 9f520cdb7c997859 16645 us/it + check 6.66s + save 2.89s; ETA 178d 03:15 2020-10-28 10:16:00 asr2/radeonvii0 937156667 P2(8630000,258.9M) D=210, nBuf=22 Assertion failed: nBuf >= minBufsFor(D), file Pm1Plan.cpp, line 154[/CODE]Issue was repeatable on application restart, not surprising. Looks like it occurred at the onset of P2. Resolved by changing to -maxAlloc 15G from 14G at least for now. Larger exponents are likely to run into trouble. The "Assertion failed" line is not present in gpuowl.log. It was captured from the console window. If practical it would be useful to have minBufs a user option. Or switch to the next worktodo line when a roadblock is hit. Or both. |
v7.1-11
How does one actually attempt to use the 2xSP?
There's nothing in the help output about it, or readme. (It seems a bit early to expect it to be automatically selecting depending on gpu model...) |
[QUOTE=kriesel;561343]Same asr2 system as in some previous reports, gpuowl-win 7.1-1
[CODE]2020-10-28 10:15:50 asr2/radeonvii0 937156667 P1 Jacobi OK @ 12451200 d8514e45158c1e7d 2020-10-28 10:16:00 asr2/radeonvii0 937156667 OK 12500800 1.33% 9f520cdb7c997859 16645 us/it + check 6.66s + save 2.89s; ETA 178d 03:15 2020-10-28 10:16:00 asr2/radeonvii0 937156667 P2(8630000,258.9M) D=210, nBuf=22 Assertion failed: nBuf >= minBufsFor(D), file Pm1Plan.cpp, line 154[/CODE]Issue was repeatable on application restart, not surprising. Looks like it occurred at the onset of P2. Resolved by changing to -maxAlloc 15G from 14G at least for now. Larger exponents are likely to run into trouble. The "Assertion failed" line is not present in gpuowl.log. It was captured from the console window. If practical it would be useful to have minBufs a user option. Or switch to the next worktodo line when a roadblock is hit. Or both.[/QUOTE] P2 needs at least 24 buffers. As the exponent grows, the buffer size grows, and this minimum required may not be met dependning on the -maxAlloc allowed. I do not plan to fix this ATM, let's simply say that if not enough GPU memory is available, then huge exponents can't be P2. |
[QUOTE=kriesel;561352]How does one actually attempt to use the 2xSP?
There's nothing in the help output about it, or readme. (It seems a bit early to expect it to be automatically selecting depending on gpu model...)[/QUOTE] No, 2xSP is an experiment, can't be used for anything yet. Still a long way to go. (I was just measuring the precission that can be achieved *if* it was implemented) |
2 things:
V7.1-11 P-1 hiccup, so says Bye. (Sure would be nice if it would progress to the next worktodo entry instead of quitting.) [CODE]2020-10-30 13:18:44 GpuOwl VERSION v7.1-11-g97cfbd2 2020-10-30 13:18:44 config: -user kriesel -cpu asr2/radeonvii2 -d 2 -maxAlloc 15G -proof 9 -log 100000 -use NO_ASM 2020-10-30 13:18:44 device 2, unique id '' 2020-10-30 13:18:44 asr2/radeonvii2 153021377 FFT: 8M 1K:8:512 (18.24 bpw) 2020-10-30 13:18:46 asr2/radeonvii2 153021377 OpenCL args "-DEXP=153021377u -DWIDTH=1024u -DSMALL_HEIGHT=512u -DMIDDLE=8u -DAMDGPU=1 -DCARRY64=1 -DCARRYM64=1 -DMM_CHAIN=1u -DMM2_CHAIN=1u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0xb.10feac5431868p-4 -DIWEIGHT_STEP_MINUS_1=-0xd.156361ac01fe8p-5 -DNO_ASM=1 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2020-10-30 13:18:53 asr2/radeonvii2 153021377 OpenCL compilation in 7.71 s 2020-10-30 13:18:54 asr2/radeonvii2 153021377 maxAlloc: 15.0 GB 2020-10-30 13:18:54 asr2/radeonvii2 153021377 P1(1M) 1442134 bits 2020-10-30 13:18:54 asr2/radeonvii2 153021377 PRP starting from beginning 2020-10-30 13:18:54 asr2/radeonvii2 153021377 Acquired memory lock 'memlock-2' 2020-10-30 13:18:54 asr2/radeonvii2 153021377 P1(1M) using 460 buffers 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [0] 2d87ce26 != fffffffb 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [1] 7581b6da != 00000019 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [2] efca9779 != ffffff83 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [3] 03fe031c != 00000271 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [4] 21d014f0 != fffff3cb 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [5] 2100996a != 00003d09 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [6] 18280ed1 != fffeced3 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [7] fdd2f6a2 != 0005f5e1 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [8] 563a16a6 != ffe2329b 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [9] 1c97ee6a != 009502f9 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [10] 44fb8d20 != fd16f123 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [11] 0de14ac0 != 0e8d4a51 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [12] fd818931 != b73d8c6b 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [13] 058c6909 != 6bcc41e9 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [14] c4dfa66e != e502b673 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [15] 8739ef6f != 86f26fc1 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [16] 3f8bbf6e != 5d43d13b 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [17] 19ad23d9 != 2dace9d9 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [18] 6b9d9ac6 != 1b9f6ec3 2020-10-30 13:18:58 asr2/radeonvii2 153021377 [19] 729bd6ae != 75e2d631 2020-10-30 13:18:58 asr2/radeonvii2 153021377 fold() does not roundtrip 2020-10-30 13:18:58 asr2/radeonvii2 153021377 P1(1M) releasing 460 buffers 2020-10-30 13:18:59 asr2/radeonvii2 153021377 Released memory lock 'memlock-2' 2020-10-30 13:18:59 asr2/radeonvii2 Exiting because "fold roundtrip" 2020-10-30 13:18:59 asr2/radeonvii2 Bye[/CODE]A 623M assignment had no such issue at 623466917 FFT: 36M 4K:9:512 (16.52 bpw) . Maybe 8M fft on 153M P-1&PRP is a bit optimistic for a default, at 18.24 bits/word? It launches ok if forced to 9M, 16.21 bpw. It's confirmed by quick test that the -use STATS issue for 4K fft head extends down to the minimum length that offers it, 6M [CODE]2020-10-30 12:52:38 gpuowl v6.11-364-g36f4e2a 2020-10-30 12:52:38 config: -user kriesel -cpu asr2/radeonvii -d 1 -maxAlloc 15000 -use NO_ASM,STATS -fft 4K:3:256 2020-10-30 12:52:38 device 1, unique id '' 2020-10-30 12:52:38 asr2/radeonvii 100759339 FFT: 6M 4K:3:256 (16.02 bpw) 2020-10-30 12:52:38 asr2/radeonvii Expected maximum carry32: 12AD0000 2020-10-30 12:52:39 asr2/radeonvii OpenCL args "-DEXP=100759339u -DWIDTH=4096u -DSMALL_HEIGHT=256u -DMIDDLE=3u -DPM1=1 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0xf.a9c658667f95p-4 -DIWEIGHT_STEP_MINUS_1=-0xf.d46dc4b3339dp-5 -DNO_ASM=1 -DSTATS=1 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2020-10-30 12:52:44 asr2/radeonvii OpenCL compilation in 4.50 s 2020-10-30 12:52:45 asr2/radeonvii 100759339 P1 B1=1000000, B2=30000000; 1442134 bits; starting at 1001 2020-10-30 12:52:45 asr2/radeonvii Exception gpu_error: OUT_OF_RESOURCES carryFused at clwrap.cpp:325 run 2020-10-30 12:52:45 asr2/radeonvii Bye 2020-10-30 12:53:07 gpuowl v6.11-364-g36f4e2a 2020-10-30 12:53:07 config: -user kriesel -cpu asr2/radeonvii -d 1 -maxAlloc 15000 -use NO_ASM -fft 4K:3:256 2020-10-30 12:53:07 device 1, unique id '' 2020-10-30 12:53:07 asr2/radeonvii 100759339 FFT: 6M 4K:3:256 (16.02 bpw) 2020-10-30 12:53:07 asr2/radeonvii Expected maximum carry32: 12AD0000 2020-10-30 12:53:09 asr2/radeonvii OpenCL args "-DEXP=100759339u -DWIDTH=4096u -DSMALL_HEIGHT=256u -DMIDDLE=3u -DPM1=1 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0xf.a9c658667f95p-4 -DIWEIGHT_STEP_MINUS_1=-0xf.d46dc4b3339dp-5 -DNO_ASM=1 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2020-10-30 12:53:13 asr2/radeonvii OpenCL compilation in 4.26 s 2020-10-30 12:53:14 asr2/radeonvii 100759339 P1 B1=1000000, B2=30000000; 1442134 bits; starting at 1001 2020-10-30 12:53:24 asr2/radeonvii 100759339 P1 10000 0.69%; 1161 us/it; ETA 0d 00:28; 849bb19b9a9f4ce3 2020-10-30 12:53:36 asr2/radeonvii 100759339 P1 20000 1.39%; 1158 us/it; ETA 0d 00:27; 556f71d2a8cf201c[/CODE] |
v7.2
Please upgrade to v7.2 which fixes a proof generation bug.
|
| All times are UTC. The time now is 05:16. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.