![]() |
[QUOTE=kriesel;516704]...
Not sure why 1398269 reliably fails and 3021377 correctly runs to completion. Haven't tried P-1. ...[/QUOTE] I had a similar problem with a similar exponent where it failed on its preferred FFT of 72K and 80K but worked on 128K. Below is an example of an exponent that works on its preferred FFT of 64K and 128K but throws "error on load" for 72K and 80K. [code]2019-05-19 14:07:39 Note: no config.txt file found 2019-05-19 14:07:39 config: -prp 1275001 2019-05-19 14:07:39 1275001 FFT 64K: Width 8x8, Height 64x8; 19.45 bits/word 2019-05-19 14:07:39 using short carry kernels 2019-05-19 14:07:41 OpenCL compilation in 2079 ms, with "-DEXP=1275001u -DWIDTH=64u -DSMALL_HEIGHT=512u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-05-19 14:07:41 1275001.owl not found, starting from the beginning. 2019-05-19 14:07:42 1275001 OK 2000 0.16%; 0.12 ms/sq; ETA 0d 00:03; d19a9c6b08d199b6 (check 0.13s) 2019-05-19 14:07:44 1275001 20000 1.57%; 0.13 ms/sq; ETA 0d 00:03; 65e3704fff61d046 2019-05-19 14:07:45 Stopping, please wait.. 2019-05-19 14:07:46 1275001 OK 31000 2.43%; 0.12 ms/sq; ETA 0d 00:03; 19d3b2da2559da70 (check 0.15s) 2019-05-19 14:07:46 Exiting because "stop requested" 2019-05-19 14:07:46 Bye[/code][code]2019-05-19 14:07:07 Note: no config.txt file found 2019-05-19 14:07:07 config: -prp 1275001 -fft 72K 2019-05-19 14:07:07 1275001 FFT 72K: Width 8x8, Height 8x8, Middle 9; 17.29 bits/word 2019-05-19 14:07:07 using short carry kernels 2019-05-19 14:07:10 OpenCL compilation in 1984 ms, with "-DEXP=1275001u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-05-19 14:07:10 1275001.owl not found, starting from the beginning. 2019-05-19 14:07:10 1275001 EE loaded: 0, blockSize 1000, 0000000000000000 (expected 0000000000000003x) 2019-05-19 14:07:10 Exiting because "error on load" 2019-05-19 14:07:10 Bye[/code][code]2019-05-19 14:08:02 Note: no config.txt file found 2019-05-19 14:08:02 config: -prp 1275001 -fft 80K 2019-05-19 14:08:02 1275001 FFT 80K: Width 8x8, Height 8x8, Middle 10; 15.56 bits/word 2019-05-19 14:08:02 using short carry kernels 2019-05-19 14:08:04 OpenCL compilation in 1985 ms, with "-DEXP=1275001u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=10u -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-05-19 14:08:04 1275001.owl not found, starting from the beginning. 2019-05-19 14:08:05 1275001 EE loaded: 0, blockSize 1000, 0000000000000000 (expected 0000000000000003x) 2019-05-19 14:08:05 Exiting because "error on load" 2019-05-19 14:08:05 Bye[/code][code]2019-05-19 14:08:15 Note: no config.txt file found 2019-05-19 14:08:15 config: -prp 1275001 -fft 128K 2019-05-19 14:08:15 1275001 FFT 128K: Width 256x4, Height 8x8; 9.73 bits/word 2019-05-19 14:08:15 using long carry kernels 2019-05-19 14:08:17 OpenCL compilation in 1920 ms, with "-DEXP=1275001u -DWIDTH=1024u -DSMALL_HEIGHT=64u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-05-19 14:08:17 1275001.owl not found, starting from the beginning. 2019-05-19 14:08:18 1275001 OK 2000 0.16%; 0.15 ms/sq; ETA 0d 00:03; d19a9c6b08d199b6 (check 0.16s) 2019-05-19 14:08:20 1275001 20000 1.57%; 0.15 ms/sq; ETA 0d 00:03; 65e3704fff61d046 2019-05-19 14:08:23 1275001 40000 3.14%; 0.15 ms/sq; ETA 0d 00:03; ddca1e3b88d59ea2 2019-05-19 14:08:24 Stopping, please wait.. 2019-05-19 14:08:24 1275001 OK 44000 3.45%; 0.15 ms/sq; ETA 0d 00:03; 50e59fd6714c3a09 (check 0.16s) 2019-05-19 14:08:24 Exiting because "stop requested" 2019-05-19 14:08:24 Bye[/code]When you do P-1 instead of PRP it erroneously does stage 1 with zeroed residues and fails an assert only at the start of stage 2: [code] 2019-05-19 14:23:05 1275001 710000 98.40%; 0.15 ms/sq; ETA 0d 00:00; 0000000000000000 2019-05-19 14:23:06 1275001 720000 99.79%; 0.15 ms/sq; ETA 0d 00:00; 0000000000000000 2019-05-19 14:25:19 Round 0 of 1: init 1.88 s; 0.17 ms/mul; 764090 muls 2019-05-19 14:25:19 1275001 P-1 stage1 GCD: no factor gpuowl: GmpUtil.cpp:25: std::__cxx11::string GCD(u32, const std::vector<unsigned int>&, u32): Assertion `mpz_cmp_ui(b, 0)' failed. Aborted (core dumped)[/code] Some sort of bounds issue? I've encountered it a few times when trying to make a benchmark script that benches PRP at every FFT with an exponent at 90% of what gpuowl says is the maximum for that FFT and it fails in the same way for 48K, 72K, 80K, 768K, 1152K and 1280K. |
[QUOTE=M344587487;517138]I had a similar problem with a similar exponent where it failed on its preferred FFT of 72K and 80K but worked on 128K. Below is an example of an exponent that works on its preferred FFT of 64K and 128K but throws "error on load" for 72K and 80K.
[code]2019-05-19 14:07:39 Note: no config.txt file found 2019-05-19 14:07:39 config: -prp 1275001 2019-05-19 14:07:39 1275001 FFT 64K: Width 8x8, Height 64x8; 19.45 bits/word 2019-05-19 14:07:39 using short carry kernels 2019-05-19 14:07:41 OpenCL compilation in 2079 ms, with "-DEXP=1275001u -DWIDTH=64u -DSMALL_HEIGHT=512u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-05-19 14:07:41 1275001.owl not found, starting from the beginning. 2019-05-19 14:07:42 1275001 OK 2000 0.16%; 0.12 ms/sq; ETA 0d 00:03; d19a9c6b08d199b6 (check 0.13s) 2019-05-19 14:07:44 1275001 20000 1.57%; 0.13 ms/sq; ETA 0d 00:03; 65e3704fff61d046 2019-05-19 14:07:45 Stopping, please wait.. 2019-05-19 14:07:46 1275001 OK 31000 2.43%; 0.12 ms/sq; ETA 0d 00:03; 19d3b2da2559da70 (check 0.15s) 2019-05-19 14:07:46 Exiting because "stop requested" 2019-05-19 14:07:46 Bye[/code][code]2019-05-19 14:07:07 Note: no config.txt file found 2019-05-19 14:07:07 config: -prp 1275001 -fft 72K 2019-05-19 14:07:07 1275001 FFT 72K: Width 8x8, Height 8x8, Middle 9; 17.29 bits/word 2019-05-19 14:07:07 using short carry kernels 2019-05-19 14:07:10 OpenCL compilation in 1984 ms, with "-DEXP=1275001u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=9u -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-05-19 14:07:10 1275001.owl not found, starting from the beginning. 2019-05-19 14:07:10 1275001 EE loaded: 0, blockSize 1000, 0000000000000000 (expected 0000000000000003x) 2019-05-19 14:07:10 Exiting because "error on load" 2019-05-19 14:07:10 Bye[/code][code]2019-05-19 14:08:02 Note: no config.txt file found 2019-05-19 14:08:02 config: -prp 1275001 -fft 80K 2019-05-19 14:08:02 1275001 FFT 80K: Width 8x8, Height 8x8, Middle 10; 15.56 bits/word 2019-05-19 14:08:02 using short carry kernels 2019-05-19 14:08:04 OpenCL compilation in 1985 ms, with "-DEXP=1275001u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=10u -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-05-19 14:08:04 1275001.owl not found, starting from the beginning. 2019-05-19 14:08:05 1275001 EE loaded: 0, blockSize 1000, 0000000000000000 (expected 0000000000000003x) 2019-05-19 14:08:05 Exiting because "error on load" 2019-05-19 14:08:05 Bye[/code][code]2019-05-19 14:08:15 Note: no config.txt file found 2019-05-19 14:08:15 config: -prp 1275001 -fft 128K 2019-05-19 14:08:15 1275001 FFT 128K: Width 256x4, Height 8x8; 9.73 bits/word 2019-05-19 14:08:15 using long carry kernels 2019-05-19 14:08:17 OpenCL compilation in 1920 ms, with "-DEXP=1275001u -DWIDTH=1024u -DSMALL_HEIGHT=64u -DMIDDLE=1u -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-05-19 14:08:17 1275001.owl not found, starting from the beginning. 2019-05-19 14:08:18 1275001 OK 2000 0.16%; 0.15 ms/sq; ETA 0d 00:03; d19a9c6b08d199b6 (check 0.16s) 2019-05-19 14:08:20 1275001 20000 1.57%; 0.15 ms/sq; ETA 0d 00:03; 65e3704fff61d046 2019-05-19 14:08:23 1275001 40000 3.14%; 0.15 ms/sq; ETA 0d 00:03; ddca1e3b88d59ea2 2019-05-19 14:08:24 Stopping, please wait.. 2019-05-19 14:08:24 1275001 OK 44000 3.45%; 0.15 ms/sq; ETA 0d 00:03; 50e59fd6714c3a09 (check 0.16s) 2019-05-19 14:08:24 Exiting because "stop requested" 2019-05-19 14:08:24 Bye[/code]When you do P-1 instead of PRP it erroneously does stage 1 with zeroed residues and fails an assert only at the start of stage 2: [code] 2019-05-19 14:23:05 1275001 710000 98.40%; 0.15 ms/sq; ETA 0d 00:00; 0000000000000000 2019-05-19 14:23:06 1275001 720000 99.79%; 0.15 ms/sq; ETA 0d 00:00; 0000000000000000 2019-05-19 14:25:19 Round 0 of 1: init 1.88 s; 0.17 ms/mul; 764090 muls 2019-05-19 14:25:19 1275001 P-1 stage1 GCD: no factor gpuowl: GmpUtil.cpp:25: std::__cxx11::string GCD(u32, const std::vector<unsigned int>&, u32): Assertion `mpz_cmp_ui(b, 0)' failed. Aborted (core dumped)[/code]Some sort of bounds issue? I've encountered it a few times when trying to make a benchmark script that benches PRP at every FFT with an exponent at 90% of what gpuowl says is the maximum for that FFT and it fails in the same way for 48K, 72K, 80K, 768K, 1152K and 1280K.[/QUOTE] Sometimes I have the same all-zeroes residue, but I don't know if the issue is the same. Gpuowl should reload the last checkpoint after a check. |
[QUOTE=M344587487;517138] Some sort of bounds issue? I've encountered it a few times when trying to make a benchmark script that benches PRP at every FFT with an exponent at 90% of what gpuowl says is the maximum for that FFT and it fails in the same way for 48K, 72K, 80K, 768K, 1152K and 1280K.[/QUOTE]Thanks for the testing. What gpu was that on?
Gpuowl blithely accepting and continuing on all-0 res64 values in P-1 is a missed opportunity for error detection. Printing that it completed stage one, when the interim res64s are all zeros is unfortunate. Zero and one are known error conditions in P-1 (CUDAPm1 for example). And the Gerbicz check is not applicable to P-1 computations, so adding that check back in for P-1 computations would be useful, in this otherwise unchecked run case. Per Preda, there was a zero check present in the PRP code a while ago. [URL]https://www.mersenneforum.org/showpost.php?p=466658&postcount=189[/URL] |
[QUOTE=M344587487;517138] Some sort of bounds issue? I've encountered it a few times when trying to make a benchmark script that benches PRP at every FFT with an exponent at 90% of what gpuowl says is the maximum for that FFT and it fails in the same way for 48K, 72K, 80K, 768K, 1152K and 1280K.[/QUOTE]Thanks for the testing. What gpu was that on?
Gpuowl blithely accepting and continuing on all-0 res64 values is a missed opportunity for error detection. Zero and one are known error conditions in P-1 (CUDAPm1 for example). And the Gerbicz check is not applicable to P-1 computations, so adding that zero check back in for P-1 computations would be useful. Per Preda, there was a zero check present in the PRP code a while ago. [url]https://www.mersenneforum.org/showpost.php?p=466658&postcount=189[/url] |
[QUOTE=kriesel;517143]Thanks for the testing. What gpu was that on?
Gpuowl blithely accepting and continuing on all-0 res64 values is a missed opportunity for error detection. Zero and one are known error conditions in P-1 (CUDAPm1 for example). And the Gerbicz check is not applicable to P-1 computations, so adding that zero check back in for P-1 computations would be useful. Per Preda, there was a zero check present in the PRP code a while ago. [URL]https://www.mersenneforum.org/showpost.php?p=466658&postcount=189[/URL][/QUOTE] Absolutely. The fact that after an all-zeroes-residue the GEC fails and gpuowl reloads the last checkpoint file. For PRP of course. |
[QUOTE=kriesel;517143]Thanks for the testing. What gpu was that on?
Gpuowl blithely accepting and continuing on all-0 res64 values is a missed opportunity for error detection. Zero and one are known error conditions in P-1 (CUDAPm1 for example). And the Gerbicz check is not applicable to P-1 computations, so adding that zero check back in for P-1 computations would be useful. Per Preda, there was a zero check present in the PRP code a while ago. [URL]https://www.mersenneforum.org/showpost.php?p=466658&postcount=189[/URL][/QUOTE] Radeon VII. It's not a bounds issue as I've been testing 72K at 1K exponent intervals and it just doesn't work, fails the zero check every time. Could be an initialisation error, whatever it is it probably applies to all of these too: 48K, 72K, 80K, 768K, 1152K and 1280K. |
[QUOTE=SELROC;517145]Absolutely. The fact that after an all-zeroes-residue the GEC fails and gpuowl reloads the last checkpoint file. For PRP of course.[/QUOTE]Catching it earlier from producing console / log output of PRP has a time advantage.
In the case where a zero error occurs, if uniformly distributed over iteration numbers of first appearance, it can be detected on average console-output-interval/2 iterations later by a separate zero res64 check, while the Gerbicz error check would take on average blocksize-squared/2 iterations. For V6.5 default operation, those averages would be 10,000 and 500,000 iterations respectively. Per Preda and Ewmayer, res64 determination in gpuowl and mlucas are fast. And using the res64 determined already for console output makes even that small cost vanish, leaving only the very small cost of a 64-bit compare or 16-char string compare. A 490,000 iterations savings on my RX480 at 3.8ms/iter for current wavefront exponents is of order 1862 seconds, just over half an hour. (About 59 ppm per occurrence per year, so it would take 17 of them per year to accumulate to 0.1% performance difference.) But hopefully these zero errors are rare occurrences in PRP. They seem to be rare, from a casual look at my logs. I don't recall ever seeing a zero from gpuowl. [QUOTE=M344587487;517148]Radeon VII. It's not a bounds issue as I've been testing 72K at 1K exponent intervals and it just doesn't work, fails the zero check every time. Could be an initialisation error, whatever it is it probably applies to all of these too: 48K, 72K, 80K, 768K, 1152K and 1280K.[/QUOTE]Fortunately, all of those are well below the size used for current production primality testing in the GIMPS; ~4608K for first primality test, ~2688K for (LL) double checks, ~4M for PRP double checks. |
[QUOTE=kriesel;517150]Catching it earlier from producing console / log output has a time advantage.
In the case where a zero error occurs, if uniformly distributed over iteration numbers of first appearance, it can be detected on average console-output-interval/2 iterations later by a separate zero res64 check, while the Gerbicz error check would take on average blocksize-squared/2 iterations. For V6.5 default operation, those averages would be 10,000 and 500,000 iterations respectively. Per Preda and Ewmayer, res64 determination in gpuowl and mlucas are fast. And using the res64 determined already for console output makes even that small cost vanish, leaving only the very small cost of a 64-bit compare or 16-char string compare. A 490,000 iterations savings on my RX480 at 3.8ms/iter for current wavefront exponents is of order 1862 seconds, just over half an hour. (About 59 ppm per occurrence per year, so it would take 17 of them per year to accumulate to 0.1% performance difference.) But hopefully these zero errors are rare occurrences in PRP. They seem to be rare, from a casual look at my logs. I don't recall ever seeing a zero from gpuowl.[/QUOTE] It occurs to me that the zero error is not often and I did not find a way to reproduce it reliably. It may happen two or three times one day, and the day after not happen at all. |
I have a suspicion fft-64 is broken, and all the sizes that use it. I need to investigate. Give me a few days.
|
[QUOTE=preda;517195]I have a suspicion fft-64 is broken, and all the sizes that use it. I need to investigate. Give me a few days.[/QUOTE]
With new version there is remarkable speedup on 332M exponent ! Went from 4.13 ms/sq to 3.7 ms/sq Good ! I did change the FFT however, from -fft +2 to normal fft without arguments. -fft +2 now fails to load. |
[QUOTE=M344587487;517138]I had a similar problem with a similar exponent where it failed on its preferred FFT of 72K and 80K but worked on 128K. Below is an example of an exponent that works on its preferred FFT of 64K and 128K but throws "error on load" for 72K and 80K.
[...] Some sort of bounds issue? I've encountered it a few times when trying to make a benchmark script that benches PRP at every FFT with an exponent at 90% of what gpuowl says is the maximum for that FFT and it fails in the same way for 48K, 72K, 80K, 768K, 1152K and 1280K.[/QUOTE] Thanks for the bug report! Turns out in the current implementation, the MIDDLE step of the FFT can't be done correctly when H < 256. I think all your failing cases were in that situation. Anyway, I updated the FFTConfig to not generate the invalid size combinations anymore; please retry. |
| All times are UTC. The time now is 23:14. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.