mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
Thread Tools
Old 2019-05-19, 13:33   #1178
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

11001110112 Posts
Default

Quote:
Originally Posted by kriesel View Post
...
Not sure why 1398269 reliably fails and 3021377 correctly runs to completion. Haven't tried P-1.
...
I had a similar problem with a similar exponent where it failed on its preferred FFT of 72K and 80K but worked on 128K. Below is an example of an exponent that works on its preferred FFT of 64K and 128K but throws "error on load" for 72K and 80K.

Code:
2019-05-19 14:07:39 Note: no config.txt file found
2019-05-19 14:07:39 config: -prp 1275001 
2019-05-19 14:07:39 1275001 FFT 64K: Width 8x8, Height 64x8; 19.45 bits/word
2019-05-19 14:07:39 using short carry kernels
2019-05-19 14:07:41 OpenCL compilation in 2079 ms, with "-DEXP=1275001u -DWIDTH=64u -DSMALL_HEIGHT=512u -DMIDDLE=1u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-19 14:07:41 1275001.owl not found, starting from the beginning.
2019-05-19 14:07:42 1275001 OK     2000  0.16%; 0.12 ms/sq; ETA 0d 00:03; d19a9c6b08d199b6 (check 0.13s)
2019-05-19 14:07:44 1275001       20000  1.57%; 0.13 ms/sq; ETA 0d 00:03; 65e3704fff61d046
2019-05-19 14:07:45 Stopping, please wait..
2019-05-19 14:07:46 1275001 OK    31000  2.43%; 0.12 ms/sq; ETA 0d 00:03; 19d3b2da2559da70 (check 0.15s)
2019-05-19 14:07:46 Exiting because "stop requested"
2019-05-19 14:07:46 Bye
Code:
2019-05-19 14:07:07 Note: no config.txt file found
2019-05-19 14:07:07 config: -prp 1275001 -fft 72K 
2019-05-19 14:07:07 1275001 FFT 72K: Width 8x8, Height 8x8, Middle 9; 17.29 bits/word
2019-05-19 14:07:07 using short carry kernels
2019-05-19 14:07:10 OpenCL compilation in 1984 ms, with "-DEXP=1275001u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=9u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-19 14:07:10 1275001.owl not found, starting from the beginning.
2019-05-19 14:07:10 1275001 EE loaded: 0, blockSize 1000, 0000000000000000 (expected 0000000000000003x)
2019-05-19 14:07:10 Exiting because "error on load"
2019-05-19 14:07:10 Bye
Code:
2019-05-19 14:08:02 Note: no config.txt file found
2019-05-19 14:08:02 config: -prp 1275001 -fft 80K 
2019-05-19 14:08:02 1275001 FFT 80K: Width 8x8, Height 8x8, Middle 10; 15.56 bits/word
2019-05-19 14:08:02 using short carry kernels
2019-05-19 14:08:04 OpenCL compilation in 1985 ms, with "-DEXP=1275001u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=10u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-19 14:08:04 1275001.owl not found, starting from the beginning.
2019-05-19 14:08:05 1275001 EE loaded: 0, blockSize 1000, 0000000000000000 (expected 0000000000000003x)
2019-05-19 14:08:05 Exiting because "error on load"
2019-05-19 14:08:05 Bye
Code:
2019-05-19 14:08:15 Note: no config.txt file found
2019-05-19 14:08:15 config: -prp 1275001 -fft 128K 
2019-05-19 14:08:15 1275001 FFT 128K: Width 256x4, Height 8x8; 9.73 bits/word
2019-05-19 14:08:15 using long carry kernels
2019-05-19 14:08:17 OpenCL compilation in 1920 ms, with "-DEXP=1275001u -DWIDTH=1024u -DSMALL_HEIGHT=64u -DMIDDLE=1u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-19 14:08:17 1275001.owl not found, starting from the beginning.
2019-05-19 14:08:18 1275001 OK     2000  0.16%; 0.15 ms/sq; ETA 0d 00:03; d19a9c6b08d199b6 (check 0.16s)
2019-05-19 14:08:20 1275001       20000  1.57%; 0.15 ms/sq; ETA 0d 00:03; 65e3704fff61d046
2019-05-19 14:08:23 1275001       40000  3.14%; 0.15 ms/sq; ETA 0d 00:03; ddca1e3b88d59ea2
2019-05-19 14:08:24 Stopping, please wait..
2019-05-19 14:08:24 1275001 OK    44000  3.45%; 0.15 ms/sq; ETA 0d 00:03; 50e59fd6714c3a09 (check 0.16s)
2019-05-19 14:08:24 Exiting because "stop requested"
 2019-05-19 14:08:24 Bye
When you do P-1 instead of PRP it erroneously does stage 1 with zeroed residues and fails an assert only at the start of stage 2:
Code:
2019-05-19 14:23:05 1275001      710000 98.40%; 0.15 ms/sq; ETA 0d 00:00; 0000000000000000
2019-05-19 14:23:06 1275001      720000 99.79%; 0.15 ms/sq; ETA 0d 00:00; 0000000000000000
2019-05-19 14:25:19 Round 0 of 1: init 1.88 s; 0.17 ms/mul; 764090 muls
2019-05-19 14:25:19 1275001 P-1 stage1 GCD: no factor
gpuowl: GmpUtil.cpp:25: std::__cxx11::string GCD(u32, const std::vector<unsigned int>&, u32): Assertion `mpz_cmp_ui(b, 0)' failed.
 Aborted (core dumped)
Some sort of bounds issue? I've encountered it a few times when trying to make a benchmark script that benches PRP at every FFT with an exponent at 90% of what gpuowl says is the maximum for that FFT and it fails in the same way for 48K, 72K, 80K, 768K, 1152K and 1280K.
M344587487 is offline   Reply With Quote
Old 2019-05-19, 14:02   #1179
SELROC
 

19×151 Posts
Default

Quote:
Originally Posted by M344587487 View Post
I had a similar problem with a similar exponent where it failed on its preferred FFT of 72K and 80K but worked on 128K. Below is an example of an exponent that works on its preferred FFT of 64K and 128K but throws "error on load" for 72K and 80K.

Code:
2019-05-19 14:07:39 Note: no config.txt file found
2019-05-19 14:07:39 config: -prp 1275001 
2019-05-19 14:07:39 1275001 FFT 64K: Width 8x8, Height 64x8; 19.45 bits/word
2019-05-19 14:07:39 using short carry kernels
2019-05-19 14:07:41 OpenCL compilation in 2079 ms, with "-DEXP=1275001u -DWIDTH=64u -DSMALL_HEIGHT=512u -DMIDDLE=1u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-19 14:07:41 1275001.owl not found, starting from the beginning.
2019-05-19 14:07:42 1275001 OK     2000  0.16%; 0.12 ms/sq; ETA 0d 00:03; d19a9c6b08d199b6 (check 0.13s)
2019-05-19 14:07:44 1275001       20000  1.57%; 0.13 ms/sq; ETA 0d 00:03; 65e3704fff61d046
2019-05-19 14:07:45 Stopping, please wait..
2019-05-19 14:07:46 1275001 OK    31000  2.43%; 0.12 ms/sq; ETA 0d 00:03; 19d3b2da2559da70 (check 0.15s)
2019-05-19 14:07:46 Exiting because "stop requested"
2019-05-19 14:07:46 Bye
Code:
2019-05-19 14:07:07 Note: no config.txt file found
2019-05-19 14:07:07 config: -prp 1275001 -fft 72K 
2019-05-19 14:07:07 1275001 FFT 72K: Width 8x8, Height 8x8, Middle 9; 17.29 bits/word
2019-05-19 14:07:07 using short carry kernels
2019-05-19 14:07:10 OpenCL compilation in 1984 ms, with "-DEXP=1275001u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=9u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-19 14:07:10 1275001.owl not found, starting from the beginning.
2019-05-19 14:07:10 1275001 EE loaded: 0, blockSize 1000, 0000000000000000 (expected 0000000000000003x)
2019-05-19 14:07:10 Exiting because "error on load"
2019-05-19 14:07:10 Bye
Code:
2019-05-19 14:08:02 Note: no config.txt file found
2019-05-19 14:08:02 config: -prp 1275001 -fft 80K 
2019-05-19 14:08:02 1275001 FFT 80K: Width 8x8, Height 8x8, Middle 10; 15.56 bits/word
2019-05-19 14:08:02 using short carry kernels
2019-05-19 14:08:04 OpenCL compilation in 1985 ms, with "-DEXP=1275001u -DWIDTH=64u -DSMALL_HEIGHT=64u -DMIDDLE=10u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-19 14:08:04 1275001.owl not found, starting from the beginning.
2019-05-19 14:08:05 1275001 EE loaded: 0, blockSize 1000, 0000000000000000 (expected 0000000000000003x)
2019-05-19 14:08:05 Exiting because "error on load"
2019-05-19 14:08:05 Bye
Code:
2019-05-19 14:08:15 Note: no config.txt file found
2019-05-19 14:08:15 config: -prp 1275001 -fft 128K 
2019-05-19 14:08:15 1275001 FFT 128K: Width 256x4, Height 8x8; 9.73 bits/word
2019-05-19 14:08:15 using long carry kernels
2019-05-19 14:08:17 OpenCL compilation in 1920 ms, with "-DEXP=1275001u -DWIDTH=1024u -DSMALL_HEIGHT=64u -DMIDDLE=1u  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2019-05-19 14:08:17 1275001.owl not found, starting from the beginning.
2019-05-19 14:08:18 1275001 OK     2000  0.16%; 0.15 ms/sq; ETA 0d 00:03; d19a9c6b08d199b6 (check 0.16s)
2019-05-19 14:08:20 1275001       20000  1.57%; 0.15 ms/sq; ETA 0d 00:03; 65e3704fff61d046
2019-05-19 14:08:23 1275001       40000  3.14%; 0.15 ms/sq; ETA 0d 00:03; ddca1e3b88d59ea2
2019-05-19 14:08:24 Stopping, please wait..
2019-05-19 14:08:24 1275001 OK    44000  3.45%; 0.15 ms/sq; ETA 0d 00:03; 50e59fd6714c3a09 (check 0.16s)
2019-05-19 14:08:24 Exiting because "stop requested"
 2019-05-19 14:08:24 Bye
When you do P-1 instead of PRP it erroneously does stage 1 with zeroed residues and fails an assert only at the start of stage 2:
Code:
2019-05-19 14:23:05 1275001      710000 98.40%; 0.15 ms/sq; ETA 0d 00:00; 0000000000000000
2019-05-19 14:23:06 1275001      720000 99.79%; 0.15 ms/sq; ETA 0d 00:00; 0000000000000000
2019-05-19 14:25:19 Round 0 of 1: init 1.88 s; 0.17 ms/mul; 764090 muls
2019-05-19 14:25:19 1275001 P-1 stage1 GCD: no factor
gpuowl: GmpUtil.cpp:25: std::__cxx11::string GCD(u32, const std::vector<unsigned int>&, u32): Assertion `mpz_cmp_ui(b, 0)' failed.
 Aborted (core dumped)
Some sort of bounds issue? I've encountered it a few times when trying to make a benchmark script that benches PRP at every FFT with an exponent at 90% of what gpuowl says is the maximum for that FFT and it fails in the same way for 48K, 72K, 80K, 768K, 1152K and 1280K.



Sometimes I have the same all-zeroes residue, but I don't know if the issue is the same. Gpuowl should reload the last checkpoint after a check.
  Reply With Quote
Old 2019-05-19, 14:18   #1180
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Some sort of bounds issue? I've encountered it a few times when trying to make a benchmark script that benches PRP at every FFT with an exponent at 90% of what gpuowl says is the maximum for that FFT and it fails in the same way for 48K, 72K, 80K, 768K, 1152K and 1280K.
Thanks for the testing. What gpu was that on?
Gpuowl blithely accepting and continuing on all-0 res64 values in P-1 is a missed opportunity for error detection. Printing that it completed stage one, when the interim res64s are all zeros is unfortunate. Zero and one are known error conditions in P-1 (CUDAPm1 for example). And the Gerbicz check is not applicable to P-1 computations, so adding that check back in for P-1 computations would be useful, in this otherwise unchecked run case.
Per Preda, there was a zero check present in the PRP code a while ago.
https://www.mersenneforum.org/showpo...&postcount=189

Last fiddled with by kriesel on 2019-05-19 at 14:22
kriesel is offline   Reply With Quote
Old 2019-05-19, 14:25   #1181
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10101001111012 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Some sort of bounds issue? I've encountered it a few times when trying to make a benchmark script that benches PRP at every FFT with an exponent at 90% of what gpuowl says is the maximum for that FFT and it fails in the same way for 48K, 72K, 80K, 768K, 1152K and 1280K.
Thanks for the testing. What gpu was that on?
Gpuowl blithely accepting and continuing on all-0 res64 values is a missed opportunity for error detection. Zero and one are known error conditions in P-1 (CUDAPm1 for example). And the Gerbicz check is not applicable to P-1 computations, so adding that zero check back in for P-1 computations would be useful.
Per Preda, there was a zero check present in the PRP code a while ago.
https://www.mersenneforum.org/showpo...&postcount=189
kriesel is offline   Reply With Quote
Old 2019-05-19, 14:52   #1182
SELROC
 

1010100001002 Posts
Default

Quote:
Originally Posted by kriesel View Post
Thanks for the testing. What gpu was that on?
Gpuowl blithely accepting and continuing on all-0 res64 values is a missed opportunity for error detection. Zero and one are known error conditions in P-1 (CUDAPm1 for example). And the Gerbicz check is not applicable to P-1 computations, so adding that zero check back in for P-1 computations would be useful.
Per Preda, there was a zero check present in the PRP code a while ago.
https://www.mersenneforum.org/showpo...&postcount=189



Absolutely. The fact that after an all-zeroes-residue the GEC fails and gpuowl reloads the last checkpoint file. For PRP of course.

Last fiddled with by SELROC on 2019-05-19 at 14:52
  Reply With Quote
Old 2019-05-19, 15:23   #1183
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

14738 Posts
Default

Quote:
Originally Posted by kriesel View Post
Thanks for the testing. What gpu was that on?
Gpuowl blithely accepting and continuing on all-0 res64 values is a missed opportunity for error detection. Zero and one are known error conditions in P-1 (CUDAPm1 for example). And the Gerbicz check is not applicable to P-1 computations, so adding that zero check back in for P-1 computations would be useful.
Per Preda, there was a zero check present in the PRP code a while ago.
https://www.mersenneforum.org/showpo...&postcount=189
Radeon VII. It's not a bounds issue as I've been testing 72K at 1K exponent intervals and it just doesn't work, fails the zero check every time. Could be an initialisation error, whatever it is it probably applies to all of these too: 48K, 72K, 80K, 768K, 1152K and 1280K.
M344587487 is offline   Reply With Quote
Old 2019-05-19, 15:34   #1184
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by SELROC View Post
Absolutely. The fact that after an all-zeroes-residue the GEC fails and gpuowl reloads the last checkpoint file. For PRP of course.
Catching it earlier from producing console / log output of PRP has a time advantage.

In the case where a zero error occurs, if uniformly distributed over iteration numbers of first appearance, it can be detected on average console-output-interval/2 iterations later by a separate zero res64 check, while the Gerbicz error check would take on average blocksize-squared/2 iterations. For V6.5 default operation, those averages would be 10,000 and 500,000 iterations respectively. Per Preda and Ewmayer, res64 determination in gpuowl and mlucas are fast. And using the res64 determined already for console output makes even that small cost vanish, leaving only the very small cost of a 64-bit compare or 16-char string compare. A 490,000 iterations savings on my RX480 at 3.8ms/iter for current wavefront exponents is of order 1862 seconds, just over half an hour. (About 59 ppm per occurrence per year, so it would take 17 of them per year to accumulate to 0.1% performance difference.) But hopefully these zero errors are rare occurrences in PRP. They seem to be rare, from a casual look at my logs. I don't recall ever seeing a zero from gpuowl.

Quote:
Originally Posted by M344587487 View Post
Radeon VII. It's not a bounds issue as I've been testing 72K at 1K exponent intervals and it just doesn't work, fails the zero check every time. Could be an initialisation error, whatever it is it probably applies to all of these too: 48K, 72K, 80K, 768K, 1152K and 1280K.
Fortunately, all of those are well below the size used for current production primality testing in the GIMPS; ~4608K for first primality test, ~2688K for (LL) double checks, ~4M for PRP double checks.

Last fiddled with by kriesel on 2019-05-19 at 15:42
kriesel is offline   Reply With Quote
Old 2019-05-19, 15:40   #1185
SELROC
 

2,713 Posts
Default

Quote:
Originally Posted by kriesel View Post
Catching it earlier from producing console / log output has a time advantage.

In the case where a zero error occurs, if uniformly distributed over iteration numbers of first appearance, it can be detected on average console-output-interval/2 iterations later by a separate zero res64 check, while the Gerbicz error check would take on average blocksize-squared/2 iterations. For V6.5 default operation, those averages would be 10,000 and 500,000 iterations respectively. Per Preda and Ewmayer, res64 determination in gpuowl and mlucas are fast. And using the res64 determined already for console output makes even that small cost vanish, leaving only the very small cost of a 64-bit compare or 16-char string compare. A 490,000 iterations savings on my RX480 at 3.8ms/iter for current wavefront exponents is of order 1862 seconds, just over half an hour. (About 59 ppm per occurrence per year, so it would take 17 of them per year to accumulate to 0.1% performance difference.) But hopefully these zero errors are rare occurrences in PRP. They seem to be rare, from a casual look at my logs. I don't recall ever seeing a zero from gpuowl.

It occurs to me that the zero error is not often and I did not find a way to reproduce it reliably. It may happen two or three times one day, and the day after not happen at all.
  Reply With Quote
Old 2019-05-19, 22:19   #1186
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

137110 Posts
Default

I have a suspicion fft-64 is broken, and all the sizes that use it. I need to investigate. Give me a few days.
preda is offline   Reply With Quote
Old 2019-05-20, 04:52   #1187
SELROC
 

32·23·43 Posts
Default

Quote:
Originally Posted by preda View Post
I have a suspicion fft-64 is broken, and all the sizes that use it. I need to investigate. Give me a few days.

With new version there is remarkable speedup on 332M exponent !


Went from 4.13 ms/sq to 3.7 ms/sq


Good !


I did change the FFT however, from -fft +2 to normal fft without arguments. -fft +2 now fails to load.

Last fiddled with by SELROC on 2019-05-20 at 05:14
  Reply With Quote
Old 2019-05-20, 13:27   #1188
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

3·457 Posts
Default

Quote:
Originally Posted by M344587487 View Post
I had a similar problem with a similar exponent where it failed on its preferred FFT of 72K and 80K but worked on 128K. Below is an example of an exponent that works on its preferred FFT of 64K and 128K but throws "error on load" for 72K and 80K.
[...]
Some sort of bounds issue? I've encountered it a few times when trying to make a benchmark script that benches PRP at every FFT with an exponent at 90% of what gpuowl says is the maximum for that FFT and it fails in the same way for 48K, 72K, 80K, 768K, 1152K and 1280K.
Thanks for the bug report!

Turns out in the current implementation, the MIDDLE step of the FFT can't be done correctly when H < 256. I think all your failing cases were in that situation. Anyway, I updated the FFTConfig to not generate the invalid size combinations anymore; please retry.
preda is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 07:41.


Fri Aug 6 07:41:03 UTC 2021 up 14 days, 2:10, 1 user, load averages: 2.66, 2.69, 2.71

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.