 mersenneforum.org gpuOWL for Wagstaff
 Register FAQ Search Today's Posts Mark Forums Read  2019-11-29, 05:01 #12 axn   Jun 2003 466210 Posts My adventures in understanding DWT Using Pari/GP for prototyping: The functions. Note, pari vectors start with index 1. Code: weight(v,w,N)=vector( N, i, v[i]*w[i] ); \\ apply a weight w to signal v unweight(v,w,N)=vector( N, i, v[i]/w[i] ); \\ remove the weight w from signal v halve( v, N )=vector(N/2, i, v[i] + I*v[i+N/2]); \\ combine k and k+N/2 real elements into a single complex unhalve( v, N )=vector(N*2, i, if(i <= N, real(v[i]), imag(v[i-N]))); \\ reverse the complex elements back into real elements forward(v,N)=vector( N, k, sum(n=1,N, v[n]*exp(-I*2*Pi*(k-1)*(n-1)/N)) ); \\ simple O(N^2) DFT inverse(v,N)=vector( N, k, sum(n=1,N, v[n]*exp(+I*2*Pi*(k-1)*(n-1)/N))/N );\\ inverse DFT forward_weight(v,N,w)=forward( weight(v,w,N), N ); \\ DWT inverse_weight(v,N,w)=unweight( inverse(v, N), w, N ); \\ inverse DWT square(v,N)=vector( N, i, v[i]^2 ); \\ point-wise squaring Data and weights Code: N=8; v=[1592, 4744, 2670, 4702, 5898, 1185, 513, 1678]; \\ a random number 1038670476017199579990351480376 w=[1, 2^.375, 2^.75, 2^.125, 2^.5, 2^.875, 2^.25, 2^.625]; \\ irrational base weight for p=101, N=8 w2=vector(N, i, exp(-I*Pi*(i-1)/N)); \\ weight for negacyclic w3=vector(N, i, w[i]*w2[i]); \\ combined weight for IB + negacyclic Mersenne mod 2^101-1 Code: u=forward_weight(v, N, w); u=square(u, N); u=inverse_weight(u, N, w); round(u) = [131715320, 65806708, 55646526, 107656084, 85866497, 86563572, 114850764, 78330688] \\ results Mod(131715320 + 65806708*2^13+ 55646526* 2^26+ 107656084*2^38+ 85866497*2^51+ 86563572*2^64+ 114850764*2^76+ 78330688*2^89, 2^101-1) = Mod(320904239316807117549353868207, 2535301200456458802993406410751) Mod(1038670476017199579990351480376, 2^101-1)^2 = Mod(320904239316807117549353868207, 2535301200456458802993406410751) Match! Wagstaff mod 2^101+1 Code: u=forward_weight(v, N, w3); u=square(u, N); u=inverse_weight(u, N, w3); round(u) = [-126646392, -35596916, 6367106, 23618092, 69432719, 83120316, 103588028, 78330688] \\ results Mod(-126646392 + -35596916*2^13 + 6367106* 2^26 + 23618092*2^38 + 69432719*2^51+ 83120316*2^64+ 103588028*2^76+ 78330688*2^89, 2^101+1) = Mod(2005153614013422538592239271122, 2535301200456458802993406410753) Mod(1038670476017199579990351480376, 2^101+1)^2 = Mod(2005153614013422538592239271122, 2535301200456458802993406410753) Match! complex->complex right-angle convolution for wagstaff Code: vw=weight(v, w, N); \\ apply irrational base weights h=halve(vw, N); \\ combine into complex u=forward_weight(h, N/2, w2); \\ DWT with negacyclic weights sq=square(u, N/2); u=inverse_weight(sq, N/2, w2); \\ inverse DWT uh=unhalve(u, N/2); \\ back to reals u=unweight(uh, w, N); \\ undo IB weights round(u) = [-7430296, 9893668, 37926930, 23618092, -31874255, 36346212, 70633156, 78330688] Mod(-7430296 + 9893668*2^13 + 37926930* 2^26 + 23618092*2^38 + -31874255*2^51+ 36346212*2^64+ 70633156*2^76+ 78330688*2^89, 2^101+1) = Mod(2049592028740143596208787739827, 2535301200456458802993406410753) Mod(1038670476017199579990351480376, 2^101+1)^2 = Mod(2005153614013422538592239271122, 2535301200456458802993406410753) Does not match. What now?    2019-12-01, 02:10 #13 paulunderwood   Sep 2002 Database er0rr CFD16 Posts In your "data" section you should have: Code: w2=vector(N, i, exp(I*Pi*(i-1)/N)); \\ weight for negacyclic i.e. no "-", HTH, Last fiddled with by paulunderwood on 2019-12-01 at 02:42   2019-12-01, 03:06 #14 axn   Jun 2003 2×32×7×37 Posts Crandall & Fagin (http://www.faginfamily.net/barry/Pap...Transforms.pdf) says that you can use either, and they explicitly use the minus version (eg:- 5.3). Indeed it works for the naive wagstaff version. Nonetheless, let me try the right angle convolution with the other one... Code: ? weight(v,w,N)=vector( N, i, v[i]*w[i] ); \\ apply a weight w to signal v ? unweight(v,w,N)=vector( N, i, v[i]/w[i] ); \\ remove the weight w from signal v ? halve( v, N )=vector(N/2, i, v[i] + I*v[i+N/2]); \\ combine k and k+N/2 real elements into a single complex ? unhalve( v, N )=vector(N*2, i, if(i <= N, real(v[i]), imag(v[i-N]))); \\ reverse the complex elements back into real elements ? ? forward(v,N)=vector( N, k, sum(n=1,N, v[n]*exp(-I*2*Pi*(k-1)*(n-1)/N)) ); \\ simple O(N^2) DFT ? inverse(v,N)=vector( N, k, sum(n=1,N, v[n]*exp(+I*2*Pi*(k-1)*(n-1)/N))/N );\\ inverse DFT ? forward_weight(v,N,w)=forward( weight(v,w,N), N ); \\ DWT ? inverse_weight(v,N,w)=unweight( inverse(v, N), w, N ); \\ inverse DWT ? ? square(v,N)=vector( N, i, v[i]^2 ); \\ point-wise squaring ? N=8; ? v=[1592, 4744, 2670, 4702, 5898, 1185, 513, 1678]; \\ a random number 1038670476017199579990351480376 ? w=[1, 2^.375, 2^.75, 2^.125, 2^.5, 2^.875, 2^.25, 2^.625]; \\ irrational base weight for p=101, N=8 ? w2=vector(N, i, exp(I*Pi*(i-1)/N)); \\ weight for negacyclic ? w3=vector(N, i, w[i]*w2[i]); \\ combined weight for IB + negacyclic ? vw=weight(v, w, N); \\ apply irrational base weights ? h=halve(vw, N); \\ combine into complex ? u=forward_weight(h, N/2, w2); \\ DWT with negacyclic weights ? sq=square(u, N/2); ? u=inverse_weight(sq, N/2, w2); \\ inverse DWT ? uh=unhalve(u, N/2); \\ back to reals ? u=unweight(uh, w, N); \\ undo IB weights ? round(u) %22 = [-126646392, -35596916, 6367106, 23618092, 69432719, 83120316, 103588028, 78330688] Holy crap, it works! Thank you so much. Now, to see if I can fork cudaLucas to make it do wagstaff.   2019-12-11, 01:55 #15 paulunderwood   Sep 2002 Database er0rr 52·7·19 Posts @George Woltman. How hard is to write a modulo for 2^p+1 as well as 2^p-1? Is it something the average programmer can do with gpuowl.cl?    2019-12-11, 04:47   #16
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

2·5·701 Posts Quote:
 Originally Posted by paulunderwood @George Woltman. How hard is to write a modulo for 2^p+1 as well as 2^p-1? Is it something the average programmer can do with gpuowl.cl? Non-trivial. You would need to replace real weights with complex weights in all the Carry kernels. You would need to change the tail kernels to not do its Hermetian symmetry stuff -- this makes that code simpler.

There is a learning curve playing gpuowl.cl. The rest of the code is C++ code with std library which I have not used over the last 2 decades. I do not know how to debug other than studying code and using printfs. Optimizations to working routines is much easier to code and debug than writing brand new code.   2020-01-05, 16:19   #17
axn

Jun 2003

2·32·7·37 Posts Here is cudaWagstaff v0.03 (adapted from cudalucas)
This is a cut down version with lot of the non-critical functionalities missing. For comparable FFT it is about 5% slower than CUDALucas (on my 1660Ti - YMMV).

Assignment syntax is
PRP=<exponent> or PRP=<exponent>,<fft>

Implements GEC PRP.
GEC failure / rollback logic is untested, but should work.
Supports cufftbench functionality, but no threadbench.

I have only built/tested in Linux; may or may not build/run under windows.
Please give it a try. Let me know if you have any questions / issues / feedback.

EDIT:- Attachment updated with minor change
Attached Files cudaWagstaff0.03.7z (27.5 KB, 43 views)

Last fiddled with by axn on 2020-01-05 at 16:29   2020-01-05, 18:12   #18
paulunderwood

Sep 2002
Database er0rr

52·7·19 Posts Quote:
 Originally Posted by axn Here is cudaWagstaff v0.03 (adapted from cudalucas) This is a cut down version with lot of the non-critical functionalities missing. For comparable FFT it is about 5% slower than CUDALucas (on my 1660Ti - YMMV). Assignment syntax is PRP= or PRP=, Implements GEC PRP. GEC failure / rollback logic is untested, but should work. Supports cufftbench functionality, but no threadbench. I have only built/tested in Linux; may or may not build/run under windows. Please give it a try. Let me know if you have any questions / issues / feedback. EDIT:- Attachment updated with minor change
WIll it build for a Linux Radeon VII?

Have you checked Ryan's two PRPs? If so how long did each take? How does that compared to a CPU?

How does one select "fft"?

Is there a coordinated search ongoing? If so, how does one get assignments?

Last fiddled with by paulunderwood on 2020-01-05 at 18:13   2020-01-05, 18:52   #19
kracker
ἀβουλία

"Mr. Meeseeks"
Jan 2012
California, USA

17×127 Posts Quote:
 Originally Posted by paulunderwood WIll it build for a Linux Radeon VII? Have you checked Ryan's two PRPs? If so how long did each take? How does that compared to a CPU? How does one select "fft"? Is there a coordinated search ongoing? If so, how does one get assignments?
CUDA runs only on nvidia hardware, it would need to be ported to OpenCL.   2020-01-05, 20:49   #20
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

105F16 Posts Quote:
 Originally Posted by paulunderwood WIll it build for a Linux Radeon VII?
No. CUDA is for NVIDIA. Radeon VII is AMD, OpenCL.
Quote:
 Have you checked Ryan's two PRPs? If so how long did each take? How does that compared to a CPU?
No idea, but a very fast gpu generally beats a very fast multicore cpu.
Quote:
 How does one select "fft"?
If it's derived from CUDALucas, chances are it determines an appropriate fft size itself, or accepts specification of fft size from the command line.
Quote:
 Is there a coordinated search ongoing? If so, how does one get assignments?
Good questions. diep, early in https://www.mersenneforum.org/showthread.php?t=23523 says systematic testing has been done through 10M on Wagstaff. Later there's a link to GP2's page http://mprime.s3-website.us-west-1.a....com/wagstaff/ and also the new Mersenne conjecture http://mprime.s3-website.us-west-1.a...onjecture.html. There's also a bit of discussion of the new mersenne conjecture status in the later pages of the CEMPLLA thread, https://www.mersenneforum.org/showth...ecture&page=20   2020-01-06, 03:03   #21
axn

Jun 2003

2·32·7·37 Posts Quote:
 Originally Posted by paulunderwood Have you checked Ryan's two PRPs? If so how long did each take? How does that compared to a CPU?
I have checked one of them. It took about 1 hr 6 minutes on a P100 (colab). For comparison my i3-8100 (4 real cores) would take 3x that time.

Quote:
 Originally Posted by paulunderwood How does one select "fft"?
fft is just the length. Once you run -cufftbench option, the program will have benchmark for optimal fft selection (plus some heuristics for appropriate size ffts). But, if the fft length selected by program is somehow not optimal, you can override.

Quote:
 Originally Posted by paulunderwood Is there a coordinated search ongoing? If so, how does one get assignments?
As of now, I am the only one doing doublecheck below Ryan's known tested range. I am currently working at 10.3-10.4m range (using CPU only - soon to be expanded to GPU as well). GP2 has the authoritative list of unfactored exponents.

I am also currently TFing the 14-15m range to 69 bits. This is first time territory (I think). If you're interested in first time tests, you can coordinate with GP2.   2020-01-10, 15:19 #22 lalera   Jul 2003 23·73 Posts hi, i do tf wagstaff numbers with mfaktc http://deep.alotspace.com   Thread Tools Show Printable Version Email this Page Similar Threads Thread Thread Starter Forum Replies Last Post GP2 Wagstaff PRP Search 385 2020-07-28 16:58 Prime95 GpuOwl 91 2019-12-30 08:30 M344587487 GpuOwl 14 2018-12-29 08:11 SELROC GpuOwl 29 2018-09-22 16:09 preda PrimeNet 2 2017-10-07 21:32

All times are UTC. The time now is 02:03.

Wed Aug 5 02:03:16 UTC 2020 up 18 days, 21:50, 1 user, load averages: 1.98, 1.94, 1.69