mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2019-11-29, 05:01   #12
axn
 
axn's Avatar
 
Jun 2003

466210 Posts
Default My adventures in understanding DWT

Using Pari/GP for prototyping:

The functions. Note, pari vectors start with index 1.
Code:
weight(v,w,N)=vector( N, i, v[i]*w[i] );  \\ apply a weight w to signal v
unweight(v,w,N)=vector( N, i, v[i]/w[i] );  \\ remove the weight w from signal v
halve( v, N )=vector(N/2, i, v[i] + I*v[i+N/2]);  \\ combine k and k+N/2 real elements into a single complex 
unhalve( v, N )=vector(N*2, i, if(i <= N, real(v[i]), imag(v[i-N]))); \\ reverse the complex elements back into real elements

forward(v,N)=vector( N, k, sum(n=1,N, v[n]*exp(-I*2*Pi*(k-1)*(n-1)/N)) );  \\ simple O(N^2) DFT
inverse(v,N)=vector( N, k, sum(n=1,N, v[n]*exp(+I*2*Pi*(k-1)*(n-1)/N))/N );\\ inverse DFT
forward_weight(v,N,w)=forward( weight(v,w,N), N ); \\ DWT
inverse_weight(v,N,w)=unweight( inverse(v, N), w, N ); \\ inverse DWT

square(v,N)=vector( N, i, v[i]^2 ); \\ point-wise squaring
Data and weights
Code:
N=8;
v=[1592, 4744, 2670, 4702, 5898, 1185, 513, 1678];  \\ a random number 1038670476017199579990351480376
w=[1, 2^.375, 2^.75, 2^.125, 2^.5, 2^.875, 2^.25, 2^.625]; \\ irrational base weight for p=101, N=8
w2=vector(N, i, exp(-I*Pi*(i-1)/N)); \\ weight for negacyclic 
w3=vector(N, i, w[i]*w2[i]); \\ combined weight for IB + negacyclic
Mersenne mod 2^101-1
Code:
u=forward_weight(v, N, w); 
u=square(u, N);
u=inverse_weight(u, N, w);
round(u)
= [131715320, 65806708, 55646526, 107656084, 85866497, 86563572, 114850764, 78330688]

\\ results
Mod(131715320 + 65806708*2^13+ 55646526* 2^26+ 107656084*2^38+ 85866497*2^51+ 86563572*2^64+ 114850764*2^76+ 78330688*2^89, 2^101-1)
= Mod(320904239316807117549353868207, 2535301200456458802993406410751)
Mod(1038670476017199579990351480376, 2^101-1)^2
= Mod(320904239316807117549353868207, 2535301200456458802993406410751)
Match!

Wagstaff mod 2^101+1
Code:
u=forward_weight(v, N, w3);
u=square(u, N);
u=inverse_weight(u, N, w3);
round(u)
= [-126646392, -35596916, 6367106, 23618092, 69432719, 83120316, 103588028, 78330688]

\\ results
Mod(-126646392 +  -35596916*2^13 + 6367106* 2^26 + 23618092*2^38 + 69432719*2^51+ 83120316*2^64+ 103588028*2^76+ 78330688*2^89, 2^101+1)
= Mod(2005153614013422538592239271122, 2535301200456458802993406410753)
Mod(1038670476017199579990351480376, 2^101+1)^2
= Mod(2005153614013422538592239271122, 2535301200456458802993406410753)
Match!

complex->complex right-angle convolution for wagstaff
Code:
vw=weight(v, w, N); \\ apply irrational base weights
h=halve(vw, N);  \\ combine into complex
u=forward_weight(h, N/2, w2); \\ DWT with negacyclic weights
sq=square(u, N/2);  
u=inverse_weight(sq, N/2, w2); \\ inverse DWT
uh=unhalve(u, N/2); \\ back to reals
u=unweight(uh, w, N); \\ undo IB weights
round(u)
= [-7430296, 9893668, 37926930, 23618092, -31874255, 36346212, 70633156, 78330688]

Mod(-7430296 + 9893668*2^13 +  37926930* 2^26 + 23618092*2^38 +  -31874255*2^51+  36346212*2^64+  70633156*2^76+  78330688*2^89, 2^101+1)
= Mod(2049592028740143596208787739827, 2535301200456458802993406410753)
Mod(1038670476017199579990351480376, 2^101+1)^2
= Mod(2005153614013422538592239271122, 2535301200456458802993406410753)
Does not match. What now?
axn is online now   Reply With Quote
Old 2019-12-01, 02:10   #13
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

CFD16 Posts
Default

In your "data" section you should have:

Code:
w2=vector(N, i, exp(I*Pi*(i-1)/N)); \\ weight for negacyclic
i.e. no "-", HTH,

Last fiddled with by paulunderwood on 2019-12-01 at 02:42
paulunderwood is online now   Reply With Quote
Old 2019-12-01, 03:06   #14
axn
 
axn's Avatar
 
Jun 2003

2×32×7×37 Posts
Default

Crandall & Fagin (http://www.faginfamily.net/barry/Pap...Transforms.pdf) says that you can use either, and they explicitly use the minus version (eg:- 5.3).

Indeed it works for the naive wagstaff version. Nonetheless, let me try the right angle convolution with the other one...
Code:
? weight(v,w,N)=vector( N, i, v[i]*w[i] );  \\ apply a weight w to signal v
? unweight(v,w,N)=vector( N, i, v[i]/w[i] );  \\ remove the weight w from signal v
? halve( v, N )=vector(N/2, i, v[i] + I*v[i+N/2]);  \\ combine k and k+N/2 real elements into a single complex 
? unhalve( v, N )=vector(N*2, i, if(i <= N, real(v[i]), imag(v[i-N]))); \\ reverse the complex elements back into real elements
? 
? forward(v,N)=vector( N, k, sum(n=1,N, v[n]*exp(-I*2*Pi*(k-1)*(n-1)/N)) );  \\ simple O(N^2) DFT
? inverse(v,N)=vector( N, k, sum(n=1,N, v[n]*exp(+I*2*Pi*(k-1)*(n-1)/N))/N );\\ inverse DFT
? forward_weight(v,N,w)=forward( weight(v,w,N), N ); \\ DWT
? inverse_weight(v,N,w)=unweight( inverse(v, N), w, N ); \\ inverse DWT
? 
? square(v,N)=vector( N, i, v[i]^2 ); \\ point-wise squaring
? N=8;
? v=[1592, 4744, 2670, 4702, 5898, 1185, 513, 1678];  \\ a random number 1038670476017199579990351480376
? w=[1, 2^.375, 2^.75, 2^.125, 2^.5, 2^.875, 2^.25, 2^.625]; \\ irrational base weight for p=101, N=8
? w2=vector(N, i, exp(I*Pi*(i-1)/N)); \\ weight for negacyclic 
? w3=vector(N, i, w[i]*w2[i]); \\ combined weight for IB + negacyclic
? vw=weight(v, w, N); \\ apply irrational base weights
? h=halve(vw, N);  \\ combine into complex
? u=forward_weight(h, N/2, w2); \\ DWT with negacyclic weights
? sq=square(u, N/2);  
? u=inverse_weight(sq, N/2, w2); \\ inverse DWT
? uh=unhalve(u, N/2); \\ back to reals
? u=unweight(uh, w, N); \\ undo IB weights
? round(u)
%22 = [-126646392, -35596916, 6367106, 23618092, 69432719, 83120316, 103588028, 78330688]
Holy crap, it works! Thank you so much.

Now, to see if I can fork cudaLucas to make it do wagstaff.
axn is online now   Reply With Quote
Old 2019-12-11, 01:55   #15
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

52·7·19 Posts
Default

@George Woltman. How hard is to write a modulo for 2^p+1 as well as 2^p-1? Is it something the average programmer can do with gpuowl.cl?
paulunderwood is online now   Reply With Quote
Old 2019-12-11, 04:47   #16
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2·5·701 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
@George Woltman. How hard is to write a modulo for 2^p+1 as well as 2^p-1? Is it something the average programmer can do with gpuowl.cl?
Non-trivial. You would need to replace real weights with complex weights in all the Carry kernels. You would need to change the tail kernels to not do its Hermetian symmetry stuff -- this makes that code simpler.

There is a learning curve playing gpuowl.cl. The rest of the code is C++ code with std library which I have not used over the last 2 decades. I do not know how to debug other than studying code and using printfs. Optimizations to working routines is much easier to code and debug than writing brand new code.
Prime95 is offline   Reply With Quote
Old 2020-01-05, 16:19   #17
axn
 
axn's Avatar
 
Jun 2003

2·32·7·37 Posts
Default

Here is cudaWagstaff v0.03 (adapted from cudalucas)
This is a cut down version with lot of the non-critical functionalities missing. For comparable FFT it is about 5% slower than CUDALucas (on my 1660Ti - YMMV).

Assignment syntax is
PRP=<exponent> or PRP=<exponent>,<fft>

Implements GEC PRP.
GEC failure / rollback logic is untested, but should work.
Supports cufftbench functionality, but no threadbench.

I have only built/tested in Linux; may or may not build/run under windows.
Please give it a try. Let me know if you have any questions / issues / feedback.

EDIT:- Attachment updated with minor change
Attached Files
File Type: 7z cudaWagstaff0.03.7z (27.5 KB, 43 views)

Last fiddled with by axn on 2020-01-05 at 16:29
axn is online now   Reply With Quote
Old 2020-01-05, 18:12   #18
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

52·7·19 Posts
Default

Quote:
Originally Posted by axn View Post
Here is cudaWagstaff v0.03 (adapted from cudalucas)
This is a cut down version with lot of the non-critical functionalities missing. For comparable FFT it is about 5% slower than CUDALucas (on my 1660Ti - YMMV).

Assignment syntax is
PRP=<exponent> or PRP=<exponent>,<fft>

Implements GEC PRP.
GEC failure / rollback logic is untested, but should work.
Supports cufftbench functionality, but no threadbench.

I have only built/tested in Linux; may or may not build/run under windows.
Please give it a try. Let me know if you have any questions / issues / feedback.

EDIT:- Attachment updated with minor change
WIll it build for a Linux Radeon VII?

Have you checked Ryan's two PRPs? If so how long did each take? How does that compared to a CPU?

How does one select "fft"?

Is there a coordinated search ongoing? If so, how does one get assignments?

Last fiddled with by paulunderwood on 2020-01-05 at 18:13
paulunderwood is online now   Reply With Quote
Old 2020-01-05, 18:52   #19
kracker
ἀβουλία
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

17×127 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
WIll it build for a Linux Radeon VII?

Have you checked Ryan's two PRPs? If so how long did each take? How does that compared to a CPU?

How does one select "fft"?

Is there a coordinated search ongoing? If so, how does one get assignments?
CUDA runs only on nvidia hardware, it would need to be ported to OpenCL.
kracker is offline   Reply With Quote
Old 2020-01-05, 20:49   #20
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

105F16 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
WIll it build for a Linux Radeon VII?
No. CUDA is for NVIDIA. Radeon VII is AMD, OpenCL.
Quote:
Have you checked Ryan's two PRPs? If so how long did each take? How does that compared to a CPU?
No idea, but a very fast gpu generally beats a very fast multicore cpu.
Quote:
How does one select "fft"?
If it's derived from CUDALucas, chances are it determines an appropriate fft size itself, or accepts specification of fft size from the command line.
Quote:
Is there a coordinated search ongoing? If so, how does one get assignments?
Good questions. diep, early in https://www.mersenneforum.org/showthread.php?t=23523 says systematic testing has been done through 10M on Wagstaff. Later there's a link to GP2's page http://mprime.s3-website.us-west-1.a....com/wagstaff/ and also the new Mersenne conjecture http://mprime.s3-website.us-west-1.a...onjecture.html. There's also a bit of discussion of the new mersenne conjecture status in the later pages of the CEMPLLA thread, https://www.mersenneforum.org/showth...ecture&page=20
kriesel is offline   Reply With Quote
Old 2020-01-06, 03:03   #21
axn
 
axn's Avatar
 
Jun 2003

2·32·7·37 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Have you checked Ryan's two PRPs? If so how long did each take? How does that compared to a CPU?
I have checked one of them. It took about 1 hr 6 minutes on a P100 (colab). For comparison my i3-8100 (4 real cores) would take 3x that time.

Quote:
Originally Posted by paulunderwood View Post
How does one select "fft"?
fft is just the length. Once you run -cufftbench option, the program will have benchmark for optimal fft selection (plus some heuristics for appropriate size ffts). But, if the fft length selected by program is somehow not optimal, you can override.

Quote:
Originally Posted by paulunderwood View Post
Is there a coordinated search ongoing? If so, how does one get assignments?
As of now, I am the only one doing doublecheck below Ryan's known tested range. I am currently working at 10.3-10.4m range (using CPU only - soon to be expanded to GPU as well). GP2 has the authoritative list of unfactored exponents.

I am also currently TFing the 14-15m range to 69 bits. This is first time territory (I think). If you're interested in first time tests, you can coordinate with GP2.
axn is online now   Reply With Quote
Old 2020-01-10, 15:19   #22
lalera
 
lalera's Avatar
 
Jul 2003

23·73 Posts
Default

hi,
i do tf wagstaff numbers with mfaktc
http://deep.alotspace.com
lalera is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Status of Wagstaff testing? and testing Mersenne primes for Wagstaff-ness GP2 Wagstaff PRP Search 385 2020-07-28 16:58
gpuOwl Windows setup for Radeon VII Prime95 GpuOwl 91 2019-12-30 08:30
gpuowl tuning M344587487 GpuOwl 14 2018-12-29 08:11
gpuowl: runtime error SELROC GpuOwl 29 2018-09-22 16:09
How to interface gpuOwl with PrimeNet preda PrimeNet 2 2017-10-07 21:32

All times are UTC. The time now is 02:03.

Wed Aug 5 02:03:16 UTC 2020 up 18 days, 21:50, 1 user, load averages: 1.98, 1.94, 1.69

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.