mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2019-07-08, 16:50   #1
GP2
 
GP2's Avatar
 
Sep 2003

2,579 Posts
Default

mprime can do k*b^n+c

How feasible would it be, in principle, to adapt gpuOwL to be more flexible? Wagstaff in particular: (2^p+1)/3
GP2 is offline   Reply With Quote
Old 2019-07-08, 20:10   #2
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

CFD16 Posts
Default

Quote:
Originally Posted by GP2 View Post
mprime can do k*b^n+c

How feasible would it be, in principle, to adapt gpuOwL to be more flexible? Wagstaff in particular: (2^p+1)/3


Working mod 2^p+1 is almost as easy as 2^p-1. Then a final division by 3 to get mod (2^p+1)/3. E.g:

Code:
p=127;Mod(3,2^p+1)^2^p
Mod(9, 170141183460469231731687303715884105729)
Gerbicz error checking can be done too,
paulunderwood is online now   Reply With Quote
Old 2019-07-08, 20:34   #3
GP2
 
GP2's Avatar
 
Sep 2003

A1316 Posts
Default

Quote:
Originally Posted by paulunderwood View Post


Working mod 2^p+1 is almost as easy as 2^p-1. Then a final division by 3 to get mod (2^p+1)/3.
In mprime, a type-5 residue for Wagstaff simply calculates 3^(2^p) mod (2^p + 1). So I don't think you need to do a division.

Probably type-4 would also be applicable to Wagstaff? Or perhaps type-2, which is similar to type-4 except using N−1 instead of N+1.

Code:
2:	SPRP variant, N is PRP if a^((N-1)/2) = +/-1 mod N
4:	SPRP variant. N is PRP if a^((N+1)/2) = +/-a mod N
GP2 is offline   Reply With Quote
Old 2019-07-09, 19:03   #4
GP2
 
GP2's Avatar
 
Sep 2003

2,579 Posts
Default

Quote:
Originally Posted by kriesel View Post
I'd like to see some P-1 related gpuowl fixes and extensions before Mihai tackles another endeavor such as extension to Wagstaff prp.
It's up to him to decide what he spends his time and effort doing. I was thinking that there might be some relatively trivial modification.

Like I mentioned earlier, the Wagstaff PRP calculation for type 5 is 3^(2^p) mod (2^p + 1) whereas for Mersenne (where type 1 and type 5 are the same thing), it's 3^(2^p − 2) mod (2^p − 1). I don't know if there is a similarly simple modification for type 4 or type 2 residues.

Since gpuOwL is a GitHub project, theoretically someone else could make the modification, possibly even forking from an earlier version that still used Mersenne type 1 residues.
GP2 is offline   Reply With Quote
Old 2019-07-10, 00:51   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×11×127 Posts
Default

Quote:
Originally Posted by GP2 View Post
It's up to him to decide what he spends his time and effort doing.
Of course. He volunteers his time, according to his talents and interests, like many others. None of us has a claim on him or each other, or authority to select one path versus another for him. To his credit, he sometimes accepts or asks for input from the user community. And if we users summarize outstanding issues or new desires, it can make him more efficient. Win-win.

Quote:
I was thinking that there might be some relatively trivial modification.
It seems to me that the power difference is trivial, but the mod difference is less so. mod 2p-1 result fits in p bits, and can be done rapidly in binary by adding the quotient to the remainder displaced rightward by p bits; mod 2p+1 can't. Seems like p+1 bits storage and subtract quotient after a p bit right shift would be in order. That in turn implies borrows rather than carries as in the existing code. But all that is from thinking in untransformed integer binary operand terms.

Quote:
Like I mentioned earlier, the Wagstaff PRP calculation for type 5 is 3^(2^p) mod (2^p + 1) whereas for Mersenne (where type 1 and type 5 are the same thing), it's 3^(2^p − 2) mod (2^p − 1). I don't know if there is a similarly simple modification for type 4 or type 2 residues.

Since gpuOwL is a GitHub project, theoretically someone else could make the modification, possibly even forking from an earlier version that still used Mersenne type 1 residues.
Which would be ~gpuowl v1.5 to 3.9. https://www.mersenneforum.org/showpo...3&postcount=15
There are other ways to do Wagstaff, to ~920M, though maybe not as high a p as you'd like to go to if you're thinking of taking the new Mersenne conjecture testing further.
There are also other ways to do p-1 factoring on Mersennes, although not above ~432.5M in CUDAPm1 in practice, or ~920M in mprime/prime95, and not on OpenCl at all.
kriesel is online now   Reply With Quote
Old 2019-07-11, 21:33   #6
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

21728 Posts
Default

Quote:
Originally Posted by GP2 View Post
I was thinking that there might be some relatively trivial modification.

Like I mentioned earlier, the Wagstaff PRP calculation for type 5 is 3^(2^p) mod (2^p + 1) whereas for Mersenne (where type 1 and type 5 are the same thing), it's 3^(2^p − 2) mod (2^p − 1). I don't know if there is a similarly simple modification for type 4 or type 2 residues.
This is what I don't know (maybe somebody could enlighten me) about the implementation of "mod 2^p+1":

In the Mersenne case, we want a cyclic convolution. The simple weighting that is done before/after the FFT achieves that.

For the "mod 2^p+1", we want a negacyclic convolution. Can this be achieved through a similar weighting (with different weights)? Or is something more involved needed?

To add a bit more detail: in the mersenne case, the weights are real. IF for 2^p+1 we need weighting with complex weights, this changes the implementation significantly because the FFT input is not real anymore.
preda is offline   Reply With Quote
Old 2019-07-12, 00:13   #7
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2·5·701 Posts
Default

Quote:
Originally Posted by preda View Post
This is what I don't know (maybe somebody could enlighten me) about the implementation of "mod 2^p+1"
There are two weights. You still have the real weights to distribute the p bits uniformly over the FFTLEN words.

You also need to apply complex roots-of-minus-one to "trick" the FFT into doing a negacyclic convolution instead of a cyclic convolution. You don't need any extra FFT memory, but you do need a modified first pass that takes real inputs and produces weighted complex FFT'ed outputs. Not easy, but not hard either.

Next you need a new simpler second pass that scraps all the Hermetian symmetry computations before the point-wise squaring.
Prime95 is offline   Reply With Quote
Old 2019-07-19, 23:27   #8
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2·5,693 Posts
Default

Quote:
Originally Posted by preda View Post
This is what I don't know (maybe somebody could enlighten me) about the implementation of "mod 2^p+1":

In the Mersenne case, we want a cyclic convolution. The simple weighting that is done before/after the FFT achieves that.
No, the FFT is inherently cyclic-convolutional ... the IBDWT weighting allow us to use a prime-length "bit folding" boundary in conjunction with an underlying polynomial-multiply which most naturally lends itself to a bitness which is highly composite, by way of being a multiple of the transform length.

Quote:
For the "mod 2^p+1", we want a negacyclic convolution. Can this be achieved through a similar weighting (with different weights)? Or is something more involved needed?

To add a bit more detail: in the mersenne case, the weights are real. IF for 2^p+1 we need weighting with complex weights, this changes the implementation significantly because the FFT input is not real anymore.
As George noted, for (mod 2^p+1) you need 2 distinct weightings: the IBDWT one to allow for a prime-length bit-folding, and the standard acyclic-effecting weighting, which for a length-n transform uses the first n complex (2*n)th roots of unity. That needs a complex FFT algorithm; for length-n real input vector you can use a length-(n/2) complex FFT. Noting that the [j]th and [j+n/2]th acyclic weights (call them 'awt') are related by awt[j+n/2] = I*awt[j], you can see that in this context it makes sense to group pairs of real inputs together not via the usual (x[j],x[j+1])-treated-as-a-complex-datum scheme but rather in (x[j],x[j+n/2]) pairs, since applying the acyclic-weights turns those 2 reals into (awt[j]*x[j],I*awt[j]*x[j+n/2]), i.e. we can pull out the shared complex acyclic-multiplier awt[j] = exp(I*j/(2*n) to get a weighted complex input awt[j]*(x[j] + I*x[j+n/2]). This is the so-called "right-angle transform" trick. Crandall & Fagin recapped it (since it wasn't new) in the Fermat-mod section of the same 1994 paper where they introduced the Mersenne-mod IBDWT.
ewmayer is offline   Reply With Quote
Old 2019-07-20, 00:51   #9
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

2·3·191 Posts
Default

Thank you, this is useful!

Quote:
Originally Posted by ewmayer View Post
No, the FFT is inherently cyclic-convolutional ... the IBDWT weighting allow us to use a prime-length "bit folding" boundary in conjunction with an underlying polynomial-multiply which most naturally lends itself to a bitness which is highly composite, by way of being a multiple of the transform length.


As George noted, for (mod 2^p+1) you need 2 distinct weightings: the IBDWT one to allow for a prime-length bit-folding, and the standard acyclic-effecting weighting, which for a length-n transform uses the first n complex (2*n)th roots of unity. That needs a complex FFT algorithm; for length-n real input vector you can use a length-(n/2) complex FFT. Noting that the [j]th and [j+n/2]th acyclic weights (call them 'awt') are related by awt[j+n/2] = I*awt[j], you can see that in this context it makes sense to group pairs of real inputs together not via the usual (x[j],x[j+1])-treated-as-a-complex-datum scheme but rather in (x[j],x[j+n/2]) pairs, since applying the acyclic-weights turns those 2 reals into (awt[j]*x[j],I*awt[j]*x[j+n/2]), i.e. we can pull out the shared complex acyclic-multiplier awt[j] = exp(I*j/(2*n) to get a weighted complex input awt[j]*(x[j] + I*x[j+n/2]). This is the so-called "right-angle transform" trick. Crandall & Fagin recapped it (since it wasn't new) in the Fermat-mod section of the same 1994 paper where they introduced the Mersenne-mod IBDWT.
preda is offline   Reply With Quote
Old 2019-11-22, 07:30   #10
axn
 
axn's Avatar
 
Jun 2003

2×32×7×37 Posts
Default

Forked the Wagstaff discussion into a separate thread (copy - original posts are still there).
axn is online now   Reply With Quote
Old 2019-11-22, 16:29   #11
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

52·7·19 Posts
Default

Quote:
Originally Posted by axn View Post
Forked the Wagstaff discussion into a separate thread (copy - original posts are still there).
Now we need someone competent to fork the source code

I for one would run Wagstaff.
paulunderwood is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Status of Wagstaff testing? and testing Mersenne primes for Wagstaff-ness GP2 Wagstaff PRP Search 385 2020-07-28 16:58
gpuOwl Windows setup for Radeon VII Prime95 GpuOwl 91 2019-12-30 08:30
gpuowl tuning M344587487 GpuOwl 14 2018-12-29 08:11
gpuowl: runtime error SELROC GpuOwl 29 2018-09-22 16:09
How to interface gpuOwl with PrimeNet preda PrimeNet 2 2017-10-07 21:32

All times are UTC. The time now is 02:26.

Wed Aug 5 02:26:38 UTC 2020 up 18 days, 22:13, 1 user, load averages: 1.70, 1.72, 1.65

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.