mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Sierpinski/Riesel Base 5

Reply
 
Thread Tools
Old 2006-05-05, 12:33   #34
Joe O
 
Joe O's Avatar
 
Aug 2002

10000011012 Posts
Default

Quote:
4. It isn't important to use in all cases all small prime(power) divisors of p-1. That would slow down the calculation, what is your method for that?

I found this out separately myself. I reduced the size of array holding the p-1 factors down to 2 (from 8) and found that, for SoB+PSP but not Riesel, it was much faster.
I'm not sure that you are both talking about the same thing here. Since you both have the same source, how about specifying the array names. Or if you want to take this discussion offline, email me at factrange at yahoo dot com.

Last fiddled with by Joe O on 2006-05-05 at 12:34
Joe O is offline   Reply With Quote
Old 2006-05-05, 16:12   #35
Greenbank
 
Greenbank's Avatar
 
Jul 2005

2·193 Posts
Default

Quote:
Originally Posted by Joe O
I'm not sure that you are both talking about the same thing here. Since you both have the same source, how about specifying the array names. Or if you want to take this discussion offline, email me at factrange at yahoo dot com.
OK, I hardcoded sievewindowsize to 1<<22 (i.e. eratosthenes_window sieves 2^22 odd numbers each time).

I changed asievewindowprimebits[1<<22] to be an array with one unsigned char per odd number, not just one bit. I bzero the entire array at the start of eratosthenes_window(). If the eratosthenes sieve finds the number to be composite it puts 255 in there.

Then in the p-1 factoring part of eratosthenes_window() I can skip the number if( asievewindowprimebits[j] >= NOSP1FACS )

Otherwise I store the p-1 factor in the pminus1factors array which is defined as:-

uint32_t pminus1factors[1<<22][NOSP1FACS]

The main bit of code from eratosthenes_window() is this:-

Code:
                for( j = q; j < sievewindowsize; j += p ) {
                        if( asievewindowprimebits[j] >= NOSP1FACS ) {
                                /* Either composite or already have enough factors */
                                continue;
                        }
                        pminus1factors[j][asievewindowprimebits[j]]=p;
                        asievewindowprimebits[j]++;
                }
So it combines the two checks I need to make, 1) was the number composite and so I don't care about p-1 factors of it anyway, 2) do I already have enough p-1 factors for it.

NOSP1FACS is a defined constant, in the original proth_sieve it was hardcoded as 8 all throughout the code. I found that sieving SoB (or PSP, or SoB+PSP combined) was fastest when I reduced NOSP1FACS to 2.

As I said, these changes improved the speeds for SoB, PSP and SoB+PSP but they absolutely ruined the performance for the 83 k-value riesel.dat from RieselSieve, I still haven't had enough time to work out why...

I'd be willing to share this version of the code, although there are a few bits of G5 PPC specific assembly in there so it won't compile on x86. I mentioned this several months ago in an email to you and Chuck and got no response...;-)

Last fiddled with by Greenbank on 2006-05-05 at 16:12
Greenbank is offline   Reply With Quote
Old 2006-05-05, 16:26   #36
Citrix
 
Citrix's Avatar
 
Jun 2003

62E16 Posts
Default

Quote:
Originally Posted by Greenbank
Unless I'm misunderstanding it, it doesn't always work.

Let's take a simple example (proth numbers):-

k=1235, p = 10007

Solving for b=2: we want n where k*2^n+1 = 0 (mod p).

Solving the DL (2^n = 7722 mod 10007) we get:

n=4030

and since order(2,p) == 5003 we also have n=4030+5003 = 9033 as another solution where n < p.

Solving for b=5: we want n where k*5^n+1 = 0 (mod p).

Solving the DL (5^n = 7722 mod 10007) we get:

n=3446

and order(5,p) == 10006 so there is only one solution with n < p.

However, 2^m == 5 (mod 10007) has no solution.

Yes it doesn't always work. You have to try to solve 5^m=2^n. Think about testing if a^2=5 (mod p) and a^2=2 (mod p) have a solution or not. The quadratic residues of 2 and 5 must be identical or you must use 25 or 4. (Think about it, it is a bit complicated)

Citrix

Last fiddled with by Citrix on 2006-05-05 at 16:29
Citrix is offline   Reply With Quote
Old 2006-05-05, 17:33   #37
Greenbank
 
Greenbank's Avatar
 
Jul 2005

2×193 Posts
Default

Quote:
Originally Posted by Citrix
Yes it doesn't always work. You have to try to solve 5^m=2^n. Think about testing if a^2=5 (mod p) and a^2=2 (mod p) have a solution or not. The quadratic residues of 2 and 5 must be identical or you must use 25 or 4. (Think about it, it is a bit complicated)

Citrix
OK, I think I'm getting it (my brain is just thinking about beer as it is close to 6pm on a Friday night...).

Ok, so 5 is not a QR mod 10007 so you square it to make sure it is a QR mod 10007.

Solving the DL 25^n = 2^2 (mod 10007) we get:

n=1575 and n=6578

(25^1575)^2015 == 7722 (mod 10007)

So:-

25^(1575*2015 mod order(2,p)) == 7722 (mod 10007)

25^1723 == 7722 (mod 10007)

So 5^(1723*2) == 7722 (mod 10007).

Ah...I get it.

Right, there's a problem with this. We only solve 2^n = x (mod p) in proth_sieve where n is within some reasonable bounds (the high n value from the dat file. So this is around 50M).

Taking a random factor I recently found doing combined SoB+PSP sieving:-

115325353613543 | 21181*2^28992284+1

We'll try and use this method to work out the n for b=5.

Luckily 5^39449625295844 = 2 (mod p).

So 21181*5^n+1 = 0 (mod p) should be easy to compute:-

(39449625295844*28992284) mod order(5,p) = 43497181410834

Lo and behold 21181*5^43497181410834+1 = 0 (mod 115325353613543)

Which, if you are searching for 0 < n < 50M, is quite useless.

Here's a better example (took a few minutes for a quick program to find one).

p=1000048199

21181*5^276839+1 == 0 (mod p).

But to find this factor with the above method you'd first have to find:-

21181*2^256950671+1 == 0 (mod p)

Before you can use the fact that 5^308465688 = 2 (mod p) and compute:-

(308465688*256950671) mod order(2,p) = 276839.

You'd have to solve the DL for base 2 all the way up to n = order(2,p) which can be as high as p-1.

For the smallest 49-bit prime this is over 11 million times as large a search space as a simple check up to 50M.
Greenbank is offline   Reply With Quote
Old 2006-05-05, 18:47   #38
Citrix
 
Citrix's Avatar
 
Jun 2003

2·7·113 Posts
Default

There is a way around it too. I leave it as a puzzle for you. (May be it will come to you, after the first drink)
Hint:-Think of the algorithm Silver....
Citrix is offline   Reply With Quote
Old 2006-05-05, 19:41   #39
axn
 
axn's Avatar
 
Jun 2003

31·163 Posts
Default

Quote:
Originally Posted by Citrix
All of have to do is solve all the k's for base 2, and in addition solve k=5 for base 2, so 2^n=5 can be calculated.
Since there are 400+ k's one more k won't be much more work.
I don't know too much about the mathematics here. But sounds to me like there is not algorithmic improvement here. All you are doing here is solving the Discrete Logarithm problem -- twice. Does it matter whether you solve it in the "native base" directly or solve it for base 2 first and then convert that to the "native base"?!

Maybe I'm just way off base here
axn is online now   Reply With Quote
Old 2006-05-05, 22:03   #40
Joe O
 
Joe O's Avatar
 
Aug 2002

10158 Posts
Default

Quote:
I'd be willing to share this version of the code, although there are a few bits of G5 PPC specific assembly in there so it won't compile on x86. I mentioned this several months ago in an email to you and Chuck and got no response...;-)
But as you said at the time, it would be a lot of work to compare the two sources and port improvements. As a case in point, your interpretation of the original code and the direction you took it is radically different from the way I took it. One possibility is that the G5 does character manipulation better than bit manipulation. But I tried and rejected your approach and left the bit array intact. I also go after every factor of P-1. I find that it helps to have them all. yes it speeds up this section of code to get only 2 factors, but it slows you down elsewhere. And yes I tried it. You might want to go for 4 or 6 factors and see what that does.
I'ts more fun to swap bits and pieces of "philosophy" and technique than gobs of code.
Question. Did you time "classic" and "Montgomery" multiplication? Are we in the p ranges where their efficiency overlaps? Or is this another difference between the G5 and x86 instruction timings.
Joe O is offline   Reply With Quote
Old 2006-05-06, 02:25   #41
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13×89 Posts
Default Version 0.0.6

I have simplified the giant steps whan p is large, and added a --prp switch which causes the sieve file to be written in a format suitable for input to prp (all +1/-1 k in the same file, sorted by n values).

Quote:
Originally Posted by Citrix
Why is everyone so intrested in reinventing the wheel? Why write the code from scratch when you can use the old optimized code.
The Sierpinski/Riesel Base 5 projects have been going for a while now. In that time NewPGen is the only siever I could find that was usable. I could not find source code to any sievers whatsoever.
geoff is offline   Reply With Quote
Old 2006-05-06, 03:51   #42
Joe O
 
Joe O's Avatar
 
Aug 2002

3×52×7 Posts
Default

Quote:
Originally Posted by geoff
The Sierpinski/Riesel Base 5 projects have been going for a while now. In that time NewPGen is the only siever I could find that was usable. I could not find source code to any sievers whatsoever.
Congratulations on your fine work. I'm sure you have a warm feeling of satisfaction on writing something that meets your needs specifically.
Joe O is offline   Reply With Quote
Old 2006-05-08, 03:52   #43
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13×89 Posts
Default version 0.0.8

Version 0.0.8 fixes a bug that could have caused candidates to be wrongly eliminated. Sorry about that. The good news is that the --check option would have caught any problem, or if you used the --factors option then it is just a matter of checking that none of the factors are composite.

As well as the bug fix, mulmod32 has been inlined thanks to axn1, and there are some other speedups from inlining common cases in lookup() and insert(). Removed --benchmark until I can work out something a bit more realistic. Added --hashtable which allows the hashtable size to be increased.

I have had a read of the suggestions being posted, I will keep a list for the future but most of them are way over my head at the moment. If anyone else wants to take them up they are welcome to use any of the srsieve code in their own program if that would be helpful.
geoff is offline   Reply With Quote
Old 2006-05-08, 09:47   #44
Greenbank
 
Greenbank's Avatar
 
Jul 2005

2×193 Posts
Default

Quote:
Originally Posted by Joe O
But as you said at the time, it would be a lot of work to compare the two sources and port improvements. As a case in point, your interpretation of the original code and the direction you took it is radically different from the way I took it. One possibility is that the G5 does character manipulation better than bit manipulation. But I tried and rejected your approach and left the bit array intact. I also go after every factor of P-1. I find that it helps to have them all. yes it speeds up this section of code to get only 2 factors, but it slows you down elsewhere. And yes I tried it.
I base 'faster' and 'slower' on the kp/s ratings over a few different sized ranges after I make whatever changes need to be made.

I get a higher kp/s range across the board when I used a char array over a bit array.

And yes, there may be vast differences between the throughputs of G5 PPC over x86...

Quote:
Originally Posted by Joe O
You might want to go for 4 or 6 factors and see what that does.
I'ts more fun to swap bits and pieces of "philosophy" and technique than gobs of code.
4, and subsequently 6, p-1 factors are slower (based on the reported kp/s rating) than just 2. For SoB or PSP or SoB+PSP. For Riesel 2 p-1 factors is slower than 8 p-1 factors.

Quote:
Originally Posted by Joe O
Question. Did you time "classic" and "Montgomery" multiplication? Are we in the p ranges where their efficiency overlaps? Or is this another difference between the G5 and x86 instruction timings.
It is completely independent of the current p range. The efficiency is based on the time it takes to calculate two values based on b and p of (a*b mod p). Remember that this is a 64-bit implementation which has massive benefits for montgomery multiplication.

According to my tests, it is better to use Montgomery Multiplication when the number of required iterations (with the same b and p) is greater than 5. This takes into account the setup function and the few things that has to calculate.

Every time the original code used mul_mod_p_b() I check to see if the number of iterations will be greater than 5 (making sure not to use a divide) and then have to blocks of code to do the same thing. One using montgomery and one using classic multiplication.

So this was the original code for building the giant-steps hash-table:-

Code:
                for( ; j <= nhigh; j += grange ) {
                        store_hash( ht, exp++ );
                        ht = mul_mod_p_b( ht, gstep );
                }
This now becomes:-

Code:
        if( (grange*mont_limit) <= nhigh ) {
                montmul_setup( gstep ); /* setup montgomery multiplication */
                for( ; j <= nhigh; j += grange ) {
                        store_hash( ht, exp++ );
                        ht = mul_mod_p_b( ht, gstep ); /* use mont mul */
                }
        } else {
                for( ; j <= nhigh; j += grange ) {
                        store_hash( ht, exp++ );
                        ht = mul_mod_p( ht, gstep );
                }
        }
nhigh/grange is the number of iterations that we will need to call mul_mod_p_b(). So if grange*mont_limit <= nhigh then it is worthwhile.

Setting it up requires calculating:
mprime = -p mod 2^64
which is easy to do with shifts and adds, and:
yval = b*(2^64) mod p
which is just a pow2mod and mulmod call.

These can be cached, if just the b value is changing (but p is the same) we don't need to recalculate mprime.

With montgomery multiplication (mont_limit of 5) = 1171 kp/sec
Classic multiplication (mont_limit = 5000000 to disable it) = 959 kp/sec
Greenbank is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Very Prime Riesel and Sierpinski k robert44444uk Open Projects 587 2016-11-13 15:26
Sierpinski/ Riesel bases 6 to 18 robert44444uk Conjectures 'R Us 139 2007-12-17 05:17
Sierpinski/Riesel Base 10 rogue Conjectures 'R Us 11 2007-12-17 05:08
Sierpinski / Riesel - Base 23 michaf Conjectures 'R Us 2 2007-12-17 05:04
Sierpinski / Riesel - Base 22 michaf Conjectures 'R Us 49 2007-12-17 05:03

All times are UTC. The time now is 09:13.


Sat Jul 17 09:13:32 UTC 2021 up 50 days, 7 hrs, 1 user, load averages: 1.42, 1.67, 1.61

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.