mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

Reply
 
Thread Tools
Old 2011-05-12, 04:37   #551
axn
 
axn's Avatar
 
Jun 2003

508610 Posts
Default

Quote:
Originally Posted by ET_ View Post
I'll have to study better. I confused the GCD final stage of ECM
Don't worry. It is not you who has to study better. Vincent is definitely conflating many different things here.

FWIW, P-1 stage 1 & 2 takes _days_ to run. The gcd phase lasts a few _minutes_. No matter how inefficient the gcd is, it will never be the bottleneck.
axn is offline   Reply With Quote
Old 2011-05-12, 07:33   #552
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

22×3×641 Posts
Default

Vincent,

P-1 is a deterministic algorithm, not a probabilistic one like Pollard Rho. P-1 will always find a factor if that factor is within the B1/B2 search space.

Quote:
Originally Posted by diep View Post
I'd argue that P-1 is a tad pensionned algorithm. One year older than Pollard-Rho, yet suffering from the same randomness.
Quote:
Originally Posted by diep View Post
You can design an adapted version algorithm which randomly looks for factors of limited size.
Quote:
Originally Posted by diep View Post
Wiki uses a random number to initialize. See: http://en.wikipedia.org/wiki/Pollard's_p_−_1_algorithm

"randomly pick a coprime to n (note: we can actually fix a, random selection here is not imperative)"

Crandall&Pomerance write it different in the book primes i see now. They use c=2 as a starting point and note you can use as well a random number there.
In addition to what axn has already replied:

If you compare the P-1 and Pollard Rho algorithms, you will find that the Pollard Rho algorithm uses a pseudorandom function within its main loop. That is not true of the P-1 algorithm, so P-1 is _not_ "suffering from the same randomness".

While it is true that a random number (as long as it is coprime to the number being factored) can be used for the P-1 initial value that is exponentiated, randomness of its choice is not necessary!

As your own quote from Wikipedia says, "(note: we can actually fix a, random selection here is not imperative)."

P-1 definitely does not do any pseudorandom number computation inside its loops ... and it need not do any pseudorandom number computation at all, since it is perfectly acceptable to use a fixed initial value (such as 3, which Prime95 uses) as long as it's properly coprime.

Last fiddled with by cheesehead on 2011-05-12 at 07:59
cheesehead is offline   Reply With Quote
Old 2011-05-12, 12:35   #553
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

179510 Posts
Default

axn, thanks.
So we spend 2 days getting 3 to this large power -- how is the GCD operation carried out efficiently? (I'm doing light studying here, thinking long-term about maybe doing P-1 on CUDA).

And does additional TF, which has just gotten cheaper, help out P-1?
Christenson is offline   Reply With Quote
Old 2011-05-12, 12:54   #554
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

36 Posts
Default

Quote:
Originally Posted by cheesehead View Post
Vincent,

P-1 is a deterministic algorithm, not a probabilistic one like Pollard Rho. P-1 will always find a factor if that factor is within the B1/B2 search space.



In addition to what axn has already replied:

If you compare the P-1 and Pollard Rho algorithms, you will find that the Pollard Rho algorithm uses a pseudorandom function within its main loop. That is not true of the P-1 algorithm, so P-1 is _not_ "suffering from the same randomness".

While it is true that a random number (as long as it is coprime to the number being factored) can be used for the P-1 initial value that is exponentiated, randomness of its choice is not necessary!

As your own quote from Wikipedia says, "(note: we can actually fix a, random selection here is not imperative)."

P-1 definitely does not do any pseudorandom number computation inside its loops ... and it need not do any pseudorandom number computation at all, since it is perfectly acceptable to use a fixed initial value (such as 3, which Prime95 uses) as long as it's properly coprime.
I completely disagree with you here as it all has to do with the B1/B2.

This because of the political wording you choose: " P-1 will always find a factor if that factor is within the B1/B2 search space."

The real problem is the selection of the B1/B2 to pick.

If we compare this with a systematic search through all hills and valleys then this is just a randomly behaving algorithm as it depends upon your initial random selection of a starting number. If you pick a constant then you will miss the entire other search space that is out there. Each time you're just busy in a single valley trying to hillclimb a single mountain. If it isn't the Everest by accident you'll never succeed.

That's the kind of randomness i speak about.

Furthermore you somehow want to know whether you can factor the number and you don't have unlimited RAM. Are you gonna postpone the GCD at a gpu with its limited ram or try it each time just like Pollard-Rho?

The FFT of a LL runs for biggest part of the calculation not from the RAM. You can really limit the ram dependency. With this algorithm you can't, except if you try that GCD each time (if i studied it ok). Otherwise you're going to be completely RAM dependant.

The GPU can execute 1.35T instructions per gpu per second. The RAM can deliver just 170GB/s or so (that's a theoretic number whereas that 1.35T is possible to achieve). If additional to that each entity stores a few bits in a 64 bits unit (so 2x32 bits) then the amount of instructions available for each unit is quite huge.

So you just can't use GPU ram to store anything.

Apply the GCD directly?

That would be a solution. Only factor 2 slowdown. Alternative i see is not going to be fun.

Vincent

Last fiddled with by diep on 2011-05-12 at 12:58
diep is offline   Reply With Quote
Old 2011-05-12, 13:01   #555
axn
 
axn's Avatar
 
Jun 2003

2×2,543 Posts
Default

Quote:
Originally Posted by Christenson View Post
So we spend 2 days getting 3 to this large power -- how is the GCD operation carried out efficiently? (I'm doing light studying here, thinking long-term about maybe doing P-1 on CUDA).
http://en.wikipedia.org/wiki/Euclidean_algorithm -- The basic algorithm is simple enough. The tricky bit is to do a "generalized" (i.e. involving numbers of no special form) mod operation. I do not know the state of the art in doing that part efficiently.

Quote:
Originally Posted by Christenson View Post
And does additional TF, which has just gotten cheaper, help out P-1?
Quite the opposite, actually. In general, smaller factors are more likely to be "smooth" (which P-1 requires) than larger ones. So additional TF screening means that P-1 would receive candidates that are less likely to have smooth factors ==> fewer successes. But the effect won't be that pronounced, if it is just 1 or 2 bit levels of addnl TF.
axn is offline   Reply With Quote
Old 2011-05-12, 13:05   #556
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

13318 Posts
Default

Quote:
Originally Posted by axn View Post
http://en.wikipedia.org/wiki/Euclidean_algorithm -- The basic algorithm is simple enough. The tricky bit is to do a "generalized" (i.e. involving numbers of no special form) mod operation. I do not know the state of the art in doing that part efficiently.



Quite the opposite, actually. In general, smaller factors are more likely to be "smooth" (which P-1 requires) than larger ones. So additional TF screening means that P-1 would receive candidates that are less likely to have smooth factors ==> fewer successes. But the effect won't be that pronounced, if it is just 1 or 2 bit levels of addnl TF.
Our experience is quite the opposite; Additional bits of TF finds other factors than P-1. We compared there the default 61 bits TF which i had done with what P-1 found at around 8 million bitrange for Wagstaff.

In general P-1 factoring is so randomly compared to TF that the statistical odds it finds that small factor that another few bits of TF finds, assuming it is there, is so small that P-1 has nearly no overlap with another few bits of TF.

Vincent

Last fiddled with by diep on 2011-05-12 at 13:06
diep is offline   Reply With Quote
Old 2011-05-12, 13:18   #557
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

10010110100112 Posts
Default

Quote:
Originally Posted by Christenson View Post
Axn, roughly how much LL speedup do we get on CUDA?
Using CUDALucas on my GTX275, I have 2x-4.5x speedup depending on the exponent and the FFT chosen.

Luigi
ET_ is online now   Reply With Quote
Old 2011-05-12, 16:09   #558
axn
 
axn's Avatar
 
Jun 2003

508610 Posts
Default

Quote:
Originally Posted by diep View Post
Additional bits of TF finds other factors than P-1.
A finding that I never disputed.

Quote:
Originally Posted by diep View Post
In general P-1 factoring is so randomly compared to TF that the statistical odds it finds that small factor that another few bits of TF finds, assuming it is there, is so small that P-1 has nearly no overlap with another few bits of TF.
Also consistent with what I wrote, to wit:
Quote:
Originally Posted by axn View Post
But the effect won't be that pronounced, if it is just 1 or 2 bit levels of addnl TF.
axn is offline   Reply With Quote
Old 2011-05-12, 16:36   #559
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

3×5×313 Posts
Default

Quote:
Originally Posted by Christenson View Post
And does additional TF, which has just gotten cheaper, help out P-1?
It helps out the projects as a whole.

Each additional bit of TF eliminates about another 1.5% of the exponents per this page: http://www.mersenne.org/various/math.php

The question tends to center around: Is a 1.5% success rate enough to take the time to do another bit level or is the time better spent doing the LL/DC?
This seems to most depend on two things (IMHO):
- How fast TF is relative to LL? The answer is very different if you use CPU or GPU?
- How many people choose TF over LL. Right now we have an excess of people choosing TF.

The bit levels being handed out now at the current LL wave are higher than recommended by that same Math page because of this excess of people doing TF and more and more of them using GPUs.
petrw1 is offline   Reply With Quote
Old 2011-05-12, 17:21   #560
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

10110110012 Posts
Default

Quote:
Originally Posted by petrw1 View Post
It helps out the projects as a whole.

Each additional bit of TF eliminates about another 1.5% of the exponents per this page: http://www.mersenne.org/various/math.php

The question tends to center around: Is a 1.5% success rate enough to take the time to do another bit level or is the time better spent doing the LL/DC?
This seems to most depend on two things (IMHO):
- How fast TF is relative to LL? The answer is very different if you use CPU or GPU?
- How many people choose TF over LL. Right now we have an excess of people choosing TF.

The bit levels being handed out now at the current LL wave are higher than recommended by that same Math page because of this excess of people doing TF and more and more of them using GPUs.
TF is very cheap. It soon gets a lot less than 1.5% each additional bitlevel of course, otherwise we'd only be doing TF.

To do 1 additional bit of TF has an extra cost of factor 2 over the current bitlevel. It's not dependant upon caches nor RAM speed (of course in case of TheJudger's code it is as he's doing the generation of the factor candidates on the cpu - but that's another discussion). You do things modulo the factor candidate so that is simply same speed usually to do another bit.

Each factored candidate doesn't need a double check. Having a factor is hard. It's very easy to calculate whether the factor is factoring the Mersenne cq Wagstaff.

This where LL has a lot additional problems. It slows down more than factor 2 of course when your n increases. It slows down quadratic. You grow more out of the caches and are more and more dependant upon the RAM speed when things grow out of hand. That really sucks.

So TF really has major advantages to do real deep.

Additional to that, once we get to the 100M digits level, it's like 1 year at a gpu now to run a LL or so of 100M digits size?

That's pretty long time for a single entity run. Maybe you can double check the first few millions of iterations of such run, but i guess it'll take a real long time to double check all that.

Compared to all those problems with LL, i'd guess TF is pretty efficient and fast to do real real deep.

However as usual with toto's; majority wants a shot at finding a huge prime and not run factoring type software as that for sure won't find them a giant prime number.

So i guess now is the time to TF real real deep, because as soon as we get the 22 nm GPU's which might be able to do a LL within 1 or 2 months at the 332M+ bitlevel, then i guess most will soon abandon running TF and other factoring software.
diep is offline   Reply With Quote
Old 2011-05-12, 20:15   #561
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

1101011000112 Posts
Default

Quote:
Originally Posted by diep View Post
So TF really has major advantages to do real deep.
Using an example of M333000013, which TF's by default to 2^76, we find:
* TF to 4 levels above that (2^80) takes an additional 689 GHz-days and has a 5.16% chance of finding a factor
* P-1, for the same 5.16% chance of factor, takes 85GHz-days of work.

So your TF processor has to perform at least 8.1x faster than your P-1 processor to be worth it (this goes up to roughly 10x performance-per-chance difference at 2^86, 12%).
James Heinrich is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 10:25.


Fri Aug 6 10:25:44 UTC 2021 up 14 days, 4:54, 1 user, load averages: 3.74, 3.74, 3.78

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.