![]() |
[QUOTE=ET_;261185]I'll have to study better. I confused the GCD final stage of ECM :redface:
[/QUOTE] Don't worry. It is not you who has to study better. Vincent is definitely conflating many different things here. FWIW, P-1 stage 1 & 2 takes _days_ to run. The gcd phase lasts a few _minutes_. No matter how inefficient the gcd is, it will never be the bottleneck. |
Vincent,
P-1 is a deterministic algorithm, not a probabilistic one like Pollard Rho. P-1 will [I]always[/I] find a factor if that factor is within the B1/B2 search space. [QUOTE=diep;261102]I'd argue that P-1 is a tad pensionned algorithm. One year older than Pollard-Rho, yet suffering from the same randomness.[/QUOTE] [QUOTE=diep;261120]You can design an adapted version algorithm which randomly looks for factors of limited size.[/QUOTE] [QUOTE=diep;261136]Wiki uses a random number to initialize. See: [URL="http://en.wikipedia.org/wiki/Pollard%27s_p_%E2%88%92_1_algorithm"]http://en.wikipedia.org/wiki/Pollard's_p_−_1_algorithm[/URL] "randomly pick a coprime to n (note: we can actually fix a, random selection here is not imperative)" Crandall&Pomerance write it different in the book primes i see now. They use c=2 as a starting point and note you can use as well a random number there.[/QUOTE]In addition to what axn has already replied: If you compare the P-1 and Pollard Rho algorithms, you will find that the Pollard Rho algorithm uses a pseudorandom function [I]within its main loop[/I]. That is not true of the P-1 algorithm, so P-1 is _not_ "suffering from the same randomness". While it is true that a random number (as long as it is coprime to the number being factored) [I]can[/I] be used for the P-1 initial value that is exponentiated, [I]randomness of its choice is not necessary![/I] As your own quote from Wikipedia says, "(note: we can actually fix a, random selection here is not imperative)." P-1 definitely does [U]not[/U] do any pseudorandom number computation inside its loops ... and it need not do [I]any[/I] pseudorandom number computation at all, since it is perfectly acceptable to use a fixed initial value (such as 3, which Prime95 uses) as long as it's properly coprime. |
axn, thanks.
So we spend 2 days getting 3 to this large power -- how is the GCD operation carried out efficiently? (I'm doing light studying here, thinking long-term about maybe doing P-1 on CUDA). And does additional TF, which has just gotten cheaper, help out P-1? |
[QUOTE=cheesehead;261208]Vincent,
P-1 is a deterministic algorithm, not a probabilistic one like Pollard Rho. P-1 will [I]always[/I] find a factor if that factor is within the B1/B2 search space. In addition to what axn has already replied: If you compare the P-1 and Pollard Rho algorithms, you will find that the Pollard Rho algorithm uses a pseudorandom function [I]within its main loop[/I]. That is not true of the P-1 algorithm, so P-1 is _not_ "suffering from the same randomness". While it is true that a random number (as long as it is coprime to the number being factored) [I]can[/I] be used for the P-1 initial value that is exponentiated, [I]randomness of its choice is not necessary![/I] As your own quote from Wikipedia says, "(note: we can actually fix a, random selection here is not imperative)." P-1 definitely does [U]not[/U] do any pseudorandom number computation inside its loops ... and it need not do [I]any[/I] pseudorandom number computation at all, since it is perfectly acceptable to use a fixed initial value (such as 3, which Prime95 uses) as long as it's properly coprime.[/QUOTE] I completely disagree with you here as it all has to do with the B1/B2. This because of the political wording you choose: " P-1 will [I]always[/I] find a factor if that factor is within the B1/B2 search space." The real problem is the selection of the B1/B2 to pick. If we compare this with a systematic search through all hills and valleys then this is just a randomly behaving algorithm as it depends upon your initial random selection of a starting number. If you pick a constant then you will miss the entire other search space that is out there. Each time you're just busy in a single valley trying to hillclimb a single mountain. If it isn't the Everest by accident you'll never succeed. That's the kind of randomness i speak about. Furthermore you somehow want to know whether you can factor the number and you don't have unlimited RAM. Are you gonna postpone the GCD at a gpu with its limited ram or try it each time just like Pollard-Rho? The FFT of a LL runs for biggest part of the calculation not from the RAM. You can really limit the ram dependency. With this algorithm you can't, except if you try that GCD each time (if i studied it ok). Otherwise you're going to be completely RAM dependant. The GPU can execute 1.35T instructions per gpu per second. The RAM can deliver just 170GB/s or so (that's a theoretic number whereas that 1.35T is possible to achieve). If additional to that each entity stores a few bits in a 64 bits unit (so 2x32 bits) then the amount of instructions available for each unit is quite huge. So you just can't use GPU ram to store anything. Apply the GCD directly? That would be a solution. Only factor 2 slowdown. Alternative i see is not going to be fun. Vincent |
[QUOTE=Christenson;261227]So we spend 2 days getting 3 to this large power -- how is the GCD operation carried out efficiently? (I'm doing light studying here, thinking long-term about maybe doing P-1 on CUDA).[/quote]
[url]http://en.wikipedia.org/wiki/Euclidean_algorithm[/url] -- The basic algorithm is simple enough. The tricky bit is to do a "generalized" (i.e. involving numbers of no special form) mod operation. I do not know the state of the art in doing that part efficiently. [QUOTE=Christenson;261227]And does additional TF, which has just gotten cheaper, help out P-1?[/QUOTE] Quite the opposite, actually. In general, smaller factors are more likely to be "smooth" (which P-1 requires) than larger ones. So additional TF screening means that P-1 would receive candidates that are less likely to have smooth factors ==> fewer successes. But the effect won't be that pronounced, if it is just 1 or 2 bit levels of addnl TF. |
[QUOTE=axn;261233][url]http://en.wikipedia.org/wiki/Euclidean_algorithm[/url] -- The basic algorithm is simple enough. The tricky bit is to do a "generalized" (i.e. involving numbers of no special form) mod operation. I do not know the state of the art in doing that part efficiently.
Quite the opposite, actually. In general, smaller factors are more likely to be "smooth" (which P-1 requires) than larger ones. So additional TF screening means that P-1 would receive candidates that are less likely to have smooth factors ==> fewer successes. But the effect won't be that pronounced, if it is just 1 or 2 bit levels of addnl TF.[/QUOTE] Our experience is quite the opposite; Additional bits of TF finds other factors than P-1. We compared there the default 61 bits TF which i had done with what P-1 found at around 8 million bitrange for Wagstaff. In general P-1 factoring is so randomly compared to TF that the statistical odds it finds that small factor that another few bits of TF finds, assuming it is there, is so small that P-1 has nearly no overlap with another few bits of TF. Vincent |
[QUOTE=Christenson;261195]Axn, roughly how much LL speedup do we get on CUDA? [/QUOTE]
Using CUDALucas on my GTX275, I have 2x-4.5x speedup depending on the exponent and the FFT chosen. Luigi |
[QUOTE=diep;261235]Additional bits of TF finds other factors than P-1.[/quote]
A finding that I never disputed. [QUOTE=diep;261235]In general P-1 factoring is so randomly compared to TF that the statistical odds it finds that small factor that another few bits of TF finds, assuming it is there, is so small that P-1 has nearly no overlap with another few bits of TF.[/QUOTE] Also consistent with what I wrote, to wit: [QUOTE=axn;261233]But the effect won't be that pronounced, if it is just 1 or 2 bit levels of addnl TF.[/QUOTE] |
[QUOTE=Christenson;261227]And does additional TF, which has just gotten cheaper, help out P-1?[/QUOTE]
It helps out the projects as a whole. Each additional bit of TF eliminates about another 1.5% of the exponents per this page: [url]http://www.mersenne.org/various/math.php[/url] The question tends to center around: Is a 1.5% success rate enough to take the time to do another bit level or is the time better spent doing the LL/DC? This seems to most depend on two things (IMHO): - How fast TF is relative to LL? The answer is very different if you use CPU or GPU? - How many people choose TF over LL. Right now we have an excess of people choosing TF. The bit levels being handed out now at the current LL wave are higher than recommended by that same Math page because of this excess of people doing TF and more and more of them using GPUs. |
[QUOTE=petrw1;261270]It helps out the projects as a whole.
Each additional bit of TF eliminates about another 1.5% of the exponents per this page: [url]http://www.mersenne.org/various/math.php[/url] The question tends to center around: Is a 1.5% success rate enough to take the time to do another bit level or is the time better spent doing the LL/DC? This seems to most depend on two things (IMHO): - How fast TF is relative to LL? The answer is very different if you use CPU or GPU? - How many people choose TF over LL. Right now we have an excess of people choosing TF. The bit levels being handed out now at the current LL wave are higher than recommended by that same Math page because of this excess of people doing TF and more and more of them using GPUs.[/QUOTE] TF is very cheap. It soon gets a lot less than 1.5% each additional bitlevel of course, otherwise we'd only be doing TF. To do 1 additional bit of TF has an extra cost of factor 2 over the current bitlevel. It's not dependant upon caches nor RAM speed (of course in case of TheJudger's code it is as he's doing the generation of the factor candidates on the cpu - but that's another discussion). You do things modulo the factor candidate so that is simply same speed usually to do another bit. Each factored candidate doesn't need a double check. Having a factor is hard. It's very easy to calculate whether the factor is factoring the Mersenne cq Wagstaff. This where LL has a lot additional problems. It slows down more than factor 2 of course when your n increases. It slows down quadratic. You grow more out of the caches and are more and more dependant upon the RAM speed when things grow out of hand. That really sucks. So TF really has major advantages to do real deep. Additional to that, once we get to the 100M digits level, it's like 1 year at a gpu now to run a LL or so of 100M digits size? That's pretty long time for a single entity run. Maybe you can double check the first few millions of iterations of such run, but i guess it'll take a real long time to double check all that. Compared to all those problems with LL, i'd guess TF is pretty efficient and fast to do real real deep. However as usual with toto's; majority wants a shot at finding a huge prime and not run factoring type software as that for sure won't find them a giant prime number. So i guess now is the time to TF real real deep, because as soon as we get the 22 nm GPU's which might be able to do a LL within 1 or 2 months at the 332M+ bitlevel, then i guess most will soon abandon running TF and other factoring software. |
[QUOTE=diep;261278]So TF really has major advantages to do real deep.[/QUOTE]Using an example of M333000013, which TF's by default to 2^76, we find:
* TF to 4 levels above that (2^80) takes an additional [url=http://mersenne-aries.sili.net/credit.php?worktype=TF&exponent=333000013&f_exponent=&b1=&b2=&numcurves=&factor=&frombits=76&tobits=80]689 GHz-days[/url] and has a 5.16% chance of finding a factor * P-1, for the same 5.16% chance of factor, takes [url=http://mersenne-aries.sili.net/prob.php?exponent=333000013&prob=5.16&work=&factorbits=76]85GHz-days[/url] of work. So your TF processor has to perform at least 8.1x faster than your P-1 processor to be worth it (this goes up to roughly 10x performance-per-chance difference at 2^86, 12%). |
| All times are UTC. The time now is 23:05. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.