![]() |
|
|
#551 |
|
Jun 2003
508610 Posts |
Don't worry. It is not you who has to study better. Vincent is definitely conflating many different things here.
FWIW, P-1 stage 1 & 2 takes _days_ to run. The gcd phase lasts a few _minutes_. No matter how inefficient the gcd is, it will never be the bottleneck. |
|
|
|
|
|
#552 | |||
|
"Richard B. Woods"
Aug 2002
Wisconsin USA
22×3×641 Posts |
Vincent,
P-1 is a deterministic algorithm, not a probabilistic one like Pollard Rho. P-1 will always find a factor if that factor is within the B1/B2 search space. Quote:
Quote:
Quote:
If you compare the P-1 and Pollard Rho algorithms, you will find that the Pollard Rho algorithm uses a pseudorandom function within its main loop. That is not true of the P-1 algorithm, so P-1 is _not_ "suffering from the same randomness". While it is true that a random number (as long as it is coprime to the number being factored) can be used for the P-1 initial value that is exponentiated, randomness of its choice is not necessary! As your own quote from Wikipedia says, "(note: we can actually fix a, random selection here is not imperative)." P-1 definitely does not do any pseudorandom number computation inside its loops ... and it need not do any pseudorandom number computation at all, since it is perfectly acceptable to use a fixed initial value (such as 3, which Prime95 uses) as long as it's properly coprime. Last fiddled with by cheesehead on 2011-05-12 at 07:59 |
|||
|
|
|
|
|
#553 |
|
Dec 2010
Monticello
179510 Posts |
axn, thanks.
So we spend 2 days getting 3 to this large power -- how is the GCD operation carried out efficiently? (I'm doing light studying here, thinking long-term about maybe doing P-1 on CUDA). And does additional TF, which has just gotten cheaper, help out P-1? |
|
|
|
|
|
#554 | |
|
Sep 2006
The Netherlands
36 Posts |
Quote:
This because of the political wording you choose: " P-1 will always find a factor if that factor is within the B1/B2 search space." The real problem is the selection of the B1/B2 to pick. If we compare this with a systematic search through all hills and valleys then this is just a randomly behaving algorithm as it depends upon your initial random selection of a starting number. If you pick a constant then you will miss the entire other search space that is out there. Each time you're just busy in a single valley trying to hillclimb a single mountain. If it isn't the Everest by accident you'll never succeed. That's the kind of randomness i speak about. Furthermore you somehow want to know whether you can factor the number and you don't have unlimited RAM. Are you gonna postpone the GCD at a gpu with its limited ram or try it each time just like Pollard-Rho? The FFT of a LL runs for biggest part of the calculation not from the RAM. You can really limit the ram dependency. With this algorithm you can't, except if you try that GCD each time (if i studied it ok). Otherwise you're going to be completely RAM dependant. The GPU can execute 1.35T instructions per gpu per second. The RAM can deliver just 170GB/s or so (that's a theoretic number whereas that 1.35T is possible to achieve). If additional to that each entity stores a few bits in a 64 bits unit (so 2x32 bits) then the amount of instructions available for each unit is quite huge. So you just can't use GPU ram to store anything. Apply the GCD directly? That would be a solution. Only factor 2 slowdown. Alternative i see is not going to be fun. Vincent Last fiddled with by diep on 2011-05-12 at 12:58 |
|
|
|
|
|
|
#555 | |
|
Jun 2003
2×2,543 Posts |
Quote:
Quite the opposite, actually. In general, smaller factors are more likely to be "smooth" (which P-1 requires) than larger ones. So additional TF screening means that P-1 would receive candidates that are less likely to have smooth factors ==> fewer successes. But the effect won't be that pronounced, if it is just 1 or 2 bit levels of addnl TF. |
|
|
|
|
|
|
#556 | |
|
Sep 2006
The Netherlands
13318 Posts |
Quote:
In general P-1 factoring is so randomly compared to TF that the statistical odds it finds that small factor that another few bits of TF finds, assuming it is there, is so small that P-1 has nearly no overlap with another few bits of TF. Vincent Last fiddled with by diep on 2011-05-12 at 13:06 |
|
|
|
|
|
|
#557 |
|
Banned
"Luigi"
Aug 2002
Team Italia
10010110100112 Posts |
|
|
|
|
|
|
#558 | |
|
Jun 2003
508610 Posts |
A finding that I never disputed.
Quote:
|
|
|
|
|
|
|
#559 | |
|
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
3×5×313 Posts |
Quote:
Each additional bit of TF eliminates about another 1.5% of the exponents per this page: http://www.mersenne.org/various/math.php The question tends to center around: Is a 1.5% success rate enough to take the time to do another bit level or is the time better spent doing the LL/DC? This seems to most depend on two things (IMHO): - How fast TF is relative to LL? The answer is very different if you use CPU or GPU? - How many people choose TF over LL. Right now we have an excess of people choosing TF. The bit levels being handed out now at the current LL wave are higher than recommended by that same Math page because of this excess of people doing TF and more and more of them using GPUs. |
|
|
|
|
|
|
#560 | |
|
Sep 2006
The Netherlands
10110110012 Posts |
Quote:
To do 1 additional bit of TF has an extra cost of factor 2 over the current bitlevel. It's not dependant upon caches nor RAM speed (of course in case of TheJudger's code it is as he's doing the generation of the factor candidates on the cpu - but that's another discussion). You do things modulo the factor candidate so that is simply same speed usually to do another bit. Each factored candidate doesn't need a double check. Having a factor is hard. It's very easy to calculate whether the factor is factoring the Mersenne cq Wagstaff. This where LL has a lot additional problems. It slows down more than factor 2 of course when your n increases. It slows down quadratic. You grow more out of the caches and are more and more dependant upon the RAM speed when things grow out of hand. That really sucks. So TF really has major advantages to do real deep. Additional to that, once we get to the 100M digits level, it's like 1 year at a gpu now to run a LL or so of 100M digits size? That's pretty long time for a single entity run. Maybe you can double check the first few millions of iterations of such run, but i guess it'll take a real long time to double check all that. Compared to all those problems with LL, i'd guess TF is pretty efficient and fast to do real real deep. However as usual with toto's; majority wants a shot at finding a huge prime and not run factoring type software as that for sure won't find them a giant prime number. So i guess now is the time to TF real real deep, because as soon as we get the 22 nm GPU's which might be able to do a LL within 1 or 2 months at the 332M+ bitlevel, then i guess most will soon abandon running TF and other factoring software. |
|
|
|
|
|
|
#561 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
1101011000112 Posts |
Using an example of M333000013, which TF's by default to 2^76, we find:
* TF to 4 levels above that (2^80) takes an additional 689 GHz-days and has a 5.16% chance of finding a factor * P-1, for the same 5.16% chance of factor, takes 85GHz-days of work. So your TF processor has to perform at least 8.1x faster than your P-1 processor to be worth it (this goes up to roughly 10x performance-per-chance difference at 2^86, 12%). |
|
|
|