mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Math (https://www.mersenneforum.org/forumdisplay.php?f=8)
-   -   prime factorization algorithms? (https://www.mersenneforum.org/showthread.php?t=14803)

MathBoy 2011-01-15 21:02

prime factorization algorithms?
 
Hi, I have been reading alot of the math behind factorization algorithms.

My questions are these

1)
Almost all the best factorization algorithms rely on finding congruences of squares. The only difference in the runtimes of these algorithms is based on how efficient they can find them. Correct me if I am wrong?

Currently GNFS is the most efficient for this but all the others quadratic sieve, continued fraction factorization, Dixon's factorization, ... etc are only not as fast because of not finding congruences of squares quick enough. No other aspect of these algorithms seems to be more efficient.

2)
Seems to me all these algorithms are based on probability because even if
x^2 = y^2 mod n is found this doesn't imply that gcd(x-y , n) will ever be equal to something different then 1 or n. Though it has a 50% changes their could exist numbers out their such that all the congruences are trival.
Unless their is a proof I don't know about?

3)
All the algorithms seem to depend on the factor bases you choose. If the factor bases is all the values under m then their will be pi(m) primes in it. As m gets larger it would be computational very very hard to run thru the whole factor bases looking for congruences. Not to mention their exist primes that probably don't have any congurences under 100000000000000000000000 for example how would one then ever find a factor to these number? I would think that even the GNFS only works on a portion of primes that you can find a good factor bases for (and which that factor bases can be looked thru in a reasonable amount of time).

4) If I am not correct on 3 then is their a quick way to find a factor bases that will work for any number you want to factor in a reasonable amount of time? And is their away to prove that this bases will contain at least on factor x,y pair that will work to give a nontrival factor of your number?

Thanks for any help in understanding
Correct me if I am wrong with anything I said

R.D. Silverman 2011-01-15 22:08

[QUOTE=MathBoy;246611]Hi, I have been reading alot of the math behind factorization algorithms.

My questions are these

1)
Almost all the best factorization algorithms rely on finding congruences of squares. The only difference in the runtimes of these algorithms is based on how efficient they can find them. Correct me if I am wrong?
[/QUOTE]

For general algorithms yes. But ECM is better at factoring when
factors are relatively small.

[QUOTE]
Currently GNFS is the most efficient for this but all the others quadratic sieve, continued fraction factorization, Dixon's factorization, ... etc are only not as fast because of not finding congruences of squares quick enough. No other aspect of these algorithms seems to be more efficient.
[/QUOTE]

I'm not sure what you mean by the last sentence, but otherwise yes.


[QUOTE]
2)
Seems to me all these algorithms are based on probability because even if
x^2 = y^2 mod n is found this doesn't imply that gcd(x-y , n) will ever be equal to something different then 1 or n. Though it has a 50% changes their could exist numbers out their such that all the congruences are trival.
[/QUOTE]

Such numbers do exist. They are called "primes". For NFS, one just tries
successive congruences until success.

[QUOTE]
3)
All the algorithms seem to depend on the factor bases you choose. If the factor bases is all the values under m then their will be pi(m) primes in it.
[/QUOTE]

They are not "all values less than m". For QS, the factor base primes
are quadratic residues of the discriminant of the polynomials used.
For NFS, the rational side factor base consists of all primes, but the
algebraic side consists of ideals of norm index 1 within the algebraic
number field. [sorry that I had to get technical]

[QUOTE]
As m gets larger it would be computational very very hard to run thru the whole factor bases looking for congruences.
[/QUOTE]

No it is not. The run time is proportional to the size of the sieve region
times loglog(M) where M is the largest prime in the factor base.

[QUOTE]
Not to mention their exist primes that probably don't have any congurences under 100000000000000000000000
[/QUOTE]

This last statement is meaningless gibberish.

[QUOTE]
for example how would one then ever find a factor to these number? I would think that even the GNFS only works on a portion of primes that you can find a good factor bases for.
[/QUOTE]

GNFS does not factor primes. It factors COMPOSITES. If you meant
something else, then I am afraid that I don't know what you mean.

[QUOTE]
4) is I am not correct on 3 then is their a quick way to find a factor bases that will work for any number you want to factor in a reasonable amount of time.
[/QUOTE]

Finding the factor base is easy. It is done for each composite at the start of
the computation. But the factor base will be (in general) different for
each number being factored.

No offense, but have you actually tried reading about how these methods work? Repeat after me: Google is my friend. Elementary descriptions
are available.

CRGreathouse 2011-01-15 23:46

[QUOTE=MathBoy;246611]Almost all the best factorization algorithms rely on finding congruences of squares. The only difference in the runtimes of these algorithms is based on how efficient they can find them. Correct me if I am wrong?

Currently GNFS is the most efficient for this but all the others quadratic sieve, continued fraction factorization, Dixon's factorization, ... etc are only not as fast because of not finding congruences of squares quick enough. No other aspect of these algorithms seems to be more efficient.[/QUOTE]

Well... sort of.

GNFS is asymptotically the most efficient, but that doesn't mean that it's more efficient than the other methods for a given number. It would take a long time to factor a 30-digit composite with GNFS compared to QS.

[QUOTE=MathBoy;246611]Seems to me all these algorithms are based on probability because even if
x^2 = y^2 mod n is found this doesn't imply that gcd(x-y , n) will ever be equal to something different then 1 or n. Though it has a 50% changes their could exist numbers out their such that all the congruences are trival.
Unless their is a proof I don't know about?[/QUOTE]

Yes, these methods are probabilistic. But there always exist congruences to find (unless the number we're trying to factor is actually prime).

MathBoy 2011-01-16 00:59

[QUOTE]Quote:
Originally Posted by MathBoy
Seems to me all these algorithms are based on probability because even if
x^2 = y^2 mod n is found this doesn't imply that gcd(x-y , n) will ever be equal to something different then 1 or n. Though it has a 50% changes their could exist numbers out their such that all the congruences are trival.
Unless their is a proof I don't know about?

Yes, these methods are probabilistic. But there always exist congruences to find (unless the number we're trying to factor is actually prime).
[/QUOTE]


Well is their a proof some where that say's you are going to find at least one
gcd(x-y,n) which is not equal to 1 or n? Because if not these algorithms in general may never find a factor of n? And hence are just a
pseudo-factorization algorithm. And the only true guaranteed algorithm is the trial and error method which is the most horrible runtime ever.

Another words if a factor of n is not found by exhausting all possible congruences in the factor bases then does this imply n is prime?


[QUOTE]Quote:
2)
Seems to me all these algorithms are based on probability because even if
x^2 = y^2 mod n is found this doesn't imply that gcd(x-y , n) will ever be equal to something different then 1 or n. Though it has a 50% changes their could exist numbers out their such that all the congruences are trival.

Such numbers do exist. They are called "primes". For NFS, one just tries
successive congruences until success.
[/QUOTE]

Is their any other numbers other then primes that meet this criteria? (this is answer by the first quote question for the most part )
Another words does the factor bases for the number guarantee us to find at least one non-trivial factor of n?

CRGreathouse 2011-01-16 01:02

[QUOTE=MathBoy;246642]Well is their a proof some where that say's you are going to find at least one
gcd(x-y,n) which is not equal to 1 or n?[/QUOTE]

It's a theorem that at least one such exists. Of course that doesn't mean you're going to find it.

[QUOTE=MathBoy;246642]And the only true guaranteed algorithm is the trial and error method which is the most horrible runtime ever.[/QUOTE]

No, there are others (e.g. Fermat's).

[QUOTE=MathBoy;246642]Is their any other numbers other then primes that meet this criteria?[/QUOTE]

Just units (1 and -1).

MathBoy 2011-01-16 01:48

Question 1
Do you know where I can get a proof for this no trivial gcd(x-y,n) factor base property ?
Is their a proof for all the algorithms that use a factor bases NFS , QS ,...etc or is the proof more general and holds for all algorithms that compute a factor bases in a certain way,...etc ?


Question 2
I am also curious is msieve the best/most optimal software I can get for factorization of large numbers?

Meaning does it implement the best matrix algorithm which is currently
Block Lanczos algorithm or block Wiedemann algorithm.
Does it allow you to distribute the data collection stage onto multiple different computers.
Does it implement GNFS , SNFS , ECM and QS variants for different flavors to use.
Is their any restriction on the numbers size you can test for. Apart from the fact that you will never find a factor in a reasonable amount of time. I saw some version on the net had a problem with testing a prime up to a certain size.

CRGreathouse 2011-01-16 02:41

[QUOTE=MathBoy;246653]Do you know where I can get a proof for this no trivial gcd(x-y,n) factor base property ?[/QUOTE]

Factor base property?

[QUOTE=MathBoy;246653]I am also curious is msieve the best/most optimal software I can get for factorization of large numbers?[/QUOTE]

No, but it's nice in that it's entirely automatic.

R.D. Silverman 2011-01-16 03:03

[QUOTE=MathBoy;246653]Question 1
Do you know where I can get a proof for this no trivial gcd(x-y,n) factor base property ?
[/QUOTE]

Huh???? You are confused. Why do you think that the GCD computation
is related to the factor base?

As for the proof that non-trivial GCD's always exist, it is elementary.
But it requires some knowledge that you seem to lack: basic number
theory. Sketch of proof: count the quadratic residues for a number with
two distinct prime factors.

[QUOTE]
Is their a proof for all the algorithms that use a factor bases NFS , QS ,...etc or is the proof more general and holds for all algorithms that compute a factor bases in a certain way,...etc ?
[/QUOTE]

You are thoroughly confused. Once again: go read how these methods
work. Then if you have questions, get back to us.

Mr. P-1 2011-01-16 10:58

[QUOTE=R.D. Silverman;246667]Huh???? You are confused. Why do you think that the GCD computation
is related to the factor base?[/QUOTE]

If I'm not mistaken, his question this:

Some congruences of squares are trivial. Do we know that a non-trivial congruence can always be constructed from the set of relations in a particular factor base? If so, how can this be proved?

CRGreathouse 2011-01-16 17:34

[QUOTE=Mr. P-1;246717]If I'm not mistaken, his question this:

Some congruences of squares are trivial. Do we know that a non-trivial congruence can always be constructed from the set of relations in a particular factor base? If so, how can this be proved?[/QUOTE]

If that was actually the question: there are factor bases small enough that you won't find relations at all. For example, the factor base {}. But this has nothing to do with finding congruences given relations.

MathBoy 2011-01-16 18:45

Well, basically I am struggling to see 2 aspects of these sieve methods
1) How can you generate a factor bases for a given number very quickly?
2) How do you know that this factor bases will give rise to a gcd(x-y,n) that isn't the trivial 1 or n divisors.

Silverman said something that their is a proof that a factor bases will contain a nontrivial factor. Using some relation to the quadratic resides....
I cann't think of how to prove it because I would think the proof would require a method of generating a factor bases. Maybe I am misunderstanding the definition of a factor base. I thought it was just a set of numbers that you look for x^2 = r mod n and that x and r belong to the factor bases set.

I know
[QUOTE]Modulo a prime, there are an equal number of quadratic residues and nonresidues.
Modulo a prime, the product of two quadratic residues is a residue, the product of a residue and a nonresidue is a nonresidue, and the product of two nonresidues is a residue.[/QUOTE]

But the above quote does not imply the set has a nontrivial gcd relation. Or at least I can't see it.

CRGreathouse 2011-01-16 18:47

You seem to have a serious misunderstanding of what factor bases are/are for. I don't think I'm going to be able to explain you out of that. Is someone else up to the task?

MathBoy 2011-01-16 19:05

[QUOTE]If I'm not mistaken, his question this:

Some congruences of squares are trivial. Do we know that a non-trivial congruence can always be constructed from the set of relations in a particular factor base? If so, how can this be proved?[/QUOTE]

yes this is basically what I meant in my question 2. (nontrivial being some congruence that gives rise to a gcd(x-y,n) that is not equal to 1 or n)


Question 1 is basically asking how can we find a factor bases quickly since all these seive methods rely on being able to generate a factor bases and find non trivial congruences/gcd's quickly.
But I don't believe I fully understand what a factor bases is.
I have read Wikipedia and a few other things on Google I get the impression that they are just randomly chosen sets of numbers that you look for congruences in that have the relation if x^2 = y mod n and x belongs to the factor bases set then y must belong to the factor bases set as well.

CRGreathouse 2011-01-16 20:22

It's not hard to "find a factor bases quickly"; this is the first step in the computation. The hard parts are finding relations and then turning those relations into congruences.

So if your question is, "Can we find the factor base quickly?", the answer is yes -- this is trivial. (I would never use the term "find"; there's nothing hidden about these, unlike the relations or factors.) If your question is, "Will there always be congruences to find?" then the answer is also yes.

Summary:
1. Build factor base (quick)
2. Find relations (slow, but can be done on many computers to speed it up)
3. Process relations (not too slow, but needs to be done on a single [fast] computer with lots of memory if the number is large)
4. Find factor using congruences found in step 3 (quick)

Analogy:
1. Rent a building big enough to fit an airplane.
2. Make 6 million airplane parts (slow, but you can have many companies make individual parts)
3. Assemble the parts (not too slow, compared to making all of the pieces one-by-one, but takes special expertise)
4. Taxi the plane out of the area

So your question, to me, sounds like "Can we rent a hanger to make an airplane?". My answer is "Yes, but that's not the hard part.". If the question is instead, "If we built an airplane, would we be able to taxi it out of the room?" then the answer is yes, but this relies on our understanding of airplanes (they have wheels and engines that allow them to taxi).

jasonp 2011-01-16 21:10

You've received many cogent responses to your questions; I'll only add that the time-consuming part of all the sieve-based factoring methods is the collection of relations. The rate at which this happens depends on how likely a random number of a given size is 'smooth', i.e. has only small factors. One can use analytic number theory to quantify the odds that this happens; it depends on the size of the facotr base and the size of the random number, which varies with the factorization method. We also know how many it times it has to happen so that we have 'enough' relations; multiply by the (asymptotic) time needed to find one relation and you get the asymptotic runtime of each method.

You are correct that the 'post processing' stage for all of the sieve-based methods is essentially the same. Some of the details are different, i.e. NFS needs a lot of additional work in the number field you have constructed, but those details don't detract from the general idea of what postprocessing does.

As for whether msieve is the best package for factoring numbers, you need to be a little more specific. Other packages will be better if you need to factor a huge number of very small inputs. Packages like YAFU will do a better job at factoring medium-size numbers, i.e. from maybe 60 to 90 digits in size, because YAFU's QS code is multithreaded and more efficient in general. Msieve uses GMP-ECM for the P+-1 and ECM algorithms, but uses an automatic framework for calling that library and it's not suitable for serious ECM work.

For GNFS, there are only three packages that are readily available and can scale to large problems: GGNFS, CADO-NFS and Msieve. Development on GGNFS has basically stopped, other than small improvements to the sieving tools. CADO-NFS is under very active development and incorporates state-of-the-art improvements, but needs more work before it can handle problems much larger than 512 bits.

Of course I'm a little biased about the remaining tool :) Msieve is the only publicly available tool that can handle larger NFS jobs. More seriously, you should not rely on the sieving tools in Msieve, they were written years ago and the siever in GGNFS and CADO-NFS are much much better. The state of currently-available NFS tools is a little confusing because to a certain extent all of these tools are research efforts.

cmd 2011-01-16 21:18

Perhaps it is time to do [URL="http://1.bp.blogspot.com/_rvR3ouziO8g/TTHavVyw2nI/AAAAAAAAAwg/T2gxOo5V3kw/s1600/p_zigzag_zvi.PNG"]something different[/URL]


[URL="http://www.youtube.com/watch?v=qbtpSxpryKY"]team[/URL]



[URL="http://www.youtube.com/watch?v=5JMwnatgkgg&feature=related"]m48-b[/URL]


ps
carlo = ( Karl ):
be just as difficult to understand [URL="http://3.bp.blogspot.com/_rvR3ouziO8g/TNh63k3uQ4I/AAAAAAAAAuY/ERV8Y7A3AaM/s1600/a1p2.PNG"]this pattern[/URL]?

Mr. P-1 2011-01-17 00:15

[QUOTE=MathBoy;246818]But I don't believe I fully understand what a factor bases is.
I have read Wikipedia and a few other things on Google I get the impression that they are just randomly chosen sets of numbers that you look for congruences in that have the relation if x^2 = y mod n and x belongs to the factor bases set then y must belong to the factor bases set as well.[/QUOTE]

My understanding which may be no better than yours, is as follows.

A factor base is a designated set of small prime numbers.

A relation is a congruence x^2 = y (mod n) satisfying the following criteria.

1. There is no restriction on x.

2. y is such that if p appears in the prime factorisation of y to an odd power, then p lies within the factor base. (Some algorithms allow p to be a "large prime" not in the factor base under certain circumstances, but this is a complication best ignored at this point.)

Additionally, it is a trivial theorem that if N is an odd composite with a * b a proper factorisation, the a congruence of squares yielding that factorisation must exist.

Also any congruence of squares is vacuously a relation as defined above, to any factor base, including the empty factor base.

Consequently, given odd composite n, it is trivially true that, given any factor base (including the empty factor base), the set of relations is non empty and contains every congruence of squares. Of course, if we could find the congruences of squares directly, we wouldn't need to bother with factor bases and relations. The clever part (which I don't remotely understand) of these algorithms is choosing a factor base small enough that we don't need too many relations to guarantee dependencies, from which congruence of squares can be derived, yet large enough that we can find the relations in the first place. And of course, actually finding the relations. That's clever too.

I should be very grateful if Mr. Silverman could tell me how much I suck, or if any of the other knowledgeable people here could explain how I'm wrong (or right?).

R.D. Silverman 2011-01-17 00:16

[QUOTE=MathBoy;246805]Well, basically I am struggling to see 2 aspects of these sieve methods
1) How can you generate a factor bases for a given number very quickly?
[/QUOTE]

I am about to lose patience. You simply do not have any understanding
about how these methods work. Go READ ABOUT THEM.

Computing the factor base is TRIVIAL.

[QUOTE]
Maybe I am misunderstanding the definition of a factor base. I thought it was just a set of numbers that you look for x^2 = r mod n and that x and r belong to the factor bases set.
[/QUOTE]

NO! NO! NO!

This is a total misunderstanding. Go READ.

science_man_88 2011-01-17 00:47

[QUOTE=R.D. Silverman;246888]I am about to lose patience. You simply do not have any understanding
about how these methods work. Go READ ABOUT THEM.

Computing the factor base is TRIVIAL.



NO! NO! NO!

This is a total misunderstanding. Go READ.[/QUOTE]

is [url]http://en.wikipedia.org/wiki/Factor_base[/url] a good source ? If so I partially understand but not to a level that I could help yet. When I figure this out maybe I'll try looking more into some of the algorithms.

cmd 2011-01-17 01:10

[URL="http://en.wikipedia.org/wiki/Generating_primes"]secur!ty[/URL] :
delicate subject ...
real sieve analysis ?

science_man_88 2011-01-17 01:15

[QUOTE=cmd;246894][URL="http://en.wikipedia.org/wiki/Generating_primes"]secur!ty[/URL] :
delicate subject ...
real sieve analysis ?[/QUOTE]

I'm not here to apply it to cryptography I'm just here to learn, of course like many I've learned this is not the place to learn as you get killed if you have interest in subjects beyond your current knowledge.

CRGreathouse 2011-01-17 03:03

[QUOTE=science_man_88;246891]is [url]http://en.wikipedia.org/wiki/Factor_base[/url] a good source ?[/QUOTE]

Not really, but it does at least give a reasonable definition.

cheesehead 2011-01-19 02:52

[QUOTE=science_man_88;246891]is [URL]http://en.wikipedia.org/wiki/Factor_base[/URL] a good source ? If so I partially understand but not to a level that I could help yet.[/QUOTE]Let's not forget that MathWorld (formerly Eric Weisstein's World of Mathematics, now at [URL]http://mathworld.wolfram.com/[/URL] ) is a good source of concise definitions, though it doesn't go into history and such.

MathBoy 2011-01-21 20:04

well, I have been looking at the Wikipedia , and Mathworld definitions but still don't fully think I understand it.

definition of a factor bases:
The primes with Legendre symbol (n/p)=1 (less than pi(d) for trial divisor d) which need be considered when using the quadratic sieve factorization method

Question
From this definition how do you know what d should be so that you can find a nontrivial square congruence relation for n? (i.e x^2 = y^2 mod n)


From this definition it seems that a factor bases is just a set of prime numbers under a randomly chosen pi(d) such that n is a quadratic residue mod all those primes in that set.

sorry if I am not getting the obvious.

CRGreathouse 2011-01-21 20:07

[QUOTE=MathBoy;248003]From this definition how do you know what d should be so that you can find a nontrivial square congruence relation for n?[/QUOTE]

Choose whatever you like. If it works, you find a factor; if it fails, choose a larger d and start over.

There are tables and heuristics and data to help you choose a number large enough without being so large that it's slow.

[QUOTE=MathBoy;248003]From this definition it seems that a factor bases is just a set of prime numbers under a randomly chosen pi(d) such that n is a quadratic residue mod all those primes in that set.[/QUOTE]

Right. Depending on what method you use, it's either easy to build the factor base or entirely trivial.

davar55 2011-01-21 20:13

[QUOTE=R.D. Silverman;246620]For general algorithms yes. But ECM is better at factoring when
factors are relatively small.
[/QUOTE]

Is that so? Could you please quantify this claim?

MathBoy 2011-01-21 20:19

[QUOTE]Choose whatever you like. If it works, you find a factor; if it fails, choose a larger d and start over.

There are tables and heuristics and data to help you choose a number large enough without being so large that it's slow.[/QUOTE]

So the QS , GNFS , and their variants
Won't give you any factors provided you don't choose d high enough / correctly?
If this is true then these methods really don't guarantee you a factor necessarily only if their is a guaranteed way to find d such that a nontrivial relation holds in your factor bases and you can reasonable calculate the factor bases set.

Correct me if I am wrong

CRGreathouse 2011-01-21 20:58

[QUOTE=MathBoy;248011]So the QS , GNFS , and their variants
Won't give you any factors provided you don't choose d high enough / correctly?[/QUOTE]

Only insofar as trial division won't give you a factor if you don't go high enough.

With trial division, you know that there is a factor to find if you search long enough. With, e.g., QS you know that there are enough relations if you search long enough. It would be easy to come up with a value that would be guaranteed high enough to work in all cases, but it's much better to choose a good value rather than a worst-case value. (In practice, you never change the size of the factor base; if the size was a little off you just sieve slightly longer than you'd otherwise sieve.)

MathBoy 2011-01-21 21:24

OK

But obtaining a factor bases would be equivalent to finding all primes that have the relation x^2 = n mod p in the set of numbers pi(d).

I would think this is still a long process to do and takes brute force.
Unless their is an easy way to compute if n is a quadratic residue mod a prime fast? And to compute it fast for all primes under pi(d). Either way won't this be equivalent to trial divisors for numbers up to pi(d) for runtime. Seems it to me.

Also is this definition of a factor bases for the QS the same for the GNFS , SNFS , RNFS ,...etc?
Or is their a different definition for each method?


Thank you for all your help in understanding this.

R.D. Silverman 2011-01-21 21:50

[QUOTE=MathBoy;248003]well, I have been looking at the Wikipedia , and Mathworld definitions but still don't fully think I understand it.

definition of a factor bases:
The primes with Legendre symbol (n/p)=1 (less than pi(d) for trial divisor d) which need be considered when using the quadratic sieve factorization method
[/QUOTE]

No. The part about "trial divisor d" is wrong.

We are discussing only QS here. The factor base is a set of primes
that are quadratic residues of the number being factored. The set consists
of all primes less than a specified bound B. This bound is an [i] input
parameter.[/i]

We use a quadratic polynomial Ax^2 + 2Bx + C whose discriminant
is a multiple k, of n. k is chosen to maximize the Knuth-Schroeppel
function.

READ my 1987 paper: The Multiple Polynomial Quadratic Sieve. Math. Comp.


[QUOTE]

From this definition it seems that a factor bases is just a set of prime numbers under a randomly chosen pi(d) such that n is a quadratic residue mod all those primes in that set.

[/QUOTE]

"under a randomly chosen pi(d)" is gibberish. The correct statement should
be "all primes less than a chosen bound B". [you also need the unit
-1 to hold the sign bit]

R.D. Silverman 2011-01-21 21:58

[QUOTE=MathBoy;248046]OK

But obtaining a factor bases would be equivalent to finding all primes that have the relation x^2 = n mod p in the set of numbers pi(d).
[/QUOTE]

This is more gibberish. You want all primes p < B, such that (p|n) = 1
[Legendre symbol]. This takes a few seconds at most for even fairly
large B [e.g. B = 10^7]
[QUOTE]

I would think this is still a long process to do and takes brute force.
Unless their is an easy way to compute if n is a quadratic residue mod a prime fast?
[/QUOTE]

Get hold of Crandall & Pomerance: Prime Numbers, A computational
perspective.

It is TRIVIAL to compute the Legendre symbol.

[QUOTE]
And to compute it fast for all primes under pi(d). Either way won't this be equivalent to trial divisors for numbers up to pi(d) for runtime. Seems it to me.
[/QUOTE]

"seems to me"???? You don't know enough to have an opinion.

You need to learn some basic number theory. Without it, understanding
these algorithms will be hopeless. Start by looking up "Quadratic Reciprocity"

And stop using the word "under". If you mean "less than", than say it!


[QUOTE]
Also is this definition of a factor bases for the QS the same for the GNFS , SNFS , RNFS ,...etc?
Or is their a different definition for each method?
[/QUOTE]

I already discussed this. Go back and read what I wrote.

R.D. Silverman 2011-01-21 22:03

[QUOTE=MathBoy;248011]So the QS , GNFS , and their variants
Won't give you any factors provided you don't choose d high enough / correctly?
If this is true then these methods really don't guarantee you a factor necessarily only if their is a guaranteed way to find d such that a nontrivial relation holds in your factor bases and you can reasonable calculate the factor bases set.

Correct me if I am wrong[/QUOTE]

Stop making guesses and go read my Math. Comp. paper. It explains
everything you need to know.

CRGreathouse 2011-01-22 03:40

[QUOTE=R.D. Silverman;248063]Stop making guesses and go read my Math. Comp. paper. It explains
everything you need to know.[/QUOTE]

Actually, he shouldn't -- it's not yet within his reach. He'll need to learn more before tackling that paper, and the forums are a passable place for that.

MathBoy 2011-01-22 18:39

ok in your paper

I am having trouble understanding

(iv) and (v)

what does add the value of [log(pi)] have to do with anything?
I get that if r1, r2 are the 2 solutions to Q(x) = 0 mod p_i then their exist other solutions of the form r1+kp_i for all k belonging to Z.

I also get how to find a factor bases quickly using Legendre symbol and using Tonelli–Shanks algorithm or Pocklington's algorithm to find the y value in y^2 = n mod p for the factor bases.

(v) I get M*sqrt(N) approx by the fact that if Q(x) = (x - [sqrt(n)])^2 - n so when x is small we get Q(x) approx = 2*x*[sqrt(n)]

Don't get where we are getting these log calculations in step (iv) and (v).

in the example here [URL="http://en.wikipedia.org/wiki/Quadratic_sieve"]http://en.wikipedia.org/wiki/Quadratic_sieve[/URL]
Data collection part

I get this completely they are using no log calculations and just look for the entries to be 1's in V array. This would imply that those V(x) elements = 1 factor completely in the factor bases. Which they call B-smooth.

But going by this how do we know we are going to have enough B-smooth numbers such that we can get FB+1 relations to make the matrix have non-trivial solutions. This would depend on if we chose are B bound and M sieve interval correctly.

Either way I would like to know how these log calculations in your paper relate to the example in wikipedia seems their doing something completely different in a few stages of the data collection part.

R.D. Silverman 2011-01-23 00:40

[QUOTE=MathBoy;248420]ok in your paper

I am having trouble understanding

(iv) and (v)

what does add the value of [log(pi)] have to do with anything?
I get that if r1, r2 are the 2 solutions to Q(x) = 0 mod p_i then their exist other solutions of the form r1+kp_i for all k belonging to Z.

[/QUOTE]

suppose that q(x) := Ax^2 + 2Bx + C = a mod n for x = x_0.

We want a to factor over the factor base. If

a = p1*p2*p3 * ....

Then log(a) = log(p1) + log(p2) + log(p3) .....

The sieve accumulates logs of the prime p_i at locations x_j such
that q(x_j) = 0 mod p_i. i.e. p_i divides a. Thus, by accumulating the
logs of the primes in the "right locations", you can attempt to factor
many values of q(x) all at once.

[QUOTE]

But going by this how do we know we are going to have enough B-smooth numbers such that we can get FB+1 relations to make the matrix have non-trivial solutions. This would depend on if we chose are B bound and M sieve interval correctly.

Either way I would like to know how these log calculations in your paper relate to the example in wikipedia seems their doing something completely different in a few stages of the data collection part.[/QUOTE]

My paper gives values of B that work for various sized composites.

As for how they were first found? Experimentation.

MathBoy 2011-01-23 02:05

[QUOTE]suppose that q(x) := Ax^2 + 2Bx + C = a mod n for x = x_0.

We want a to factor over the factor base. If

a = p1*p2*p3 * ....

Then log(a) = log(p1) + log(p2) + log(p3) .....

The sieve accumulates logs of the prime p_i at locations x_j such
that q(x_j) = 0 mod p_i. i.e. p_i divides a. Thus, by accumulating the
logs of the primes in the "right locations", you can attempt to factor
many values of q(x) all at once.
[/QUOTE]

confused, I get the properties of logs ,...etc but when I look at the example given on Wikipedia the data collection step uses nothing about logs only the elements in V that are 1.

I guess I don't see where these logs are and why we are using them

sorry if I am not seeing the obvious.

axn 2011-01-23 02:54

[QUOTE=MathBoy;248584]confused, I get the properties of logs ,...etc but when I look at the example given on Wikipedia the data collection step uses nothing about logs only the elements in V that are 1.

I guess I don't see where these logs are and why we are using them

sorry if I am not seeing the obvious.[/QUOTE]

[url]http://en.wikipedia.org/wiki/Quadratic_sieve#Checking_smoothness_by_sieving[/url]

It clearly states where the logs are being added.

MathBoy 2011-01-23 05:19

well where are they using log(p) stuff in this example.

[URL="http://en.wikipedia.org/wiki/Quadratic_sieve#Example_of_basic_sieve"]http://en.wikipedia.org/wiki/Quadratic_sieve#Example_of_basic_sieve[/URL]

A[] I am assuming is the V in this example

WraithX 2011-01-23 05:44

[QUOTE=MathBoy;248634]well where are they using log(p) stuff in this example.

[URL="http://en.wikipedia.org/wiki/Quadratic_sieve#Example_of_basic_sieve"]http://en.wikipedia.org/wiki/Quadratic_sieve#Example_of_basic_sieve[/URL]

A[] I am assuming is the V in this example[/QUOTE]

I don't know where you got A[], but let's use it for this example. In that wikipedia page, they are dividing the entries in V[] by the primes in the factor base to see if that entry in V[] becomes 1.

Another way to do this is to set all the values in the array (lets call our array A[]) to zero. Then, for the prime 2 (from the example factor base), we find the first relation that has 2 as a factor. (each relation corresponds to one location in A[]) Then for that location in A[] (lets call it x), we increment A[x + 2*k] (0 <= k <= len(A)/2) by log(2). Then, for the prime 17 (the next prime in the example factor base), we find the first relation that has 17 as a factor. Then for that location in A[] (lets call it y), we increment A[y + 17*j] (0 <= j <= len(A)/17) by log(17). etc. Then, at the end, we will have an array A[] that will have some zero entries (not smooth at all, according to the factor base), and then all the other entries will represent different levels of smoothness. At this point, we choose a minimum smoothness to see which entries from A[] correspond to which relations we want to keep. So, if we want the entries in A[] to be > 5.7 (pick a number here, another algorithm for this), we search through the array to see which entries are > 5.7, then we know which relations to keep.

MathBoy 2011-01-23 07:10

OK , but if you do it that way then you don't know if the nonzero entries can be completely factored over your factor bases.

Since on the Wikipedia example a 1 in V indicates that element in V was completely factored over your factor bases.

How do you know this if you use log addition method?

I would assume the only way is to sum up all the log(p_i) in your factor bases
then search in your A[] for entries = to sum over all FB element log(p_i)

If this is how it's done then I get it.

jasonp 2011-01-23 15:14

Using logarithms is a performance optimization; you have to factor a huge number of integers in a time much faster than trial dividing each one.

If you know that a small prime p divides a number x, you also know that it cannot divide the next p-1 numbers after x. This means you can skip trial division by p for a while. That's a long-winded way of describing a sieve procedure.

The sieve needs one entry for each x. You need to find which of the x is almost completely a product of small primes p. You could make each entry an integer and multiply each x by every p when p divides that integer, or you can work in logarithms and perform an addition of log(p). The latter is much faster and more space-efficient than the former, and is what give the sieve based methods a great deal of their power.

MathBoy 2011-01-23 16:37

Ok, but what I am trying to get at is in the example on Wikipedia you know that a entry 1 in V means the number completely factored in FB.

How do you know the equivalent when using log's?

Using a certain bound 5.5 > doesn't mean those numbers in the A[] having this property completely factor in FB.


I do understand using logs is better computationally saves time and space though.
Just don't understand how you determine which elements completely factor in FB.

Also in the wikipedia example a entry 1 in V means the element completely factored but this will only catch elements that are products of primes in the set and not products of primes with powers different then 1. So it will give you only at most 2^pi(d) numbers in V that completely factor, where d is your randomly chosen bound. Their could be tons more that have higher powers ...etc

axn 2011-01-23 17:34

[QUOTE=MathBoy;248731]Ok, but what I am trying to get at is in the example on Wikipedia you know that a entry 1 in V means the number completely factored in FB.

How do you know the equivalent when using log's?

Using a certain bound 5.5 > doesn't mean those numbers in the A[] having this property completely factor in FB.


I do understand using logs is better computationally saves time and space though.
Just don't understand how you determine which elements completely factor in FB.
[/QUOTE]

For the love of god, read
[QUOTE="the wiki page"]At the end of the factor base, any A[] containing a value [B]above a threshold of roughly log(n)[/B] will correspond to a value of y(x) which splits over the factor base. The information about exactly which primes divide y(x) has been lost, but it has only small factors, and there are many good algorithms (trial division by small primes, SQUFOF, Pollard rho, and ECM are usually used in some combination) for factoring a number known to have only small factors.[/QUOTE]
My bold.

fivemack 2011-01-23 18:52

[QUOTE=MathBoy;248731]Ok, but what I am trying to get at is in the example on Wikipedia you know that a entry 1 in V means the number completely factored in FB.

How do you know the equivalent when using log's?

Using a certain bound 5.5 > doesn't mean those numbers in the A[] having this property completely factor in FB.
[/QUOTE]

That's fine; the point for all factor-base algorithms is that, if you've picked B correctly, you're looking for things of which there exist significantly more than the number you need. So it's worth making rough-and-ready choices to make the sieving faster, even at a price of sometimes getting false negatives and positives.

You take all the sievepoints that score highly enough, and then factor the associated number by trial division (or by some more clever method - a lot of the interesting bits of the GNFS sievers are in the choice of that method); it may be that it doesn't factor completely over the factor base. If it does factor completely, use it; if it doesn't, go on to the next high-scoring sievepoint.

The really good trick here is the large-primes variant - you sieve for things that have some factors in the factor base, then do a full factorisation by a cleverer method, and *add to the factor base* the small-enough extra factors you find that lie outside it. You can handle them specially rather than by full dense-matrix methods (for instance, if a prime appears only once in the entire job you just throw it away), and this technique turns out to be absolutely essential for getting adequate performance out of NFS and QS.

MathBoy 2011-01-25 03:32

OK so the log method doesn't guarantee that all the numbers over a given
threshold factor completely. You test them individually to see if they do and either you discard the ones that don't or add the few new primes to the FB.

But then how is this method better then just going thru all V[r1+kp_i] and as you do divide by the highest power prime. Just like the Wikipedia example does except it doesn't divide thru by the highest power ( just uses power =1 ) .

If it did at the end, the 1 entries in the V would indicate the only numbers in V that could completely factored over FB.

And you could set it to count up to FB + 1 relations and stop so it doesn't go thru the whole thing, just goes until it finds enough.

Their are tons of different variations but they all seem to have the same runtime except considering the actual factoring algorithm used to factor the small V numbers.

Basically I think the wikipedia way is just as good as the log way? Correct me if I am wrong.

CRGreathouse 2011-01-25 03:46

[QUOTE=MathBoy;249103]Basically I think the wikipedia way is just as good as the log way? Correct me if I am wrong.[/QUOTE]

No, it's much slower.

bsquared 2011-01-25 04:53

[QUOTE=MathBoy;249103]

Basically I think the wikipedia way is just as good as the log way? Correct me if I am wrong.[/QUOTE]

Good as in "more accurate", sure. Good as in "as fast", no way. To factor, say, a 75 digit number, a modern quadratic sieve would typically set up 100,000 or so vectors of length 524288. Each point in each one of these vectors represents a number to be trial divided. And each number is on the order of sqrt(n) in size (about 37 digits). Say you had the prime '3' in your factor base. What you propose would involve trial dividing '3' into every third position in every one of those vectors. That's on the order of 10 billion divisions. Divisions are slow to perform in the cpu: each might take 40 clock cycles. Just to check every position for divisibility by 3 might take 400e9 / 3e9 ~= 2 minutes. Additions, on the other hand, take 1-2 clocks (assuming every addition is a write to a cache memory location). I'd rather use additions.

Of course, modern QS implementations don't bother checking for divisibility by 3 at all. The key to speed is sloppiness. We approximate using logs. We don't use the smallest primes at all when sieving. We ignore prime powers. We only look closely at results that are "good enough". The entire exercise is to be able to find (by way of approximate sieving techniques) those locations which are probably good enough very quickly - only spend effort doing laborious divisions on vector locations which are likely to completely factor over the factor base.

Hope this helps.

xilman 2011-01-25 21:00

[QUOTE=bsquared;249108]... only spend effort doing laborious divisions on vector locations which are likely to completely factor over the factor base.[/QUOTE]And not even then if you can avoid it. The keyword here is "resieving".

Paul

fivemack 2011-01-25 21:29

Can you give me a good reference for resieving? Is there more to it than simply running the sieve pass again and accumulating the sieve primes that hit specially-marked locations into some other data structure?

R.D. Silverman 2011-01-25 22:55

[QUOTE=fivemack;249203]Can you give me a good reference for resieving? Is there more to it than simply running the sieve pass again and accumulating the sieve primes that hit specially-marked locations into some other data structure?[/QUOTE]

No. You got it.

bsquared 2011-01-25 22:59

[QUOTE=xilman;249188]And not even then if you can avoid it. The keyword here is "resieving".

Paul[/QUOTE]

Good point, although that's maybe more important for NFS lattice sieving. Profiling my QS code says that the actual time spend doing divisions is ~ 5% of the total time.

I'd also be interested in a re-sieving reference, if you know of one. I've googled it before and haven't turned up anything substantial.

[edit]
nevermind. saw RDS post.

bsquared 2011-01-25 23:05

[QUOTE=R.D. Silverman;249223]No. You got it.[/QUOTE]

One part is still confusing me. If the division is never performed, how do we determine either the value of any residual large prime or the value of a residual small composite to split when using two large primes (or three...)?

R.D. Silverman 2011-01-25 23:20

[QUOTE=bsquared;249228]One part is still confusing me. If the division is never performed, how do we determine either the value of any residual large prime or the value of a residual small composite to split when using two large primes (or three...)?[/QUOTE]

The division [i]is[/i] performed! It's just that the resieve tells you
exactly which primes will divide the norm. It isn't [i]trial[/i] division.
You only divide by primes that are known to divide.

bsquared 2011-01-25 23:23

[QUOTE=R.D. Silverman;249233]The division [i]is[/i] performed! It's just that the resieve tells you
exactly which primes will divide the norm. It isn't [i]trial[/i] division.
You only divide by primes that are known to divide.[/QUOTE]

:tu:

Ahh!!

Thanks - sorry for the mental block there.

- b.

MathBoy 2011-01-26 02:46

I am still unclear how these logs are faster.
Since for example let V[] be the set that we are going to sieve using the log method.

let N = the threshold log value for example lets use log(V[i]) > 5.7

using this we still have to go thru all the elements in V and for the elements > 5.7 they still have to be factored. So where is the time saver?

The only time saver I can see is the method used in factoring the elements in V. Maybe I don't use trial but ECM or something different to speed this up...
But I would still think this is a slow process.

once we have these FB+1 relations and their factorization then the rest of the algorithm just relys on matrix solving and gcd algorithm. Which I believe the matrix can be solved in O(n^2.3ish) and gcd(b1,b2) is O(log(max{#b1,#b2})) . So the only difference, in creating a better algorithm like QS , NFS ,...etc variants is to improve the data collection part. For the most part the sieving piece. Find a factor bases really cann't be improved that much since it is almost trivial to find one O(n) using Legendre symbol "provided we already know the primes in the set". And finding the x in x^2 mod p is also pretty fast just find a quadratic nonresidue a (50% changes randomly picking)
and compute (a + sqrt(a^2 - n^2 ) )^p+1/2


Side question these fastest/best known algorithms all use the same thought process look for relations x^2 = y^2 mod N.
Would their be any benefit in looking not at the case power = 2 but also in the case where power >= 3? Or maybe varying the power as a way of adding another edge to the algorithm. Just in the way MPQS varies over many different polynomials.

alexhiggins732 2011-02-11 21:06

[QUOTE=MathBoy;249254]I am still unclear how these logs are faster.
Since for example let V[] be the set that we are going to sieve using the log method.

let N = the threshold log value for example lets use log(V[i]) > 5.7

using this we still have to go thru all the elements in V and for the elements > 5.7 they still have to be factored. So where is the time saver?

The only time saver I can see is the method used in factoring the elements in V. Maybe I don't use trial but ECM or something different to speed this up...
[/QUOTE]

There are many time savers.

Let's suppose we are factoring N, and let f(x)= x+i^2 mod N, with x=[sqrt(n)] ( the first integer larger than the square root of N)

Let suppose to get accumulate enough relations we must consider all residues from i=0 to 2^32.

Using your example (assuming our computers capable) we then have v[2^32].

Lets also say suppose factor base contains 2,000 primes just to make things easier

Now not using the log method:

1) Before you can even trial divide a v[i], you need to actual evaluate x^2 mod N for each v[]. would require 2^32 squarings and multi-precion divisions to calcuate v[0]... v[2^32]

2) Then we need to trial divide each v by the corresponding primes in our factor base. lets say a prime p divides v[i] n times, then you need to divide v[i] n+1 time, because you need to divide v[i] by p until v[i] mod p <>0.

Now using the log method.

1) You need not calculate x^2 mod n for all v. In fact, it will suffice to calculate x^2 mod n ONLY ONCE then taking the MAXLOG log of this value. Eg, MAXLOG= log((sqrt(n)+1)x^2) mod n

2) Now 2^32 v[i], we simply say v[i] += log(p) for each p in our factor base. In practice, to save time we calculate the log of each p only once before we start sieving and say v[i] += logp(Logp.IndexOf(p)).

From here we subtract some fudge factor from MAXLOG and check all v[] for v[i] > MAXLOG. Here we find a maybe several thousand v[i] that meet this criteria. We then calculate (x+i)^2 mod n for these I and attempt to trial divide only these values.

So by using the log method we exchange the need to calculate on the x+i^2 mod N for 2^32 and the requirement to trial divide everyone of those locations for the ability to perform one the order of 2^32 additions along with only a few thousand evaluations of x+i^2 mod N and trial division of those locations.

How much does this save? Say we sieve 2^16 locations per block.

Let's say to calcute X^2 Mod N for those 65536 location and trial divide each of those locations takes 2 seconds. To sieve all 2^32 locations would require (2^32/2^16) blocks of a size 2^16. This means sieving would take 2^16*2 seconds = 131072 seconds which would be ~ 36.4 hours.

Now using the log methods (using subtraction instead of addition) we just set those 65536 locations to MAX LOG and subtract log(p) from each location divisible by P and then only trial divide v[i] < some Fudge factor. This may take .0002 seconds (without further trial dividing the v[i] on a slow computer which means the sieving would take 13 seconds or so for all 2^32 locations. Granted we still need to trial divide those locations where v[i] > Fudge factor. Since our facto base contains 2,000 primes we may likely only need to trial divide 65,536 location to actually find 2,000 relations smooth to our factor base giving us a final run time of 13 seconds for sieving and 2 seconds for trial division.

15 secs for sieving and trial division using logs vs 36 hours for sieving and trial division not using logs.


All times are UTC. The time now is 05:34.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.