mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2007-02-06, 14:12   #1
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

20008 Posts
Thumbs up Distribution of Parallel Linear Algebra (Thorsten's report)

Tucked away inside of Thorsten's report of a new DL record (mod a
160-digit prime) is a description of what would appear to be the first
public announcement of the latest development in the type of linear
algebra used in NFS factorizations. Here's the NMBRTHRY paragraph:

Quote:
Linear algebra:

The matrix was solved using the Wiedemann algorithm. Most of the
computation was done on the Himalaya cluster at the Institute for
Numerical Simulation. Up to eight jobs, each using 12 to 24 processors,
were run simultaneously on the cluster. In total, solving the linear
system of equations took 14 CPU years for a 3.2 GHz Xeon64.
Spending more time in the sieving step would probably reduce the total
runtime.
It appears that setting the eight jobs is the [new!] distributed step, and
that each job may not need the complete matrix. Each part seems to have
gotten (the usual?) parallel linear algebra; after which there presumably
would be some final assembly. One hopes that more explicit details of the
distribution will appear. -bdodson
bdodson is offline   Reply With Quote
Old 2007-02-06, 14:27   #2
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

161008 Posts
Default

Quote:
Originally Posted by bdodson View Post
Tucked away inside of Thorsten's report of a new DL record (mod a
160-digit prime) is a description of what would appear to be the first
public announcement of the latest development in the type of linear
algebra used in NFS factorizations. Here's the NMBRTHRY paragraph:



It appears that setting the eight jobs is the [new!] distributed step, and
that each job may not need the complete matrix. Each part seems to have
gotten (the usual?) parallel linear algebra; after which there presumably
would be some final assembly. One hopes that more explicit details of the
distribution will appear. -bdodson
Indeed. I hope that more details are forthcoming.

Note that this was a DL computation. The matrix must be solved modulo the
order of the group. One typically does this by solving it modulo small primes,
then pasting the results together with the CRT. Solving the system modulo
one of the small primes can be done completely independently of the others.
This is a natural parallel partition of the problem. One then brings everything
together with the CRT.

Such a partition does not seem to exist for solving it over GF(2). I wish it
did.
R.D. Silverman is offline   Reply With Quote
Old 2007-02-06, 14:52   #3
FactorEyes
 
FactorEyes's Avatar
 
Oct 2006
vomit_frame_pointer

1011001002 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
Such a partition does not seem to exist for solving it over GF(2). I wish it did.
The curse of characteristic 2, yet again.

So many factorization algorithms depend on solving x^2 = y^2 mod n, which ties us to a characteristic 2 computation in the matrix stage. How about jumping to the congruence x^k = y^k mod n (k large and smooth)? Two huge problems with this, of course: (i) sieving, or whatever approach one uses to collect relations, could be inefficient -- but we'd already be at the point where the matrix was much more inefficient anyway; (ii) the linear algebra is now over Z/(k), which ain't a field, but this could be a minor issue.

Anyone think it might go in this direction some day?
FactorEyes is offline   Reply With Quote
Old 2007-02-06, 14:57   #4
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

26×113 Posts
Default

Quote:
Originally Posted by FactorEyes View Post
The curse of characteristic 2, yet again.

How about jumping to the congruence x^k = y^k mod n (k large and smooth)?

Anyone think it might go in this direction some day?
No, it will never go in this direction.

Hint: Ask yourself what are the requirements on the factors of n (and n)
for a solution to the congruence x^3 = y^3 mod n to even exist?

Further hint: such a congruence does not exist for most n and almost all k.
R.D. Silverman is offline   Reply With Quote
Old 2007-02-06, 18:45   #5
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

210 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
Indeed. I hope that more details are forthcoming.

Note that this was a DL computation.
There is such a note in the very first line of my post.

Quote:
Such a partition does not seem to exist for solving it over GF(2).
I wish it did.
We're in disagreement on this topic once again. I didn't Quote the
section of Thorsten's post on solving logs for small primes, which was
"done at the end of the matrix step". I stand by my description; let's
discuss again in six months. -Bruce
bdodson is offline   Reply With Quote
Old 2007-02-07, 18:06   #6
FactorEyes
 
FactorEyes's Avatar
 
Oct 2006
vomit_frame_pointer

16416 Posts
Default

Quote:
Originally Posted by R.D. Silverman View Post
No, it will never go in this direction.

Hint: Ask yourself what are the requirements on the factors of n (and n)
for a solution to the congruence x^3 = y^3 mod n to even exist?

Further hint: such a congruence does not exist for most n and almost all k.
Oh, I get it -- for some reason I thought the x^2 = y^2 congruence at the heart of these things was nothing special, only that it was the easiest to solve.

For a prime exponent k, x^k = x and y^k = y (mod k), so

x^(k-1)+x^(k-2)y+...+y^(k-1) = (x^k - y^k)/(x - y) will be congruent to 1 whenever x and y are incongruent mod k. Hence you can only split off factors which are congruent to 1 modulo k. Yech.
FactorEyes is offline   Reply With Quote
Old 2007-02-07, 18:13   #7
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

26×113 Posts
Default

Quote:
Originally Posted by FactorEyes View Post
Oh, I get it -- for some reason I thought the x^2 = y^2 congruence at the heart of these things was nothing special, only that it was the easiest to solve.

For a prime exponent k, x^k = x and y^k = y (mod k), so

x^(k-1)+x^(k-2)y+...+y^(k-1) = (x^k - y^k)/(x - y) will be congruent to 1 whenever x and y are incongruent mod k. Hence you can only split off factors which are congruent to 1 modulo k. Yech.

Bingo! The exponent 2 works because all primes (except 2) are 1 mod 2.
R.D. Silverman is offline   Reply With Quote
Old 2007-02-07, 21:03   #8
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

210 Posts
Default

Quote:
Originally Posted by bdodson View Post
I didn't Quote the
section of Thorsten's post on solving logs for small primes, which was
"done at the end of the matrix step". I stand by my description; let's
discuss again in six months. -Bruce
Well. Here's another part I neglected to Quote (and/or read carefully),
there's a reference to Joux-Lercier, Math Comp, Nov., 2002 (from a
revision in Oct. 2000). From that article ("improvements in gnfs for DL .."),
Thorsten reports 3 "small differences", including an improvement in the
linear algebra.

I'm still somewhat puzzled by Bob's suggestion that DL gnfs is more
inherently parallel than gnfs for integer factorization. One possible
reduction uses the prime factors of p-1; but Thorsten takes the prime p
in the "usual form" with q= (p-1)/2 also prime. The linear algebra is
then mod q (not 2, indeed). J-L describes an initial structured gauss
step; which sounds as though it may correspond to merging large primes
during filtering (where J-L doesn't use large primes). I'm still seeing a
central computation on a single large matrix --- parallel Lanczos for J-L,
"Wiedemann" for Thorsten. The result of this "matrix step" does supply
logs for some small primes ("good" ones in J-L, "many" in Th). We haven't
yet gotten to the particular number whose log we're supposed to be
computing --- the entire matrix step is a "pre-computation" (J-L) before
getting to the particular log; but the post-computation is then relatively
routine, and not-so intensive. So that's 14 cpu-years for the matrix, and
"a few hours per individual log" (Th).

J-L report an additional sieving step (after the matrix solution) to write
x = A/B; Th has y = -(d/n) mod p. In either case, A,B or d,n are supposed
to be products of small "good" primes or medium primes ("larger" in Th).
The logs of these primes are, in fact, reduced to logs of primes below
2^32. Is this the step Bob wants to distribute? If so, it's just this final
post-computation, not the main matrix "pre-computation" (independent/
before) getting to the specific number whose log is being computed.

I do know for a fact (I was there to hear Contini's version), that block
Lanczos mod 2 did/was adaptable to mod q. So far as I can tell (pending
further enlightenment; please), the other stuff being discussed doesn't
touch the main step of the computation; the "linear algebra", before
getting to the specific log being computed. The central question (to me)
remains those 8 jobs, 12-24 cpus each in the main matrix computation. I'd
like to be able to say that mod q -vs- mod 2 doesn't matter; and that it's
incidental that those 8 jobs are on a single cluster, instead of being
separately done in 8 different countries, across several continents. I'll
admit that doesn't appear to fit Thorsten's description of a "small difference",
unless it's included in the "Wiedemann" that's currently part of the gnfs suit
of Franke-Bahr-Kleinjung. Anyone here that's seen current code?
-bdodson
bdodson is offline   Reply With Quote
Old 2007-02-08, 17:32   #9
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

161008 Posts
Default

Quote:
Originally Posted by bdodson View Post

I'm still somewhat puzzled by Bob's suggestion that DL gnfs is more
inherently parallel than gnfs for integer factorization.

<snip>

getting to the specific log being computed. The central question (to me)
remains those 8 jobs, 12-24 cpus each in the main matrix computation. I'd
like to be able to say that mod q -vs- mod 2 doesn't matter; and that it's
incidental that those 8 jobs are on a single cluster,
-bdodson
Given that one wants to find the null-space of a (given) matrix mod q
for fairly large q we can do the following instead.

Select a set of 32 bit primes p_i such that product(p_i) > q.

Reduce the matrix mod each p_i, then solve it mod p_i for each p_i.
This can be done totally independently for each p_i.

Then find the null space mod q by pasting together the solutions for
the p_i via the Chinese Remainder Theorem.

This reduces the problem of solving a big matrix modulo a (moderately large)
multi-precision integer, to one of solving multiple matrices modulo single
precision primes. One solves these smaller coefficient matrices totally
independently.

Each of the reduced matrices can of course also be solved in parallel.

We have changed the problem from solving a matrix with big numbers
using many processors in parallel to one of solving multiple smaller
number matrices. Each of the smaller matrices is still solved in parallel,
but we use fewer processors for solving each of these small number
matrices. Processor utilization *decreases* for large numbers of processors
working on a parallel linear algebra problem. Breaking a large number of
processors into smaller clusters, each cluster working independently
should increase per processor utilization.
R.D. Silverman is offline   Reply With Quote
Old 2007-02-08, 19:35   #10
sean
 
sean's Avatar
 
Aug 2004
New Zealand

3·73 Posts
Default

Quote:
Originally Posted by bdodson View Post
unless it's included in the "Wiedemann" that's currently part of the gnfs suit
of Franke-Bahr-Kleinjung. Anyone here that's seen current code?
-bdodson
I can't claim to understand much of this discussion, and I haven't seen the current code. However, I have used the Wiedemann implementation in Franke's MPQS suite. In that version the matrix job had some small (short) initial step and then chunks were distributed to various machines (I think the most machines I ever used was 4). The chunks processed independently of each other (no communication until the chunks finished). Processing the chunks was the time consuming part. Finally there was more processing once all the chunks finished, this also took significant time, maybe on the order of the time to do 1 chunk.

All this was for runs I did with MPQS in the C110 range.
sean is offline   Reply With Quote
Old 2007-02-09, 06:34   #11
bdodson
 
bdodson's Avatar
 
Jun 2005
lehigh.edu

100000000002 Posts
Default

Quote:
Originally Posted by sean View Post
... I have used the Wiedemann implementation in Franke's MPQS suite. In that version the matrix job had some small (short) initial step and then chunks were distributed to various machines (I think the most machines I ever used was 4). The chunks processed independently of each other (no communication until the chunks finished). Processing the chunks was the time consuming part. Finally there was more processing once all the chunks finished, this also took significant time, maybe on the order of the time to do 1 chunk. ...
Sure sounds distributed parallel to me. While I was thinking about being
ahead of the curve on deciphering/anticipating posts from Franke-Bahr-
Kleinjung, I'm now inclined towards the view that Thorsten's description
wasn't referring to a new feature at all. (That would fit the "small difference
... in the linear algebra" in his post.) Rather, we seem to be getting,
at long last, new details about how the matrix for RSA200 was done.
Anyone care to wager on whether there was a version of these eight
"chunks" that we have to thank for RSA200 being factored?

On Bob's post, Thanks for a clear description of the classical linear
algebra for mod p DL. I recognize the Chinese Remaindering from Peter's
ecm/fft output. Does anyone know whether the method described is still
applied post-Lanczos (much less, post-Wiedemann)? I certainly didn't hear
it in Scott's description (which appears to have been the first use of Peter's
Lanczos applied to the mod p DL matrix; if I recall, this would have been in
1995, before Crypto95); and it's not referred to in Thorsten's reference
to Joux-Lercier's paper (revised, 2000), which seems to otherwise be
fairly complete in it's detail. -Bruce
bdodson is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Has anyone tried linear algebra on a Threadripper yet? fivemack Hardware 3 2017-10-03 03:11
restarting nfs linear algebra cubaq YAFU 2 2017-04-02 11:35
Linear algebra at 600% CRGreathouse Msieve 8 2009-08-05 07:25
Linear algebra crashes 10metreh Msieve 3 2009-02-02 08:34
Linear algebra proof Damian Math 8 2007-02-12 22:25

All times are UTC. The time now is 05:25.

Sat Oct 31 05:25:28 UTC 2020 up 51 days, 2:36, 2 users, load averages: 2.15, 2.07, 1.93

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.