mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2011-03-17, 16:52   #23
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

11101001001002 Posts
Default

Quote:
Originally Posted by jasonp View Post
Only to a first approximation; we've noticed that parallel runs will speed up if the matrix is packed somewhat more densely than would be optimal by that measure, probably because the communication cost is much more important than the arithmetic intensity in that setting and a smaller matrix reduces overhead by more than the arithmetic increases it. Schemes that pack the matrix for cache efficiency can make this even more confusing, since arithmetic becomes free if matrix elements are packed densely enough.

Cavallar's 2001 NFS filtering paper models the runtime as (linear function of the matrix size) times (the number of nonzeros), with the coefficients of the former determined by experiment. The complete iteration also needs about 5-10% more overhead to account for vector-vector products.
Of course, by packing it more densely, one also makes it smaller.
R.D. Silverman is offline   Reply With Quote
Old 2011-03-17, 17:32   #24
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

3×2,141 Posts
Default

http://www.mersenneforum.org/showpos...45&postcount=2 has filtering of a large set of relations with various different target densities, indicating that (for the machine I was using, a single i7)

Code:
|TD |matrix size|sparse weight|lit/col|ETA in hours|nc*sw|
| 70|23116648 x 23116873|1506911489| 65.2|991|3.48e16|
| 90|21207898 x 21208124|1748363936| 82.4|956|3.71e16|
|110|19922444 x 19922669|1978439421| 99.3|936|3.94e16|
|130|18982850 x 18983075|2197143316|115.7|923|4.17e16|
the runtime is a decreasing function of (number of columns * sparse weight).

Of course, looking at this table, I ran another job with target density 130 and ended up without any usable dependencies. Ze milage, she varies.

Last fiddled with by fivemack on 2011-03-17 at 18:33
fivemack is offline   Reply With Quote
Old 2011-03-17, 21:46   #25
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

5×359 Posts
Default

Guys:
Bob Silverman asked me to read Crandall and Pomerance.
My copy of Shilov's Linear Algebra is AWOL. (but Rudin's Linear Systems was by the front door)
http://kolxo3.tiera.ru/_Papers/Numer...bra(16s).ps.gz
should let me read Montgomery's paper.

What else do I need to move into my brain by way of my bookshelf? (A good intro to number theory wouldn't hurt; I'm trolling for that and anything else required -- I do know standard calculus and ODE's, read topology once, and think I might have an infinite-dimensional integral somewhere)
Christenson is offline   Reply With Quote
Old 2011-03-18, 02:47   #26
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3·1,181 Posts
Default

See this thread.
jasonp is offline   Reply With Quote
Old 2011-03-18, 07:17   #27
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

22·232 Posts
Default

Sorry I've been AWOL in this thread. It's been a crazy two weeks in a busy semester. Answering a few things that came up in this thread:

- It took a month to get 900 cores only because both the Lonestar (in Texas) and Lomonosov (in Moscow) clusters were down for the month of January.

- The LA for the "smaller" NFS@Home numbers is current run on an 8-node, 32-core Core 2 Beowulf cluster connected by Gigabit ethernet all on a dedicated single fast switch. (It actually has 10 nodes, but for various reasons only 8 are used.) Most of the time is spent in communication. It is clear watching the cluster-wide "top" that the cores spend well over half their time waiting for data. On this cluster, a 17.7M matrix from 6,377- took just over 2 weeks to solve.

- I applied for and received a small grant to upgrade these eight nodes to Infiniband for these LA problems, but haven't yet received the money. I'm hoping to have these ordered and installed by the end of April. I expect this to halve the time required to solve a given matrix.

- Jason perhaps overstates my contribution to the current code. A back-of-the-envelope calculation demonstrated that the old architecture spent a tremendous amount of time in communication. Being a physicist, I simplified the problem. Assuming that computation is completely free, how would I design the algorithm to minimize communication? I came up with a rough sketch of the current architecture and proposed it to Jason. Almost overnight (or perhaps it was overnight) he filled in all the details, wrote an elegant implementation, and optimized the hell out of it! I still don't understand all of the nuances of his implementation, but I'm ok with that.

Ummm... I think that covers the major highlights of this thread...
frmky is online now   Reply With Quote
Old 2011-03-19, 02:19   #28
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

179510 Posts
Default

Quote:
Originally Posted by jasonp View Post
See this thread.
I think the basic list, except for C&P, is too light on the fundamentals I need at the moment, as the Montgomery paper has me at least temporarily confused.

So here's my picks from that thread
`The Theory of Algebraic Numbers' by Harry Pollard and Harold Diamond.
This on the list? (Pointing out that the underlying number theory is probably quite important)
`Solving Sparse Linear Equations Over Finite Fields' by Douglas Wiedemann (The competition for block Lanczos)
Christenson is offline   Reply With Quote
Old 2011-03-19, 02:34   #29
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

70316 Posts
Default

Quote:
Originally Posted by frmky View Post
- I applied for and received a small grant to upgrade these eight nodes to Infiniband for these LA problems, but haven't yet received the money. I'm hoping to have these ordered and installed by the end of April. I expect this to halve the time required to solve a given matrix.
A new question: What infiniband link aggregation and signalling rate do you think you are going to end up with? How much, roughly, do these infiniband adapters and cables run?

And an old one: Does it make sense to get a machine with 10s of Gigs of memory and a few processors to do these Lancsoz problems, on the grounds that avoiding communications as much as possible is the best way to speed the algorithm? Or possibly do some really fancy footwork with CUDA on a GPU and some mass storage?
Christenson is offline   Reply With Quote
Old 2011-03-19, 02:45   #30
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

111000000112 Posts
Default

And one more dumb question:
What's a good basic book on Linear Algebra on finite fields? (Shilov is generally concerned with linear algebra on the reals or the complex numbers)
Christenson is offline   Reply With Quote
Old 2011-03-19, 11:40   #31
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

3·2,141 Posts
Default

Quote:
Originally Posted by Christenson View Post
A new question: What infiniband link aggregation and signalling rate do you think you are going to end up with? How much, roughly, do these infiniband adapters and cables run?
The source of Infiniband kit that I'm aware of is http://www.colfaxdirect.com/store/pc/home.asp

The cables are about $60. Small SDR switches have existed at $750 for a long time; a small (8-port) QDR switch now exists for $1850. 24-port SDR is $2150, 18-port QDR is $3690.

SDR network card is $125, QDR network card is $562.

Quote:
And an old one: Does it make sense to get a machine with 10s of Gigs of memory and a few processors to do these Lancsoz problems, on the grounds that avoiding communications as much as possible is the best way to speed the algorithm?
Please spell Lanczos correctly (it's Hungarian, the Z makes the C sound 'ts' rather than 'ch'; pronunciation is lantsosh)

I don't think that does make sense; in particular you get a prodigious amount more memory bandwidth on sixteen computers than you do on one. Greg had a system with eight quad-core Opterons for a while and it was IIRC very disappointing in linalg performance; I have a box with two quad-core Opterons and 32G and its linalg performance is also not great (running -t8 is slower than -t4, and running -t4 is 40% of the speed of running -t4 on an i7).

Maybe (or maybe the earthquake in Japan means I've missed the cheap memory) 24G of DDR3/1600 in a hex-core i7 box is now affordable and interestingly fast. Would cost me £443 for the chip and £253 for the memory to try, which is a lot cheaper than the last time I did the sums, but still not all that tempting.

Last fiddled with by fivemack on 2011-03-19 at 11:42
fivemack is offline   Reply With Quote
Old 2011-03-19, 13:00   #32
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

3·2,141 Posts
Default fivemack's largest matrices

Quote:
Originally Posted by frmky View Post
- The LA for the "smaller" NFS@Home numbers is current run on an 8-node, 32-core Core 2 Beowulf cluster connected by Gigabit ethernet all on a dedicated single fast switch. (It actually has 10 nodes, but for various reasons only 8 are used.) Most of the time is spent in communication. It is clear watching the cluster-wide "top" that the cores spend well over half their time waiting for data. On this cluster, a 17.7M matrix from 6,377- took just over 2 weeks to solve.
For comparison, a 16.9M matrix of mine took 891 hours (5.3 weeks) on four cores of Phenom 2.5GHz with DDR2; a 15.2M took 404 hours on four cores of i7 2.8GHz with DDR3. The 19.5M from 2801^79-1 took 830 hours (with a couple of outages) on the i7; a 24.1M from 2^941-1 took 1081 hours on the i7 and just over 8G memory; the 22.2M from a GNFS C187 is using just under 8G and has expected total runtime 830 hours. So I think a 50M-60M would be entirely possible on the i7 with a load of extra memory, but it would take a year.
fivemack is offline   Reply With Quote
Old 2011-03-19, 13:34   #33
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3·1,181 Posts
Default

Quote:
Originally Posted by Christenson View Post
I think the basic list, except for C&P, is too light on the fundamentals I need at the moment, as the Montgomery paper has me at least temporarily confused.

So here's my picks from that thread
`The Theory of Algebraic Numbers' by Harry Pollard and Harold Diamond.
This on the list? (Pointing out that the underlying number theory is probably quite important)
`Solving Sparse Linear Equations Over Finite Fields' by Douglas Wiedemann (The competition for block Lanczos)
Pollard and Diamond was mentioned in the first response to that thread, unfortunately it's not available anymore unless you live near a university library or cruise Russian e-book web sites.

What you want is uniquely difficult to provide: there are many basic references on the number field sieve but they all skip over the linear algebra because it's a complex subject. There are many books on computational sparse linear algebra but none of them deal in finite fields (that I know of). The block versions of the Lanczos and Wiedemann algorithms are pretty much unique to the NFS domain, so the original papers describing them, and the few follow-ons, are the best we can do outside of analyzing code. One of those follow-ons that was not in the list but is pretty readable is here, a comparatively recent addition.

Last fiddled with by jasonp on 2011-03-19 at 13:41
jasonp is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Msieve with MPI block Lanczos jasonp Msieve 104 2013-06-17 11:28
Initial vector in Lanczos Robert Holmes Miscellaneous Math 4 2010-09-11 01:34
Lanczos error Andi47 Msieve 7 2009-01-11 19:33
Msieve Lanczos scalability Jeff Gilchrist Msieve 1 2009-01-02 09:32
how would you distribute the $100k prize? ixfd64 Lounge 26 2006-01-20 02:33

All times are UTC. The time now is 17:26.


Mon Aug 2 17:26:30 UTC 2021 up 10 days, 11:55, 0 users, load averages: 2.10, 2.19, 2.22

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.