mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2005-06-30, 17:42   #1
Istari
 
Feb 2005

22 Posts
Question Improvements for the CWI NFS software package

Hey all,

Let me first introduce myself, I student at the University of Amsterdam. Currently I am ending my first of a two year Master programma called Grid Computing in Computer Science. Besides computer science I have a great interest in theoretical mathematics, for my Bachelor project last year I implemented several factorization methods (Fermat, Pollard rho, Pollard p-1, BQS and the MPQS) under supervision of H.J.J. te Riele at the CWI. Next year I have almost a whole year for my Masters graduation project and I will work on the NFS software package from the CWI (almost certainly again under the supervision of te Riele). My main goal is to speed up the implementation and implement some new ideas from a point of view which is more related to computer science.

I have been reading this forum for almost a year now and most of the people here are experts in this field and have experience with a NFS software package. Some of you even worked with the software from the CWI, that's why I post this message here. I am interested in ideas or improvements you would like to see in the implementation of NFS, and to be more specific to the people who played with the software suit from the CWI what they (dis)liked about this piece of software.

Of course I cannot promise I will implement all your suggestions but I will do my best I already have the code and am working on it beside my normal school work. I am a big fan of the GMP library and right know I am busy to replace the current big number library (FreeLIP) with GMP (this replacement should result in a speedup), this is a very time consuming task (and of course the new code must be tested extensively). Right now I am doing this only for the sieving stage since this is one of the more time consuming tasks. My second goal is to make the sieving program able to run on big clusters using MPI and improving the sieving code with the help of the bucket sorting techniques which I encountered here and are currently also being used in for instance msieve. I have some other smaller ideas which could improve the code but right know these are the main three things I would like to add to the software package.

Of course I will (eventually) also be working on the other stages, but first things first. For now I will concentrate mainly on the sieving stage. I hope the people here have some great ideas I didn't think of or complaints about this piece of software. I have read a lot about the GNFS but only since a few months I have been able to play with this piece of software. So a lot of people here are certainly more experienced with this software package and could share this experience.

I hope this discussion will results in even a faster NFS program,

kind regards,

Joppe Bos
Istari is offline   Reply With Quote
Old 2005-06-30, 19:14   #2
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

236158 Posts
Default

Quote:
Originally Posted by Istari
Hey all,

Let me first introduce myself, I student at the University of Amsterdam. Currently I am ending my first of a two year Master programma called Grid Computing in Computer Science. Besides computer science I have a great interest in theoretical mathematics, for my Bachelor project last year I implemented several factorization methods (Fermat, Pollard rho, Pollard p-1, BQS and the MPQS) under supervision of H.J.J. te Riele at the CWI. Next year I have almost a whole year for my Masters graduation project and I will work on the NFS software package from the CWI (almost certainly again under the supervision of te Riele). My main goal is to speed up the implementation and implement some new ideas from a point of view which is more related to computer science.

I have been reading this forum for almost a year now and most of the people here are experts in this field and have experience with a NFS software package. Some of you even worked with the software from the CWI, that's why I post this message here. I am interested in ideas or improvements you would like to see in the implementation of NFS, and to be more specific to the people who played with the software suit from the CWI what they (dis)liked about this piece of software.

Of course I cannot promise I will implement all your suggestions but I will do my best I already have the code and am working on it beside my normal school work. I am a big fan of the GMP library and right know I am busy to replace the current big number library (FreeLIP) with GMP (this replacement should result in a speedup), this is a very time consuming task (and of course the new code must be tested extensively). Right now I am doing this only for the sieving stage since this is one of the more time consuming tasks. My second goal is to make the sieving program able to run on big clusters using MPI and improving the sieving code with the help of the bucket sorting techniques which I encountered here and are currently also being used in for instance msieve. I have some other smaller ideas which could improve the code but right know these are the main three things I would like to add to the software package.

Of course I will (eventually) also be working on the other stages, but first things first. For now I will concentrate mainly on the sieving stage. I hope the people here have some great ideas I didn't think of or complaints about this piece of software. I have read a lot about the GNFS but only since a few months I have been able to play with this piece of software. So a lot of people here are certainly more experienced with this software package and could share this experience.

I hope this discussion will results in even a faster NFS program,

kind regards,

Joppe Bos
Excellent news!

I've been using the CWI package for many years now and have a source code license in myself. Herman will vouch for me if you ask him...

Please contact me directly by email (paul AT leyland.vispa.com) and I'll discuss several possible enhancements. I may be able to help with some of the work, though I am extremely busy at the moment and can not make any firm promises.

Strictly speaking, I think the CWI arithmetic package is not freeLIP but developed from a precursor to freeLIP itself. Independent of that, moving to GMP would almost certainly improve the performance significantly but PLEASE read the license terms of GMP very carefully. So far, CWI have not released their NFS suite under a license which is compatible with the LPGP that GMP uses.

Just to get things started: I believe we need factobase primes to at least 2^32 and large primes to at least 2^40 in the near future (to support a kilobit SNFS for example. The software should still run well on 32-bit machines. The post-processing tools need rewriting to use much less memory, especially in the duplicate and singleton removal algorithms whe filtering (Peter Montgomery, Don Leclair and myself all have code which addresses parts of the problem). Likewise, the post-processing tools need to be able to work with very large datasets on architectures which do not support very large files. I have code that addresses this problem. There are other things we can discuss, but that will do for a start.


Paul
xilman is offline   Reply With Quote
Old 2005-06-30, 19:59   #3
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

25·7·11 Posts
Default

Excellent news indeed!

If you discuss possible enhancements by private mail, may I ask you to post summaries of your ideas occasionally? This would be of great interest for me (and undoubtedly others) to follow.

A few things I noticed that should be easily fixed but can be quite annoying for the user are these (some are very similar to the ones Paul mentioned):

- add large file support on 32 bit architectures. Right now filter aborts when files get >2GB (I think piping through gzip may avoid the problem, but not sure about it and it's pretty slow) (are there any systems left in popular use that do not support large files at all?)

- allow doing duplicate removal and singleton pruning completely separately. Right now the maxrelsinp parameter to filter specifies up to how many hash table entries should be used for duplicate removal, so setting that parameter to a low value helps conserve memory. Unfortunately the parameter doubles as the initial value for the size of the table of ideals, so setting it too low will make filter spend a lot of time increasing that table (it dumps the old contents to disk and re-reads). I think either mergelevel 1 should not do duplicate removal at all, or there should be a -nodupe parameter.

- Herman once mentioned to me that the binary file format would get retired. That would be a shame, imo, as it allows for much faster file I/O. The format will need to be adapted to allow for ideal norms >2^32, though.

- Now that dual-core cpus appear to become commonplace, having gauss support dual (maybe quad) cpu machine without much hassle would be nice.

- sqrt should learn to deal with ideal norms >1e9 on 32 bit machines. Even increasing the limit to 2^32 or 2^31 would make a huge difference for everyday NFS work, i.e. factorisations for the Cunningham project.

- The siever could benefit from some modest architecture specific optimisation. Especially AMD64, which appears to be quickly becoming the workhorse of choice for computational number theory, should offer much room for improvement, for example by using the 64x64->128 bit product.

This is all I can think of off the top of my head, when other ideas occur to me, I'll post more wish lists.

Alex

Last fiddled with by akruppa on 2005-06-30 at 20:01
akruppa is offline   Reply With Quote
Old 2005-06-30, 20:05   #4
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

34·53 Posts
Default

Quote:
Originally Posted by akruppa
Excellent news indeed!

If you discuss possible enhancements by private mail, may I ask you to post summaries of your ideas occasionally? This would be of great interest for me (and undoubtedly others) to follow.
Of course. My suggestion for private email was mostly so we could discuss things which at present can only be discussed by holders of source code licenses.

Your point about Gauss (really Lanczos hiding behind a pseudonym) is a very good one. Personally, I would like to see the full MPI version released.

Paul

P.S. Apologies for typos, etc, in my posts today. I'm seriously short on sleep.
xilman is offline   Reply With Quote
Old 2005-06-30, 20:30   #5
Wacky
 
Wacky's Avatar
 
Jun 2003
The Texas Hill Country

20728 Posts
Default

Welcome to the group.

Having used the CWI suite for NFSNet, I welcome any improvements that you might make. Of course, everyone wants things that make it run faster. If gmp, or some other MP arithmetic package will help, by all means consider it. However, I would strongly recommend that you try to be implementation agnostic wherever possible and isolate the parts that are not.

I would hope that, even if you don't actually implement it, you can start to accommodate larger primes in the factor base and factors. When we eventually do a kilo-bit factorization, I think that we are going to need to exceed the present upper limits. Raising the limits on 64 bit machines should not be difficult, at least conceptually. Although these 64 bit machines are becoming more readily available, there are still significantly more 32 bit machines around. I think that it will be a long time before we want to "write off" the ability of them to contribute usefully to the sieving stage.

I also have a number of suggestions with respect to portability and inter-operability, or rather the lack thereof. I will send you more information on our experiences via e-mail.

As for using MPI on big clusters, improvements in the sieving will be much less significant than they would be in other parts of the suite. Since the sieving is extremely loosely coupled, big long-running batch jobs are generally quite adequate. One suggestion I would make is modularization. Separate the sieving engine from the control parts rather than having them woven together like spaghetti.

I look forward to further discussions.

Richard
Wacky is offline   Reply With Quote
Old 2005-06-30, 20:49   #6
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3×1,163 Posts
Default

Quote:
Originally Posted by Istari
My second goal is to make the sieving program able to run on big clusters using MPI and improving the sieving code with the help of the bucket sorting techniques which I encountered here and are currently also being used in for instance msieve. I have some other smaller ideas which could improve the code but right know these are the main three things I would like to add to the software package.
I'm a bit surprised that nobody has suggested adding a lattice siever to the CWI suite; IMO this could speed up sieving more than any amount of optimization applied to the current siever. Of course it's a big job, but you did ask

jasonp
jasonp is offline   Reply With Quote
Old 2005-06-30, 21:21   #7
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

25×7×11 Posts
Default

In fact the siever already contains some code that aims at turning it into a lattice siever eventually. I.e. you can specify vectors for the transform of the sieving space, but you have to do it manually at present. I once played with them a little but ran into a bug (the siever exited with an error) which I reported to Peter Montgomery. He ackowledged the bug but I don't know what became of the issue.

So, some preparations for lattice sieving have been made already. This would definitely be worthwhile to pursue.

Alex
akruppa is offline   Reply With Quote
Old 2005-07-01, 13:16   #8
Istari
 
Feb 2005

416 Posts
Default

Wow! It is great to see so many replies in such a short time. I was posting my request at this point in time because the summer vacation is right around the corner and I am going on a vacation and was planning to investigate a lot of the ideas presented here (since I will bring my laptop with me).
I will make a large list of all the things that could be implemented or changed, later I will give some sort of priority number to every point on the list and hopefully have enough time to work down the whole list (and simultaneously do some research to speed up certain processes in the NFS).

Currently I only took a close look to the polynomial selection and the sieving stage (and am still reading the code), so improvements to other stages are more than welcome but will be incorporated in a later stadium. I will contact the persons here with a source code license so we can discuss some things into detail, my mail address is jwbos <at> science <dot> uva <dot> nl.
But first I will respond at some suggestions and ideas.

Quote:
Originally Posted by xilman
Strictly speaking, I think the CWI arithmetic package is not freeLIP but developed from a precursor to freeLIP itself. Independent of that, moving to GMP would almost certainly improve the performance significantly but PLEASE read the license terms of GMP very carefully. So far, CWI have not released their NFS suite under a license which is compatible with the LPGP that GMP uses.
I already found some non-freeLIP code but since it is developed from a precursor this explains everything . I have read the LGPL very carefully and don't think this is a problem for the current license of the CWI. From section 6 in the LGPL

Quote:
Originally Posted by LGPL
6. As an exception to the Sections above, you may also combine or link a "work that uses the Library" with the Library to produce a work containing portions of the Library, and distribute that work under terms of your choice, provided that the terms permit modification of the work for the customer's own use and reverse engineering for debugging such modifications.
You must give prominent notice with each copy of the work that the Library is used in it and that the Library and its use are covered by this License. You must supply a copy of this License. If the work during execution displays copyright notices, you must include the copyright notice for the Library among them, as well as a reference directing the user to the copy of this License. Also, you must do one of these things:
Now five choices are being presented and I think choice B is the way to go with the NFS:
Quote:
Originally Posted by LGPL
b) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (1) uses at run time a copy of the library already present on the user's computer system, rather than copying library functions into the executable, and (2) will operate properly with a modified version of the library, if the user installs one, as long as the modified version is interface-compatible with the version that the work was made with.
This just says using a shared library mechanism is enough. So when running the binary the user must have GMP installed on his machine or else the program won't run.

Quote:
Originally Posted by xilman
I believe we need factobase primes to at least 2^32 and large primes to at least 2^40 in the near future (to support a kilobit SNFS for example).
I will put this on top of my ToDo list, I haven't looked into this detail much yet but this should not be a very big problem (at least when it is implemented how I think it is currently implemented).

Quote:
Originally Posted by akruppa
The siever could benefit from some modest architecture specific optimisation. Especially AMD64, which appears to be quickly becoming the workhorse of choice for computational number theory, should offer much room for improvement, for example by using the 64x64->128 bit product.
64-bit support was indeed one of the smaller ideas I had in mind, I think it is nothing but very logical to implement this support.

About the Lanczos algorithm, the current version already has multi-processor support. I haven't looked into this part of the code yet but as far as I know this isn't done with MPI and converting the code to MPI would make it way more scalable to for instance big clusters.

Quote:
Originally Posted by Wacky
As for using MPI on big clusters, improvements in the sieving will be much less significant than they would be in other parts of the suite. Since the sieving is extremely loosely coupled, big long-running batch jobs are generally quite adequate. One suggestion I would make is modularization. Separate the sieving engine from the control parts rather than having them woven together like spaghetti.
You are right about the fact that using MPI instead of using long-running batch jobs would have a marginal effect on the performance. But I think a program using MPI has a better design than a program which needs to be called by all kind of scripts in order to run, I am still in doubt if all the effort needed to add MPI support it worth it. But I think eventually I will implement it because this has a close relation with my study Grid Computing.
Your second suggestion is also a very good one! The program would have a much better design if all the 'real' sieving code could be totally separated from the rest of the code in the sieving stage.

As akruppa already mentioned there actually is lattice sieve support in the program. I have only experience with the normal line siever but will (of course) investigate the (dis)advantages of this sieving method. I am particularly interested in this bug, I also don't know if it has been fixed but if I know what caused it I could take a closer look at it.

Again thanks for all the fast replies, I will take a closer look at all of the suggested ideas in the summer vacation since right now I am quite bussy with the last things I have to finish for my study. Sorry for this rather large reply.

Kind regards, Joppe
Istari is offline   Reply With Quote
Old 2005-07-01, 14:06   #9
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

26·113 Posts
Thumbs up

Quote:
Originally Posted by akruppa
Excellent news indeed!

If you discuss possible enhancements by private mail, may I ask you to post summaries of your ideas occasionally? This would be of great interest for me (and undoubtedly others) to follow.
One thing that would *greatly* enhance the CWI suite would be to move from a
line siever to a lattice siever.

When I re-worked my line siever into a lattice siever I got (roughly) a factor
of 5 speed improvement. I'd like to improve it further, but I have a real job
that requires my time. Weekends are spent being a single parent.....

If the OP likes, I can provide a source copy of my lattice siever.
It is much more readable than the Franke siever. No disrespect intented.
Franke et.al. did a good job on their siever, but the coding style and
lack of comments makes in unreadable.
R.D. Silverman is offline   Reply With Quote
Old 2005-07-01, 14:30   #10
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

26·113 Posts
Thumbs up

Quote:
Originally Posted by R.D. Silverman
One thing that would *greatly* enhance the CWI suite would be to move from a
line siever to a lattice siever.

<snip>
I include here some timing information from my lattice siever. The data
is from 2,791+, which is in progress. I am using factor bases of 1.5 million
primes for each polynomial, and the lattice being sieved is 6K x 12K.
The sieving is being done on a 1.6GHz Pentium 4 Laptop.

qindex is an index into the factor base for which special q is being used.
lattice is the reduced lattice for this special q.

I first sieve from (0 to 6K) (0 to 6K), then sieve from (0 to -6K) (0 to 6K)
The "after scan: lines show how many *potential* successes there are
that survived the algebraic sieve, first for the top half plane, then for the
bottom half.

num_total_success is the number of *potential* successes surviving both
the algebraic and integer sieve. Many (most) are false hits with a too large
prime cofactor. A lot are also successes where the (c,d) grid coordinates
are not coprime.

Siever built on Feb 11 2005 12:59:58
Clock rate = 1600000.000000
We try to factor: 538979174260030195774806654998460878107494491770558596271026789309120102762361871555156107665792312371183237710356585532466537782307442433670499269977421434908027478823657788916992881425413513
qindex, special_q, root, log, = 226440 3144173 896787 33
lattice = (101,2321), (-1342,291)
after scan, num_alg_success = 267026
num_total_success = 10115
after scan, num_alg_success = 257476
num_total_success = 9885
Int LP bad split: 2
Alg LP bad split: 89
Int cofactor prime: 724
Alg cofactor prime: 8136
Int big cofactor: 20
Alg big cofactor: 5531
Total algebraic hits: 257476
Total combined hits: 20000
Not coprime: 5069
Relations found: 97
==========================================
Total time to process 226440 : 30.290281
Total sieve time = 14.508987
Int line: 0.302529
Alg line: 0.297463
Int vector: 6.974603
Alg vector: 6.934392
Total resieve time = 9.550174
Int resieve: 4.791307
Alg resieve: 4.758867
Trial Int time: 0.097341
Trial Alg time: 1.854874
Find startpts time: 3.034763
Alg scan time: 0.161952
Lattice reduce time: 1.393615
QS/Squfof time: 1.068258
Prepare regions time: 2.366303
Inverse time = 1.141454
Prime Test time = 0.303485

I line sieve the smallest primes. Resieve time refers to the time to do
the resieving during factoring the potential candidates.

Find startpts shows the time to compute the sieve start points for
all 3 million primes for this special q.

As one can see, 80% of the run time is spent sieving. 4.6% is spent
doing lattice reductions on the factor base primes, and 7.6% is spent
preparing the sub-lattice sieve region boundaries. Less than 8% is spent
doing 64 bit arithmetic. I could cut the time to prepare the sieve
boundaries in half by storing, then reusing some computations, but it
would *greatly* increase memory requirements.

Even if we could cut the 64-bit arithmetic time to ZERO, it would not
speed the code by very much. Thus, I have doubts about whether
64 x 64 bit arithmetic on the (say) Opteron would help very much.

Comments are solicited.

I will provide the source to those who ask via email.
R.D. Silverman is offline   Reply With Quote
Old 2005-07-01, 16:02   #11
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

9A016 Posts
Default

But faster modular arithmetic will change the breaks of how much sieving vs. factoring of residuals should be done. To take the point to extremes, if we could do 64 bit arithmetic at ZERO time, we wouldn't sieve at all, because trial division would provide an arbitrary number of smooth relations in zero time.
How much sieving work can we save (by smaller factor base or smaller sieve region) if we can handle larger large primes inexpensively?

Alex
akruppa is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
PRPnet Script Improvements brilong Software 1 2014-05-15 23:47
Possible algorithm/software improvements? diep Miscellaneous Math 9 2009-03-08 16:53
Free C++ package and looking for a bit of help. Uncwilly Programming 9 2005-03-04 13:37
Any code improvements? db597 Software 7 2004-09-09 17:05
Improvements possible for assignment procedure and double checking???? Erasmus PrimeNet 10 2004-02-19 12:09

All times are UTC. The time now is 07:22.

Tue Oct 27 07:22:28 UTC 2020 up 47 days, 4:33, 0 users, load averages: 1.32, 1.38, 1.49

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.