mersenneforum.org General Questions
 Register FAQ Search Today's Posts Mark Forums Read

2013-02-23, 11:15   #1
pastcow

Feb 2013

710 Posts
General Questions

Hi,

If any of you have some spare time it would be great if you could answer some of my beginner questions :)

I've been looking into factoring for the last few days - I'm hoping to eventually build a rig to factor C15x. The way I understand it at the moment is that there are a few steps to doing this. The first is using msieve to generate some "good" polynomials.

So to start with I split up msieve and made it distributed so I can run it in Amazon EC2 - this seems to work well and I get loads of results.

How do I determine what is the best polynomial? e.g. I've read I have to look at the "e" value but are these ok for a C15x?

Quote:
 # norm 5.550210e-15 alpha -8.037245 e 2.745e-12 rroots 3 skew: 7966027.18 c0: -1093047428917558581339935547562703702661 c1: 131925493694074760219251593532155 c2: 138836323623084375165467015 c3: -24962492672785352963 c4: -2822482971114 c5: 129168 Y0: -614841046671961074280414623092 Y1: 104974279042196101 # norm 5.205393e-15 alpha -7.492211 e 2.629e-12 rroots 5 skew: 6278276.80 c0: -213894779492634144131491722169682354111 c1: 265623048828533468540109249087755 c2: 89972557414556913289650545 c3: -31074467407315923443 c4: -2448005563914 c5: 129168 Y0: -614840985804724857243849380262 Y1: 104974279042196101
I've also been doing some heavy Googling but my biggest question is: What is the polynomial selection actually doing? (i'm not a mathematician) - how is this used to create the matrix used in the next step (ggnfs) and how does the polynomial selection affect the next stage of processing?

Thanks

P.

 2013-02-23, 13:52 #2 jasonp Tribal Bullet     Oct 2004 DD916 Posts Those polynomials are pretty good, but you can probably find polynomials that make the sieving run 20% faster if you spent a bit longer looking. Of course if you have EC2 instances waiting to sieve, and the sieving will take at most a few days, then it's better to take what you have and get started. Beyond an ill-defined point, the time you save in sieving a better polynomial will be wasted in the search for that better polynomial. You want the polynomial that sieves the fastest, and that's loosely correlated to the polynomial with the largest E-value. In practice you have to 'test sieve', i.e. run the sieving tools for maybe 1/1000 of the complete job, and find which of your top 2-5 polynomials generates the highest relations-per-second rate with all the other sieving parameters held constant. Folks here have a lot of practice doing that, I don't. Again, with EC2 instances waiting it's less important to find the absolute best polynomial, because the job will be so short anyway. Explaining what exactly the number field sieve does is tricky, the references in the Msieve readme do a much better job than a forum post will. Each relation from the sieving is a congruence a == b, and we multiply millions of those congruences together so that the product of the a's and the b's are squares, but the square root of each product is different modulo the number N to be factored. That they are different means that gcd(sqrt(product(a))+-sqrt(product(b)), N) is a factor of N about half the time. The quadratic sieve is easier to understand, because each relation has a and b as integers at all times. The number field sieve is both much more powerful and much more difficult to understand, because the 'b' part of each relation is not a number but a 'thing' in an algebraic number field. At the end of the algorithm, before the gcd, we apply a 'trap door' that converts the square root of the product of all the b's from thing-world into a number, and that number is a usable square root. This is more powerful because the size of each 'thing' is smaller and easier to deal with on average, compared to working in numbers all the time. The polynomial you choose determines the number field, and the shape of the trap door, and the size of the things in thing-world. The sieving searches through the polynomials, evaluated at x = c/d for lots of different c and d, looking for polynomial values that factor into small primes. Those become the relations that are raw material for the square roots, and the matrix that is built helps determine which relations participate in the product of the a' and b's. Last fiddled with by jasonp on 2013-02-23 at 14:22 Reason: lots of corrections
 2013-02-23, 17:01 #3 debrouxl     Sep 2009 2×3×163 Posts For 512-bit semi-primes, I've already seen polynomials with E values above 3.2e-12, so you would indeed get a better polynomial by having your CPUs (or preferably, GPUs, which dwarf CPUs in throughput on that task). Maybe you're trying to factor a C15x with x a bit higher than 4 or 5, in which case E ~2.7e-12 is good. With recent processors, one can sieve a 512-bit semi-prime on 32 cores in less than 4 days wall clock time, and post-process it in less than half a day.
 2013-02-23, 17:32 #4 chris2be8     Sep 2009 34·29 Posts I suggest you go to http://gilchrist.ca/jeff/factoring/n...ers_guide.html and follow the instructions to factor a 100 digit number. Seeing the whole process through with an easy case will make it easier to understand. You may want to try 110, 120, 130 and 140 digit numbers before starting on 15x digit numbers. Chris (still a relative beginner)
 2013-02-24, 05:47 #5 pastcow     Feb 2013 7 Posts Thanks for all the replies, they've made things a lot clearer :) I managed to find E ~3.06e-12 for a C159 using mseive - I converted it to a .poly file for use with gnfs. I'm now working on a script to get gnfs distributed over EC2 using spot instances. As it stands, i'm looking at ~16 days on a CoreI7 (8 cores) for the sieving - hopefully once I have it working on EC2 I can bring this down to ~1 hour for under $500 :D  2013-02-24, 06:18 #6 Dubslow Basketry That Evening! "Bunslow the Bold" Jun 2011 40  2013-02-26, 08:38 #7 pastcow Feb 2013 78 Posts I've used 60 Amazon "Spot" instances (32 core Xeons) - a total of 1920 cores - this bought the second stage down to 2 hours and a total cost of$40. :)
 2013-02-26, 09:47 #8 Batalov     "Serge" Mar 2008 Phi(4,2^7658614+1)/2 23·32·137 Posts Can you take the matrix step down to 2 hours, though?
 2013-02-26, 18:21 #9 debrouxl     Sep 2009 17228 Posts Heh, what kind of software setup are you using to distribute jobs on so many cores ?
 2013-02-27, 00:01 #10 pastcow     Feb 2013 7 Posts Im using 2 scripts I wrote, 1 sits on a sever (PHP) and distributes work via an API that a client (Python) requests. I've setup the spot instances so as soon as they boot they request work. The work is basically a set of arbitrary commands that the client runs (32 cores means 32 separate clients) - work is then sent back to the server and it requests more work. Its pretty much just a version of the factmsieve.py script that runs over HTTP - crude but effective. Costs for the EC2 spot instances (2x16 core xeons) are around 37 us/cents per hour which is quite reasonable. I'm sending the work out in 10 minute chunks (in case the spot instance is terminated - however, this didn't happen at all during the processing) - I've not looked at distributing the later stages of it yet, i'll probably look at that tonight. Once I've got a bunch of scripts that are relatively stable I'll stick them up on github.
2013-02-27, 07:01   #11
debrouxl

Sep 2009

17228 Posts

Quote:
 Im using 2 scripts I wrote, 1 sits on a sever (PHP) and distributes work via an API that a client (Python) requests. I've setup the spot instances so as soon as they boot they request work. The work is basically a set of arbitrary commands that the client runs (32 cores means 32 separate clients) - work is then sent back to the server and it requests more work. Its pretty much just a version of the factmsieve.py script that runs over HTTP - crude but effective.
Interesting
Several months ago, I had precisely started making a similar, simple HTTP-based reservation system for distributing the factorization of a 512-bit semi-prime more easily then setting up a local instance of RSALS-derived BOINC (NFS@Home, Tom Ritter's cloud-and-control) and attaching clients to it.
I had the first version of a server (Perl) mostly done: it could handle reservations, submissions (files not uploaded to the server, for the first version) and it kept a log that it used for correctly resuming distribution after the last distributed WU.
I never finished the client, because it was, in the end, easy enough to make sh scripts that called gnfs-lasieve4I14e directly and shuffle the files around through scp / sftp. I had "only" 40-42 cores on 8 hosts or so.

Quote:
 Once I've got a bunch of scripts that are relatively stable I'll stick them up on github.
Great
You'll be doing everyone a favor.

 Similar Threads Thread Thread Starter Forum Replies Last Post jwaltos Homework Help 34 2017-09-21 14:12 R.D. Silverman NFSNET Discussion 4 2007-07-19 18:43 pacionet Miscellaneous Math 15 2005-12-08 08:00 jerico2day Software 5 2005-03-30 09:19 Val 15k Search 10 2004-03-13 20:56

All times are UTC. The time now is 14:18.

Thu Jul 7 14:18:09 UTC 2022 up 9:05, 0 users, load averages: 1.10, 1.24, 1.30