mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2013-02-23, 11:15   #1
pastcow
 
pastcow's Avatar
 
Feb 2013

7 Posts
Default General Questions

Hi,

If any of you have some spare time it would be great if you could answer some of my beginner questions :)

I've been looking into factoring for the last few days - I'm hoping to eventually build a rig to factor C15x. The way I understand it at the moment is that there are a few steps to doing this. The first is using msieve to generate some "good" polynomials.

So to start with I split up msieve and made it distributed so I can run it in Amazon EC2 - this seems to work well and I get loads of results.

How do I determine what is the best polynomial? e.g. I've read I have to look at the "e" value but are these ok for a C15x?

Quote:
# norm 5.550210e-15 alpha -8.037245 e 2.745e-12 rroots 3
skew: 7966027.18
c0: -1093047428917558581339935547562703702661
c1: 131925493694074760219251593532155
c2: 138836323623084375165467015
c3: -24962492672785352963
c4: -2822482971114
c5: 129168
Y0: -614841046671961074280414623092
Y1: 104974279042196101

# norm 5.205393e-15 alpha -7.492211 e 2.629e-12 rroots 5
skew: 6278276.80
c0: -213894779492634144131491722169682354111
c1: 265623048828533468540109249087755
c2: 89972557414556913289650545
c3: -31074467407315923443
c4: -2448005563914
c5: 129168
Y0: -614840985804724857243849380262
Y1: 104974279042196101
I've also been doing some heavy Googling but my biggest question is: What is the polynomial selection actually doing? (i'm not a mathematician) - how is this used to create the matrix used in the next step (ggnfs) and how does the polynomial selection affect the next stage of processing?


Thanks

P.
pastcow is offline   Reply With Quote
Old 2013-02-23, 13:52   #2
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3×1,181 Posts
Default

Those polynomials are pretty good, but you can probably find polynomials that make the sieving run 20% faster if you spent a bit longer looking. Of course if you have EC2 instances waiting to sieve, and the sieving will take at most a few days, then it's better to take what you have and get started. Beyond an ill-defined point, the time you save in sieving a better polynomial will be wasted in the search for that better polynomial.

You want the polynomial that sieves the fastest, and that's loosely correlated to the polynomial with the largest E-value. In practice you have to 'test sieve', i.e. run the sieving tools for maybe 1/1000 of the complete job, and find which of your top 2-5 polynomials generates the highest relations-per-second rate with all the other sieving parameters held constant. Folks here have a lot of practice doing that, I don't. Again, with EC2 instances waiting it's less important to find the absolute best polynomial, because the job will be so short anyway.

Explaining what exactly the number field sieve does is tricky, the references in the Msieve readme do a much better job than a forum post will. Each relation from the sieving is a congruence a == b, and we multiply millions of those congruences together so that the product of the a's and the b's are squares, but the square root of each product is different modulo the number N to be factored. That they are different means that gcd(sqrt(product(a))+-sqrt(product(b)), N) is a factor of N about half the time.

The quadratic sieve is easier to understand, because each relation has a and b as integers at all times. The number field sieve is both much more powerful and much more difficult to understand, because the 'b' part of each relation is not a number but a 'thing' in an algebraic number field. At the end of the algorithm, before the gcd, we apply a 'trap door' that converts the square root of the product of all the b's from thing-world into a number, and that number is a usable square root. This is more powerful because the size of each 'thing' is smaller and easier to deal with on average, compared to working in numbers all the time.

The polynomial you choose determines the number field, and the shape of the trap door, and the size of the things in thing-world. The sieving searches through the polynomials, evaluated at x = c/d for lots of different c and d, looking for polynomial values that factor into small primes. Those become the relations that are raw material for the square roots, and the matrix that is built helps determine which relations participate in the product of the a' and b's.

Last fiddled with by jasonp on 2013-02-23 at 14:22 Reason: lots of corrections
jasonp is offline   Reply With Quote
Old 2013-02-23, 17:01   #3
debrouxl
 
debrouxl's Avatar
 
Sep 2009

2×3×163 Posts
Default

For 512-bit semi-primes, I've already seen polynomials with E values above 3.2e-12, so you would indeed get a better polynomial by having your CPUs (or preferably, GPUs, which dwarf CPUs in throughput on that task).
Maybe you're trying to factor a C15x with x a bit higher than 4 or 5, in which case E ~2.7e-12 is good.

With recent processors, one can sieve a 512-bit semi-prime on 32 cores in less than 4 days wall clock time, and post-process it in less than half a day.
debrouxl is offline   Reply With Quote
Old 2013-02-23, 17:32   #4
chris2be8
 
chris2be8's Avatar
 
Sep 2009

42028 Posts
Default

I suggest you go to http://gilchrist.ca/jeff/factoring/n...ers_guide.html and follow the instructions to factor a 100 digit number. Seeing the whole process through with an easy case will make it easier to understand.

You may want to try 110, 120, 130 and 140 digit numbers before starting on 15x digit numbers.

Chris (still a relative beginner)
chris2be8 is offline   Reply With Quote
Old 2013-02-24, 05:47   #5
pastcow
 
pastcow's Avatar
 
Feb 2013

7 Posts
Default

Thanks for all the replies, they've made things a lot clearer :) I managed to find E ~3.06e-12 for a C159 using mseive - I converted it to a .poly file for use with gnfs. I'm now working on a script to get gnfs distributed over EC2 using spot instances. As it stands, i'm looking at ~16 days on a CoreI7 (8 cores) for the sieving - hopefully once I have it working on EC2 I can bring this down to ~1 hour for under $500 :D
pastcow is offline   Reply With Quote
Old 2013-02-24, 06:18   #6
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

722110 Posts
Default

Yeesh, that's a lot of money for a single factorization! Depending on your motives, some of us might have (or maybe will) help out with the hardware we already have...

Last fiddled with by Dubslow on 2013-02-24 at 06:19 Reason: s/in/out (first time i've made that mistake...)
Dubslow is offline   Reply With Quote
Old 2013-02-26, 08:38   #7
pastcow
 
pastcow's Avatar
 
Feb 2013

78 Posts
Default

I've used 60 Amazon "Spot" instances (32 core Xeons) - a total of 1920 cores - this bought the second stage down to 2 hours and a total cost of $40. :)
pastcow is offline   Reply With Quote
Old 2013-02-26, 09:47   #8
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

17×563 Posts
Default

Can you take the matrix step down to 2 hours, though?
Batalov is offline   Reply With Quote
Old 2013-02-26, 18:21   #9
debrouxl
 
debrouxl's Avatar
 
Sep 2009

2×3×163 Posts
Default

Heh, what kind of software setup are you using to distribute jobs on so many cores ?
debrouxl is offline   Reply With Quote
Old 2013-02-27, 00:01   #10
pastcow
 
pastcow's Avatar
 
Feb 2013

7 Posts
Default

Im using 2 scripts I wrote, 1 sits on a sever (PHP) and distributes work via an API that a client (Python) requests. I've setup the spot instances so as soon as they boot they request work. The work is basically a set of arbitrary commands that the client runs (32 cores means 32 separate clients) - work is then sent back to the server and it requests more work. Its pretty much just a version of the factmsieve.py script that runs over HTTP - crude but effective.

Costs for the EC2 spot instances (2x16 core xeons) are around 37 us/cents per hour which is quite reasonable.

I'm sending the work out in 10 minute chunks (in case the spot instance is terminated - however, this didn't happen at all during the processing) - I've not looked at distributing the later stages of it yet, i'll probably look at that tonight.

Once I've got a bunch of scripts that are relatively stable I'll stick them up on github.
pastcow is offline   Reply With Quote
Old 2013-02-27, 07:01   #11
debrouxl
 
debrouxl's Avatar
 
Sep 2009

3D216 Posts
Default

Quote:
Im using 2 scripts I wrote, 1 sits on a sever (PHP) and distributes work via an API that a client (Python) requests. I've setup the spot instances so as soon as they boot they request work. The work is basically a set of arbitrary commands that the client runs (32 cores means 32 separate clients) - work is then sent back to the server and it requests more work. Its pretty much just a version of the factmsieve.py script that runs over HTTP - crude but effective.
Interesting
Several months ago, I had precisely started making a similar, simple HTTP-based reservation system for distributing the factorization of a 512-bit semi-prime more easily then setting up a local instance of RSALS-derived BOINC (NFS@Home, Tom Ritter's cloud-and-control) and attaching clients to it.
I had the first version of a server (Perl) mostly done: it could handle reservations, submissions (files not uploaded to the server, for the first version) and it kept a log that it used for correctly resuming distribution after the last distributed WU.
I never finished the client, because it was, in the end, easy enough to make sh scripts that called gnfs-lasieve4I14e directly and shuffle the files around through scp / sftp. I had "only" 40-42 cores on 8 hosts or so.

Quote:
Once I've got a bunch of scripts that are relatively stable I'll stick them up on github.
Great
You'll be doing everyone a favor.
debrouxl is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Is this solvable in general. jwaltos Homework Help 34 2017-09-21 14:12
General Status??? R.D. Silverman NFSNET Discussion 4 2007-07-19 18:43
General formula pacionet Miscellaneous Math 15 2005-12-08 08:00
some questions of mine, in general jerico2day Software 5 2005-03-30 09:19
general Mersenne Val 15k Search 10 2004-03-13 20:56

All times are UTC. The time now is 04:06.


Sat Oct 23 04:06:57 UTC 2021 up 91 days, 22:35, 0 users, load averages: 1.79, 1.77, 1.36

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.