mersenneforum.org Factored my first SemiPrime -Some Questions
 Register FAQ Search Today's Posts Mark Forums Read

 2010-12-01, 21:57 #1 tal   Nov 2010 32 Posts Factored my first SemiPrime -Some Questions Hey all - I have some beginners questions about the use of msieve in conjunction with ggnfs, hopefully I put this in (mostly) the right place. The idea of factoring some large numbers appealed to me, but I know it's a bit more complicated than just running a script - so I've been reading up on it a lot lately. I followed the guide and successfully factored RSA-100, which was really exciting - but I have some questions so I actually understand what I was doing besides just doing the magical incantations. I realize some of this process might be overkill when talking about a 100-digit semiprime, but I want to get the process right for something small, so when I move on to larger numbers I can understand it all. For the record, I'm running on a Linux box, 2GB RAM, Dual-Core Athlon64 4200+ with Hyperthreading enabled, in 32 bit (x86) - yes that's not a typo. When I installed the OS I made the mistake of not doing x64, and rebuilding the toolchain in gentoo is impossible, so I will probably wipe and reinstall soon, but for now it's 32 bit. I intend to do benchmarking to compare with and without Hyperthreading, x32 and x64, how optimal and suboptimal L1_BITS affects it... I have two GeForce 480's in my desktop (running Win7) to do CUDA testing on, and I have some more boxes I can borrow or turn on to do distributed sieving. Part 1: Polynomial Selection Selecting a good polynomial reduces the time spent sieving, and there are tests to determine what a 'good' polynomial is - the one I see mentioned a lot being the Murphy test. And I saw people comparing polynomials using a Murphy Score, which was of the form 2.64e-12... here are some selections from rsa100.dat.p: ... # norm 7.551775e-14 alpha -4.695883 e 1.040e-08 rroots 2 skew: 1607312.30 c0: -15717476569555150190764775856 c1: 9064072593136772886192 c2: 29855101586021 c3: 1661649458 c4: 7320 Y0: -675330080117467742399765 Y1: 80844668967923 # norm 5.777448e-14 alpha -4.550075 e 9.261e-09 rroots 0 skew: 1995747.27 c0: 37257076699243727116282745836 c1: -3080766778179746568716 c2: -21401399698772505 c3: 1237242768 c4: 8316 Y0: -654133332715871134713393 Y1: 77694547530973 ...I recognize c0-c4 are the coefficients used in the polynomial, and I assume that Y0 and Y1 are more polynomial search parameters that are way over my head. But what are norm, e, alpha, and rroots (real roots?)? Is e the Murphy Score? It goes up and down over the course of the search - I thought it would continually get better. (Why keep track of polynomials that are worse than the one you have?) Here is the log for rsa-100.dat.p Later on, I'd like to run multiple polynomial selections in parallel over different ranges and choose the best. How can I objectively compare two polynomials to see which is 'better'? Is it only worth comparing Murphy Scores? Part 2: Sieving After polynomial selection ran for about an hour (longer than the .35 hours it promised) I killed it, and ran the script to start sieving. It estimated about 4M relations needed, and kicked off four jobs at once to compute them. It started q at q = 900000. I understand Q is the range over which we are searching for relations, and the most distrubutable part of the operation, and it seems that you find more relations at lower Q values - but I have a few academic questions. Why did it start Q at 900K? Similarly in over in this thread Q is started at 20M, and here, Q=30M. How do you determine a good Q-start-value? Related, I have no idea what the -M 1 parameter means. I know -M 0 means start on the "algebraic side of mpqs treatment" and -M 1 means the rational side - but I haven't the faintest what that _actually_ means. I also don't understand the difference between the gnfs-lasieve4IXXe binaries. 11 through 16, they seem to be suited for different jobs. I've seen one thread talk about using 15e, another 16e. Looking at the code of ./factmsieve.py, it seems each is for a different range of digits in the composite (when working with the gnfs, I'm not too interested in the sfns): - <95 = lasieve4I11e - 96-110 = lasieve4I12e - 111-140 = lasieve4I13e - 141-158 = lasieve4I14e - 158-185 = lasieve4I15e - >185 = lasieve4I16e If that is correct, what is the (approximate) difference between them? I suppose as far as telling when it is better to take a step up (or down) - it would just do test sieving and see which performs faster? So I sieved and I sieved and it didn't take very long, by the time Q got to 1.3M, I had 4778012 relations, 116.7% of the estimated minimum (4095000). Time to move on. Part 3: Linear Algebra Once upon a time I took a computer vision course, things like edge detection and pattern recognition, and movement prediction - without taking linear algebra. I'm proud (or ashamed) to say that after me they made Linear Algebra a prequisite for the course. So I'm not too hot at it. Tue Nov 30 15:43:12 2010 -> ./msieve -s ./rsa100.dat -l ./rsa100.log -i ./rsa100.ini -nf ./rsa100.fb -t 4 -nc1 Tue Nov 30 15:47:24 2010 LatSieveTime: 2859.37 Tue Nov 30 15:47:24 2010 -> Running matrix solving step ... Tue Nov 30 15:47:24 2010 -> ./msieve -s ./rsa100.dat -l ./rsa100.log -i ./rsa100.ini -nf ./rsa100.fb -t 4 -nc2 Tue Nov 30 15:55:16 2010 -> Running square root step ... Tue Nov 30 15:55:16 2010 -> ./msieve -s ./rsa100.dat -l ./rsa100.log -i ./rsa100.ini -nf ./rsa100.fb -t 4 -nc3 Tue Nov 30 15:57:30 2010 -> Computing 1.29115e+09 scale for this machine... Tue Nov 30 15:57:30 2010 -> procrels -speedtest> PIPE Tue Nov 30 15:57:46 2010 -> Factorization summary written to g100-rsa100.txt Step -nc1 has to do with checking the relations. Can this be done incrementally as sieve jobs complete? Or must it be run on the entire sieve results? Can it be run, and the results discarded, to check the validity of a sieve job, as a sanity check? If possible, can you explain what it's actually *doing*? Anway, I was so excited when I saw the news - my primes were ready to come out of the oven. $cat g100-rsa100.txt Number: rsa100 N = 1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139 (100 digits) Divisors found: Version: Msieve-1.40 Total time: 2.93 hours. Factorization parameters were as follows: n: 1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139 Y0: -871428731799383793275219 Y1: 81811708972999 c0: 22384488491775383738875545270 c1: -8624944207655121457719 c2: -16162116011420596 c3: 3616907164 c4: 2640 skew: 2095497.31 type: gnfs Factor base limits: 1800000/1800000 Large primes per side: 3 Large prime bits: 26/26 Sieved algebraic special-q in [0, 0) Total raw relations: 4778012 Relations: Pruned matrix : Polynomial selection time: 0.00 hours. Total sieving time: 2.93 hours. Total relation processing time: 0.00 hours. Matrix solve time: 0.00 hours. time per square root: 0.00 hours. Prototype def-par.txt line would be: gnfs,99,4,58,1500,0.003,0.4,220,15,10000,2000,1800000,1800000,26,26,48,48,2.5,2.5,100000 total time: 2.93 hours. I have no primes, no primes for me =( I had no idea what went wrong. But I typed up nearly this whole post, and finally found them!$ tail rsa100.log.mpi00 -n 3 Tue Nov 30 15:57:30 2010 prp50 factor: 37975227936943673922808872755445627854565536638199 Tue Nov 30 15:57:30 2010 prp50 factor: 40094690950920881030683735292761468389214899724061 Tue Nov 30 15:57:30 2010 elapsed time 00:02:14 My primes! I haven't the faintest why the python script failed to give the result of all my (okay not really 'my') hard work, but I found them. So I have a lot of academic questions wrapped up in this narrative (I tried to make them stand out), but I also wanted to thank everyone (especially jasonp) profusely for all the hard work they've put in to making these tools - it's really exciting.
 2010-12-02, 02:18 #2 jasonp Tribal Bullet     Oct 2004 2×52×71 Posts For the poly selection, Y0 and Y1 are the coefficients of the rational polynomial. The other numbers attempt to measure how good the polynomial will be if you sieved with it; the Murphy E score is the most accurate, and the most expensive to compute. It was basically invented so that two polynomials could be directly compared by their E values. The reason the code prints out large numbers of polynomials is that the E value is not a perfect measure of the potential in a polynomial, and for larger problems there will be many polynomials that have very similar E values. So for the largest problems you have to choose the best polynomial via test sieving, and the E score comparison is there to reduce the amount of test sieving you have to do. I agree that it's more helpful to print out, say, the top 10 polynomials, but you also don't want the code to print nothing for weeks until it's done. Right now any polynomial whose E value exceeds a given (fairly stringent) bound is printed. The linear algebra is divided into two halves; -nc1 does the first and -nc2 does the second. Both require all the relations to be available at once. The first half is the 'filtering' phase, which throws away relations that will not help complete the factorization, and combines the rest into small groups that make the resulting matrix much smaller. The actual linear algebra then selects a very specific collection of (about half of) those groups that allow the factorization to proceed. I'm afraid the details here are complicated. Congrats on getting started!
 2010-12-02, 03:21 #3 R.D. Silverman     "Bob Silverman" Nov 2003 North of Boston 23×937 Posts Full (very long) quote removed Please only quote what you are replying to Would you like references? If you really want to learn about the algorithm, I can provide them. Otherwise, it will probably remain just a black box. And I would be happy to answer any mathematical questions you might have about the algorithm. And you should start by learning about QS and how it works. Trying to understand NFS before studying QS will be nigh to impossible unless you have a very strong math background. (advanced undergrad with at least one course in algebraic number theory or abstract algebra; do you know what a prime ideal is?) An excellent general discussion can be found in the Crandall & Pomerance book: Prime Numbers; A Computational Perspective. It will also give the number theory background material that is needed. Last fiddled with by smh on 2010-12-16 at 20:38 Reason: eliminate some redundancy
2010-12-02, 03:27   #4
CRGreathouse

Aug 2006

5,987 Posts

tal -- For what it's worth, Silverman here is a great resource on this subject, but he doesn't like beginner questions. :) He's a real expert on the NFS and quite generous with his time.

Quote:
 Originally Posted by R.D. Silverman An excellent general discussion can be found in the Crandall & Pomerance book: Prime Numbers; A Computational Perspective. It will also give the number theory background material that is needed.
Yes, this is an excellent introduction to the subject.

Quote:
 Originally Posted by R.D. Silverman And you should start by learning about QS and how it works. Trying to understand NFS before studying QS will be nigh to impossible unless you have a very strong math background. (advanced undergrad with at least one course in algebraic number theory or abstract algebra; do you know what a prime ideal is?)
I'd call it impossible, frankly, without either knowing the QS or having at least that much background. Even with that background I'd certainly learn the QS first.

Last fiddled with by CRGreathouse on 2010-12-02 at 03:28

2010-12-02, 03:41   #5
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

23·937 Posts

Quote:
 Originally Posted by CRGreathouse tal -- For what it's worth, Silverman here is a great resource on this subject, but he doesn't like beginner questions. :) .
On the contrary. I respond quite well to questions. What I don't like
is beginner assertions and presumptions. I also openly admit to intolerance
toward "willful ignorance": those the claim to be interested, but are unwilling
to learn.

2010-12-02, 05:18   #6
CRGreathouse

Aug 2006

5,987 Posts

Quote:
 Originally Posted by R.D. Silverman On the contrary. I respond quite well to questions. What I don't like is beginner assertions and presumptions. I also openly admit to intolerance toward "willful ignorance": those the claim to be interested, but are unwilling to learn.
I think you respond extremely well to questions from those who have appropriate background and who have done appropriate preparation. You have in the past faired much less well with questions from beginners (though, I grant, not uniformly poorly). My comment was meant only as guidance to the OP: learn your material, then when you have a decent understanding come to you with questions.

2010-12-02, 14:26   #7

May 2008
Worcester, United Kingdom

72·11 Posts

Hi Tal,

With reference to your 'missing primes' when using my factmsieve.py script, I have had bug reports indicating that the script sometimes fails to run the square root step for no apparent reason. If the final summary file is deleted and the script is then run again, it then completes the factorisation. I have not been able to debug this as I have not seen the behaviour.

But I have now spent some time trying to find what might cause this and I think that I have may have found the issue. It seems to be caused by some Python versions becoming confused about indentation when a script contains mixed tab/space indents.

I hope the attached version (v0.76) corrects this problem. This file now uses only spaces (no tabs) and is a Windows convention file with CRLF line endings (if it is run on *nix, it _might_ need to be converted for *nix line endings).

I have also made a few other minor changes to improve the scripts termination behaviour. Please report any issues here.

Brian
Attached Files
 factmsieve.76.zip (19.1 KB, 225 views)

 2010-12-02, 14:51 #8 tal   Nov 2010 32 Posts R.D. - I certainly don't want to waste your time or patience, so I'll treat it more as a black box for now, brush up on my background and read up about the Quadratic Sieve for now. I think I understand enough right now to go through factoring larger numbers, and benchmarking each step should give me an idea going forward if things are taking more or less time than expected, and if I should start experimenting with other options. Brian - I was running v0.74 on a Linux box, under Python 2.6.5 - I run gentoo, so everything, including python, is compiled from source. I'll take a look at the newest version and hopefully time will permit me to run through the whole process again today and see what might have gone awry.
2010-12-02, 15:15   #9
CRGreathouse

Aug 2006

176316 Posts

Quote:
 Originally Posted by tal brush up on my background

2010-12-02, 15:40   #10
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

23×937 Posts

Quote:
 Originally Posted by tal R.D. - I certainly don't want to waste your time or patience, so I'll treat it more as a black box for now,
I'm not trying to be mean. (But many people might believe otherwise).
But you can't ask any meaningful questions until you have at least minimal
understanding of how the algorithms work.

of Y0 and Y1 in the inputs will not make much sense unless you have
some understanding of the algorithm itself. Similarly for other
parametric inputs.

There are some simplified explanations that you can find for both QS
and NFS on the web. I will give you one of my favorite mantras:

What is your CS background? Have you had a course in algorithms?
You say that you have taken Linear Algebra. Have you taken any
other math courses?

2010-12-02, 16:01   #11
tal

Nov 2010

32 Posts

Quote:
 Originally Posted by R.D. Silverman Even something simple as a reply to your question about the meaning of Y0 and Y1 in the inputs will not make much sense unless you have some understanding of the algorithm itself. Similarly for other parametric inputs.
Completely reasonable.

Quote:
 Originally Posted by R.D. Silverman There are some simplified explanations that you can find for both QS and NFS on the web. I will give you one of my favorite mantras: "Google is my friend" What is your CS background? Have you had a course in algorithms? You say that you have taken Linear Algebra. Have you taken any other math courses?
I have a strong CS background - compilers, databases, esoteric C techniques, networking, data structures, and algorithms (asymptotic notation included of course) - reading, editing, or compiling the code doesn't scare me in the least. I've been reading and writing code for the past 6 years full time at both work and home. It's my math background that's lacking. I did not take linear algebra - nor much Calculus (neither Differential Equations nor Multivariable). I took a Discrete Math for Cryptography course a few years ago that I loved -I recently wanted to relearn the details of ElGamal encryption, so I spent the time relearning the basics of Group Theory and by the time I had a solid grasp on that, the actual ElGamal part was obvious.

I'm not scared to start at QS, and work my way backwards googling and reading, ingesting each building block as I come across it - it's just going to take some time.

Last fiddled with by tal on 2010-12-02 at 16:04 Reason: clarification

 Similar Threads Thread Thread Starter Forum Replies Last Post sweety439 sweety439 11 2020-09-23 01:42 danaj Programming 17 2017-09-15 16:41 aketilander Factoring 4 2012-08-08 18:09 JHansen Factoring 47 2005-06-29 17:59 xilman Factoring 23 2005-06-02 03:24

All times are UTC. The time now is 14:37.

Wed Nov 30 14:37:22 UTC 2022 up 104 days, 12:05, 1 user, load averages: 0.54, 0.95, 1.10