mersenneforum.org > EdH Comparison of Several gnfs Factoring Methods
 Register FAQ Search Today's Posts Mark Forums Read

2019-09-01, 02:49   #1
EdH

"Ed Hall"
Dec 2009

31·109 Posts
Comparison of Several gnfs Factoring Methods

I will be comparing methods of gnfs factoring in this thread. The three basic comparisons will be factmsieve.py (using msieve and ggnfs packages), CADO-NFS, and a hybrid method that uses CADO-NFS for polynomial selection and lattice sieving, and then uses msieve for the Linear Algebra and Square Root steps.

For the initial comparison, I have chosen RSA-120 and will be running the three methods on a single i7 machine with 4 cores and 2 threads/core. The machine has 16GB of RAM.

All program installations were performed in the manners I describe elsewhere in my "How I . . ." threads. The current installations may not be the very latest versions. The CADO-NFS package is using the "Improved params files. . ." c120 parameters from this post.

To start off somewhere, I will run RSA-120 with the current setup on my testing machine. I have determined elsewhere that the "improved" c120 file performed better for me on average, so I'm not going to provide a standard run using the default c120 file.

First set of tests:

factmsieve.py: I used factmsieve.py which was set up as described here.

First, I created an input number file:
Code:
echo n: 227010481295437363334259960947493668895875336466084780038173258247009162675779735389791151574049166747880487470296548479 >comp.n
Then, I ran factmsieve.py:
Code:
python ../factmsieve.py comp.n
This resulted in factors in 3:39:33:
Code:
Sat Aug 31 08:59:46 2019 -> factmsieve.py (v0.86)
. . .
Sat Aug 31 12:39:19 2019 -> Factorization summary written to g120-comp.txt

I used the basic format described in the README file to run the CADO-NFS test:
Code:
./cado-nfs.py 227010481295437363334259960947493668895875336466084780038173258247009162675779735389791151574049166747880487470296548479
The factors were returned in 3:02:33:

Code:
Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 86397.6/10953.3
Hybrid version of the above: I used CADO-NFS for polynomial selection and lattice sieving, then used msieve for the Linear Algebra and Square Root phases. This initially entailed a lot more setup than either of the other two tests, but once I created a bash script and made a conversion file** to address CADO/msieve differences, I can now reuse that script in a more timely manner.

**I had to write one, or use someone else's conversion to take the CADO-NFS poly and make a compatible msieve fb file. The source for my conversion file (poly2fb.cpp) can be found attached to this post at the bottom. The compiled name for my setup is poly2fb and is located in the ~Math/cado-nfs directory.

The final script I use with some comments (cmhybrid.sh):
Code:
 #!/bin/bash/

# This is a default number to be used if no composite is supplied on the command line.
# It is RSA-100.
comp=1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139

if [ ${#1} -gt 60 ] then comp=$1
fi

rm -r /tmp/tests

# Most of the below command should be familiar, but "tasks.filter.run=false" is used to drop out of CADO-NFS after sieving.

./cado-nfs.py $comp server.whitelist=192.168.1.0/24 server.port=11111 server.ssl=no tasks.workdir=/tmp/tests tasks.filter.run=false echo "Finished cado-nfs!" cd /tmp/tests # Wildcards are used here because, by default, CADO-NFS names files based on digit size. cat c*.upload/*.gz >comp.dat.gz cat *.poly >comp.polyT mv comp.polyT comp.poly echo "n:$comp" >comp.n
echo "N $comp" >comp.fb ~/Math/cado-nfs/poly2fb ~/Math/msieve/msieve -i comp.n -s comp.dat.gz -l compmsieve.log -nf comp.fb -t 8 -nc cat compmsieve.log | grep " factor: " echo$comp >> ~/FactorList
cat compmsieve.log | grep " factor: " >> ~/FactorList
cp compmsieve.log ~/.
echo "Elapsed time is $(($SECONDS / 60)) minutes and $(($SECONDS % 60)) seconds"
Once everything was in place, I ran the script:
Code:
bash cmhybrid.sh 22701048129543736333425996094749366889587533646608478003817325824700916267577973538979115157404916674788048747029654847
The factors were returned in 3:00:43:
Code:
Elapsed time is 180 minutes and 43 seconds
Summary: The hybrid version barely squeaked by the CADO-NFS run with the modified c120 parameters. Considering the setup involved, this clearly leans toward using CADO-NFS for gnfs jobs of RSA-120 size. However, if the default parameter c120 file is to be used, I think there would be a clear advantage to the hybrid version. I also believe that VBCurtis made some very good adjustments to the c120 parameter file for CADO-NFS.

First Set of Tests Final Timings:
factmsieve.py - - - - - - 3:39:33
CADO-NFS- - - - - - - - - 3:02:33
hybrid version - - - - - - 3:00:43

Second set of tests:

This second set of tests will duplicate the first with the exception of the chosen composite. For the second set I am using RSA-130 on the same machine as before. It has been commonly accepted that for every 5 digits increase in length of a composite, the factoring time roughly doubles. If that holds true, each of these tests in this second set should take approximately 4 times as long to run.

factmsieve.py: This time it took 12:01:34 to complete. This is actually a couple hours less than 4 times the previous run:
Code:
Sun Sep 01 15:26:44 2019 -> factmsieve.py (v0.86)
. . .
Mon Sep 02 03:28:18 2019 -> Factorization summary written to g130-comp.txt
CADO-NFS: This time it took 9:26:00 to complete. This is about 2.5 hours less than 4 times the previous run:
Code:
Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 275053/33959.9
Hybrid version of the above: This time it took 9:02:53 to complete:
Code:
Elapsed time is 542 minutes and 53 seconds
Summary: This time there was a pretty clear difference between the three methods, with CADO-NFS clearly beating factmsieve.py. But, the hybrid version came in even faster with a savings of about 25% over factmsieve.py and about 5% over CADO-NFS.

Second Set of Tests Final Timings:
factmsieve.py - - - - - - 12:01:34
CADO-NFS- - - - - - - - - 09:26:00
hybrid version - - - - - - 09:02:53

Third set of tests:

This set of tests will be as close to the last two as possible, but with three i7 machines that match the above single machine used for the first two sets. The differences in the testing will be limited to involving more machines for the polynomial testing which factmsieve.py does not do. I again used RSA-120 to perform this set of tests. A possible discrepancy in this testing is that I am not using a calculated time for polynomial selection based on the composite size. For this run, I chose a value of 15 minutes divided among the three machines. Of course, the effectiveness might be offset by the polynomial pair quality in either direction.

I am still not incorporating mpi for any portion of my tests due to earlier experience with msieve mpi LA, which proved to be detrimental without extreme communication capability between the nodes. Gigabit is not sufficient. A portion of CADO-NFS can be set up to perform with mpi, but I have been too ignorant to figure this out, yet. I hope I can cure that affliction and bring mpi into the CADO-NFS testing for a later set.

factmsieve.py: This time I used the setup described here and ran three machines, as described. I placed the commands in a script so I could time the whole run without it needing manual intervention.

It took 1:24:03 to complete:
Code:
 Elapsed time is 84 minutes and 3 seconds.
CADO-NFS: Using three machines this time, it took 1:46:39 to complete:
Code:
Info:Complete Factorization: Total cpu/elapsed time for entire factorization: 118983/6398.83
Hybrid version: This time it took 1:44:40 to complete:
Code:
Elapsed time is 104 minutes and 40 seconds
Summary: factmsieve.py definitely came in the fastest in this test, but the polynomial pair selection time was artificially shortened. With larger composites a longer time would be needed and it is hard to say whether the timing would still be ahead of a comparable CADO-NFS run. Also the manual intervention may outweigh the ease of use of the CADO-NFS package.

Clearly, CADO-NFS with modified parameter files is so close in these tests that it outweighs the miniscule time savings with all the added work for the Hybrid setup, though. This would clearly move CADO-NFS well ahead if I can learn how to use the mpi portion of the CADO-NFS Linear Algebra phase. Perhaps at that point, it would even catch factmsieve.py.

Something that never occurred to me until these tests is that, although using multiple machines will factor composites in a more timely fashion, it is clear from these limited tests that it is not efficient. Three individual machines can factor three separate composites in less time than three machines working together can factor two of the same size.

Third Set of Tests Final Timings:
factmsieve.py - - - - - - 1:24:03
CADO-NFS- - - - - - - - - 1:46:39
hybrid version - - - - - - 1:44:40

More testing is always in order, but for now I'll let things sit. . .
Attached Files
 poly2fb.cpp.txt (2.9 KB, 39 views)

Last fiddled with by EdH on 2019-09-04 at 03:15

 2019-09-01, 07:24 #2 VBCurtis     "Curtis" Feb 2005 Riverside, CA 112A16 Posts Should I try to improve on the params in factmsieve.py, for a fair(er) fight? I've done quite a lot of factmsieve tweaking over the years; perhaps you might be curious to try one or two alternate versions? I no longer mess with it, as I prefer CADO these days; in fact, I can't say I'm sure anymore what all I changed. Anyway, let me know if you'd like to test alternate params.
2019-09-01, 14:41   #3
EdH

"Ed Hall"
Dec 2009

64638 Posts

Quote:
 Originally Posted by VBCurtis Should I try to improve on the params in factmsieve.py, for a fair(er) fight? I've done quite a lot of factmsieve tweaking over the years; perhaps you might be curious to try one or two alternate versions? I no longer mess with it, as I prefer CADO these days; in fact, I can't say I'm sure anymore what all I changed. Anyway, let me know if you'd like to test alternate params.
That would be more fair since I'm using your modified params for CADO-NFS. The trouble I'm concerned with, though, is if this could bloom into a huge tree of parameter branches, especially when I start adding machines into the mix with larger composites. When I add machines I should adjust the las/LA ratio. But I also need to get the mpi portion of CADO-NFS LA working at some point, which will change things, too.

 2019-09-01, 15:11 #4 VBCurtis     "Curtis" Feb 2005 Riverside, CA 10001001010102 Posts The main change to the python script is to set target relations high enough that filtering only happens once. Pausing sieving jobs to filter 3-5 times is a waste of wall clock time. At 165+ digits, this is set higher than the minimum to build a matrix, to make for smaller matrices. I haven't done too much else, though I did reduce the digit cutoffs for each large-prime size; I don't mess with lim or lambda choices. So, the changes are simple, nothing like the meticulous fine-tuning that I've attempted on CADO. Last fiddled with by VBCurtis on 2019-09-01 at 15:12
2019-09-01, 16:31   #5
EdH

"Ed Hall"
Dec 2009

31×109 Posts

Quote:
 Originally Posted by VBCurtis The main change to the python script is to set target relations high enough that filtering only happens once. Pausing sieving jobs to filter 3-5 times is a waste of wall clock time. At 165+ digits, this is set higher than the minimum to build a matrix, to make for smaller matrices. I haven't done too much else, though I did reduce the digit cutoffs for each large-prime size; I don't mess with lim or lambda choices. So, the changes are simple, nothing like the meticulous fine-tuning that I've attempted on CADO.
I did some changes long ago also, but memory isn't serving me well ATM. I know I edited the code to work with msieve mpi LA runs (I renamed that one so it wouldn't get mixed up with the original), but that was back when gigabit switching helped between machines (Pentium 4 days).

I also remember tweaking def-par.txt, which was suggested within its description, but I don't think there is any target info there.

 2019-09-04, 03:19 #6 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 64638 Posts Since the editing of the first post in the thread doesn't flag a new post signal, I'm bumping the thread to mention that three sets of tests are now complete. This is all I plan to post for the near term.

 Similar Threads Thread Thread Starter Forum Replies Last Post CRGreathouse Factoring 3 2018-02-05 14:55 BotXXX Factoring 25 2015-09-02 08:27 mickfrancis Factoring 2 2015-01-11 18:31 WraithX Msieve 18 2012-05-20 22:19 ixfd64 Factoring 1 2004-04-27 09:41

All times are UTC. The time now is 07:24.

Tue Oct 27 07:24:24 UTC 2020 up 47 days, 4:35, 0 users, load averages: 2.08, 1.67, 1.59