mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > EdH

Reply
 
Thread Tools
Old 2018-03-17, 19:38   #1
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

D7D16 Posts
Default How I Run a Larger Factorization Using Msieve, gnfs and factmsieve.py on Several Ubuntu Machines

(Note: I expect to keep the first post of each of these "How I..." threads up-to-date with the latest version. Please read the rest of each thread to see what may have led to the current set of instructions.)

This thread will explain the steps I use to run msieve and gnfs on several computers which are already running Ubuntu and have msieve and the ggnfs package installed* and tested per:

How I Install msieve onto my Ubuntu Machines
and
How I Install ggnfs onto my Ubuntu Machines

*In this instance "install" is referring to the acquiring and compilation of the msieve and ggnfs packages only. The binaries and scripts will have to be called using their respective paths.

I will be creating the folders Math/factorMain and Math/factorMain/factorWork on every machine for this example. All of my machines are able to communicate via ssh and I will be using sshfs to map the factorWork drive of the main machine to the factorWork drives of all the others. You can use other forms of mapping, but basically, you need each machine to see the factorWork folder of the main machine. Again adjust anything you need to for local folders, etc.

In my case, I have developed scripts for all my machines, but for this thread, I will only supply the command lines that will run everything because every script is machine specific. A reader can build their own scripts easily from the commands given.

The sieving and Linear Algebra are controlled and driven by separate factmsieve.py scripts on each machine, but it has some limitations along with its assets. One of the limitations is that it only runs one machine for polynomial pair selection. Assets include aggregation of relations and automatically running the LA and subsequent stages.

Since factmsieve.py only runs a single machine for polynomial pair selection, I use msieve on each machine to generate poly pairs that are later combined and a selection of the best is made via a script from user chris2be8 in this thread. I also restrict the poly time instead of letting msieve choose, because msieve isn't aware of the other machines. Note, that I sometimes don't have the best poly that I might acquire with more time and this method may not provide the best results for >150 dd composites. Where I use a value of "poly_deadline=300" in my example, a larger value may well be worth using.

First, let's make our folders and sub-folders. Open a terminal on each machine and type:
Code:
mkdir Math/factorMain
mkdir Math/factorMain/factorWork
Now, acquire the factmsieve.py script from:

Factoring Large Composite Numbers

Place a copy into each Math/factorMain folder on every machine. On every machine, using a text editor, open each factmsieve.py file and look for the following section:
Code:
# Set binary directory paths
GGNFS_PATH = '/home/<user>/Math/ggnfs/bin/'
MSIEVE_PATH = '/home/<user>/Math/msieve/'

# Set the number of CPU cores and threads
NUM_CORES = 4
THREADS_PER_CORE = 2

USE_CUDA = False
GPU_NUM = 0
MSIEVE_POLY_TIME_LIMIT = 0
MIN_NFS_BITS = 264
Make sure that the PATHs, COREs and CUDA are all set properly for each individual machine. Replace <user> above with your username.

Save/close the factmsieve.py file(s).

Go to the post referenced above and acquire a copy of refindpoly.pl.txt and remove the .txt from the name to leave refindpoly.pl. Place this file in the factorMain folder on the main machine.

Let's choose a composite and run it using three machines. For comparisons, I will use the 94 digit composite chosen for the CADO-NFS multi-machine example:
Code:
1975636228803860706131861386351317508435774072460176838764200263234956507563682801432890234281
In a terminal, on the main machine, go to the Math/factorMain/factorWork folder and enter:
Code:
echo "n: 
1975636228803860706131861386351317508435774072460176838764200263234956507563682801432890234281 > comp94.n"
Now go to the other machines and map the main machine's factorWork directory into the local machine's factorWork directory. I use sshfs in the following manner:
Code:
sshfs <mainmachine@IP>:Math/factorMain/factorWork ~/Math/factorMain/factorWork
Check to see if the map is working by going to the Math/factorMain folder and typing:
Code:
ls factorWork
If it is working you should see comp94.n listed. If it isn't, work out the trouble before continuing.

On the main machine, move into the factorWork folder. If you've run anything in this folder previously, there may be a hidden file that will cause trouble later. Use the following command to remove it:
Code:
rm .params
Now, let's start creating poly candidates from all three machines. On the main machine enter:
Code:
../../msieve/msieve -i comp94.n -s comp94.1 -nf comp94-1.fb -t 8 -np "poly_deadline=300 1,3000"
On the other machines, while in the factorWork folders, enter:
Code:
../../msieve/msieve -i comp94.n -s comp94.2 -nf comp94-2.fb -t 8 -np "poly_deadline=300 3001,6000"
and
Code:
../../msieve/msieve -i comp94.n -s comp94.3 -nf comp94-3.fb -t 8 -np "poly_deadline=300 6001,9000"
For more information on the above commands see the readme.nfs file in the msieve folder.

Five minutes after the start of the last machine, they should all be finished. Now, we're going to want a perl script which can be downloaded from this post by chris2be8. Place this file in the main machine's factorMain folder and call it while in the factorWork folder:
Code:
perl ../refindpoly.pl comp94
This will choose the best polynomial and create the file comp94.poly. Now we're ready to run the factmsieve.py script to finish factoring the composite. On the main machine, while in the factorWork folder, enter:
Code:
python ../factmsieve.py comp94.n 1 3
The 1 and 3 signify that this is the first machine of three. It will sieve and perform post sieving activities. On the two other machines, from within the factorWork folder, enter:
Code:
python ../factmsieve.py comp94.n 2 3
and
Code:
python ../factmsieve.py comp94.n 3 3
They will sieve and stop after completing their last sieving processes. The first machine will continue through the final processing and the factors will be placed in the comp94.log file. You can review the log or you can see the factors with the following command:
Code:
cat comp94.log | grep -i "factor:"
Code:
p45 factor: 179231227423414197451601378315047105853969879
p50 factor: 11022834899950977366949652871606409040980556071039

Last fiddled with by EdH on 2019-08-31 at 21:15
EdH is offline   Reply With Quote
Old 2019-08-19, 10:31   #2
aokle
 
Aug 2019

3 Posts
Default I do not know about the max_coeff=9000, Help Me

Hello EdH, "msieve -i comp94.n -s comp94.3 -nf comp94-3.fb -t 8 -np "poly_deadline=300 6001,9000"". The max_coeff can set any integer?
"-t 8"=(NUM_CORES = 4)*(THREADS_PER_CORE = 2) ? when I calculate RSA 512bit,
poly_deadline=300 is acceptable?

Thank You.
aokle is offline   Reply With Quote
Old 2019-08-19, 13:34   #3
xilman
Bamboozled!
 
xilman's Avatar
 
"𒉺𒌌𒇷𒆷𒀭"
May 2003
Down not across

32×11×103 Posts
Default

Quote:
Originally Posted by EdH View Post
...

This thread will explain the steps I use to run msieve and gnfs on several computers which are already running Ubuntu and have msieve and the ggnfs package installed* and tested per:

...
Excellent!


I've been using factMsieve.pl et al. for years but haven't yet accumulated enough round tuits to automate it to the degree which you have achieved.
xilman is offline   Reply With Quote
Old 2019-08-19, 21:50   #4
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

65758 Posts
Default

Quote:
Originally Posted by aokle View Post
Hello EdH, "msieve -i comp94.n -s comp94.3 -nf comp94-3.fb -t 8 -np "poly_deadline=300 6001,9000"". The max_coeff can set any integer?
"-t 8"=(NUM_CORES = 4)*(THREADS_PER_CORE = 2) ? when I calculate RSA 512bit,
poly_deadline=300 is acceptable?

Thank You.
Hi aokle,

RSA-512 would mean a bit larger number than my example, so you would need to adjust your parameters accordingly. poly_deadline would most assuredly not achieve a good enough polynomial pair in 5 minutes for your number. I'm not sure I understand the other question, but you should adjust all the parameters based on how many machines will be used and what range you would like to use for your polynomial pair search.

Having said all the above, I've moved to using CADO-NFS across several machines and let it do "almost" all the parameter choice. All but the Linear Algebra phase will be run pretty much automatically across all machines. LA will be run on the server machine only. I have the setup for CADO-NFS in a similar thread to this one.

Ed
EdH is offline   Reply With Quote
Old 2019-08-19, 21:56   #5
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

3·1,151 Posts
Default

Quote:
Originally Posted by xilman View Post
Excellent!


I've been using factMsieve.pl et al. for years but haven't yet accumulated enough round tuits to automate it to the degree which you have achieved.
Thanks xilman. I've never really played with the .pl version. I came on the scene when Brian was just about finalizing his .py code and got more familiar with it. Now I pretty much just use CADO-NFS, although I can factor a large composite quicker using a hybrid CADO-NFS/msieve script I occasionally play with.
EdH is offline   Reply With Quote
Old 2019-08-20, 12:20   #6
aokle
 
Aug 2019

316 Posts
Default

Hi EdH,
Think you for your help.
I'm a little uncertainty:
"poly_deadline=300 1,3000"
"poly_deadline=300 3001,6000"
"poly_deadline=300 6001,9000"

when I excute "./msieve --help". some like that:
poly_deadline=X stop searching after X seconds (0 means search forever)
X,Y same as 'min_coeff=X max_coeff=Y'

but I stil do not understand:
the coeff range is 1-9000 ?
max_coeff(9000) can be another integer at your case?

Think you.
aokle is offline   Reply With Quote
Old 2019-08-20, 13:26   #7
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

345310 Posts
Default

Quote:
Originally Posted by aokle View Post
Hi EdH,
Think you for your help.
I'm a little uncertainty:
"poly_deadline=300 1,3000"
"poly_deadline=300 3001,6000"
"poly_deadline=300 6001,9000"

when I excute "./msieve --help". some like that:
poly_deadline=X stop searching after X seconds (0 means search forever)
X,Y same as 'min_coeff=X max_coeff=Y'

but I stil do not understand:
the coeff range is 1-9000 ?
max_coeff(9000) can be another integer at your case?

Think you.
Hi aokle,

If you run an instance without the min/max_coeff, the coefficient value is chosen at random from 1 through the max_coeff that is chosen by msieve based on the composite. If you are running multiple machines there is a "slight" chance of more than one running the same coeff. To prevent this in my example, I set each machine to use a unique range. That way all the randoms of one are outside all the randoms of the others. As the composite gets larger, the max_coeff for the entire group of machines gets larger, but it is divided between all the machines using the min_coeff=,max_coeff==.

The poly_deadline chosen was just for the example size composite. With three machines searching, that was roughly equal to 15 minutes of searching across all three machines. In practice, I choose the total time I want to search and divide that by the number of machines I'll be using, similar to the max/min_coeffs.


Ed
EdH is offline   Reply With Quote
Old 2019-08-21, 02:26   #8
aokle
 
Aug 2019

3 Posts
Thumbs up Excellent!

Quote:
Originally Posted by EdH View Post
Hi aokle,

If you run an instance without the min/max_coeff, the coefficient value is chosen at random from 1 through the max_coeff that is chosen by msieve based on the composite. If you are running multiple machines there is a "slight" chance of more than one running the same coeff. To prevent this in my example, I set each machine to use a unique range. That way all the randoms of one are outside all the randoms of the others. As the composite gets larger, the max_coeff for the entire group of machines gets larger, but it is divided between all the machines using the min_coeff=,max_coeff==.

The poly_deadline chosen was just for the example size composite. With three machines searching, that was roughly equal to 15 minutes of searching across all three machines. In practice, I choose the total time I want to search and divide that by the number of machines I'll be using, similar to the max/min_coeffs.


Ed
Excellent!
aokle is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
How I Run a Larger Factorization Via CADO-NFS on Several Ubuntu Machines EdH EdH 0 2018-02-25 18:00
How I Install msieve onto my Ubuntu Machines EdH EdH 0 2018-02-23 14:43
Error while running Msieve 1.53 with factmsieve.py FelicityGranger Msieve 2 2016-12-04 10:44
Factorizing with MSIEVE, GGNFS & Factmsieve.py Romuald Msieve 24 2015-11-09 20:16
Error running GGNFS+msieve+factmsieve.py D. B. Staple Factoring 6 2011-06-12 22:23

All times are UTC. The time now is 22:56.

Sat Nov 28 22:56:19 UTC 2020 up 79 days, 20:07, 3 users, load averages: 1.24, 1.14, 1.18

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.