mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   EdH (https://www.mersenneforum.org/forumdisplay.php?f=152)
-   -   How I Run a Larger Factorization Using Msieve, gnfs and factmsieve.py on Several Ubuntu Machines (https://www.mersenneforum.org/showthread.php?t=23165)

EdH 2018-03-17 19:38

How I Run a Larger Factorization Using Msieve, gnfs and factmsieve.py on Several Ubuntu Machines
 
(Note: I expect to keep the first post of each of these "How I..." threads up-to-date with the latest version. Please read the rest of each thread to see what may have led to the current set of instructions.)

This thread will explain the steps I use to run msieve and gnfs on several computers which are already running Ubuntu and have msieve and the ggnfs package installed* and tested per:

[URL="http://www.mersenneforum.org/showthread.php?t=23085"]How I Install msieve onto my Ubuntu Machines[/URL]
and
[URL="http://www.mersenneforum.org/showthread.php?t=23081"]How I Install ggnfs onto my Ubuntu Machines[/URL]

*In this instance "install" is referring to the acquiring and compilation of the msieve and ggnfs packages only. The binaries and scripts will have to be called using their respective paths.

I will be creating the folders Math/factorMain and Math/factorMain/factorWork on every machine for this example. All of my machines are able to communicate via ssh and I will be using sshfs to map the factorWork drive of the main machine to the factorWork drives of all the others. You can use other forms of mapping, but basically, you need each machine to see the factorWork folder of the main machine. Again adjust anything you need to for local folders, etc.

In my case, I have developed scripts for all my machines, but for this thread, I will only supply the command lines that will run everything because every script is machine specific. A reader can build their own scripts easily from the commands given.

The sieving and Linear Algebra are controlled and driven by separate factmsieve.py scripts on each machine, but it has some limitations along with its assets. One of the limitations is that it only runs one machine for polynomial pair selection. Assets include aggregation of relations and automatically running the LA and subsequent stages.

Since factmsieve.py only runs a single machine for polynomial pair selection, I use msieve on each machine to generate poly pairs that are later combined and a selection of the best is made via a script from user chris2be8 in [URL="http://www.mersenneforum.org/showthread.php?p=482157#post482157"]this thread[/URL]. I also restrict the poly time instead of letting msieve choose, because msieve isn't aware of the other machines. Note, that I sometimes don't have the best poly that I might acquire with more time and this method may not provide the best results for >150 dd composites. Where I use a value of "poly_deadline=300" in my example, a larger value may well be worth using.

First, let's make our folders and sub-folders. Open a terminal on each machine and type:
[code]
mkdir Math/factorMain
mkdir Math/factorMain/factorWork
[/code]Now, acquire the factmsieve.py script from:

[URL="http://brg.a2hosted.com//oldsite/computing/factoring.php"]Factoring Large Composite Numbers[/URL]

Place a copy into each Math/factorMain folder on every machine. On every machine, using a text editor, open each factmsieve.py file and look for the following section:
[code]
# Set binary directory paths
GGNFS_PATH = '/home/<user>/Math/ggnfs/bin/'
MSIEVE_PATH = '/home/<user>/Math/msieve/'

# Set the number of CPU cores and threads
NUM_CORES = 4
THREADS_PER_CORE = 2

USE_CUDA = False
GPU_NUM = 0
MSIEVE_POLY_TIME_LIMIT = 0
MIN_NFS_BITS = 264
[/code]Make sure that the PATHs, COREs and CUDA are all set properly for each individual machine. Replace <user> above with your username.

Save/close the factmsieve.py file(s).

Go to the post referenced above and acquire a copy of refindpoly.pl.txt and remove the .txt from the name to leave refindpoly.pl. Place this file in the factorMain folder on the main machine.

Let's choose a composite and run it using three machines. For comparisons, I will use the 94 digit composite chosen for the CADO-NFS multi-machine example:
[code]
1975636228803860706131861386351317508435774072460176838764200263234956507563682801432890234281
[/code]In a terminal, on the main machine, go to the Math/factorMain/factorWork folder and enter:
[code]
echo "n:
1975636228803860706131861386351317508435774072460176838764200263234956507563682801432890234281 > comp94.n"
[/code]Now go to the other machines and map the main machine's factorWork directory into the local machine's factorWork directory. I use sshfs in the following manner:
[code]
sshfs <mainmachine@IP>:Math/factorMain/factorWork ~/Math/factorMain/factorWork
[/code]Check to see if the map is working by going to the Math/factorMain folder and typing:
[code]
ls factorWork
[/code]If it is working you should see comp94.n listed. If it isn't, work out the trouble before continuing.

On the main machine, move into the factorWork folder. If you've run anything in this folder previously, there may be a hidden file that will cause trouble later. Use the following command to remove it:
[code]
rm .params
[/code]Now, let's start creating poly candidates from all three machines. On the main machine enter:
[code]
../../msieve/msieve -i comp94.n -s comp94.1 -nf comp94-1.fb -t 8 -np "poly_deadline=300 1,3000"
[/code]On the other machines, while in the factorWork folders, enter:
[code]
../../msieve/msieve -i comp94.n -s comp94.2 -nf comp94-2.fb -t 8 -np "poly_deadline=300 3001,6000"
[/code]and
[code]
../../msieve/msieve -i comp94.n -s comp94.3 -nf comp94-3.fb -t 8 -np "poly_deadline=300 6001,9000"
[/code]For more information on the above commands see the readme.nfs file in the msieve folder.

Five minutes after the start of the last machine, they should all be finished. Now, we're going to want a perl script which can be downloaded from [URL="http://www.mersenneforum.org/showthread.php?p=482157#post482157"]this post[/URL] by chris2be8. Place this file in the main machine's factorMain folder and call it while in the factorWork folder:
[code]
perl ../refindpoly.pl comp94
[/code]This will choose the best polynomial and create the file comp94.poly. Now we're ready to run the factmsieve.py script to finish factoring the composite. On the main machine, while in the factorWork folder, enter:
[code]
python ../factmsieve.py comp94.n 1 3
[/code]The 1 and 3 signify that this is the first machine of three. It will sieve and perform post sieving activities. On the two other machines, from within the factorWork folder, enter:
[code]
python ../factmsieve.py comp94.n 2 3
[/code]and
[code]
python ../factmsieve.py comp94.n 3 3
[/code]They will sieve and stop after completing their last sieving processes. The first machine will continue through the final processing and the factors will be placed in the comp94.log file. You can review the log or you can see the factors with the following command:
[code]
cat comp94.log | grep -i "factor:"
[/code][code]
p45 factor: 179231227423414197451601378315047105853969879
p50 factor: 11022834899950977366949652871606409040980556071039
[/code]

aokle 2019-08-19 10:31

I do not know about the max_coeff=9000, Help Me
 
Hello EdH, "msieve -i comp94.n -s comp94.3 -nf comp94-3.fb -t 8 -np "poly_deadline=300 6001,9000"". The max_coeff can set any integer?
"-t 8"=(NUM_CORES = 4)*(THREADS_PER_CORE = 2) ? when I calculate RSA 512bit,
poly_deadline=300 is acceptable´╝č

Thank You.

xilman 2019-08-19 13:34

[QUOTE=EdH;482640]...

This thread will explain the steps I use to run msieve and gnfs on several computers which are already running Ubuntu and have msieve and the ggnfs package installed* and tested per:

...

[/QUOTE]Excellent! :tu: :bow:


I've been using [C]factMsieve.p[/C]l et al. for years but haven't yet accumulated enough round tuits to automate it to the degree which you have achieved.

EdH 2019-08-19 21:50

[QUOTE=aokle;523927]Hello EdH, "msieve -i comp94.n -s comp94.3 -nf comp94-3.fb -t 8 -np "poly_deadline=300 6001,9000"". The max_coeff can set any integer?
"-t 8"=(NUM_CORES = 4)*(THREADS_PER_CORE = 2) ? when I calculate RSA 512bit,
poly_deadline=300 is acceptable´╝č

Thank You.[/QUOTE]Hi aokle,

RSA-512 would mean a bit larger number than my example, so you would need to adjust your parameters accordingly. poly_deadline would most assuredly not achieve a good enough polynomial pair in 5 minutes for your number. I'm not sure I understand the other question, but you should adjust all the parameters based on how many machines will be used and what range you would like to use for your polynomial pair search.

Having said all the above, I've moved to using CADO-NFS across several machines and let it do "almost" all the parameter choice. All but the Linear Algebra phase will be run pretty much automatically across all machines. LA will be run on the server machine only. I have the [URL="https://www.mersenneforum.org/showthread.php?t=23091"]setup for CADO-NFS[/URL] in a similar thread to this one.

Ed

EdH 2019-08-19 21:56

[QUOTE=xilman;523935]Excellent! :tu: :bow:


I've been using [C]factMsieve.p[/C]l et al. for years but haven't yet accumulated enough round tuits to automate it to the degree which you have achieved.[/QUOTE]
Thanks xilman. I've never really played with the .pl version. I came on the scene when Brian was just about finalizing his .py code and got more familiar with it. Now I pretty much just use CADO-NFS, although I can factor a large composite quicker using a hybrid CADO-NFS/msieve script I occasionally play with.

aokle 2019-08-20 12:20

Hi EdH,
Think you for your help.
I'm a little uncertainty:
"poly_deadline=300 1,3000"
"poly_deadline=300 3001,6000"
"poly_deadline=300 6001,9000"

when I excute "./msieve --help". some like that:
poly_deadline=X stop searching after X seconds (0 means search forever)
X,Y same as 'min_coeff=X max_coeff=Y'

but I stil do not understand:
the coeff range is 1-9000 ?
max_coeff(9000) can be another integer at your case?

Think you.

EdH 2019-08-20 13:26

[QUOTE=aokle;523999]Hi EdH,
Think you for your help.
I'm a little uncertainty:
"poly_deadline=300 1,3000"
"poly_deadline=300 3001,6000"
"poly_deadline=300 6001,9000"

when I excute "./msieve --help". some like that:
poly_deadline=X stop searching after X seconds (0 means search forever)
X,Y same as 'min_coeff=X max_coeff=Y'

but I stil do not understand:
the coeff range is 1-9000 ?
max_coeff(9000) can be another integer at your case?

Think you.[/QUOTE]Hi aokle,

If you run an instance without the min/max_coeff, the coefficient value is chosen at random from 1 through the max_coeff that is chosen by msieve based on the composite. If you are running multiple machines there is a "slight" chance of more than one running the same coeff. To prevent this in my example, I set each machine to use a unique range. That way all the randoms of one are outside all the randoms of the others. As the composite gets larger, the max_coeff for the entire group of machines gets larger, but it is divided between all the machines using the min_coeff=,max_coeff==.

The poly_deadline chosen was just for the example size composite. With three machines searching, that was roughly equal to 15 minutes of searching across all three machines. In practice, I choose the total time I want to search and divide that by the number of machines I'll be using, similar to the max/min_coeffs.


Ed

aokle 2019-08-21 02:26

Excellent!
 
[QUOTE=EdH;524003]Hi aokle,

If you run an instance without the min/max_coeff, the coefficient value is chosen at random from 1 through the max_coeff that is chosen by msieve based on the composite. If you are running multiple machines there is a "slight" chance of more than one running the same coeff. To prevent this in my example, I set each machine to use a unique range. That way all the randoms of one are outside all the randoms of the others. As the composite gets larger, the max_coeff for the entire group of machines gets larger, but it is divided between all the machines using the min_coeff=,max_coeff==.

The poly_deadline chosen was just for the example size composite. With three machines searching, that was roughly equal to 15 minutes of searching across all three machines. In practice, I choose the total time I want to search and divide that by the number of machines I'll be using, similar to the max/min_coeffs.


Ed[/QUOTE]

Excellent!


All times are UTC. The time now is 08:30.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.