mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2017-09-27, 01:48   #1
jacky
 
Sep 2017

32 Posts
Default Msieve parallel poly selection with several GPU cards

Hello Everyone:
I would like some advice about msieve.
I want to compile Msieve to work with several GPU cards.How should I do to run the polynomial stage in parallel ?

Thanks.
jacky is offline   Reply With Quote
Old 2017-09-27, 06:45   #2
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

72·131 Posts
Default

Divide up the c5 range manually and run one process with -g 0, another with -g 1, another with -g 2 ...
fivemack is offline   Reply With Quote
Old 2017-09-27, 06:49   #3
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

4,861 Posts
Default

Just give each card a different coefficient range to search.
If you leave the stage1 norm at default, and are searching for a large input number, msieve splits the search into small slices (say, 40 slices, searching just one) such that 5 different cards will each search a random 1/40th, producing little overlap in efforts even if they all searched the same range.

I suggest altering stage 1 norm (divide default by 10 to 30), and having each GPU search a different coefficient range.

Last fiddled with by VBCurtis on 2017-09-27 at 06:50
VBCurtis is offline   Reply With Quote
Old 2017-09-28, 01:17   #4
jacky
 
Sep 2017

32 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
Just give each card a different coefficient range to search.
If you leave the stage1 norm at default, and are searching for a large input number, msieve splits the search into small slices (say, 40 slices, searching just one) such that 5 different cards will each search a random 1/40th, producing little overlap in efforts even if they all searched the same range.

I suggest altering stage 1 norm (divide default by 10 to 30), and having each GPU search a different coefficient range.
If I have 10 cards, I want to split the search into 10 slices and give each card a different range,not to search random.Is it right?
jacky is offline   Reply With Quote
Old 2017-09-28, 01:53   #5
jacky
 
Sep 2017

32 Posts
Default

Quote:
Originally Posted by fivemack View Post
Divide up the c5 range manually and run one process with -g 0, another with -g 1, another with -g 2 ...
How to give each process the range? USing it as a parameter of msieve ?
jacky is offline   Reply With Quote
Old 2017-09-28, 04:14   #6
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

4,861 Posts
Default

invoke it so:
./msieve -np1 -nps 1,1000
Once this starts to run, cancel it; you only want a default run to learn what msieve chooses for default stage1 norm and stage2 norm (see msieve.log for those numbers). As soon as msieve.log is created, you can ctrl-c the run.

./msieve -np1 -nps "stage1norm=xeyy stage2norm=uevv 1000000,10000000" -g0 -s gpu1polyfile

-np1 is the first step, run on the GPU; if you divide the default stage1norm by 10, you'll be better off; depending on how fast the CPU is, you might divide by 20 or even 30 to reduce the amount of work the GPU sends to the GPU. Experiment.

-nps is the size optimizing step, where each stage 1 hit is refined to make a polynomial. Only the best hits from this step deserve the intensive root-optimizing step (which is generally done separately). I usually set the stage 2 norm such that msieve saves 100 lines per day to disk (specifically, the gpu1polyfile.ms file). With your super-gpu-setup, I'd set stage2 even tighter; dividing the default stage2norm by 30 might be enough, or might produce too much output. Each invocation of msieve (1 per GPU) will be using CPU to do the -nps step. If this is too much work for your CPU, you can omit the -nps flag above, and the raw GPU output will be saved to disk. These files can get *big*; it makes more sense to me to run -np1 and -nps at the same time, but you might saturate a quad-core with 10 separate msieve instances and 10 GPUs. If you reduce stage1norm, you also reduce CPU load because fewer hits are being sent from GPU stage to CPU stage.

The numbers after stage2norm above, e.g. 1000000,10000000 are the coefficient range of c5 you're asking that GPU to run. Make the ranges non-overlapping for best results.

Run one invocation per GPU, with different save files. After you're done with the GPU phase, cat all the files together, and run:
./msieve -npr -s nameofbigfileofpolys (the file needs to have the .ms ending, but don't put the .ms part in after -s).

This last step is CPU-only (and single-threaded!), and quite slow; the bigger the input number, the slower the root-opt takes. On a C180, 100 lines in the input file might take an hour, or more.

When complete, file called nameofbigfileofpolys.p will have a list of every optimized poly generated in the last step, and msieve.fb will have the highest-scoring poly, ready for NFS work.

I likely omitted important details in this explanation- please ask for more info if I've confused you.

Last fiddled with by VBCurtis on 2017-09-28 at 04:15
VBCurtis is offline   Reply With Quote
Old 2017-09-28, 07:40   #7
jacky
 
Sep 2017

916 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
invoke it so:
./msieve -np1 -nps 1,1000
Once this starts to run, cancel it; you only want a default run to learn what msieve chooses for default stage1 norm and stage2 norm (see msieve.log for those numbers). As soon as msieve.log is created, you can ctrl-c the run.

./msieve -np1 -nps "stage1norm=xeyy stage2norm=uevv 1000000,10000000" -g0 -s gpu1polyfile

-np1 is the first step, run on the GPU; if you divide the default stage1norm by 10, you'll be better off; depending on how fast the CPU is, you might divide by 20 or even 30 to reduce the amount of work the GPU sends to the GPU. Experiment.

-nps is the size optimizing step, where each stage 1 hit is refined to make a polynomial. Only the best hits from this step deserve the intensive root-optimizing step (which is generally done separately). I usually set the stage 2 norm such that msieve saves 100 lines per day to disk (specifically, the gpu1polyfile.ms file). With your super-gpu-setup, I'd set stage2 even tighter; dividing the default stage2norm by 30 might be enough, or might produce too much output. Each invocation of msieve (1 per GPU) will be using CPU to do the -nps step. If this is too much work for your CPU, you can omit the -nps flag above, and the raw GPU output will be saved to disk. These files can get *big*; it makes more sense to me to run -np1 and -nps at the same time, but you might saturate a quad-core with 10 separate msieve instances and 10 GPUs. If you reduce stage1norm, you also reduce CPU load because fewer hits are being sent from GPU stage to CPU stage.

The numbers after stage2norm above, e.g. 1000000,10000000 are the coefficient range of c5 you're asking that GPU to run. Make the ranges non-overlapping for best results.

Run one invocation per GPU, with different save files. After you're done with the GPU phase, cat all the files together, and run:
./msieve -npr -s nameofbigfileofpolys (the file needs to have the .ms ending, but don't put the .ms part in after -s).

This last step is CPU-only (and single-threaded!), and quite slow; the bigger the input number, the slower the root-opt takes. On a C180, 100 lines in the input file might take an hour, or more.

When complete, file called nameofbigfileofpolys.p will have a list of every optimized poly generated in the last step, and msieve.fb will have the highest-scoring poly, ready for NFS work.

I likely omitted important details in this explanation- please ask for more info if I've confused you.

Thanks for your detailed answer.
I want to know how to calculate the upper limit of the coefficient?I have tryed to calculate it according to the document,but it's not tally with the number in the log file.Like this:

Mon Sep 18 09:31:15 2017 Msieve v. 1.53 (SVN 998)
Mon Sep 18 09:31:15 2017 random seeds: a04bee34 3009d85b
Mon Sep 18 09:31:15 2017 factoring 1383685099763667105632400713678004435475890381619245554755184129298742220082186377 (82 digits)
Mon Sep 18 09:31:15 2017 searching for 15-digit factors
Mon Sep 18 09:31:15 2017 commencing number field sieve (82-digit input)
Mon Sep 18 09:31:15 2017 commencing number field sieve polynomial selection
Mon Sep 18 09:31:15 2017 polynomial degree: 4
Mon Sep 18 09:31:15 2017 max stage 1 norm: 5.07e+13
Mon Sep 18 09:31:15 2017 max stage 2 norm: 2.34e+13
Mon Sep 18 09:31:15 2017 min E-value: 9.06e-08
Mon Sep 18 09:31:15 2017 poly select deadline: 281
Mon Sep 18 09:31:15 2017 time limit set to 0.08 CPU-hours
Mon Sep 18 09:31:15 2017 expecting poly E from 1.99e-07 to > 2.29e-07
Mon Sep 18 09:31:15 2017 searching leading coefficients from 1 to 1925891

My questions are:
1.How to get the number "1925891"?
2.The number that needed to be divided is stage1norm"5.07e+13" or "1925891",I'm confused...
3.For this example,if I have 2 GPU cards,invoke so:
../msieve.exe -np1 -nps "stage1norm=5.07e+13 stage2norm=2.34e+13 1,1000000" -g 0 -s gpu0polyfile

../msieve.exe -np1 -nps "stage1norm=5.07e+13 stage2norm=2.34e+13 1000000,2000000" -g 1 -s gpu1polyfile

Is it right?

Last fiddled with by jacky on 2017-09-28 at 08:29
jacky is offline   Reply With Quote
Old 2017-09-28, 14:13   #8
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

113758 Posts
Default

Quote:
Originally Posted by jacky View Post
Mon Sep 18 09:31:15 2017 max stage 1 norm: 5.07e+13
Mon Sep 18 09:31:15 2017 max stage 2 norm: 2.34e+13
Mon Sep 18 09:31:15 2017 min E-value: 9.06e-08
Mon Sep 18 09:31:15 2017 poly select deadline: 281
Mon Sep 18 09:31:15 2017 time limit set to 0.08 CPU-hours
Mon Sep 18 09:31:15 2017 expecting poly E from 1.99e-07 to > 2.29e-07
Mon Sep 18 09:31:15 2017 searching leading coefficients from 1 to 1925891

My questions are:
1.How to get the number "1925891"?
2.The number that needed to be divided is stage1norm"5.07e+13" or "1925891",I'm confused...
3.For this example,if I have 2 GPU cards,invoke so:
../msieve.exe -np1 -nps "stage1norm=5.07e+13 stage2norm=2.34e+13 1,1000000" -g 0 -s gpu0polyfile

../msieve.exe -np1 -nps "stage1norm=5.07e+13 stage2norm=2.34e+13 1000000,2000000" -g 1 -s gpu1polyfile

Is it right?
In short, yes it's right!

A C82 isn't even worth using GNFS, let alone a GPU for poly select, so this isn't a good test case.

I have no idea how msieve gets "1925891". That number is also not relevant- it's an arbitrary choice msieve makes, one that the start and stop values supersede from our command line. If we don't put numbers inside the quotes to tell msieve coefficient values to search, it would for this number choose to search from 1 to 1925891. Instead, we're splitting the range into 1-1million on GPU0, 1M-2M on GPU1, etc. There's no cap on how high you can search, but there's not enough NFS-time-savings to justify spending vast efforts on poly searching (3-5% of expected project length is the rule of thumb- say, a GPU-day on a C155, 3-4 GPU-days on C170, etc). Your GPU rig is overkill for any project smaller than C180; a single GPU for a week or less is plenty for anything smaller.

The numbers you care about are the max stage1 norm and max stage2 norm. Divide stage 1 by 10 (that is, reduce the exponent by 1) and divide stage 2 by 30 or so. That'll get you close to the correct amount of output.
VBCurtis is offline   Reply With Quote
Old 2017-09-29, 13:05   #9
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

The upper bound on the leading coefficient is a function of the minimum and maximum skew.

For polynomial degree d, once you choose a leading coefficient the next coefficient after it is easy to make small, but all the others will be the same (too large) size on average when the skew is 1. As the skew gets larger the third highest coefficient starts to reduce in size if you use Kleinjung's algorithms, but if the skew is too large then the fourth-to-highest coefficient takes over as the largest contributor to the size norm. That bounds the acceptable range of skews, and by extension the range of leading coefficients. Details are in Kleinjung's 2006 Math. Comp. paper.

Paul Zimmermann has empirically observed that one can find better size properties on average if the leading coefficient is deliberately too small given the target skew; I don't remember the details.
jasonp is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Running msieve polynomial selection in parallel? ryanp Msieve 9 2019-11-16 19:45
msieve parallel poly selection with MPI drone84 Msieve 4 2017-06-28 09:18
GNFS poly selection frmky Factoring 14 2012-07-23 01:57
Restart/continue poly selection with msieve? Jeff Gilchrist Msieve 3 2009-04-25 14:03
Different msieve 1.39 poly selection outputs... Jeff Gilchrist Msieve 5 2008-12-29 23:07

All times are UTC. The time now is 01:15.


Sat Jul 17 01:15:31 UTC 2021 up 49 days, 23:02, 1 user, load averages: 1.42, 1.15, 1.29

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.