mersenneforum.org Optimization and selection of the top scoring polynomial from a larger set
 Register FAQ Search Today's Posts Mark Forums Read

2021-10-04, 20:58   #1
Plutie

"Evan"
Dec 2020
Montreal

23·32 Posts
Optimization and selection of the top scoring polynomial from a larger set

I'm working on a bash script to facilitate optimization and selection of the top scoring polynomial from a larger set (i.e. the msieve.dat.ms file from running msieve -np1 -nps). I have the actual optimization process done, but I'm still trying to figure out what the optimal choices for sopteffort and ropteffort based on the composite's size. I think that VBCurtis' optimized CADO parameters would be useful as a first reference, but those only have ropteffort values. Does anyone have an idea of what the scaling should be for sopteffort, as well as for ropteffort?

I'm attaching my current script to this post. The only requirement right now is that you have an input file that will work with CADO's sopt binary.

My current plans are:
- figure out a way to convert msieve's nps output into CADO-compatible polynomials
- split the sopt effort over multiple threads (currently only ropt has a threads parameter)
- implementing duplicate polynomial removal before every step

P.S. I'm not sure if this belongs here or if it should be moved to the programming sub-forum instead.
Attached Files
 Scripts.zip (1.2 KB, 39 views)

Last fiddled with by EdH on 2021-10-06 at 12:08 Reason: forgot something!

2021-10-04, 21:23   #2
EdH

"Ed Hall"
Dec 2009

2·3·17·41 Posts

Quote:
 Originally Posted by Plutie . . . Feedback/advice is appreciated, thank you! P.S. I'm not sure if this belongs here or if it should be moved to the programming sub-forum instead.
In my spin scripts, based on Max0526's spin routine, I use sopt 0, 1, 10, 100 and ropt 0, 0.01, 1, 2, 10, 100. The initial discussion on the script is here. My current work is much more complicated, adding in some Msieve ropt. But, I'm not getting any better "spins." I look forward to seeing your script to see if I can find something to move mine forward. I'm also waiting to hear from Max on a couple notes I sent.

I think this thread might be better in the "Factoring" sub-forum, only because it involves more than just CADO-NFS, but let's leave it here for now and if the time comes to move it, I can do so.

 2021-10-04, 22:20 #3 VBCurtis     "Curtis" Feb 2005 Riverside, CA 142116 Posts Below 145 digits, I find that using sopt in a CADO poly select is slower than using that time to search a wider ad-range. I've just recently begun trying to determine when sopt=1 provides a benefit greater than its time cost. By C180+, a high number for sopt like 8 or 10 seems to be worthwhile and perhaps even higher, so there ought to be some kind of scaling-up in between.
2021-10-06, 03:19   #4
Plutie

"Evan"
Dec 2020
Montreal

7210 Posts

Got the script into a working state, should be ready for use for anyone!

Unzip the scripts zip into your CADO build/[user]/polyselect directory. All you need in order to run it is a file from either msieve's size-optimization output (msieve -np1 -nps) or already in CADO's poly format. Run the script with ./polyFinder.sh (might need to run chmod +x polyFinder.sh first.) It should just run through all the steps on its own. The default sopteffort and ropteffort are higher, but I'll be fixing them up eventually. They shouldn't necessarily go any higher than what they're currently set to though.

Enjoy!

P.S. @mods: could the thread be renamed to something more appropriate? (and moved if necessary.)

Forgot to actually attach the zip, here it is. Could this be merged into the post above? (Done -EdH)
Attached Files
 polyopt-1.0.0.zip (2.0 KB, 36 views)

Last fiddled with by EdH on 2021-10-06 at 12:12 Reason: OP Request for merge

 2021-10-06, 13:29 #5 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 2×3×17×41 Posts I d/l'd your latest and am having some difficulties getting it set up. First, I had to install pv and gawk. They were not defaults in my Ubuntu, but were readily available. I am totally unfamiliar with pv, but it looks like a pipe monitoring tool. Is this supplying the "0:00:00 [76.9k/s] [>..." line? I then, first tried an Msieve polynomial: Code: n: 346779657259791246313554044276401199475050208141387409790076498969471905612322027394740984520613397049141959352962783652552620174555843715898338877709473791695077522065984368363437509 R0: -633156115026647924966242353927160849 R1: 1838892199331450047 A0: -4055358059413560189793755056163937883873351483520 A1: 16581144338788276414460714496465421079736 A2: -636936169159140542086060941338 A3: -29838939405307828973869 A4: -3553046772992 A5: 3408 skew 1367091201.68 # size 6.377e-018, alpha -8.529, combined = 5.338e-014 rroots = 5 Then I tried manually converting it to CADO-NFS fomat: Code: n: 346779657259791246313554044276401199475050208141387409790076498969471905612322027394740984520613397049141959352962783652552620174555843715898338877709473791695077522065984368363437509 Y0: -633156115026647924966242353927160849 Y1: 1838892199331450047 c0: -4055358059413560189793755056163937883873351483520 c1: 16581144338788276414460714496465421079736 c2: -636936169159140542086060941338 c3: -29838939405307828973869 c4: -3553046772992 c5: 3408 skew: 1367091201.68 # size 6.377e-018, alpha -8.529, combined = 5.338e-014 rroots = 5 For both, I got: Code: $bash polyFinder.sh Enter the number being factored: 346779657259791246313554044276401199475050208141387409790076498969471905612322027394740984520613397049141959352962783652552620174555843715898338877709473791695077522065984368363437509 What file are your polynomials stored in? polys How many threads will you be using? 24 What degree polynomial are you optimizing? (4-6)5 How many polynomials would you like to run through root optimization? 12 Invalid input, attempting to reformat. Parse error: parameter for key c0 is not an mpz: 0:00:00 [76.9k/s] [> ] 1% ETA 09:09:35 Size-optimization completed with (estimated) 0 polynomials. Beginning root-optimization with best 12 polynomials. # Warning: parameter B is checked by this program but is undocumented. # Warning: parameter A is checked by this program but is undocumented. 0:00:00 [1.15k/s] [ <=> ] Best polynomial has been saved to the file: bestpoly # (8ab2eea72) ./polyselect_ropt -inputpolys toppolys -ropteffort 100 -t 24 # Compiled with gcc 9.3.0 # Compilation flags (C) -std=c99 -g -W -Wall -O2 -msse3 -mssse3 -msse4.1 -mpopcnt -mavx -mpclmul # Compilation flags (C++) -export-dynamic -std=c++11 -Wno-c++11-compat -g -W -Wall -O2 -Wno-literal-suffix -msse3 -mssse3 -msse4.1 -mpopcnt -mavx -mpclmul # Info: Will use 24 threads # Info: ropteffort = 100 # Info: L1_cachesize = 32768, size_tune_sievearray = 16384 # Reading polynomials from toppolys # 0 polynomial(s) read. # Info: Using OpenMP with 24 thread(s) # Stat: total phase took 0.19s # Stat: rootsieve took 0.00s # Stat: (stage 1 took 0.00s) # Stat: (tuning took 0.00s) # Stat: (stage 2 (sieving) took 0.00s) # WARNING: No polynomials were found in the input file toppolys Should I format the polynomial differently? 2021-10-06, 14:26 #6 Plutie "Evan" Dec 2020 Montreal 23×32 Posts Quote:  Originally Posted by EdH I d/l'd your latest and am having some difficulties getting it set up. First, I had to install pv and gawk. They were not defaults in my Ubuntu, but were readily available. I am totally unfamiliar with pv, but it looks like a pipe monitoring tool. Is this supplying the "0:00:00 [76.9k/s] [>..." line? I then, first tried an Msieve polynomial: Code: n: 346779657259791246313554044276401199475050208141387409790076498969471905612322027394740984520613397049141959352962783652552620174555843715898338877709473791695077522065984368363437509 R0: -633156115026647924966242353927160849 R1: 1838892199331450047 A0: -4055358059413560189793755056163937883873351483520 A1: 16581144338788276414460714496465421079736 A2: -636936169159140542086060941338 A3: -29838939405307828973869 A4: -3553046772992 A5: 3408 skew 1367091201.68 # size 6.377e-018, alpha -8.529, combined = 5.338e-014 rroots = 5 Then I tried manually converting it to CADO-NFS fomat: Code: n: 346779657259791246313554044276401199475050208141387409790076498969471905612322027394740984520613397049141959352962783652552620174555843715898338877709473791695077522065984368363437509 Y0: -633156115026647924966242353927160849 Y1: 1838892199331450047 c0: -4055358059413560189793755056163937883873351483520 c1: 16581144338788276414460714496465421079736 c2: -636936169159140542086060941338 c3: -29838939405307828973869 c4: -3553046772992 c5: 3408 skew: 1367091201.68 # size 6.377e-018, alpha -8.529, combined = 5.338e-014 rroots = 5 For both, I got: Code: $ bash polyFinder.sh Enter the number being factored: 346779657259791246313554044276401199475050208141387409790076498969471905612322027394740984520613397049141959352962783652552620174555843715898338877709473791695077522065984368363437509 What file are your polynomials stored in? polys How many threads will you be using? 24 What degree polynomial are you optimizing? (4-6)5 How many polynomials would you like to run through root optimization? 12 Invalid input, attempting to reformat. Parse error: parameter for key c0 is not an mpz: 0:00:00 [76.9k/s] [> ] 1% ETA 09:09:35 Size-optimization completed with (estimated) 0 polynomials. Beginning root-optimization with best 12 polynomials. # Warning: parameter B is checked by this program but is undocumented. # Warning: parameter A is checked by this program but is undocumented. 0:00:00 [1.15k/s] [ <=> ] Best polynomial has been saved to the file: bestpoly # (8ab2eea72) ./polyselect_ropt -inputpolys toppolys -ropteffort 100 -t 24 # Compiled with gcc 9.3.0 # Compilation flags (C) -std=c99 -g -W -Wall -O2 -msse3 -mssse3 -msse4.1 -mpopcnt -mavx -mpclmul # Compilation flags (C++) -export-dynamic -std=c++11 -Wno-c++11-compat -g -W -Wall -O2 -Wno-literal-suffix -msse3 -mssse3 -msse4.1 -mpopcnt -mavx -mpclmul # Info: Will use 24 threads # Info: ropteffort = 100 # Info: L1_cachesize = 32768, size_tune_sievearray = 16384 # Reading polynomials from toppolys # 0 polynomial(s) read. # Info: Using OpenMP with 24 thread(s) # Stat: total phase took 0.19s # Stat: rootsieve took 0.00s # Stat: (stage 1 took 0.00s) # Stat: (tuning took 0.00s) # Stat: (stage 2 (sieving) took 0.00s) # WARNING: No polynomials were found in the input file toppolys Should I format the polynomial differently?
https://transfer.sh/KWRkr9/polyFinder.sh Replace the script with this one, I did some quick (not-so-clean) fixes. It works with your initial poly now.

2021-10-06, 15:16   #7
EdH

"Ed Hall"
Dec 2009

101268 Posts

Quote:
 Originally Posted by Plutie https://transfer.sh/KWRkr9/polyFinder.sh Replace the script with this one, I did some quick (not-so-clean) fixes. It works with your initial poly now.
That got it working and it's underway ATM:
Code:
n: 346779657259791246313554044276401199475050208141387409790076498969471905612322027394740984520613397049141959352962783652552620174555843715898338877709473791695077522065984368363437509
Y0: -633156115026647924966242353927160849
Y1: 1838892199331450047
c0: -4055358059413560189793755056163937883873351483520
c1: 16581144338788276414460714496465421079736
c2: -636936169159140542086060941338
c3: -29838939405307828973869
c4: -3553046772992
c5: 3408
skew: 1367091201.68
# size 6.377e-018, alpha -8.529, combined = 5.338e-014 rroots = 5

File is valid sopt input.
0:00:01 [36.9 /s] [==============================] 133%             ETA 11:03:16
Size-optimization completed with (estimated) 1 polynomials.
Beginning root-optimization with best 12 polynomials.
# Warning: parameter B is checked by this program but is undocumented.
# Warning: parameter A is checked by this program but is undocumented.
08:19 [46.1m/s] [=====================>         ] 74% ETA 0:02:53 ETA 11:14:29
What are the "undocumented" lines referring to?

The last line seems a bit misleading. It got to 74% and then just started clocking the time values upward.

2021-10-06, 15:41   #8
Plutie

"Evan"
Dec 2020
Montreal

23·32 Posts

Quote:
 Originally Posted by EdH That got it working and it's underway ATM: Code: n: 346779657259791246313554044276401199475050208141387409790076498969471905612322027394740984520613397049141959352962783652552620174555843715898338877709473791695077522065984368363437509 Y0: -633156115026647924966242353927160849 Y1: 1838892199331450047 c0: -4055358059413560189793755056163937883873351483520 c1: 16581144338788276414460714496465421079736 c2: -636936169159140542086060941338 c3: -29838939405307828973869 c4: -3553046772992 c5: 3408 skew: 1367091201.68 # size 6.377e-018, alpha -8.529, combined = 5.338e-014 rroots = 5 File is valid sopt input. 0:00:01 [36.9 /s] [==============================] 133% ETA 11:03:16 Size-optimization completed with (estimated) 1 polynomials. Beginning root-optimization with best 12 polynomials. # Warning: parameter B is checked by this program but is undocumented. # Warning: parameter A is checked by this program but is undocumented. 08:19 [46.1m/s] [=====================> ] 74% ETA 0:02:53 ETA 11:14:29 What are the "undocumented" lines referring to? The last line seems a bit misleading. It got to 74% and then just started clocking the time values upward.
Both of the progress lines are (very) rough estimates of the time remaining. It's inherently flawed because it bases the progress on the lines output from the actual program. Since the ropt program takes a while to give any output once it starts working on a polynomial, it's going to be inaccurate. I initially didn't have any ETA functionality, but I gave it a try. The script is meant more for large lists, of possibly thousands of polys, in which case it should end up being a bit more accurate. I know the sopt progress bar is mostly correct, since there's a pretty consistent output from that binary.

The undocumented lines are output from CADO's ropt binary, they output to stderr. You can replace the line that calls ropt with this to hide them.
./polyselect_ropt -inputpolys toppolys -ropteffort 100 -t "$THREADS" 2>/dev/null | pv -pltIaes$((\$(wc -l < toppolys) * 21 / 8)) > ropt.out

If you want a list to test the script on, here's one. https://transfer.sh/KagkW6/msieve.dat.ms. deg 6, head -1 msieve.dat.ms for composite.

Last fiddled with by Plutie on 2021-10-06 at 16:20 Reason: added example file

 2021-10-06, 16:38 #9 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 2×3×17×41 Posts Success! It completed with a better poly than supplied: Code: Best polynomial has been saved to the file: bestpoly # Stat: rootsieve took 1625.22s # Stat: (stage 1 took 266.42s) # Stat: (tuning took 1317.08s) # Stat: (stage 2 (sieving) took 41.04s) # Best polynomial found (revision 8ab2eea72): # n: 346779657259791246313554044276401199475050208141387409790076498969471905612322027394740984520613397049141959352962783652552620174555843715898338877709473791695077522065984368363437509 # Y0: -633156115434329678267970664676644305 # Y1: 1838892199331450047 # c0: -942416842833037178008794982721825660284661427968 # c1: 12659775140765359789715754874236115108152 # c2: 17789742230088331244638103557030 # c3: -25013045517601501098285 # c4: -7330808774912 # c5: 3408 # skew: 1038278114.355 # # lognorm 59.22, E 50.34, alpha -8.88 (proj -1.40), 5 real roots # # MurphyE(Bf=1.000e+07,Bg=5.000e+06,area=1.000e+16)=5.688e-14 # Average exp_E: 51.00, average E: 50.34 Your script is much more clean and professional than mine. Mine has a lot more going on. I'll do some comparisons later. I'm interested in seeing if yours will find the same Best poly with subsequent runs and what my script might turn up. For info, the poly I chose for input was from the Polynomial Request Thread, and has some spun versions later in the thread.
2021-10-07, 03:39   #10
Plutie

"Evan"
Dec 2020
Montreal

23·32 Posts

"Release" 1.0.1

- Added some more conversion logic, should be compatible with the standard poly format used on this forum now (just make sure it has the n value added manually, I'll figure that out eventually.)
Attached Files
 polyopt-1.0.1.zip (2.4 KB, 46 views)

 2021-10-07, 12:42 #11 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 10000010101102 Posts My script did come up with a better scoring polynomial, but it came up with the same one as yours initially: Code: # MurphyE(Bf=1.000e+07,Bg=5.000e+06,area=1.000e+16)=5.882e-14 Best poly cownoise values: 1431009352.14992 5.96454364e-14 The better one was after going back to Msieve and then through CADO a second time. I'll eventually post in the other thread about the changes to that script, but let's keep this thread focused on building your script.

 Similar Threads Thread Thread Starter Forum Replies Last Post MattcAnderson MattcAnderson 0 2021-08-16 00:39 Rodrigo Information & Answers 67 2019-09-20 06:33 retina Forum Feedback 9 2012-07-13 02:03 R.D. Silverman Cunningham Tables 14 2010-09-29 19:56 R.D. Silverman Cunningham Tables 11 2006-03-06 18:46

All times are UTC. The time now is 02:01.

Tue Jan 18 02:01:30 UTC 2022 up 178 days, 20:30, 0 users, load averages: 1.29, 1.50, 1.44