mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Aliquot Sequences (https://www.mersenneforum.org/forumdisplay.php?f=90)
-   -   Reserved for MF - Sequence 4788 (https://www.mersenneforum.org/showthread.php?t=11615)

charybdis 2021-08-17 15:28

[QUOTE=bur;585877]So is it really worthwhile if I search a range with my moderate incr/nq combination? How likely is it that the range should be redone later on with more demanding parameters? Or is incr=420 and nq=46656 considered good enough for good?

I think about that especially in light of users with a lot of ressources who could easily do the range with nq=6^7.[/QUOTE]

Yes, it's definitely worthwhile. You could miss a good polynomial, but if instead someone were to use lots of resources to run your range with incr=210 and nq=6^7, that would mean there's some higher range where they don't search (lots of resources =/= unlimited resources!) and there could be a good polynomial hiding there too.

Leading coefficients with more small prime factors are more likely to produce good polynomials, so incr=210 would take twice as long but probably won't find twice as many good polynomials in a given range. But there's a trade-off, because, all other things kept equal, smaller coefficients are better than larger ones. This is why VBCurtis uses larger incr values like 4620 for larger coefficients: in order to make up for the coefficients being larger, you need even more small prime factors.

As for nq, Gimarel showed that 6^7 finds polynomials more slowly than 6^6. The same trade-off applies, so perhaps larger nq is justified for smaller coefficients, but we might already have adjusted enough for this by using different incr values across the range.

VBCurtis 2021-08-17 15:36

I've done a couple of degree-6 poly searches before, and I've previously used 6^5 for nq; I consider 6^6 to be *very* thorough.

While I used incr=210 on the very-smallest coefficient range, I can assure you that nobody will consider incr=420 / nq = 46656 as anything but exhaustively searched.

Part of the "spin" that Max does is to take a posted poly and try size-opt and root-opt again with different settings (or with msieve instead of CADO, since they use slightly different procedures) in an attempt to get luckier than the original discovery. For a big job like this that Max is likely to help out with, that may make size-opt-effort something we don't really need to play with? (I'm not sure about this part, it just occured to me)

bur 2021-08-17 15:41

Ok, thanks, that's good to know. The yield so far isn't so good, best poly has cownoise 8E-17 at 85% done, so I got gloomy, haha. On the other hand I have no idea what is an average result for that range...

charybdis 2021-08-17 15:55

[QUOTE=bur;585888]Ok, thanks, that's good to know. The yield so far isn't so good, best poly has cownoise 8E-17 at 85% done, so I got gloomy, haha. On the other hand I have no idea what is an average result for that range...[/QUOTE]

Pay no attention to the scores of the polynomials produced by stage 1 and sizeopt. Rootopt will improve them *a lot*.

swellman 2021-08-17 23:35

[CODE]
n: 5272066026958413205513021090082556639441277154855572268239336980532402881465013381219738819137405617219594641652778228990107677072837959240400905630530715435664638237254654768053674178165309345879869729405448588458952991
skew: 338136.855
c0: 189900042836812221982619647806688806023974608
c1: -2568098587555066549163576737385262919596
c2: 3958780384162120695641805109874764
c3: 72063043688644509410587348689
c4: -48639995036819186219299
c5: -355279972659030750
c6: -30550905000
Y0: -224860621784631260553024182454512411
Y1: 23949076096264622399754953
# MurphyE (Bf=3.436e+10,Bg=1.718e+10,area=8.590e+17) = 1.429e-09
# f(x) = -30550905000*x^6-355279972659030750*x^5-48639995036819186219299*x^4+72063043688644509410587348689*x^3+3958780384162120695641805109874764*x^2-2568098587555066549163576737385262919596*x+189900042836812221982619647806688806023974608
# g(x) = 23949076096264622399754953*x-224860621784631260553024182454512411
[/CODE]

Which cownoise scores as 2.709e-16. Not really worth reporting but I got this result in only one day with a search over a range of 1M using nq=6^6. Is a rerun with nq=6^7 worth trying? A million ad per day is a decent search pace for a c220 but not worth doing if good polys are being missed.

charybdis 2021-08-18 00:39

[QUOTE=swellman;585933]Is a rerun with nq=6^7 worth trying? A million ad per day is a decent search pace for a c220 but not worth doing if good polys are being missed.[/QUOTE]

If you want to try something that isn't quite that long and doesn't duplicate the work you've already done, you could re-run the range with the P value doubled.

swellman 2021-08-18 01:20

[QUOTE=charybdis;585936]If you want to try something that isn't quite that long and doesn't duplicate the work you've already done, you could re-run the range with the P value doubled.[/QUOTE]

P was 10e6 for this recent run but I could double it.

I’ve got another Linux box finishing another task tonight, maybe I can vary nq and P simultaneously. I have no personal experience with a poly search of this size. Nothing to lose but time…

bur 2021-08-18 09:23

Root-opt did increase the score:

[CODE]skew: 1860098.552
c0: -3427420519132645353382556907201718624117659312
c1: -52709839966423296709910539950475485051020
c2: 249681704899452712763350090243160
c3: 42123095563723982267747196055
c4: 880184122151042174607
c5: -5049619263096960
c6: 79732800
Y0: -423246853258132204806966803542792143
Y1: 2073030540718482278351646889
# MurphyE (Bf=3.436e+10,Bg=1.718e+10,area=8.590e+17) = 1.579e-09[/CODE]Cownoise: 3.228e-16
Range 1e6-2e6 with P=10e6, nq=6^6, incr=420, keep=200, ropt-effort=50, sopt-effort=0.


On a 10-core it took 4 days, so repeating with nq=6^7 is not really feasible for me, but maybe it would be interesting to compare results if someone has the hardware to do it faster? Also the influence of P would be interesting.

I think keep=200 doesn't make sense, it'd probably be better to halve the value and maybe double ropt-effort? Root-opt took less than 4 hours on the 200 polys with ropt-effort 50, not much compared to the overall time of 95 hours. Would it be worthwhile to redo root-opt with higher effort on that specific poly?

And how should I continue, range 2e6-3e6 with the same settings (just smaller keep)?


From the screenlog (if of interest):
[CODE](size optimized): Aggregate statistics:
(size optimized): potential collisions: 203998
(size optimized): raw lognorm (nr/min/av/max/std): 206980/65.780/77.170/78.620/0.998
(size optimized): optimized lognorm (nr/min/av/max/std): 193466/60.280/65.563/70.410/0.895
(size optimized): Total time: 3.13783e+06
[...]
(root optimized): Best overall polynomial was 0-th in list after size optimization
(root optimized): Aggregate statistics:
(root optimized): Total time: 120654
(root optimized): Rootsieve time: 120652[/CODE]

swellman 2021-08-18 11:56

[QUOTE=bur;585961]Root-opt did increase the score:

[CODE]skew: 1860098.552
c0: -3427420519132645353382556907201718624117659312
c1: -52709839966423296709910539950475485051020
c2: 249681704899452712763350090243160
c3: 42123095563723982267747196055
c4: 880184122151042174607
c5: -5049619263096960
c6: 79732800
Y0: -423246853258132204806966803542792143
Y1: 2073030540718482278351646889
# MurphyE (Bf=3.436e+10,Bg=1.718e+10,area=8.590e+17) = 1.579e-09[/CODE]Cownoise: 3.228e-16
Range 1e6-2e6 with P=10e6, nq=6^6, incr=420, keep=200, ropt-effort=50, sopt-effort=0.[/quote]

What did you use for adrange? 4*incr is an often used rule of thumb, and it has worked for me on many c19x poly searches (albeit deg 5).

You’ve found a very good value in an early search - suggesting P=10M is perfectly fine. I’m rerunning my job above with P=20M right now, and I should get results in a few days. Maybe that will shed more light on the effect of P. In my past testing I’ve had higher e-scores with lower P! I suspect CADO “tries harder” with higher P but internal time limits cloud the results(?)

I have never found a best poly that was worse than 8th on the list, so nrkeep=50 works for me. Tried a lot of experimentation last year or two and this has always held true though one cannot prove a negative. Others may have counter examples.

Also found higher incr values does not preclude good e-scores, and as you climb up the ad range the time to complete a search become unwieldy. You may want to move to the next range and try incr of 840 or even 1680 (with adrange =4x). But that’s just my two cents.

One last note - sopteffort has a minor effect on the total runtime and has the potential for bigger payoff. Using a value of 10 in my runs. I believe CADO uses 1+ the input value.

charybdis 2021-08-18 12:30

[QUOTE=bur;585961]I think keep=200 doesn't make sense, it'd probably be better to halve the value and maybe double ropt-effort?[/QUOTE]

[QUOTE=swellman;585967]I have never found a best poly that was worse than 8th on the list, so nrkeep=50 works for me. Tried a lot of experimentation last year or two and this has always held true though one cannot prove a negative. Others may have counter examples.[/QUOTE]

I've seen as low as 90th, though admittedly examples like this are uncommon. But we ought to remember that this was the 90th best over the entire search. If you set nrkeep=50 for a run that makes up 1% of the search, that would be like setting nrkeep=5000 over the whole search, which is overkill; it would probably be better spending most of that time doing more stage-1 searching instead.

The very best polynomials in the entire search will be either exceptionally good stage-1/sizeopt hits that get at least an average boost from rootopt, or okay sizeopt hits - possibly down as low as 200th over the whole search, hence the default CADO nrkeep value - that get an exceptionally good rootopt hit. If you're running a small range, setting nrkeep=6 or 12 might cause you to miss the best possible poly from your range, but since such a poly would not even be an "okay" sizeopt hit by the above definition, it would be extremely unlikely to be one of the top scoring polys in the whole search.

bur 2021-08-18 12:40

To not make things too mixed up, [B]I'll reserve 5e6 to 10e6[/B], if that isn't too much (depends how ETA scales with range). VBCurtis had 0-1e6 and 2e6-5e6 and swellman 40e6-41e6. Does anyone keep track of the ranges?


[QUOTE=swellman;585967]What did you use for adrange? 4*incr is an often used rule of thumb, and it has worked for me on many c19x poly searches (albeit deg 5).[/QUOTE]Yes, it was adrange=1680.

[quote]I have never found a best poly that was worse than 8th on the list, so nrkeep=50 works for me. Tried a lot of experimentation last year or two and this has always held true though one cannot prove a negative. Others may have counter examples.[/quote]Same here, whenever I saw it, cado said something about best poly was originally 1st or 2nd on the list. So I'll decrease it to 50 and increase ropt-effort to 200, whatever good that may do, but that step didn't take long anyway, so I might as well keep the time as it was.

[quote]Also found higher incr values does not preclude good e-scores, and as you climb up the ad range the time to complete a search become unwieldy. You may want to move to the next range and try incr of 840 or even 1680 (with adrange =4x). But that’s just my two cents.[/quote]I think Curtis wrote somewhere in here that he'd go to 2310 once the range was above 2e7. That large jump was due to a factor of 11 doing some good. I'll begin with incr=420 and if ETA is extreme I'll increase it to 840.

[quote]One last note - sopteffort has a minor effect on the total runtime and has the potential for bigger payoff. Using a value of 10 in my runs. I believe CADO uses 1+ the input value.[/QUOTE]Yes, I think it should increase time linearly, but if the actual size opt is only a very small part, we might as well increase it.

So I'll test 5e6-6e6 next with adrange=1680,incr=420,nq=6^6,sopt-effort=10,keep=50,ropt-effort=200 next. If too slow, I'll try incr=840.

swellman 2021-08-18 13:36

For the record, I am also repeating the range 40-41e6 with nq=6^7 and P=10M. It will take a week to complete but it should be interesting to see what CADO spits out. Personally I’m not expecting a big improvement (if any) but willing to hope.

The search range reservations are easy enough to track within this thread, but perhaps a new thread should be started so that the input parameters and output e-score can be recorded for later reference? Could be useful to future searchers.

Gimarel 2021-08-18 14:31

[QUOTE=bur;585961]Root-opt did increase the score:

[CODE]skew: 1860098.552
c0: -3427420519132645353382556907201718624117659312
c1: -52709839966423296709910539950475485051020
c2: 249681704899452712763350090243160
c3: 42123095563723982267747196055
c4: 880184122151042174607
c5: -5049619263096960
c6: 79732800
Y0: -423246853258132204806966803542792143
Y1: 2073030540718482278351646889
# MurphyE (Bf=3.436e+10,Bg=1.718e+10,area=8.590e+17) = 1.579e-09[/CODE]Cownoise: 3.228e-16
Range 1e6-2e6 with P=10e6, nq=6^6, incr=420, keep=200, ropt-effort=50, sopt-effort=0.
[/CODE][/QUOTE]

ropt of this poly gives:
[CODE]### root-optimized polynomial 0 ###
n: 5272066026958413205513021090082556639441277154855572268239336980532402881465013381219738819137405617219594641652778228990107677072837959240400905630530715435664638237254654768053674178165309345879869729405448588458952991
Y0: -423267925613578608179326248033418828
Y1: 2073030540718482278351646889
c0: -773074974413028422431926573350819467006568232
c1: -7270194150193898049263463138887225927755
c2: -1256823921278067033708319813403655
c3: 42082087970958626673016315435
c4: 1136954599732427366607
c5: -5054482166568960
c6: 79732800
skew: 1718787.677
# lognorm 62.43, E 52.81, alpha -9.62 (proj -2.47), 6 real roots
# MurphyE(Bf=1.000e+07,Bg=5.000e+06,area=1.000e+16)=3.423e-16
### Best MurphyE so far is 3.423e-16, av. exp_E 53.32, av. E 52.81[/CODE]

swellman 2021-08-18 14:38

Nice! That makes it an arguably viable poly for the c220 but maybe better can be found?

VBCurtis 2021-08-18 14:42

[QUOTE=bur;585974]To not make things too mixed up, [B]I'll reserve 5e6 to 10e6[/B], if that isn't too much (depends how ETA scales with range). VBCurtis had 0-1e6 and 2e6-5e6 and swellman 40e6-41e6. Does anyone keep track of the ranges?
.......
I think Curtis wrote somewhere in here that he'd go to 2310 once the range was above 2e7. That large jump was due to a factor of 11 doing some good. I'll begin with incr=420 and if ETA is extreme I'll increase it to 840.
[/QUOTE]

Seems you are the one tracking ranges! :)

I don't think adding more 2's to incr is very useful. If 420 is too slow, I'd skip to 2310.
Glad to hear sizeopteffort doesn't add much time to the search; I guess I'll throw that in to my next range. I'll be done with 2e6 to 5e6 Thursday.

Nice poly find, bur! 3.42e-16 is serviceable. We'd like to beat it by 10% or so, and there's a reasonable chance we can do so.

charybdis 2021-08-18 15:26

[QUOTE=VBCurtis;585987]I'll be done with 2e6 to 5e6 Thursday.[/QUOTE]

Roughly how many thread-seconds is a 1M range taking? Should be able to start searching tomorrow and I'd like to have an idea how big a range I should take.

bur 2021-08-18 17:18

[QUOTE=charybdis;585996]Roughly how many thread-seconds is a 1M range taking? Should be able to start searching tomorrow and I'd like to have an idea how big a range I should take.[/QUOTE]The 1e6-2e6 with setting as posted above took 3137830 s (size) + 120654 s (root) = 3258484 s on an i9-10900k.

The 5e6-6e6 range with identical settings has an ETA that makes it seem similar.

VBCurtis 2021-08-19 15:36

My first decent hit:
[code]n: 5272066026958413205513021090082556639441277154855572268239336980532402881465013381219738819137405617219594641652778228990107677072837959240400905630530715435664638237254654768053674178165309345879869729405448588458952991
skew: 244493.103
c0: -5668839593972764105835315556201456845921004
c1: -31798709872233758681019267015167174947
c2: 1033507213021369650910017558429451
c3: -481752321251275723176492791
c4: -20937797913410105835205
c5: 1865890162275696
c6: 81285120
Y0: -338922499431150849187274538075964740
Y1: 404346679164023626048993733
# MurphyE (Bf=1.718e+10,Bg=1.718e+10,area=3.221e+17) = 1.238e-09[/code]
cownoise says skew 479764.36842 score 3.28788074e-16

Edit: And another!
[code]n: 5272066026958413205513021090082556639441277154855572268239336980532402881465013381219738819137405617219594641652778228990107677072837959240400905630>
skew: 1130398.647
c0: 1460970719324580302884397570964425573808852630
c1: -8446464120695272475513669260441719380443
c2: 5723620292044986192940398634388588
c3: 17612914952424667248800986219
c4: -13421101303146825222542
c5: -6930604474874388
c6: 379365840
Y0: -327885433653443790045728834604845926
Y1: 1080049699948936551530063521
# MurphyE (Bf=1.718e+10,Bg=1.718e+10,area=3.221e+17) = 1.254e-09[/code]
cownoise: skew 1200878.55688 score 3.29243504e-16

I split 2e6-5e6 in half and ran on two machines. Each poly was the best from its run.
So, between bur and I we have at least three 3.2's (before spin) from 1e6-5e6. We just need better luck, next.

EDIT2: Taking 10e6-15e6 at incr=420.

swellman 2021-08-19 15:56

bur’s result was spun up to 3.427e-16 by Gimarel, a high mark to beat.

I’ll search 41-50M.

charybdis 2021-08-19 18:37

[QUOTE=bur;586004]The 1e6-2e6 with setting as posted above took 3137830 s (size) + 120654 s (root) = 3258484 s on an i9-10900k.[/QUOTE]

Thank you!

I'll do 15e6-25e6 at incr=420.

swellman 2021-08-21 11:38

[QUOTE=charybdis;585936]If you want to try something that isn't quite that long and doesn't duplicate the work you've already done, you could re-run the range with the P value doubled.[/QUOTE]

Reran same range and parameters but with P=20M. Cownoise scores it as 2.578e-16, substantially lower than with P=10M (2.709e-16).

[CODE]n: 5272066026958413205513021090082556639441277154855572268239336980532402881465013381219738819137405617219594641652778228990107677072837959240400905630530715435664638237254654768053674178165309345879869729405448588458952991
skew: 582384.469
c0: 2770118192685862159583239922268451472246246256
c1: 5479619360120434195124072093663515668702
c2: -94209227206580697483986629210745699
c3: -120798956449256492019224358219
c4: 244655244869438401570700
c5: 35478722045649204
c6: 30348244080
Y0: -225093408154878415043623645928916813
Y1: 1771886126480601451474428793
# MurphyE (Bf=3.436e+10,Bg=1.718e+10,area=8.590e+17) = 1.391e-09
# f(x) = 30348244080*x^6+35478722045649204*x^5+244655244869438401570700*x^4-120798956449256492019224358219*x^3-94209227206580697483986629210745699*x^2+5479619360120434195124072093663515668702*x+2770118192685862159583239922268451472246246256
# g(x) = 1771886126480601451474428793*x-225093408154878415043623645928916813[/CODE]

Sticking to P=10M from here on out.

I should have results back with P=10M and nq=6^7 in a few days.

bur 2021-08-21 16:04

Do we know if it would be generally advisable to use nq=6^7? Has a range been done with both settings?

swellman 2021-08-21 16:22

[QUOTE=bur;586197]Do we know if it would be generally advisable to use nq=6^7? Has a range been done with both settings?[/QUOTE]

Yes, as I stated above I have a job running from 40-41M with P=10M and nq=6^7. Results expected Sunday night or Monday morning.

VBCurtis 2021-08-21 16:35

Comparing settings by the best poly produced isn't going to tell you much- a bit like comparing weather in different cities by their record-high-temps for each month.

I suggest comparing 10th-best poly scores from runs of the same wall-clock-time (meaning a nq=6^7 run should be compared to a 6^6 run 6 times longer adrange). The 10th-best will be much less subject to the luck of the peak, thus giving a somewhat better indication of how well those settings produce "good" polys.

If you're keeping the folders where CADO records the data, *.poly.0 is the best poly while *.poly.9 is the 10th-best.

swellman 2021-08-21 17:08

I agree with your point, but unfortunately I did not save the logs.

But seeing the results you and bur got with P=10M and nq=6^6 (right?), using those parameters seems like a reasonable strategy. I’ll post the 10th best poly from my nq=6^7 run when it finishes just for laughs. I can rerun it with nq=6^6 again (only takes a day) and save/post the 10th best polys and any logs folks are interested in seeing.

Ultimately, I am using the poly that CADO spits out. But maybe we would all be better served saving the top 5 or 10 polys and then doing additional optimization/spin/magic on the lot to achieve higher scores? Just asking the question. Such a strategy would take a lot of coordination.

VBCurtis 2021-08-21 21:15

Naw, spin is only useful to get at best 10% better score; we won't be sieving a poly that scores below 3.5e-16, so it's pointless to spin polys below 3.2e-16. I suppose a run could be lucky enough to have more than one poly above 3.2e-16...

Also, I don't think there's any benefit to re-running ranges- just keep hunting, and over time we will build a semblance of consensus about settings. I'm doubtful it matters much- search time makes polys, over magic settings.

bur 2021-08-22 06:19

My reasoning about re-running ranges, or rather using highest feasible settings from the beginning, is that I thought lower number ranges had a higher chance of producing good polys. So a missed poly at low range due to too small incr or nq could not be compensated by searching for that amount of time at higher ranges.


In other words: if you have x hours to spend, are you more likely to produce a good poly if you spend it on a smaller, low value range with high incr/nq setting than on a larger, high value range with lower incr/nq settings? But of course whether it's worthwhile depends on how much more likely a good poly is produced by higher incr/nq settings. From what you write I assume the six-fold longer search time could better be spend on searching more ranges?



I checked the 1e6-2e6 range and the second best poly had a cownoise score of 2.928e16, the 10th had a 2.636e16 score.

bur 2021-08-22 16:17

5e6-6e6 finished. P=10e6, incr=420, nq=6^6, sopt-effort=10, ropt-effort=200, keep=50


Best poly: 2.613e-16, 10th poly: 2.445e-16.


[CODE](size optimized): potential collisions: 203998
(size optimized): raw lognorm (nr/min/av/max/std): 205759/66.090/76.957/78.230/0.993
(size optimized): optimized lognorm (nr/min/av/max/std): 199598/60.300/65.381/70.150/0.811
(size optimized): Total time: 3.13786e+06

(root optimized): Best overall polynomial was 3-th in list after size optimization
(root optimized): Total time: 92878.2
(root optimized): Rootsieve time: 92877.6[/CODE]

charybdis 2021-08-22 17:31

:w00t:

[code]n: 5272066026958413205513021090082556639441277154855572268239336980532402881465013381219738819137405617219594641652778228990107677072837959240400905630530715435664638237254654768053674178165309345879869729405448588458952991
skew: 1677636.441
Y0: -256368443813618831466708127928344755
Y1: 57410022887758634925443897
c0: -6432998852802319578730132186023855248338556016
c1: -1885042439120902119375835441237563637190
c2: 38014843918570711244403339019948367
c3: 255627954821968894266780614
c4: -14860362930136626207980
c5: -295780948942080
c6: 390020400[/code]

Cownoise says 3.66829272e-16. Not done with my range yet, just spotted a particularly good stage1/sizeopt hit and ran rootopt on it early.

Edit: for what it's worth, I used P=12M and this poly would not have been found with 10M (or 20M).

Gimarel 2021-08-22 17:56

My best poly at this time is[CODE]# norm 7.281232e-16 alpha -11.657411 e 3.534e-16 rroots 4
skew: 136068650.31
c0: 27680344439067470290809361307486495967413680207186278
c1: -447858068584845721161312687367308917933310105
c2: -19478555356472742220574131686725155164
c3: 197738485983250412209995159733
c4: 1908714894005357456986
c5: -2645822906888
c6: 3360
Y0: -1878453280983944894105747993662053230
Y1: 15349830508030558303291[/CODE]But this doesn't seem to be competitive any more.

Gimarel 2021-08-22 17:57

[QUOTE=charybdis;586264]:w00t:
[code]n: 5272066026958413205513021090082556639441277154855572268239336980532402881465013381219738819137405617219594641652778228990107677072837959240400905630530715435664638237254654768053674178165309345879869729405448588458952991
skew: 1677636.441
Y0: -256368443813618831466708127928344755
Y1: 57410022887758634925443897
c0: -6432998852802319578730132186023855248338556016
c1: -1885042439120902119375835441237563637190
c2: 38014843918570711244403339019948367
c3: 255627954821968894266780614
c4: -14860362930136626207980
c5: -295780948942080
c6: 390020400[/code][/QUOTE]
msieve rootsieve result:[CODE] norm 8.006896e-16 alpha -9.642905 e 3.849e-16 rroots 6
skew: 1998682.69
c0: -941780930937912992870400249010487587324648656
c1: -1560004830805915154414596156183671024166
c2: 37169728449777888510770741963225639
c3: 5764737867639720255211643014
c4: -14671660455564392319980
c5: -513927159070080
c6: 390020400
Y0: -256373795575952428326655877808423095
Y1: 57410022887758634925443897[/CODE]

swellman 2021-08-22 18:06

[quote][B]e 3.849e-16[/B][/quote] :bow:

Impressive!

bur 2021-08-22 18:12

Great, that's close to the record!


So does that mean we should use P=12e6 or could the same happen the other way around, i.e. missing a poly by using that P? Would you mind explaining the influence of P, why will both 10M and 20M rule out finding that specific poly?

charybdis 2021-08-22 18:16

[QUOTE=Gimarel;586267]msieve rootsieve result:[CODE] norm 8.006896e-16 alpha -9.642905 e 3.849e-16 rroots 6[/CODE][/QUOTE]

Very nice (and quick)!

[QUOTE=bur;586270]So does that mean we should use P=12e6 or could the same happen the other way around, i.e. missing a poly by using that P? Would you mind explaining the influence of P, why will both 10M and 20M rule out finding that specific poly?[/QUOTE]

To be honest I think that was complete luck, and there's probably not much to choose between 10M and 12M. The Y1 values searched are a product of a bunch of small primes - nq being the number of products searched - and two larger primes. The search range for the larger primes is P to 2*P; the Y1 value here has large factors 13763161 and 22862009.

bur 2021-08-22 18:23

I see. So if the smaller large factor had been 11.5e6, then neither P=10e6 nor 12e6 would have worked?

Is it known whether polys with Y1 comprising large "larger primes" are more probable to yield high scores?




And could you also explain what's behind the range (it seems c6 can be larger than the range value)? Also, what is size and what is root optimization doing? And why can we assign a score but apparently aren't able to quickly construct high-scoring polys? Is it a fundamental limit or do we just not know at this point how to construct the optimal poly?

charybdis 2021-08-22 19:02

[QUOTE=bur;586272]I see. So if the smaller large factor had been 11.5e6, then neither P=10e6 nor 12e6 would have worked?[/quote]

Correct.

[quote]Is it known whether polys with Y1 comprising large "larger primes" are more probable to yield high scores?[/quote]

I think that it's harder to find polys with larger P but they are better on average. But as we're interested in the best score rather than the average score, we still want to find a decent number of polynomials, so there's a sweet spot that we're trying to aim for. We don't know exactly where it is :smile:

[quote]And could you also explain what's behind the range (it seems c6 can be larger than the range value)? Also, what is size and what is root optimization doing? And why can we assign a score but apparently aren't able to quickly construct high-scoring polys? Is it a fundamental limit or do we just not know at this point how to construct the optimal poly?[/QUOTE]

Stage 1: search for raw polynomials
Sizeopt: adjust raw polynomials to improve their size properties, i.e. the size of the norms that we want to be smooth in the sieving step.
Rootopt: adjust polynomials to improve their root properties. If the algebraic polynomial has lots of roots modulo small primes, then the algebraic norms will be more likely to have lots of small prime factors, which in turn makes them more likely to be smooth. (We can't do this for rational norms because the rational poly will always have exactly one, possibly projective, root modulo every prime.)

The raw polynomials produced by stage 1 do have c6 in the range you expect. The tricks used by sizeopt and rootopt can change the leading coefficient, but I believe it will still always be a multiple of the original one. In my poly from a few posts ago, c6 = 390020400 was originally 18572400, which got multiplied by 21 in sizeopt.

VBCurtis 2021-08-22 19:54

Our number has leading digit 5, while RSA-220 has leading digit 2. So, relative to the difficulty of the number, Charbdis' poly is better than the RSA-220 poly.

I'll stop my poly search tonight or tomorrow and root-opt whatever hits I have, but I think we have a winner!

Sure would be nice to beat one of the CADO-team RSA poly score records, though. I'm a little inclined to try a search on RSA-180 to see if the settings we use now (nq 15625, size-opt of 5 or 10) can beat the CADO-team's score without an excessive search time.

charybdis 2021-08-22 20:07

[QUOTE=bur;586272]And why can we assign a score but apparently aren't able to quickly construct high-scoring polys? Is it a fundamental limit or do we just not know at this point how to construct the optimal poly?[/QUOTE]

Oops, forgot to answer this bit! The score is not a simple function of the coefficients, it's an integral involving the [URL="https://en.wikipedia.org/wiki/Dickman_function"]Dickman function[/URL] and Murphy's alpha (the "alpha -9.642905" bit in msieve's output), both of which have to be approximated. Plus we need the coefficients to be integers, the polys need to have a common root mod N, and the alpha value jumps all over the place with very small changes to the algebraic poly. Finding the optimal poly is out of the question; we just aim for one that's good enough to do the job.

VBCurtis 2021-08-23 03:01

[QUOTE=VBCurtis;586068]EDIT2: Taking 10e6-15e6 at incr=420.[/QUOTE]

Stopped the search at 11.01e6. Root-opt turned up nothing of significance, about 6% worse than my previous best.

bur 2021-08-23 04:56

[QUOTE=VBCurtis;586278]I'll stop my poly search tonight or tomorrow and root-opt whatever hits I have, but I think we have a winner![/QUOTE]We found a very good poly rather quickly, wouldn't it make sense to keep searching? Especially if there isn't a best P value, maybe re-doing a range with different P would be worthwhile?



[QUOTE=charybdis]Finding the optimal poly is out of the question; we just aim for one that's good enough to do the job.[/QUOTE]Thanks for the answers, interesting that so much involving NFS is still not really known. That's not really what the layman thinks about math.

charybdis 2021-08-23 12:23

The poly I posted yesterday was the best from my range, but not by that much. Here are the next best two, let's see if Gimarel can work his magic.

[code]n: 5272066026958413205513021090082556639441277154855572268239336980532402881465013381219738819137405617219594641652778228990107677072837959240400905630530715435664638237254654768053674178165309345879869729405448588458952991
skew: 1907251.281
c0: 8236427797628293049599408725893247072637239772
c1: 19162528353248595305446715812300596109295
c2: -21091088260708627616667662646742165
c3: -12419116707177921879187220801
c4: 7785775652933998252593
c5: 948954265351506
c6: 43482600
Y0: -249636408603218968365364360310503950
Y1: 22334710996682366819620031
# cownoise: e 3.62425400e-16

n: 5272066026958413205513021090082556639441277154855572268239336980532402881465013381219738819137405617219594641652778228990107677072837959240400905630530715435664638237254654768053674178165309345879869729405448588458952991
skew: 1760341.42
c0: 15354638866604858505124720786978674723100123170
c1: -61469554170413960080201338721573680907367
c2: -29684864243265761798289322803166219
c3: 74136878484020854315362790097
c4: 5081461673273545660609
c5: -1949330215164330
c6: 699388200
Y0: -246835163382771953077067062497572256
Y1: 11904096208130921118914953
# cownoise: e 3.47247226e-16[/code]

If Gimarel can get one of these close to 3.8e-16, then it might not be worth calling off the search just yet, as it would suggest the 3.849 isn't a huge outlier. And for the record, both of these would have been found with P=10M too.

EdH 2021-08-23 12:36

[QUOTE=charybdis;586279]. . . we just aim for one that's good enough to do the job.[/QUOTE]Couldn't almost any found poly be good enough to produce factors, while what we search for is the best poly, of those found, that will produce relations the fastest overall? And throughout the search, we try to balance the poly search time against the sieve time, minimizing the total?

When I was using Msieve across my farm (and working with smaller composites), I put a short hard time on polynomial searching and then went with whatever showed up in that limited search. Many of those polynomials weren't really good, but I still got factors.

charybdis 2021-08-23 12:49

[QUOTE=EdH;586307]Couldn't almost any found poly be good enough to produce factors, while what we search for is the best poly, of those found, that will produce relations the fastest overall? And throughout the search, we try to balance the poly search time against the sieve time, minimizing the total?[/QUOTE]

Yes, I should have chosen my words better. What I meant is that we don't try to search for the best possible poly, but rather we stop when we find one good enough - where "good enough" means that further poly searching would be unlikely to save time overall.

Poly selection is possibly the part of GNFS with most room for improvement. For example, the best polys for this c220 might have degrees 3 and 4 rather than 1 and 6, but we don't have a good way to find such polys.

swellman 2021-08-23 13:40

My redo of 40-41M with nq=6^7 and P=10M gave a best cownoise score of 2.898e-16.

10th best cownoise was 2.586e-16.

Going to finish my search range of 40-50M with P=10M and nq=6^6.

Gimarel 2021-08-23 14:03

[QUOTE=charybdis;586306]The poly I posted yesterday was the best from my range, but not by that much. Here are the next best two, let's see if Gimarel can work his magic.

If Gimarel can get one of these close to 3.8e-16, then it might not be worth calling off the search just yet, as it would suggest the 3.849 isn't a huge outlier. And for the record, both of these would have been found with P=10M too.[/QUOTE]
I'm on vacation right now. I'll see what I can do next week.

charybdis 2021-08-23 17:13

[QUOTE=Gimarel;586312]I'm on vacation right now. I'll see what I can do next week.[/QUOTE]

Might as well see if I can get any more hits in the meantime. Taking 25e6 to 40e6 at incr=420.

bur 2021-08-23 17:32

[QUOTE=charybdis;586308]What I meant is that we don't try to search for the best possible poly, but rather we stop when we find one good enough - where "good enough" means that further poly searching would be unlikely to save time overall.[/QUOTE]Is that point already reached? When I extrapolate from my few factorizations in the 160-170 range a c220 should take about 1000 core-years (veery roughly). So even shaving 1% of the sieving time should save a lot of time. Wouldn't a few more days do some good? Or is the probability to find something better too low?

charybdis 2021-08-23 18:57

[QUOTE=bur;586334]Is that point already reached? When I extrapolate from my few factorizations in the 160-170 range a c220 should take about 1000 core-years (veery roughly). So even shaving 1% of the sieving time should save a lot of time. Wouldn't a few more days do some good? Or is the probability to find something better too low?[/QUOTE]

1000 is an overestimate; a very rough estimate from some quick test-sieving (I=16, lim 500M/800M, lpb 34/35, mfb 64/101) is 150 core-years, and it's probably possible to improve this with better parameters. That said, if NFS@home end up sieving this then it will likely take longer than this because they don't go above 33-bit large primes.

We must have spent at least 2 core-years on polyselect so far, and I've just queued up another ~1.5 core-years, or 1% of the estimated sieving time. As the current best poly doesn't look like an outlier to me, I reckon there's enough chance of beating it by a few percent to make the extra polyselect time worthwhile.

ryanp 2021-08-23 20:09

[QUOTE=charybdis;586337]1000 is an overestimate; a very rough estimate from some quick test-sieving (I=16, lim 500M/800M, lpb 34/35, mfb 64/101) is 150 core-years, and it's probably possible to improve this with better parameters. That said, if NFS@home end up sieving this then it will likely take longer than this because they don't go above 33-bit large primes.[/QUOTE]

I volunteer to sieve this, once you've finalized the poly and params.

VBCurtis 2021-08-23 21:42

[QUOTE=ryanp;586341]I volunteer to sieve this, once you've finalized the poly and params.[/QUOTE]

Thanks Ryan!

If you end up with a matrix 80 million dimensions or smaller, I can do the matrix-solving (though it would take me 3-4 months for a 70-80M matrix on a dual-10-core 128GB ram machine).

As for sieve-time estimates: the C207 team sieve we did in 2019 took about 125 thread-years (some were not hyperthreaded, so say 70 core-years). C220 is more than two doublings bigger, so I estimate 250-300 core-years of sieve.

Note that using 34/35LP, as we did on the C207, will require nearly 4e9 raw relations- these are the largest LP settings that don't need the extra CADO-compile flag set to handle more than 2^32 raw relations. In hindsight, 33/34 would likely have been optimal for C207, so 34/35 should be fine for C220. We learned from other suboptimal sieve choices in that effort, and I am confident we could get the sieve time below 60 core-years for C207 if we did it again. For instance we started Q far too small, leading to yucky duplicate rates. We also shifted to I=15 to save memory and because it looked faster, which exacerbated the duplicate-relations problems.

I bring this stuff up since Ryan suggested we select the params...

charybdis 2021-08-23 23:44

[QUOTE=ryanp;586341]I volunteer to sieve this, once you've finalized the poly and params.[/QUOTE]

Awesome, thank you!

[QUOTE=VBCurtis;586346]As for sieve-time estimates: the C207 team sieve we did in 2019 took about 125 thread-years (some were not hyperthreaded, so say 70 core-years). C220 is more than two doublings bigger, so I estimate 250-300 core-years of sieve.

Note that using 34/35LP, as we did on the C207, will require nearly 4e9 raw relations- these are the largest LP settings that don't need the extra CADO-compile flag set to handle more than 2^32 raw relations.[/QUOTE]

Okay, my 150 core-years was really 150 i5-8500-years and I know they're pretty fast for sieving, I don't know what Ryan uses. I'm willing to believe that I underestimated the number of required relations and 200 i5-8500-years was more like it. But are close to 4G raw relations really necessary? Surely we want to choose parameters so that we don't have a duplication rate near 50%?

Edit: the usual doubling rule probably doesn't hold up around the degree 5/6 boundary does it? The degree 6 curve intercepts the degree 5 curve, so once you switch to degree 6 the number of digits per doubling should go up...

VBCurtis 2021-08-24 00:43

[QUOTE=charybdis;586357] ... But are close to 4G raw relations really necessary? Surely we want to choose parameters so that we don't have a duplication rate near 50%?

Edit: the usual doubling rule probably doesn't hold up around the degree 5/6 boundary does it? The degree 6 curve intercepts the degree 5 curve, so once you switch to degree 6 the number of digits per doubling should go up...[/QUOTE]

You're quite right- The 2330L Cunningham C207 job took 3300M raw relations to yield 1800M uniques, a terrible ratio. However, my point remains that if we went for bigger LP bounds than 34/35, we *would* risk needing more than 2^32 relations.

I imagine something like 2.8-3.0G raw relations to yield 2.0G uniques for this job? I was adding 15% to the previous job's numbers, forgetting just how bad the unique ratio was.

We don't have much data on deg 6 poly score scaling- pretty much just the CADO group's polys for RSA-220/230/240/250. It hasn't crossed my mind to fit a curve to those scores and test for scaling... Good idea!

charybdis 2021-08-24 01:13

[QUOTE=VBCurtis;586361]However, my point remains that if we went for bigger LP bounds than 34/35, we *would* risk needing more than 2^32 relations.[/quote]

Indeed. And if msieve is used for postprocessing there's a 2^32 limit there too; duplicates could be removed beforehand, so 35/35 would be possible but maybe not 35/36.

[quote]We don't have much data on deg 6 poly score scaling- pretty much just the CADO group's polys for RSA-220/230/240/250. It hasn't crossed my mind to fit a curve to those scores and test for scaling... Good idea![/QUOTE]

Be careful with these types of score comparisons. Msieve/cownoise scores are not suitable for this purpose, as I discovered when comparing them with the CADO scores for my c220 polys. Cownoise still puts the polys pretty much in the same order as CADO, but the ratios are larger: for example, CADO scored the best poly 13% higher than the 10th best, but cownoise scored it 22% higher. I'm inclined to trust the CADO ratios more, because it estimated Murphy-E using the sieving parameters I gave it, rather than the standard values used by msieve/cownoise which are far too small for numbers of this size.

So if you want to compare the scores of the RSA polynomials, you'll need to use CADO's "polyselect3" with some appropriate parameters. This is left as an exercise for the reader :)

bur 2021-08-24 06:06

[QUOTE=charybdis;586337]We must have spent at least 2 core-years on polyselect so far, and I've just queued up another ~1.5 core-years, or 1% of the estimated sieving time.[/QUOTE]My mistake was I underestimated how quickly core-time racks up. Even my two small searches came to nearly 3 core-months already.

VBCurtis 2021-08-24 06:24

[QUOTE=bur;586378]My mistake was I underestimated how quickly core-time racks up. Even my two small searches came to nearly 3 core-months already.[/QUOTE]

No mistake, sir- the typical rule of thumb is 5% of expected sieve time for poly-select time. We're at about 2%?

It's just that we found a poly scoring almost exactly what we hoped for. As you and Charybdis point out, finding one good one doesn't rule out finding an even-better one; I support a continued search, and finding additional 3.8's would support the idea that a 3.9 or higher is "out there" to be found. I'm taking a break from the search, but if we become convinced the 3.8 is not a massive outlier I'll return to this in a week or three.

bur 2021-08-24 12:32

By mistake I meant that I'd have guessed we maybe did 3-4 core-months in total.

So I'll continue the poly search at 6e6-7e6, it might at least give some idea how rare 3.5+ polys are.

bur 2021-08-24 17:05

If it's not sure whether larger or smaller P is better and some polynomials will never be found with a specific P, wouldn't it make sense to search the small value ranges again with different P?


If small leading coefficients tend to produce better polys, then going through these ranges with several different P could be worthwhile.

charybdis 2021-08-24 17:30

[QUOTE=bur;586417]If small leading coefficients tend to produce better polys, then going through these ranges with several different P could be worthwhile.[/QUOTE]

I'm not sure we've got to large enough coefficients that "small leading coefficients tend to produce better polys" is having much of an effect. For degree 5 jobs it would definitely be noticeable at this point, but so far the best 3 CADO polys have come from (pre-sizeopt) leading coeffs around 19e6, 21e6 and 23e6. In addition, basically every polynomial is getting its leading coefficient multiplied up in the optimization process, so there can't be too much harm in having a larger coefficient. In contrast, with large degree 5 searches there are usually a few polys in the top 10 that keep their original leading coefficient.

bur 2021-08-24 19:04

The searches of low value ranges are faster though, right? It might still make sense to go over them again with a different P. On the other hand, P=10e6 and P=12e6 still have a lot of overlap so the time could very well better be spend on a different range.


BTW, there is no random element to the search or is there?

VBCurtis 2021-08-24 21:30

No random element.

I believe altering P can be useful, but I wouldn't repeat coefficient ranges.

Smaller coeff's usually take a little longer to search; that is, 30-31M should run slightly faster than 2-3M.

Smaller P values do run faster.

bur 2021-08-25 05:46

[QUOTE]I believe altering P can be useful, but I wouldn't repeat coefficient ranges.[/QUOTE]Because the limited time is better spend on doing a fresh range?

[QUOTE]Smaller coeff's usually take a little longer to search;[/QUOTE]I thought it was the other way around since larger incr is used for larger value ranges to prevent long search times.

VBCurtis 2021-08-25 08:07

Changing P leads to some duplication of search, better to try an entirely new range. You'd need a P value half or double the previous one, and we have reason to think either of those options is suboptimal (though that's a judgement of "per unit time", not a prediction that you wouldn't find a better poly). We don't know what the "Best" P is for a given size of input, but hopefully we're close enough to "best" that halving or doubling P would be less efficient than running a new range with P altered by a little (say, something in a range of 8 to 20M). All the default CADO params files from C195 to C230 use P = 10e6, while only C180/185/190 use 5e6 and C240 uses 20e6. To me, that indicates that not much testing was done on the big sample-params files for "best" P, and we should experiment a little.

As for the second question, my comment was intended to be "for constant param choices", i.e. same incr such as Charybdis searching ~20M while you and I searched at 2 or 3M.

bur 2021-08-25 09:31

[QUOTE]As for the second question, my comment was intended to be "for constant param choices", i.e. same incr such as Charybdis searching ~20M while you and I searched at 2 or 3M.[/QUOTE]But won't a larger incr decrease the search time? It was recommended to use incr=2310 for larger value ranges, I assumed this was done because otherwise the search takes too long?

charybdis 2021-08-25 11:52

In previous searches (mostly degree 5), larger incr has been used to compensate for the disadvantage of searching at higher leading coefficients. Roughly speaking, polys with larger coeffs tend to have worse size properties, and using an incr value with more small prime factors gives a small boost to the root properties to make up for this.

bur 2021-08-25 14:39

Ah, I thought incr controlled how thoroughly a range is searched and larger incr resulted in shorter search times. So it is the other way around? Would it have a slight advantage to use larger incr even for smaller coefficients (regardless of whether it's worth it)?

charybdis 2021-08-25 15:00

You've got it right: the leading coefficients that are searched are the multiples of incr, so a 1M range at incr=2310 will be much quicker than a 1M range at incr=420. We don't use larger incr from the start because we don't want to get to large coefficients too quickly.

bur 2021-08-25 18:02

At the risk of looking like an idiot, but I still don't get it.


[QUOTE]so a 1M range at incr=2310 will be much quicker than a 1M range at incr=420[/QUOTE][QUOTE]Smaller coeff's usually take a little longer to search; that is, 30-31M should run slightly faster than 2-3M[/QUOTE]If larger incr is faster, I assume it's less thorough. If 30M-31M is faster than 1M-2M, then why do we use a large incr for 30M-31M? It is faster than 1M-2M even at the same incr, so why make it even faster and less thorough? Or isn't a large incr less thorough?

charybdis 2021-08-25 19:00

The rate at which we find raw polynomials stays roughly constant (assuming fixed P), so we want the average score of the polynomials we find to be as high as possible through the search. In general, we observe:

1. If incr is kept constant, lower leading coeffs give higher average scores.

2. Among leading coeffs of similar size, those with more small prime factors give higher average scores. Note that this is why we choose incr values like 420 in the first place.

It's not hard to see the effects of point 2: look at the leading coefficients of the best polynomials from your searches, and see how smooth they tend to be, even given that they're multiples of 420. For example, the current leading poly came from a stage 1 hit with c6 = 18572400 = 2^4 * 3^2 * 5^2 * 7 * 11 * 67, so it would still have been found with incr = 2310 or 4620.

So if we started off at a larger incr, we would start off finding better polynomials than with incr=420 due to point 2... but we would run into the effects of point 1 much more quickly. Once we get to leading coeffs high enough that scores have dropped noticeably due to point 1, the theory is that increasing incr to 2310 or 4620 will then give a temporary boost to the average score.

I don't think we're seeing much of the effect of point 1 yet, which is why I've stuck with incr=420 for the 25M to 40M range. It's possible that incr=2310 or 4620 over a much larger range would have been better, but that's difficult to test in advance.

I wouldn't mind CADO adding an option to put a smoothness bound on the leading coefficients...

swellman 2021-08-26 21:18

41-45M best poly found was only 2.921e-16. 10th best was 2.506e-16.

I should be finished with 45-50M late Friday.

bur 2021-08-27 07:48

So large incr gives better polys on average due to the many small factors, but it skips a lot of coefficients of which it isn't a multiple? If so, then I finally understood.

6e6-7e6 is at 75%, should be finished in about 30 hours.

VBCurtis 2021-08-27 08:25

[QUOTE=bur;586631] If so, then I finally understood.[/QUOTE]

:tu: :tu:

bur 2021-08-27 17:02

[QUOTE=VBCurtis;586635]:tu: :tu:[/QUOTE]Yes, I'm quite proud of myself. ;)


And yet one more question. Wouldn't it make sense to go over one of the lower ranges again with larger incr? Or does the same argument apply as with varying P that the time is better spent on a fresh range?

charybdis 2021-08-27 17:19

Going over a range with larger incr would be repeating work on the coefficients that are multiples of both incr values. If the new incr (say 4620) was a multiple of the old incr (say 420), you wouldn't be doing any new work at all.

swellman 2021-08-28 02:02

Completed the 40-50M range. Best cownoise score was 3.0276e-16. Fairly blah.

Searching 50-60M next.

bur 2021-08-29 05:42

Completed 6e6-7e6, best poly was 3.066e-16 cownoise.


Seeing swellman's result, I'll stop the poly search now since it seems the 3.8e-16 will be very hard to beat. I did 1e6-2e6 and 5e6-7e6.


Btw, did you finish the range you did twice with different P?

swellman 2021-08-29 10:52

[QUOTE=bur;586781]

Btw, did you finish the range you did twice with different P?[/QUOTE]

If you’re asking me, the answer is yes but the results are spread over several pages of this thread. Summing up:

40-41M w/P=10M, nq=6^6 gave a best poly of [URL="https://www.mersenneforum.org/showpost.php?p=585933&postcount=3012"]2.709e-16[/URL] per cownoise.

40-41M w/P=20M, nq=6^6 gave a best poly of [URL="https://www.mersenneforum.org/showpost.php?p=586182&postcount=3028"]2.578e-16[/URL].

40-41M w/P=10M, nq=6^7 gave a [URL="https://www.mersenneforum.org/showpost.php?p=586311&postcount=3051"]best poly of 2.898e-16 and the 10th best poly scored 2.586e-16[/URL].

(All used incr of 4620 and adrange 4x that.)


So nq=6^7 helped in this case, producing the highest scoring poly. But it took a week rather than a day and the results were still rather unimpressive.

My current search of 50-60M (P of 10M, nq is 6^6 and incr is 4620) should finish up on Wednesday. I’m prepared to search up to 300M but that will take a long time.

charybdis 2021-08-29 11:51

[QUOTE=bur;586781]Seeing swellman's result, I'll stop the poly search now since it seems the 3.8e-16 will be very hard to beat. I did 1e6-2e6 and 5e6-7e6.[/QUOTE]

Sean was using incr=4620 so his 10M range was more comparable to one of your 1M ranges.

And about the "hard to beat" bit...

[code]n: 5272066026958413205513021090082556639441277154855572268239336980532402881465013381219738819137405617219594641652778228990107677072837959240400905630530715435664638237254654768053674178165309345879869729405448588458952991
skew: 331125.033
c0: 3394949349018158588040756570853974067801857
c1: -6653335906218714378144808303035495969
c2: 74040391174021644872973194328541
c3: -276448343494407068460138157
c4: -4606872840152057004918
c5: 1393805639345406
c6: 816114600
Y0: -226657691871392565816046108637660098
Y1: 69216809380453058295685069[/code]

Cownoise says skew 534276.01166 score 4.17772314e-16! This one actually does look like a possible outlier; let's see if Gimarel can make it even better.

Second place from 25M-40M was
[code]n: 5272066026958413205513021090082556639441277154855572268239336980532402881465013381219738819137405617219594641652778228990107677072837959240400905630530715435664638237254654768053674178165309345879869729405448588458952991
skew: 259828.731
c0: 291564310755372745873385870920629996500880
c1: -11770010484070210819325995157141005616
c2: 804097425342022282773053964128739
c3: -1607028399777058166539822136
c4: -15556228087219154844031
c5: -1494532571792100
c6: 584942400
Y0: -237694813432371337441531043422949258
Y1: 47124166781711191105282781[/code]

which scores 3.78642741e-16, so with some spin it might also beat the previous 3.85 - but that's purely academic now.

Gimarel 2021-08-29 12:56

[QUOTE=charybdis;586801]Sean was using incr=4620 so his 10M range was more comparable to one of your 1M ranges.

And about the "hard to beat" bit...
[/QUOTE]
I couldn't improve the previous two. I'll try these now.

Gimarel 2021-08-29 16:33

[QUOTE=charybdis;586801]
Cownoise says skew 534276.01166 score 4.17772314e-16! This one actually does look like a possible outlier; let's see if Gimarel can make it even better.[/QUOTE]Sorry, I didn't find anything better.

swellman 2021-08-29 22:22

[QUOTE=charybdis;586801]Sean was using incr=4620 so his 10M range was more comparable to one of your 1M ranges.

And about the "hard to beat" bit...

[code]n: 5272066026958413205513021090082556639441277154855572268239336980532402881465013381219738819137405617219594641652778228990107677072837959240400905630530715435664638237254654768053674178165309345879869729405448588458952991
skew: 331125.033
c0: 3394949349018158588040756570853974067801857
c1: -6653335906218714378144808303035495969
c2: 74040391174021644872973194328541
c3: -276448343494407068460138157
c4: -4606872840152057004918
c5: 1393805639345406
c6: 816114600
Y0: -226657691871392565816046108637660098
Y1: 69216809380453058295685069[/code]

Cownoise says skew 534276.01166 score [B]4.17772314e-16[/B]! This one actually does look like a possible outlier; let's see if Gimarel can make it even better.
[/QUOTE]

:party: Nice one! Too bad it couldn’t be increased but that’s a breathtaking score.


For reasons of learning and stubborn pride, I’m going to finish my search range in a vain effort to improve on it.

charybdis 2021-08-29 23:18

[QUOTE=Gimarel;586818]Sorry, I didn't find anything better.[/QUOTE]

No worries, thanks for checking!

bur 2021-08-30 18:09

Oh, that's impressive... :D It did beat the RSA-220 poly by 7.5%.


It was nice learning a lot about polynomial search. Now let Ryan Propper work his server farm magic.

VBCurtis 2021-08-31 03:33

Not just yet- we have to pick sieve parameters first!

I used lpb0=34 and lpb1=35 for the GNFS-207 team sieve two years ago, and both LP bounds were higher than they needed to be. I believe those are big enough for this job.

The default C220 params file uses 500M/800M for lim's. Are those big enough?

mfb0 should be test-sieved at 66-67-68, and mfb1 should be test-sieved at 99-100-101-102. I used 66 and 99 for the C207, but this job is over four times tougher so likely 67 and 100-101 will be best.

One of the CADO publications observed that a very large ncurves on the 2LP side helped, say 50 or 75 (again, test-sieve?)
ncurves1 of 20ish is likely fine (default is 21).

We'd like to determine yield for a couple of test Q values, say at 100M intervals; once we estimate yield across the sieve region we can choose starting Q such that ending Q is expected to be 8 * starting Q.

charybdis 2021-08-31 14:32

I'll do some test-sieving.
Unsurprisingly, the 4.18e-16 poly is clearly better than the 3.85e-16 poly; I won't bother testing any of the others.

bur 2021-08-31 16:51

If it makes sense I could also test-sieve, though only with 10 cores. If so, which would be a good set of parameters?

charybdis 2021-08-31 18:52

There's nothing stopping you from test-sieving, but it wouldn't be useful for me because I'm comparing speeds as well as yields for lots of different parameter choices, and for speeds to be consistent I have to run everything on the same machine.

bur 2021-08-31 20:56

Ok, then I won't get into it.

charybdis 2021-08-31 23:33

Test-sieving done, at least for today.

I only used I=16, lpb 34/35, and given the yields I don't expect anything smaller than this to be better. Even if 35/35 does sieve faster, it will make postprocessing more of a pain. The machine I used only has 16GB RAM so I can't test A=32 which requires ~26GB, but I don't think we're at a size where that would be clearly better.

Raising ncurves0 from 25 doesn't find any more relations. It could possibly be dropped a bit, but 2LP cofactorization is a tiny contribution to the total time so the effect would be negligible. ncurves1=21 does miss a few relations, but it's just a fraction of a percent, so with the extra 3LP cofactorization time I don't think there's any need to change this from the default.

There is a clear benefit of several percent from using -adjust-strategy 2, as seems to be standard with large jobs. The price paid for this is that memory use with I=16 is 13GB rather than 10GB, but I assume this won't be a problem.

Higher lims = more yield and slower sieving, and with the added complication of the duplication rate depending on the lim choice, it's not easy to measure the effects with test-sieving. 500M/800M probably isn't optimal but it'll be close enough.

mfb0 choice is always complicated by the fact that higher mfb0 leads to a small increase in the number of relations required, and we don't have a good handle on this. This makes 68 less attractive, but in the interests of keeping yield high I stuck with 67 for the main test.

mfb1 can go quite high and still show a decent improvement in sec/rel. There doesn't seem to be any problem with going as high as 102.

I test-sieved the 500M/800M, 67/102 combination at various intervals between 200M and 2500M. Sec/rel worsened from ~0.7 to ~1.5 over this range. If the aim is 3G raw relations, the range 250M-2000M looks about right, and I estimate this will take only(!) ~105 thread-years on the machine I used. This figure seems suspiciously low even given that the threads in question are fast, but I can't find a mistake anywhere, so I have to believe it. Hopefully this doesn't mean Ryan's in for a nasty surprise with the duplication rate!

VBCurtis 2021-09-01 02:45

I appreciate the detail in the writeup- for e.g. Bur who sees a future in test-sieving, that post is a nice place to start.
Here's the conclusion for Ryan:
[code]tasks.lim0 = 500000000
tasks.lim1 = 800000000
tasks.lpb0 = 34
tasks.lpb1 = 35
tasks.sieve.mfb0 = 67
tasks.sieve.mfb1 = 102
tasks.sieve.ncurves0 = 25
tasks.sieve.ncurves1 = 21
tasks.I = 16
tasks.qmin = 250000000
tasks.sieve.qrange = 5000
tasks.sieve.adjust_strategy = 2
tasks.sieve.rels_wanted = 3000000000[/code]

If memory is plentiful, consider running A=32 for the first ~500M relations for the extra yield at small Q, and then changing down to I=16. That should help with the duplicate-relations rate, too- making it more likely to get 2G unique relations before Q=2000M.

charybdis 2021-09-01 03:02

I'd make one small change for practical purposes, which is to set rels_wanted artificially high (say 4G rather than 3G) as the intention is presumably to run msieve filtering. We don't want CADO filtering to be triggered, for a couple of reasons:

1. There is a bug whereby sieving with adjust_strategy = 2 can produce a relation with a (square) composite factor. This is extremely rare, but in 3G relations there's a decent chance it will happen, and unlike msieve, I don't think CADO filtering handles it well.
2. Even if this doesn't happen, CADO filtering on a job this big likely requires hundreds of GB of memory.

Setting a large value of rels_wanted has the added benefit that ETA values are more accurate early in the sieving while the yield is high :smile:

bur 2021-09-01 17:19

[QUOTE]for e.g. Bur who sees a future in test-sieving, that post is a nice place to start.[/QUOTE]That sound like test-sieving is a career-option. Professional test-siever.



But seriously, thanks, I always enjoy these posts.

ryanp 2021-09-12 18:56

[QUOTE=VBCurtis;586961]I appreciate the detail in the writeup- for e.g. Bur who sees a future in test-sieving, that post is a nice place to start.
Here's the conclusion for Ryan:[/QUOTE]

I'm sieving now. Thanks to those who found the poly and params!

Will give an update in a few days.

ryanp 2021-09-14 13:58

Update: I'm at 555M uniques now. I currently seem to be pulling about 250M unique (not total) relations per day, so hopefully another 6-10 days of sieving should be enough, if the rate keeps up.

bur 2021-09-14 16:56

How can the percentage of uniques be found during sieving? By doing a manual filtering step on the relations?


Impressive progress anyway, about the same timeframe I need for a C167...

EdH 2021-09-14 20:21

[QUOTE=bur;587861]How can the percentage of uniques be found during sieving? By doing a manual filtering step on the relations?


Impressive progress anyway, about the same timeframe I need for a C167...[/QUOTE]remdups4 is what I use. It can be found in the ggnfs package. (ggnfs/contrib/remdups/remdups4.c)


Edit: Depending on which sieving package you are using, you will need to gather all the relations into a single file and:[code][z]cat <file> | ./remdups4 1000 ><uniquerels>[/code]

bur 2021-09-16 14:42

Thanks, to find the number of uniques I just did a [C]wc -l[/C] on the lines not starting with #, does that yield the correct result? Is there a faster way, does remdup output the number somewhere?

EdH 2021-09-16 15:00

[QUOTE=bur;587972]Thanks, to find the number of uniques I just did a [C]wc -l[/C] on the lines not starting with #, does that yield the correct result? Is there a faster way, does remdup output the number somewhere?[/QUOTE]I can't answer the first part, but remdups4 outputs counts to the terminal as it runs.


All times are UTC. The time now is 21:54.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.