mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2010-01-18, 11:50   #1
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

72·131 Posts
Default bad parameters for 153-digit CPU polsel ?

On Friday afternoon, I left sixteen msieve-cpu jobs running over the weekend on a fairly fast machine (-np 1,100000; -np 100000,200000 ...) and got in on Monday morning to find a total of four polynomials in the msieve.dat.p files.

This sounds to me as if some thresholds aren't quite right; I will poke around a bit and see if I can figure out what's going on.
fivemack is offline   Reply With Quote
Old 2010-01-19, 00:00   #2
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

224058 Posts
Default this... and the "ratio of 0.69"

I agree. This region was left uncorrected. (data below was taken from GGNFS, data above was sampled by you and Ben, and me, and others)

If you take the expected (reporting threshold) E from msieve source (gnfs/poly/poly_skew.c) and take the log10 of the 3rd column, you will get a logE vs logN plot. You will observe a distinct elbow around 154. This elbow has to be smoothed.

____________
here ends the post. the rest is a tangent.

Furthermore, after correcting for the elbow, the slope of that dependency (for gnfs) would be almost exactly 0.060 (+-0.001). The slope of snfs is anecdotally known to be 1/30 in a fairly wide range. The ratio of these slopes is ~0.56.

This (together with additional sampling as well as M.Kamada's observations) leads to a revised gnfs/snfs decision boundary rule-of-thumb:

if gnfs-difficulty < snfs-diff * 0.56 + 30, then use gnfs, else snfs.
(with tests, if desired, in the borderline cases)

This is the update to the infamous ratio of 0.69, which was a good one-parametric guesstimate for the top of the home-computing range (it is known to be distorted in the easy range; there, one should use 0.75 and even 0.80; it is easy to see that the new heuristic describes that). Interestingly, this virtual decision ratio keeps declining as low as 0.65 and 0.64 for the NFS @ Home type of numbers and above.

P.S. M.Kamada's variant of the boundary (as far as I understand, implemented on his site) is
if gnfs-difficulty < snfs-diff * 0.6 + 18

Last fiddled with by Batalov on 2010-01-19 at 00:59 Reason: off on a tangent
Batalov is offline   Reply With Quote
Old 2010-01-19, 00:05   #3
FactorEyes
 
FactorEyes's Avatar
 
Oct 2006
vomit_frame_pointer

23·32·5 Posts
Default

I'm so confused. I believe the original post is about polynomial-selection parameters: e-values or some other threshold earlier in the search process.

I had a 158-digit gnfs polynomial selection run to completion without a single polynomial last year, but I imagine my experience is unusual, since I and others have found pallets of succulent, juicy gnfs polynomials via msieve since then.
FactorEyes is offline   Reply With Quote
Old 2010-01-19, 00:32   #4
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

250516 Posts
Default

The other two norms can be improved, too; this is left as an excercise to the reader.

However, it is easy to see that the E threshold being too strict will directly lead to absent (or very few) reported polynomials. I was reacting to this symptom (which I had in my few projects in the same range, too).

Additional though after that post. I was not saying that the table needs to be straightened to a line with the slope 0.060; no, I think it should be fit into a L(1/3,c), based on the available points. If we do that, the curvature will remain, but the elbow will get smoother, and this exactly what doctor oredered for gnfs-153-156-158s. Why should E follow the theoretical L(1/3,c) complexity estimate? Because E is the complexity in some sense, as far as I learned from experience: Comparing Es from different projects, one can observe that it strongly (inversely) correlates with the total needed time (at least sieving time), regardless of the FBLIMs and many other parameters being equal or different as they may be.
Batalov is offline   Reply With Quote
Old 2010-01-19, 01:31   #5
jrk
 
jrk's Avatar
 
May 2008

21078 Posts
Default

Quote:
Originally Posted by fivemack View Post
On Friday afternoon, I left sixteen msieve-cpu jobs running over the weekend on a fairly fast machine (-np 1,100000; -np 100000,200000 ...)
So the 16th job would have started at 1500000 which seems to be much much too high for msieve's code. Perhaps it would've been better to make the ranges a bit thinner. For a c153, the time-per-coeff limit is 800 seconds, so a weekend-long (~64 hours?) task would likely cover a range of less than 20000, not 100000.

Quote:
Originally Posted by fivemack View Post
and got in on Monday morning to find a total of four polynomials in the msieve.dat.p files.
I take it that you were running each range in a separate directory, so the dat files weren't being clobbered, yes?

But also, did the msieve instances actually get to use a lot of CPU time or could they have been interrupted by other running tasks? Did you measure the consumed CPU time for each instance? Sorry if these are stupid questions.
jrk is offline   Reply With Quote
Old 2010-01-19, 03:47   #6
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

947710 Posts
Default the other two norms

If we plot N1, N2 (stage1_norm and stage2_norm in the source) and E0 vs Nd (number of digits), then their tentative fit
N1 = 10 ^ ( 0.153 Nd - 0.262)
N2 = 10 ^ ( 0.153 Nd - 1.800)
E0 = 10 ^ (-0.060 Nd - 2.473)

suggests that based on all the previous thinkering N2 = C * N1, where C = 0.030. Notably, this rule is broken in the 150-digit area under question (N2 is approximately halved from the above interpolation).

I suggest doubling N2 in this area and testing.
(Alternatively, later simply letting N2 = 0.030 N1 in the code)
Batalov is offline   Reply With Quote
Old 2010-02-05, 16:47   #7
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

72·131 Posts
Default

I doubled N1 and N2 and set E small enough to expect everything to pass, which gave enough results to decide that E=2e-12 is about right at the 153-digit level. I'm now getting a few dozen polynomials a minute over the sixteen threads, which is a much more reasonable rate - should be able to push 3432 on one iteration by about the end of next week.
fivemack is offline   Reply With Quote
Old 2010-02-05, 17:17   #8
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Looks like we need an overhaul of the parameters even for input sizes that have parameters in GGNFS. Do you now get polynomials that exceed the original E-value bound?

I've been running the GPU code on RSA512 because Paul Zimmermann is collecting polynomials for testing purposes, and running my medium-end GPU for 12 days produced less than a dozen polynomials, even though stage 1 was producing 3-5 hits per second the whole time. Looks like N2 should be raised and the target E value should be lowered a little bit, but I have a hard time believing that stage 2 can 'save' thousands of polynomials that we throw away now. One would think that if a polynomial has very good size properties that it would exceed any reasonable bound we use.
jasonp is offline   Reply With Quote
Old 2010-02-08, 10:35   #9
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

72·131 Posts
Default

After a weekend on 16 threads, which is not quite enough to run even a 20,000 range of A5 values per thread, I have got 850,000 polynomials with E>2e-12, amongst which 23 with E>3e-12 (which is I think about the bound which would previously have been used). Unfortunately I deleted the results of the previous search and the OS X Time Machine doesn't appear to have backed them up.

Code:
11/msieve.dat.p:# norm 6.617963e-15 alpha -7.208546 e 3.309e-12
11/msieve.dat.p-skew: 3975894.40
11/msieve.dat.p-c0:  38864990584778529316293889269088572117
11/msieve.dat.p-c1:  42407759519512407627039861848145
11/msieve.dat.p-c2: -27227921023325206954968328
11/msieve.dat.p-c3: -34969966602945341462
11/msieve.dat.p-c4:  1855338003744
11/msieve.dat.p-c5:  236520
11/msieve.dat.p-Y0: -255608848447291680317902830970
11/msieve.dat.p-Y1:  919617793474307609
and (sorting on alpha; I think this is just a curiosity)
Code:
# norm 4.682652e-15 alpha -8.820531 e 2.614e-12
skew: 14581133.00
c0: -54625560516118891465822440063303774939381
c1: -20222977001593187674250552388158108
c2:  472573460092231841462940909
c3:  215752730536161183312
c4: -4396304506300
c5:  142800
Y0: -282757813391022193251627226150
Y1:  882713449622448257
fivemack is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
msieve polsel trouble on GTX1080Ti fivemack GPU Computing 2 2018-04-02 12:28
At least one upcoming paper on polsel Dubslow Msieve 0 2016-03-16 06:24
Feature request: multithreaded polsel Andi47 Msieve 1 2010-02-20 01:16
Some results from a live GNFS180 polsel fivemack Msieve 22 2009-12-22 02:57
160 digit factor found of 366 digit (PRP-1) AntonVrba Factoring 7 2005-12-06 22:02

All times are UTC. The time now is 01:18.


Sat Jul 17 01:18:27 UTC 2021 up 49 days, 23:05, 1 user, load averages: 1.24, 1.16, 1.26

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.