20210312, 23:11  #1 
∂^{2}ω=0
Sep 2002
República de California
29×401 Posts 
Nonlinear curve fits needed
I am working on adding p1 factoring support to Mlucas v20  This is related to choosing optimal Stage 2 parameters. I have 2 (x,y)point datasets for which I need highquality nonlinear curve fits. There is no appreciable "experimental error" in the data, so it is my hope that a suitable 2 or 3parameter functions can be found which do the job. Alas, I have no statspackage familiarity, and have found the Mac Grapher application useful only for displaying results for some 3parameter curves I tried, where I would play with one parameter and solve the resulting pair of simple linear equations for the other 2, based on simplistic "let's nail the 2 endpoints and see how the resulting middle fits". None gave a fit of the desired accuracy for all points:
y = a  b*ln(x)^c, manually play with c y = a*x^b + c, play with b y = a*exp(b*x) + c, play with b . Any help in form of candidate fit functions would be appreciated! Datasets attached. Last fiddled with by ewmayer on 20210314 at 22:12 Reason: Revised set2 to eliminate x = 25 duplicate entry 
20210313, 01:31  #2 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2
2·3·1,571 Posts 
More data is needed, specifically in the gaps between 1599 and 1159.
The reason is: there is a very strong fit if you throw away last datapoint. Therefore you have missing information to be discovered in the gaps. Suggestion to sample by hierarchical interval halving. E.g. (15+99)/2 = 57, then (15+57)/2 = 36, (57+99)/2 = 78 and so on; also perhaps past 99. 
20210313, 04:46  #3 
Jul 2003
So Cal
7×13×23 Posts 
Yes, more data is needed. For the first set, here are three models that do a reasonable job describing the data.
https://www.desmos.com/calculator/punkfe5upd 
20210313, 21:14  #4 
∂^{2}ω=0
Sep 2002
República de California
2D6D_{16} Posts 
I was curious whether any nonlinear curvefit software readers brought to bear would be able to take the partial point sets  complete at the left end, where the 'action' is, then just a farright point  and be able to do a good job fitting the omitted gap points, but your wish is my command  complete sets attached to OP.

20210314, 10:48  #5 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3·5·11·31 Posts 
What is the desired accuracy of fit?
Attached are all re set 1. 
20210314, 15:05  #6 
Feb 2017
Nowhere
11B8_{16} Posts 
The file set2.txt has a repeated line:
25 0.5382455 25 0.5382455 
20210314, 19:34  #7 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3·5·11·31 Posts 
set 2
Not only is point 25 duplicated, point 61 is absent, and point 61 roughly coincides with what looks like a discontinuity of ~9% in the related values. Is there really a step in the data? That will be not so easy to represent in a simple continuous function determined with a few parameters.

20210314, 19:43  #8 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
7·19·37 Posts 
Have you done an ANOVA analysis on excel?
(Trying to do this on Matlab) Last fiddled with by pinhodecarlos on 20210314 at 19:58 
20210314, 20:27  #9  
Feb 2017
Nowhere
1000110111000_{2} Posts 
Quote:
The values seem to decline relatively rapidly for small x, then flatten out. The idea occurred to me, try a function a/(x + b) + c. For a given b, you can do a leastsquares fit for a and c. I'm too lazy to pursue the idea. 

20210314, 20:57  #10 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
7×19×37 Posts 
y = a*x^b + c, play with b (dataset used was 2, taking out duplicates)
Matlab b1 is a, b2 is b and b3 is c. Code:
modelfun = @(b,x)b(1)*x(:,1).^b(2)+b(3); beta0 = [.1 .1 .1]; mdl = fitnlm(x,y,modelfun,beta0) mdl = Nonlinear regression model: y ~ b1*x1^b2 + b3 Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ b1 0.43127 0.015483 27.855 1.9937e21 b2 0.30448 0.023713 12.84 5.1923e13 b3 0.38114 0.017444 21.85 1.0697e18 Number of observations: 30, Error degrees of freedom: 27 Root Mean Squared Error: 0.0056 RSquared: 0.993, Adjusted RSquared 0.993 Fstatistic vs. constant model: 1.97e+03, pvalue = 5.48e30 Last fiddled with by pinhodecarlos on 20210314 at 21:02 
20210314, 22:08  #11 
∂^{2}ω=0
Sep 2002
República de California
26555_{8} Posts 
I've uploaded a revised set2.txt with the duplicate at x = 25 removed. I probably have Excelsomething on one of my Linux boxes, but can't afford more hours dicking around learning the basics of yet another software app which I will not be making further use of  see my note at end of this post re. "2 options". Thanks, Carlos, for the Matlab result.
Ken, the curves in your PDF, did those try to fit an imagined discontinuity at the right end? As described below, we only need to fit the interval x = [1,99] for set1 and x = [1,59] for set2; x < 1 and x greater than those are irrelevant for the present problem. A bit of context: the optimization in question relates to minimizing the needed number of modM(p) FFTmuls in p1 stage 2. We use a babystep/giantstep approach here. No time to detail all that, but the gist: o The two leading candidates for the giant step are D = 210 = 2.3.5.7 and D = 330 = 2.3.5.11; o For a given choice of D, we find all integers coprime to D in [1,D/21]. For D = 210 there are 24 such: [1,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103]. For D = 330 we have 40 coprimes in [1,164]. Starting with a p1 stage 1 residue, each such coprime needs a precomputed power of the stage 1 residue, thus we have 24 and 40 such precomputed powers  Mihai Preda uses the word "buffers", so I'll use it as well  for each Dsized step in stage 2. For the current GIMPS wavefront at FFT = 6Mdoubles, D = 210 thus needs ~24*50MB = 1.2GB of RAM, and D = 330 needs ~2GB, at a bare minimum; o But wait! In order to further reduce the effort, we further use  again too technical to get sidetracked by the gory details here  an "extended window prime pairing scheme". My implementation of this needs an odd number of such pairing windows, 1,3,5,...; the xvalues in my datasets represent the number of such windows. Each unit of x means a stage 2 needing at least (x * #buffers)' worth of RAM. Thus the right end of the D = 210 (set1.txt) graph corresponds to 99*24 = 2376 buffers, or around 120GB at the current wavefront. For D = 330 each unit of x represents 40 buffers, thus thus the right end of the D = 330 (set1.txt) graph corresponds to 59*40 = 2360 buffers, or again around 120GB at the current wavefront. o The yvalues of the dataset points are scaled stage 2 #modmuls  thus lower is better, assuming we have enough RAM. For a given amount of available syatem RAM, we overlay the 2 datasets, with the set2 abscissa properly scaled * 5/3, and pick whichever datapoint lying just left of the xvalue corresponding to available RAM has the smaller yvalue. That determines which Dvalue and number of pairing windows get used in the stage 2 primepairing scheme. There is little point extending the plots further because few GIMPS participants have systems with > 128GB RAM, and as the 2 datasets show, most of the gains are to be had at smaller values of x, corresponding to ~1000 buffers or fewer. There are 2 obvious options for making the above stage 2 parameter choice  The curvefit is one, a simple discrete table lookup is the other. So to the curvefittters, if your favorite tool gives a decent 2 or 3parameter fit, great, but please don't spend any inordinate effort on this. Edit: I've attached a plot of the 2 datasets overlaid, with the set2 xvalues (blackdotted circles) scaled by *5/3 to allow direct comparison  each unit of x corresponds to 24 M(p)sized memory buffers, let's assume in forwardFFTed form, thus again ~1.2GB RAM per unit of x for exponents at the current wavefront. Our choices here are the discrete centers of each circle, nothing in between. We see that for x > 13 (~16GB available RAM), D = 330 always wins, it's only for lower amounts of available memory that we need to choose between the 2 bigstepvalues. Last fiddled with by ewmayer on 20210315 at 20:08 Reason: Added comparative plot 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
the learning curve  SELROC  Lounge  34  20190731 21:33 
Work Per ECM Curve  wblipp  GMPECM  8  20081228 14:24 
Tabular Integration w/ 2 Nonlinear Terms  Primeinator  Homework Help  0  20080501 06:22 
Why does it do only one curve?  Andi47  GMPECM  6  20060319 06:38 
Curve Counts  Prime95  Factoring  23  20050322 16:43 