mersenneforum.org Nonlinear curve fits needed
 Register FAQ Search Today's Posts Mark Forums Read

2021-03-12, 23:11   #1
ewmayer
2ω=0

Sep 2002
República de California

29×401 Posts
Nonlinear curve fits needed

I am working on adding p-1 factoring support to Mlucas v20 - This is related to choosing optimal Stage 2 parameters. I have 2 (x,y)-point datasets for which I need high-quality nonlinear curve fits. There is no appreciable "experimental error" in the data, so it is my hope that a suitable 2 or 3-parameter functions can be found which do the job. Alas, I have no stats-package familiarity, and have found the Mac Grapher application useful only for displaying results for some 3-parameter curves I tried, where I would play with one parameter and solve the resulting pair of simple linear equations for the other 2, based on simplistic "let's nail the 2 endpoints and see how the resulting middle fits". None gave a fit of the desired accuracy for all points:

y = a - b*ln(x)^c, manually play with c
y = a*x^b + c, play with b
y = a*exp(b*x) + c, play with b .

Any help in form of candidate fit functions would be appreciated! Datasets attached.
Attached Files
 set1.txt.bz2 (299 Bytes, 16 views) set2.txt.bz2 (209 Bytes, 5 views)

Last fiddled with by ewmayer on 2021-03-14 at 22:12 Reason: Revised set2 to eliminate x = 25 duplicate entry

 2021-03-13, 01:31 #2 Batalov     "Serge" Mar 2008 Phi(4,2^7658614+1)/2 2·3·1,571 Posts More data is needed, specifically in the gaps between 15-99 and 11-59. The reason is: there is a very strong fit if you throw away last datapoint. Therefore you have missing information to be discovered in the gaps. Suggestion to sample by hierarchical interval halving. E.g. (15+99)/2 = 57, then (15+57)/2 = 36, (57+99)/2 = 78 and so on; also perhaps past 99.
 2021-03-13, 04:46 #3 frmky     Jul 2003 So Cal 7×13×23 Posts Yes, more data is needed. For the first set, here are three models that do a reasonable job describing the data. https://www.desmos.com/calculator/punkfe5upd
 2021-03-13, 21:14 #4 ewmayer ∂2ω=0     Sep 2002 República de California 2D6D16 Posts I was curious whether any nonlinear curve-fit software readers brought to bear would be able to take the partial point sets - complete at the left end, where the 'action' is, then just a far-right point - and be able to do a good job fitting the omitted gap points, but your wish is my command - complete sets attached to OP.
2021-03-14, 10:48   #5
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·5·11·31 Posts

What is the desired accuracy of fit?
Attached are all re set 1.
Attached Files
 function fit.7z (81.4 KB, 8 views) function fit.ods.bz2 (54.8 KB, 7 views) function fit.pdf (31.9 KB, 18 views)

 2021-03-14, 15:05 #6 Dr Sardonicus     Feb 2017 Nowhere 11B816 Posts The file set2.txt has a repeated line: 25 0.5382455 25 0.5382455
2021-03-14, 19:34   #7
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·5·11·31 Posts
set 2

Not only is point 25 duplicated, point 61 is absent, and point 61 roughly coincides with what looks like a discontinuity of ~9% in the related values. Is there really a step in the data? That will be not so easy to represent in a simple continuous function determined with a few parameters.
Attached Files
 function fit set 2.pdf (30.7 KB, 10 views)

 2021-03-14, 19:43 #8 pinhodecarlos     "Carlos Pinho" Oct 2011 Milton Keynes, UK 7·19·37 Posts Have you done an ANOVA analysis on excel? (Trying to do this on Matlab) Last fiddled with by pinhodecarlos on 2021-03-14 at 19:58
2021-03-14, 20:27   #9
Dr Sardonicus

Feb 2017
Nowhere

10001101110002 Posts

Quote:
 Originally Posted by kriesel Not only is point 25 duplicated, point 61 is absent, and point 61 roughly coincides with what looks like a discontinuity of ~9% in the related values. Is there really a step in the data? That will be not so easy to represent in a simple continuous function determined with a few parameters.
In the files I downloaded, the values in set1.txt and set2.txt for x = 1, 3, ... 59 are different. I looked at the values in set1.txt alone.

The values seem to decline relatively rapidly for small x, then flatten out. The idea occurred to me, try a function a/(x + b) + c. For a given b, you can do a least-squares fit for a and c. I'm too lazy to pursue the idea.

 2021-03-14, 20:57 #10 pinhodecarlos     "Carlos Pinho" Oct 2011 Milton Keynes, UK 7×19×37 Posts y = a*x^b + c, play with b (dataset used was 2, taking out duplicates) Matlab b1 is a, b2 is b and b3 is c. Code: modelfun = @(b,x)b(1)*x(:,1).^b(2)+b(3); beta0 = [.1 .1 .1]; mdl = fitnlm(x,y,modelfun,beta0) mdl = Nonlinear regression model: y ~ b1*x1^b2 + b3 Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ b1 0.43127 0.015483 27.855 1.9937e-21 b2 -0.30448 0.023713 -12.84 5.1923e-13 b3 0.38114 0.017444 21.85 1.0697e-18 Number of observations: 30, Error degrees of freedom: 27 Root Mean Squared Error: 0.0056 R-Squared: 0.993, Adjusted R-Squared 0.993 F-statistic vs. constant model: 1.97e+03, p-value = 5.48e-30 Tomorrow I will spend more time looking at the other functions since I struggle to get good initial values for the iteration. Last fiddled with by pinhodecarlos on 2021-03-14 at 21:02
 2021-03-14, 22:08 #11 ewmayer ∂2ω=0     Sep 2002 República de California 265558 Posts I've uploaded a revised set2.txt with the duplicate at x = 25 removed. I probably have Excel-something on one of my Linux boxes, but can't afford more hours dicking around learning the basics of yet another software app which I will not be making further use of - see my note at end of this post re. "2 options". Thanks, Carlos, for the Matlab result. Ken, the curves in your PDF, did those try to fit an imagined discontinuity at the right end? As described below, we only need to fit the interval x = [1,99] for set1 and x = [1,59] for set2; x < 1 and x greater than those are irrelevant for the present problem. A bit of context: the optimization in question relates to minimizing the needed number of mod-M(p) FFT-muls in p-1 stage 2. We use a baby-step/giant-step approach here. No time to detail all that, but the gist: o The two leading candidates for the giant step are D = 210 = 2.3.5.7 and D = 330 = 2.3.5.11; o For a given choice of D, we find all integers coprime to D in [1,D/2-1]. For D = 210 there are 24 such: [1,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103]. For D = 330 we have 40 coprimes in [1,164]. Starting with a p-1 stage 1 residue, each such coprime needs a precomputed power of the stage 1 residue, thus we have 24 and 40 such precomputed powers - Mihai Preda uses the word "buffers", so I'll use it as well - for each D-sized step in stage 2. For the current GIMPS wavefront at FFT = 6Mdoubles, D = 210 thus needs ~24*50MB = 1.2GB of RAM, and D = 330 needs ~2GB, at a bare minimum; o But wait! In order to further reduce the effort, we further use - again too technical to get sidetracked by the gory details here - an "extended window prime pairing scheme". My implementation of this needs an odd number of such pairing windows, 1,3,5,...; the x-values in my datasets represent the number of such windows. Each unit of x means a stage 2 needing at least (x * #buffers)' worth of RAM. Thus the right end of the D = 210 (set1.txt) graph corresponds to 99*24 = 2376 buffers, or around 120GB at the current wavefront. For D = 330 each unit of x represents 40 buffers, thus thus the right end of the D = 330 (set1.txt) graph corresponds to 59*40 = 2360 buffers, or again around 120GB at the current wavefront. o The y-values of the dataset points are scaled stage 2 #modmuls - thus lower is better, assuming we have enough RAM. For a given amount of available syatem RAM, we overlay the 2 datasets, with the set2 abscissa properly scaled * 5/3, and pick whichever datapoint lying just left of the x-value corresponding to available RAM has the smaller y-value. That determines which D-value and number of pairing windows get used in the stage 2 prime-pairing scheme. There is little point extending the plots further because few GIMPS participants have systems with > 128GB RAM, and as the 2 datasets show, most of the gains are to be had at smaller values of x, corresponding to ~1000 buffers or fewer. There are 2 obvious options for making the above stage 2 parameter choice - The curve-fit is one, a simple discrete table lookup is the other. So to the curve-fittters, if your favorite tool gives a decent 2 or 3-parameter fit, great, but please don't spend any inordinate effort on this. Edit: I've attached a plot of the 2 datasets overlaid, with the set2 x-values (black-dotted circles) scaled by *5/3 to allow direct comparison - each unit of x corresponds to 24 M(p)-sized memory buffers, let's assume in forward-FFTed form, thus again ~1.2GB RAM per unit of x for exponents at the current wavefront. Our choices here are the discrete centers of each circle, nothing in between. We see that for x > 13 (~16GB available RAM), D = 330 always wins, it's only for lower amounts of available memory that we need to choose between the 2 bigstep-values. Attached Thumbnails   Last fiddled with by ewmayer on 2021-03-15 at 20:08 Reason: Added comparative plot

 Similar Threads Thread Thread Starter Forum Replies Last Post SELROC Lounge 34 2019-07-31 21:33 wblipp GMP-ECM 8 2008-12-28 14:24 Primeinator Homework Help 0 2008-05-01 06:22 Andi47 GMP-ECM 6 2006-03-19 06:38 Prime95 Factoring 23 2005-03-22 16:43

All times are UTC. The time now is 02:07.

Tue May 11 02:07:26 UTC 2021 up 32 days, 20:48, 1 user, load averages: 3.36, 3.12, 3.02