mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-03-12, 23:11   #1
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

29×401 Posts
Default Nonlinear curve fits needed

I am working on adding p-1 factoring support to Mlucas v20 - This is related to choosing optimal Stage 2 parameters. I have 2 (x,y)-point datasets for which I need high-quality nonlinear curve fits. There is no appreciable "experimental error" in the data, so it is my hope that a suitable 2 or 3-parameter functions can be found which do the job. Alas, I have no stats-package familiarity, and have found the Mac Grapher application useful only for displaying results for some 3-parameter curves I tried, where I would play with one parameter and solve the resulting pair of simple linear equations for the other 2, based on simplistic "let's nail the 2 endpoints and see how the resulting middle fits". None gave a fit of the desired accuracy for all points:

y = a - b*ln(x)^c, manually play with c
y = a*x^b + c, play with b
y = a*exp(b*x) + c, play with b .

Any help in form of candidate fit functions would be appreciated! Datasets attached.
Attached Files
File Type: bz2 set1.txt.bz2 (299 Bytes, 16 views)
File Type: bz2 set2.txt.bz2 (209 Bytes, 5 views)

Last fiddled with by ewmayer on 2021-03-14 at 22:12 Reason: Revised set2 to eliminate x = 25 duplicate entry
ewmayer is offline   Reply With Quote
Old 2021-03-13, 01:31   #2
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

2·3·1,571 Posts
Default

More data is needed, specifically in the gaps between 15-99 and 11-59.

The reason is: there is a very strong fit if you throw away last datapoint. Therefore you have missing information to be discovered in the gaps. Suggestion to sample by hierarchical interval halving. E.g. (15+99)/2 = 57, then (15+57)/2 = 36, (57+99)/2 = 78 and so on; also perhaps past 99.
Batalov is offline   Reply With Quote
Old 2021-03-13, 04:46   #3
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

7×13×23 Posts
Default

Yes, more data is needed. For the first set, here are three models that do a reasonable job describing the data.

https://www.desmos.com/calculator/punkfe5upd
frmky is offline   Reply With Quote
Old 2021-03-13, 21:14   #4
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2D6D16 Posts
Default

I was curious whether any nonlinear curve-fit software readers brought to bear would be able to take the partial point sets - complete at the left end, where the 'action' is, then just a far-right point - and be able to do a good job fitting the omitted gap points, but your wish is my command - complete sets attached to OP.
ewmayer is offline   Reply With Quote
Old 2021-03-14, 10:48   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·5·11·31 Posts
Default

What is the desired accuracy of fit?
Attached are all re set 1.
Attached Files
File Type: 7z function fit.7z (81.4 KB, 8 views)
File Type: bz2 function fit.ods.bz2 (54.8 KB, 7 views)
File Type: pdf function fit.pdf (31.9 KB, 18 views)
kriesel is online now   Reply With Quote
Old 2021-03-14, 15:05   #6
Dr Sardonicus
 
Dr Sardonicus's Avatar
 
Feb 2017
Nowhere

11B816 Posts
Default

The file set2.txt has a repeated line:

25 0.5382455
25 0.5382455
Dr Sardonicus is offline   Reply With Quote
Old 2021-03-14, 19:34   #7
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·5·11·31 Posts
Default set 2

Not only is point 25 duplicated, point 61 is absent, and point 61 roughly coincides with what looks like a discontinuity of ~9% in the related values. Is there really a step in the data? That will be not so easy to represent in a simple continuous function determined with a few parameters.
Attached Files
File Type: pdf function fit set 2.pdf (30.7 KB, 10 views)
kriesel is online now   Reply With Quote
Old 2021-03-14, 19:43   #8
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

7·19·37 Posts
Default

Have you done an ANOVA analysis on excel?

(Trying to do this on Matlab)

Last fiddled with by pinhodecarlos on 2021-03-14 at 19:58
pinhodecarlos is offline   Reply With Quote
Old 2021-03-14, 20:27   #9
Dr Sardonicus
 
Dr Sardonicus's Avatar
 
Feb 2017
Nowhere

10001101110002 Posts
Default

Quote:
Originally Posted by kriesel View Post
Not only is point 25 duplicated, point 61 is absent, and point 61 roughly coincides with what looks like a discontinuity of ~9% in the related values. Is there really a step in the data? That will be not so easy to represent in a simple continuous function determined with a few parameters.
In the files I downloaded, the values in set1.txt and set2.txt for x = 1, 3, ... 59 are different. I looked at the values in set1.txt alone.

The values seem to decline relatively rapidly for small x, then flatten out. The idea occurred to me, try a function a/(x + b) + c. For a given b, you can do a least-squares fit for a and c. I'm too lazy to pursue the idea.
Dr Sardonicus is offline   Reply With Quote
Old 2021-03-14, 20:57   #10
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

7×19×37 Posts
Default

y = a*x^b + c, play with b (dataset used was 2, taking out duplicates)


Matlab b1 is a, b2 is b and b3 is c.



Code:
modelfun = @(b,x)b(1)*x(:,1).^b(2)+b(3);
beta0 = [.1 .1 .1];
mdl = fitnlm(x,y,modelfun,beta0)

mdl = 


Nonlinear regression model:
    y ~ b1*x1^b2 + b3

Estimated Coefficients:
          Estimate       SE       tStat       pValue  
          ________    ________    ______    __________

    b1     0.43127    0.015483    27.855    1.9937e-21
    b2    -0.30448    0.023713    -12.84    5.1923e-13
    b3     0.38114    0.017444     21.85    1.0697e-18


Number of observations: 30, Error degrees of freedom: 27
Root Mean Squared Error: 0.0056
R-Squared: 0.993,  Adjusted R-Squared 0.993
F-statistic vs. constant model: 1.97e+03, p-value = 5.48e-30
Tomorrow I will spend more time looking at the other functions since I struggle to get good initial values for the iteration.

Last fiddled with by pinhodecarlos on 2021-03-14 at 21:02
pinhodecarlos is offline   Reply With Quote
Old 2021-03-14, 22:08   #11
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

265558 Posts
Default

I've uploaded a revised set2.txt with the duplicate at x = 25 removed. I probably have Excel-something on one of my Linux boxes, but can't afford more hours dicking around learning the basics of yet another software app which I will not be making further use of - see my note at end of this post re. "2 options". Thanks, Carlos, for the Matlab result.

Ken, the curves in your PDF, did those try to fit an imagined discontinuity at the right end? As described below, we only need to fit the interval x = [1,99] for set1 and x = [1,59] for set2; x < 1 and x greater than those are irrelevant for the present problem.

A bit of context: the optimization in question relates to minimizing the needed number of mod-M(p) FFT-muls in p-1 stage 2. We use a baby-step/giant-step approach here. No time to detail all that, but the gist:

o The two leading candidates for the giant step are D = 210 = 2.3.5.7 and D = 330 = 2.3.5.11;

o For a given choice of D, we find all integers coprime to D in [1,D/2-1]. For D = 210 there are 24 such: [1,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101,103]. For D = 330 we have 40 coprimes in [1,164]. Starting with a p-1 stage 1 residue, each such coprime needs a precomputed power of the stage 1 residue, thus we have 24 and 40 such precomputed powers - Mihai Preda uses the word "buffers", so I'll use it as well - for each D-sized step in stage 2. For the current GIMPS wavefront at FFT = 6Mdoubles, D = 210 thus needs ~24*50MB = 1.2GB of RAM, and D = 330 needs ~2GB, at a bare minimum;

o But wait! In order to further reduce the effort, we further use - again too technical to get sidetracked by the gory details here - an "extended window prime pairing scheme". My implementation of this needs an odd number of such pairing windows, 1,3,5,...; the x-values in my datasets represent the number of such windows. Each unit of x means a stage 2 needing at least (x * #buffers)' worth of RAM. Thus the right end of the D = 210 (set1.txt) graph corresponds to 99*24 = 2376 buffers, or around 120GB at the current wavefront. For D = 330 each unit of x represents 40 buffers, thus thus the right end of the D = 330 (set1.txt) graph corresponds to 59*40 = 2360 buffers, or again around 120GB at the current wavefront.

o The y-values of the dataset points are scaled stage 2 #modmuls - thus lower is better, assuming we have enough RAM. For a given amount of available syatem RAM, we overlay the 2 datasets, with the set2 abscissa properly scaled * 5/3, and pick whichever datapoint lying just left of the x-value corresponding to available RAM has the smaller y-value. That determines which D-value and number of pairing windows get used in the stage 2 prime-pairing scheme.

There is little point extending the plots further because few GIMPS participants have systems with > 128GB RAM, and as the 2 datasets show, most of the gains are to be had at smaller values of x, corresponding to ~1000 buffers or fewer.

There are 2 obvious options for making the above stage 2 parameter choice - The curve-fit is one, a simple discrete table lookup is the other. So to the curve-fittters, if your favorite tool gives a decent 2 or 3-parameter fit, great, but please don't spend any inordinate effort on this.

Edit: I've attached a plot of the 2 datasets overlaid, with the set2 x-values (black-dotted circles) scaled by *5/3 to allow direct comparison - each unit of x corresponds to 24 M(p)-sized memory buffers, let's assume in forward-FFTed form, thus again ~1.2GB RAM per unit of x for exponents at the current wavefront. Our choices here are the discrete centers of each circle, nothing in between. We see that for x > 13 (~16GB available RAM), D = 330 always wins, it's only for lower amounts of available memory that we need to choose between the 2 bigstep-values.
Attached Thumbnails
Click image for larger version

Name:	stage2_210vs330.png
Views:	17
Size:	74.7 KB
ID:	24491  

Last fiddled with by ewmayer on 2021-03-15 at 20:08 Reason: Added comparative plot
ewmayer is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
the learning curve SELROC Lounge 34 2019-07-31 21:33
Work Per ECM Curve wblipp GMP-ECM 8 2008-12-28 14:24
Tabular Integration w/ 2 Nonlinear Terms Primeinator Homework Help 0 2008-05-01 06:22
Why does it do only one curve? Andi47 GMP-ECM 6 2006-03-19 06:38
Curve Counts Prime95 Factoring 23 2005-03-22 16:43

All times are UTC. The time now is 02:07.

Tue May 11 02:07:26 UTC 2021 up 32 days, 20:48, 1 user, load averages: 3.36, 3.12, 3.02

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.