mersenneforum.org Prime k-value density rating and factorizations
 Register FAQ Search Today's Posts Mark Forums Read

2009-07-29, 01:06   #12
Flatlander
I quite division it

"Chris"
Feb 2005
England

31·67 Posts

Yeah, I did spot it. I had done some analysis. I have an old (messy) spreadsheet somewhere with some data in. I think I tested loads of ks (mostly multiples of 15) to n=10k and then the interesting ones to 50k. Then looked for top-5000 primes in the best of them. They all went a bit 'jumpy' except for 3045 which had the most even distribution of primes and so became my fave.

Sheet one is a partial record of my searches.
On sheet two you can see colour coded (by size) lists of primes for my fave ks. Cell L4 shows that I marked 3045 as being the most fruitful. Column AD shows the increases in size from prime to prime for k=3045. A beautiful distribution imho. The mean and standard deviation are in AE.
The percentages in row 2 shows some sort of score I gave them in comparison to k=1785. 'Ruff' means a 'jumpy' distribution. 15345+ was a Proth. (Wash my mouth out with soap.)

Now I'm waiting for you to find a relationship between the 'performance' of a k and the number and size of its factors. In other words, there must a reason why some ks perform better than others???
Attached Files
 a prime log 300908.zip (44.0 KB, 58 views)

Last fiddled with by Flatlander on 2009-07-29 at 01:08

 2009-07-29, 01:43 #13 gd_barnes     May 2007 Kansas; USA 2×72×103 Posts That sounds interesting. I'll take a look at it later tonight. There's no specific reason that I can think of why a k of the same weight would be better than its coharts of the same weight with the exception of Mersenne's that have been proven can contain only factors of a specific form. Most of it would be random fluctuations and the difficult part is telling what is random and what is not. To prove that a k with a weight of 3822 was "better" (as for candidates searched per prime) than another k with the same weight would likely require testing each k to n=1G or even 1T or higher. In most cases, there would need to be several hundred primes to get a statistical significance. Also, there is degrees of freedom to concern ourselves with. As it applies here, if you look at 100's or 1000's of k's with the same weight, there are bound to be some that are 2, 3, and even possibly 4 standard deviations from the mean so that if you compare the highest and lowest of them, the deviation could seem huge on the surface but not necessarily mean a hill of beans about future primes on them. The key is to determine what is statistically significant for the number of k's that you are looking at. For 100 k's, 3 st. devs. from the mean for the highest amongst them might only give you a 80% confidence that that particular k will continue to produce above average primes in the future. But if you had only 10 k's, 3 sd's could give you a 95% confidence level. (Made up figures but probably not too far off.) If you were looking at 1000 k's, I'm certain that having a few k's at 3 sd's would be very normal. You'd likely need 3.5-4 sd's to have reasonable statistical significance for any one k. As I remember from stats class, for 1 degree of freedom, something 1 s.d. from the mean will occur about 32% of the time (counting both above and below the mean), 2 sd's from the mean will occur about 5% of the time and 3 sd's about 1% of the time. But those figures change greatly as you get up to 100 or 1000 degrees of freedom. It is my conjecture (or really speculation at this point) that high-weight k's, because they have fewer smaller factors below a typical sieve depth will also have fewer higher factors above a typical sieve depth. Therefore, I speculate that high-weight k's will give more top-5000 prime score per CPU hour than low-weight k's of approximately the same size if you take into account the total sieving plus prime testing time. Gary Last fiddled with by gd_barnes on 2009-07-29 at 01:50
 2009-07-29, 02:02 #14 Mini-Geek Account Deleted     "Tim Sorbera" Aug 2006 San Antonio, TX USA 426710 Posts Did someone say factors? Here are the full factorizations of 3045*2^n-1 with 1c90s that withstood a moderate amount of ECM; someone can work on 'em if they want, the ECM statuses are all in the DB, some are ready for QS/GNFS): http://factordb.com/search.php?query...F=1&v=n&pp=350 It's interesting how easy it is to spot factors that repeat at certain n==x mod y, e.g. for 3045: (| means "divides", N refers to 3045*2^n-1, n refers to...n) 11|N when n=4 mod 10 13|N when n=8 mod 12 61|N when n=8 mod 60 (these cover n={4, 14, 24, 34, 44, 54, 8, 20, 32, 44, and 56} mod 60) and 17|N when n=7 mod 8 (lcm(60,8)=120, so I'd have to multiply all that stuff to include this case) It doesn't seem so daunting to discover a covering set, if one exists. All you need is some small factors. Last fiddled with by Mini-Geek on 2009-07-29 at 02:23
 2009-07-29, 10:57 #15 gd_barnes     May 2007 Kansas; USA 2×72×103 Posts You just went through exactly what I've gone through 1000 times with covering sets at CRUS and trying to find patterns that make full covering sets for specific k's in other bases. Occasionally I'll get one that gives a full covering set when combining it with algebraic factors. It's extremely rare to find one that unexpectedly gives a full covering set. What I usually end up with is a partial covering set like you did with the n-values narrowed down much more than what you have. Alas, at CRUS, I'm looking at low weights whereas you are looking at a very high weight. If a prime factor f occurs at all, it will always repeat every f-1 n-values -or- every (f-1)/q n-values for k*b^n-c forms. q can be just about anything but is usually 1 or 2. The most interesting factors to find are very big ones where q is large and hence occur far more frequently than every f-1 n-values. I've seen factors over 2000 occur every < 30 n. This is a normal thing for GFNs, which can only be prime if n is a power of 2 but for more random k-values, it doesn't happen often; mostly on low weights. Syd's DB Is a boon for this. I used to do it manually using Alperton's factorization applet. Now it's quick and easy to see factors that occur unusually frequently. An example of an exercise that I have gone through is for the lowest k without a prime: 2293*2^n-1 and it's sister form 2^n-2293, which has the same covering set and occurrences of factors. If you look in the DB, you'll see that there is a specific covering set that covers 23 out of every 24n. I've fully factored all except a couple of n-values for both forms for n<400. But the interesting part is that there is a secondary effort that I took on to attempt to factor that one n out of every 24 that doesn't have a known recurring factor. If you want to see what I've done so far with both sequences, look at: 2293*2^(24*n-1)-1 2^(24*n+1)-2293 This introduces a whole new way of looking at covering sets. If you only concentrate on every 24th value of these forms, you'll notice that on a typical sieve of them, they would be more "average" weight in that they contain far fewer small factors than the original form. Alas for this k as well as many other I've observed that contain large #'s of small factors, that 24th occurrence of each n apparently contains an abnormally high # of large factors and I speculate that is because the main form itself contains so many small factors. There seems to be something going with a large # of high factors when there is also a large # of small factors on the original form itself, even though the n-values being looked at can contain only few small factors. Interestingly enough, even reduced to only looking at every 24th n-value, the factors still occur in the (f-1)/q frequency as is the case if you looked at all n's. There are just very few small factors when you get outside of the main covering set. For instance, the factor of 2017 occurs every (2017-1)/144 = 14 exponents for these "every 24 n forms". In the original form, 2017 occurs every (2017-1)/6 = 336 n. To have a q-value of 144 is huge and it does happen occasionally in the original form itself. This frequency of the factor of 2017 within every 24th n is part of what makes k=2293 so hard to find a prime for. But the small factors of 23 and 29 that occur every 11 and 7 n respectively within every 24th n is a bigger part of the reason. Most of the factors that DO occur within every 24th n occur much more frequently than they otherwise should. BTW, I've also tested 2^n-2293 up to n=200K. No primes there either so far! Gary Last fiddled with by gd_barnes on 2009-07-29 at 11:11
 2009-07-29, 16:30 #16 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT) 24×353 Posts could this conversation about prime density/nash weight be moved to its own thread? it is an incredibly interesting subject which is worth investigating further Admin edit: done Last fiddled with by gd_barnes on 2009-07-29 at 20:16 Reason: admin edit
2009-07-29, 20:35   #17
Mini-Geek
Account Deleted

"Tim Sorbera"
Aug 2006
San Antonio, TX USA

17×251 Posts

Quote:
 Originally Posted by gd_barnes If a prime factor f occurs at all, it will always repeat every f-1 n-values -or- every (f-1)/q n-values for k*b^n-c forms. q can be just about anything but is usually 1 or 2. The most interesting factors to find are very big ones where q is large and hence occur far more frequently than every f-1 n-values. I've seen factors over 2000 occur every < 30 n. This is a normal thing for GFNs, which can only be prime if n is a power of 2 but for more random k-values, it doesn't happen often; mostly on low weights.
It appears to me that though this could not be used to prove the lack of a complete covering set, it could at least be used to show a minimum size of a potential covering set, and a minimum size of the largest factor in this covering set. Has any such work been done on Riesel or Sierpinski number candidates? If so, what are some of the minimum covering set sizes? If not, how would one go about this? Doing it all by hand (even with Syd's DB to help) would be quite slow.
By the way, saying f-1 or (f-1)/q and then that q is usually 1 or 2 is partially redundant, since the q=1 case is clearly covered by "f-1". I think it'd be better to ignore the separate "f-1" case, and just include it as (f-1)/q with q=1. Or alternately say that it will repeat every f-1 n-values, and more specifically every (f-1)/q n-values (with q usually 1 or 2).
Is it proven that for any f that divides k*b^n+c, f|k*b^n+c for every (f-1)/q n-values? From our older discussion on the topic, (which I can not locate) I thought this had to be somehow proven for each candidate factor, but it seems to hold up all the time, and I'm pretty sure you're saying this is so.

Maybe sieves already take advantage of this, maybe it would be terribly inefficient, maybe it is completely unnecessary, but I am still curious and feel I must ask:
How good would it be to check for a divisor based on the n? e.g. if you discover that 7|k*2^7-1, and find that it divides only when n==1 mod 6, (and not also with q=2 for n==1 mod 3) then you can remove all n==1 mod 6 candidates. This is simple enough, but is probably not worth the effort for such a small factor. But what if you were to do this for many factors? If you were to factor a small group of n's well beyond the normal depth (maybe to the extent of 0<n<500 being fully factored and n<10,000 factored to the normal depth of n=1,000,000, or something like that) and build a table of such exclusions, you could eliminate a great many n's for any n size without needing to directly handle such large numbers.
Depending on the depths you pre-sieved to and how far you're going, further traditional sieving may be best.
Quote:
 Originally Posted by gd_barnes An example of an exercise that I have gone through is for the lowest k without a prime: 2293*2^n-1 and it's sister form 2^n-2293, which has the same covering set and occurrences of factors. If you look in the DB, you'll see that there is a specific covering set that covers 23 out of every 24n. I've fully factored all except a couple of n-values for both forms for n<400. But the interesting part is that there is a secondary effort that I took on to attempt to factor that one n out of every 24 that doesn't have a known recurring factor. If you want to see what I've done so far with both sequences, look at: 2293*2^(24*n-1)-1 2^(24*n+1)-2293
After looking at these a bit, it seems that in f==y mod ((f-1)/q), (figured I should start calling it this instead of x==y mod z) the f and q are identical for these two series of numbers, but the y can vary. e.g. for 2017, for the k*2^n-1 form, y=1, and for 2^n-k, y=-1=13. As far as I can see, there is no simple mapping that deals with this (like y_2=(-1)*y_1).
e.g. 2017|2293*2^(24*(14*n+1)-1)-1 for every n,
but 2017|2^(24*(14*n-1)+1)-2293 for every n.
(14*n+1 vs 14*n-1)
This would mean that they would share the covering set's factors, but not the all-important fact of where they are that lets the covering set be complete and lets you ask where exactly you should search for more that you're missing.
Or am I just missing something?

2009-07-29, 20:35   #18
henryzz
Just call me Henry

"David"
Sep 2007
Cambridge (GMT)

564810 Posts

i have attached a spreadsheet for ks up to 49 which compares Gary's formula with the Nash weights.
Sheet1 contains the data for where they are all currently at
Sheet2 is a graph based on Sheet1 that compares Prime Density and Nash Weight
Sheet3 is similar to Sheet1 but contains only data up to n=1M
Sheet4 is similar to Sheet2 but is based on Sheet3 instead of Sheet1

Attached Files
 PrimeDensityCalculations.xls.zip (3.4 KB, 68 views)

2009-07-30, 01:50   #19
gd_barnes

May 2007
Kansas; USA

2·72·103 Posts

A mathematical nitpick: In order to properly compare the ratings, they have to be converted to a percentage because we are taking arbitrary candidates remaining (weight) and dividing by # of primes. Otherwise things that are based off of an average such as the least-squares fit line will be strongly skewed by k's with a very high rating (low prime density) as a result of having the bad luck to have few primes. Taken to extreme, if you included k=2293 in there, its rating would be infinite since is has zero primes. It's kind of hard to average or fit a least-squares line to infinite. lol

Example:
Thing A occurs 1 in every 1000 times so its "rating" is 1000.
Thing B occurs 1 in every 250 times so its "rating" is 250.
(i.e. similar to the way these ratings work)

If you average the two, you get that the average occurence is 1 in every 625 but that would be incorrect. You have to invert them to a percentage such as:

Thing A occurs 1/1000 = .001 of the time.
Thing B occurs 1/250 = .004 of the time.

Averaging the two gives .0025. This converts to 1 every 400 or a "rating" of 400. That would be correct. If you attempt to pick something out of hat A with a 1/1000 chance and hat B with a 1/250 chance, over the long run, you'll pick that thing out of one of the random hats 1/400 times.

Therefore for the spreadsheet to have a correct least-squares fit line, the ratings have to be inverted. The reason I chose it the way I did is that I didn't want ratings like .003, .004, etc. where the higher rating meant a more prime dense k.

Well, what do you know, at least from this small sampling the lower-weight k's have a slightly higher prime density. I even removed the Mersenne's and the lower weights were still better. Alas, this is a very small sampling. More data is needed.

I took your spreadsheet, changed the ratings by inverting them to percentages, removed the Mersenne's, and deleted sheets 3 and 4. The first sheet has more data and should make for more accurate analysis.

The above inversion could be done by inverting the original formula or simply taking 1 divided by the rating. I inverted the formula. For future comparison purposes and least-squares line fitting, the Prime k-value density rating "percentage" would be:

# of primes / Nash weight / log (search depth)

Anything lower than a rating of 200 or higher than a rating percentage of .005 would be outstanding IF the k-value has more than ~30 primes so as to have statistical significance. Obviously Mersenne's fit the "outstanding" category.

Gary
Attached Files
 PrimeDensityCalculations.zip (9.3 KB, 74 views)

 2009-07-30, 05:53 #20 henryzz Just call me Henry     "David" Sep 2007 Cambridge (GMT) 24×353 Posts what sort of quantity of data do we need? how many ks? do we need an automated process?
 2009-07-30, 07:50 #21 gd_barnes     May 2007 Kansas; USA 2×72×103 Posts All 501 k's up to k=1001 should be sufficient to get close to statistical significance within weight ranges if there is any significance to be found.
2009-07-30, 08:33   #22
henryzz
Just call me Henry

"David"
Sep 2007
Cambridge (GMT)

10110000100002 Posts

Quote:
 Originally Posted by gd_barnes All 501 k's up to k=1001 should be sufficient to get close to statistical significance within weight ranges if there is any significance to be found.
how do you plan to get that much data into a spreadsheet?

 Similar Threads Thread Thread Starter Forum Replies Last Post wblipp Math 20 2011-09-07 21:45 CRGreathouse Math 1 2010-08-22 23:47 Random Poster Math 1 2008-12-15 01:14 jasong Math 18 2006-03-31 03:14 hbock Lone Mersenne Hunters 1 2004-03-07 19:51

All times are UTC. The time now is 21:32.

Sat Apr 4 21:32:34 UTC 2020 up 10 days, 19:05, 0 users, load averages: 1.63, 1.85, 1.83

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.