mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Conjectures 'R Us

Reply
 
Thread Tools
Old 2012-02-03, 23:45   #100
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24·397 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
It's interesting that you came up with a relative difficulty value. I set up something like this in a spreadsheet on my PC about a year ago for bases <=256. It is actually how I come up with bases to recommend. But I have to keep it manually updated. The main difference is that in my spreadsheet, I only consider base, # of k's remaining, and the test limit...not weights. I also add a "bias" towards smaller bases because I consider them a little more important even if they are somewhat more difficult to advance. Essentially I do (base * n search depth)^2 * # of k's remaining. I then apply the low-base bias to that. Effectively I assume that all k's are the same weight. It gives an OK estimate of difficulty but obviously not as accurate as considering the weight of the k's.

I'm still curious to find out how you are computing the average weights for multiple k's. Do you have srsiseve automatically run for every k in every base (for all bases with <= 25 k's) ?
I run srsieve to 1e6 for all k (for the given conjecture) at one time. I could have gone to 511 or some other small value, but I noticed that some conjectures remove a much higher proportion of candidates between 511 and 1e6 than others. Also going to 1e6 kept the relative difficulties a little smaller. I think that the computation for Proth weight is okay when k and b are small, but if they are larger, such as on this project, I think that 511 is too small. I would suggest that the weight be computed by sieving to 1e6 or maybe 1e7. One could then estimate that number of remaining primes in any range of 10000 would be somewhere between 40 and 50% of that number.

As for computing the k for the conjectures with more than 25 remaining, I d/l'd the other pages then wrote a gawk script to pull the k off of them and store them in srsieve input files, then read that list into my other script to compute the relative difficulty with srsieve. Those values are stored in a .txt file so that subsequent runs don't need to recalculate the difficulty. Automating that could be done, but it requires a bit more work. I hope that makes sense.
rogue is offline   Reply With Quote
Old 2012-02-04, 01:11   #101
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

101000101000112 Posts
Default

Quote:
Originally Posted by rogue View Post
I run srsieve to 1e6 for all k (for the given conjecture) at one time. I could have gone to 511 or some other small value, but I noticed that some conjectures remove a much higher proportion of candidates between 511 and 1e6 than others. Also going to 1e6 kept the relative difficulties a little smaller. I think that the computation for Proth weight is okay when k and b are small, but if they are larger, such as on this project, I think that 511 is too small. I would suggest that the weight be computed by sieving to 1e6 or maybe 1e7. One could then estimate that number of remaining primes in any range of 10000 would be somewhere between 40 and 50% of that number.

As for computing the k for the conjectures with more than 25 remaining, I d/l'd the other pages then wrote a gawk script to pull the k off of them and store them in srsieve input files, then read that list into my other script to compute the relative difficulty with srsieve. Those values are stored in a .txt file so that subsequent runs don't need to recalculate the difficulty. Automating that could be done, but it requires a bit more work. I hope that makes sense.
So does that mean that every time someone pulls the page up, srsieve has to run automically for every base with <= 25 k's remaining? If so, how do you make it run so quickly? It seems like that would be necessary since at any time, a base can be updated to remove a k or several k's. I'm mainly asking because my server machine already runs so many programs every 15 mins. or hour or day.

Last fiddled with by gd_barnes on 2012-02-04 at 01:12
gd_barnes is online now   Reply With Quote
Old 2012-02-04, 02:34   #102
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

18D016 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
So does that mean that every time someone pulls the page up, srsieve has to run automically for every base with <= 25 k's remaining? If so, how do you make it run so quickly? It seems like that would be necessary since at any time, a base can be updated to remove a k or several k's. I'm mainly asking because my server machine already runs so many programs every 15 mins. or hour or day.
Someone opening up a page doesn't trigger generation of the stats. I assume that the stats pages are generated via some type of cron job that probably runs only once per day.

The script is not automated to pull remaining k from conjectures with more than 25 left. I think that would be easy to do. It only needs to run srsieve when the number of remaining k changes. If the number of remaining k doesn't change, the entire script can run in a few seconds. For conjectures with < 25k, running srsieve is quick, probably taking only a second or two. Those with 1000 or so can run in about a minute. It is the ones like R51 and S63 that can take a few minutes. Once R51 and R79 get to n=25000 I doubt anyone will take them (or S63) further anytime soon.

This is the pseudo-code:

read base + kleft + difficulty from difficulty.txt file

for each base from conjecture page
if base and kleft from that page = base and kleft from difficulty, use value from input file
else compute difficulty with srsieve
end for

for each base
write base + kleft + difficulty to difficulty.txt file
end for

Similar logic is done for getting the weight for 1k conjectures.

The worst part is the code that reads from the extra reservation pages, i.e. those with more than 25 k left. I had to manually edit some of the html so that it could be parsed cleanly. For example, one page had an embedded &nbsp; after the k. I had to remove that. Another had leading spaces before the k on each line. I had to remove those. Another had a space before a comma. Fortunately these were few and far between.
rogue is offline   Reply With Quote
Old 2012-02-04, 02:42   #103
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

101·103 Posts
Default

I wasn't thinking too well when I asked that. The pages are generated new every 15 mins. So...the way you have it set up right now, srsieve only needs to run for bases that have changed? That would be very good. How are you able to tell whether a base has changed or not? Do you keep a file of all bases with how many k's are left and what their search depth is that is compared against? Sorry if you've already implied that by the pseudo code. It's still not 100% clear to me.

Regardless, I assume that srsieve would need to run once for ALL bases <= 25 k's when we first go live.

Last fiddled with by gd_barnes on 2012-02-04 at 02:49
gd_barnes is online now   Reply With Quote
Old 2012-02-04, 03:01   #104
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

143208 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
I wasn't thinking too well when I asked that. The pages are generated new every 15 mins. So...the way you have it set up right now, srsieve only needs to run for bases that have changed? That would be very good. How are you able to tell whether a base has changed or not? Do you keep a file of all bases with how many k's are left and what their search depth is that is compared against? Sorry if you've already implied that my the pseudo code. It's still 100% clear to me.

Regardless, I assume that srsieve would need to run once for ALL bases <= 25 k's when we first go live.
I'm sorry. I thought that you had more familiarity with the scripts as they are currently written. Here is what the scripts do:

Use wget to pull down the 6 pages at the top of http://www.noprimeleftbehind.net/crus/. One gawk script extracts data from the conjecture pages, another from the reservation pages. Those scripts put the data into temp files. Those temp files are read by another script which then generates the html that comprises the main tables, putting that into another temp file. Finally the temp files generated by the previous (but not first steps) is merged to generate the final pages. The temp files and wgeted html are then deleted. What isn't deleted are the text files holding the conjectured k (which are static) and the difficulty/weight, which are rewritten every time the script is run.

This takes only a few seconds. One would think that the gawk script to do the parsing takes a lot of CPU, but it really doesn't take much at all.

I can provide the files with pre-computed weights and difficulties, thus they won't only need to be re-computed when the number of k changes. The new values will be written to the weight and difficulty files. The important thing is that the difficulty only needs to be computed when the search limit changes or when the number of remaining k changes. The project has slowed down and only a few conjectures are changed each day.

IMO, there is little point to running the script every fifteen minutes because the html that you maintain is only changed once or twice a day. They could be run once every hour or even as little as once per day.
rogue is offline   Reply With Quote
Old 2012-02-04, 03:21   #105
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

101·103 Posts
Default

Quote:
Originally Posted by rogue View Post
I'm sorry. I thought that you had more familiarity with the scripts as they are currently written. Here is what the scripts do:

Use wget to pull down the 6 pages at the top of http://www.noprimeleftbehind.net/crus/. One gawk script extracts data from the conjecture pages, another from the reservation pages. Those scripts put the data into temp files. Those temp files are read by another script which then generates the html that comprises the main tables, putting that into another temp file. Finally the temp files generated by the previous (but not first steps) is merged to generate the final pages. The temp files and wgeted html are then deleted. What isn't deleted are the text files holding the conjectured k (which are static) and the difficulty/weight, which are rewritten every time the script is run.

This takes only a few seconds. One would think that the gawk script to do the parsing takes a lot of CPU, but it really doesn't take much at all.

I can provide the files with pre-computed weights and difficulties, thus they won't only need to be re-computed when the number of k changes. The new values will be written to the weight and difficulty files. The important thing is that the difficulty only needs to be computed when the search limit changes or when the number of remaining k changes. The project has slowed down and only a few conjectures are changed each day.

IMO, there is little point to running the script every fifteen minutes because the html that you maintain is only changed once or twice a day. They could be run once every hour or even as little as once per day.

EXCELLENT!! Thanks. That all sounds good.

I'll put a bug (pun intended :-] ) in Max's ear to check out this thread and coordinate the implmentation of all of this. He's pretty busy right now but makes time for the important efforts that come up on the two projects. It sounds like everything runs quickly even if the difficulty stat has to be completely generated from scratch the first time so I'll leave it up to you guys if you want to provide with a reference file to compare to.

Sometimes I'll update the pages 3 or 4 times in a day but mostly on average, it's twice a day when I'm in town if people have posted status updates and once every day or two when I'm out of town. While 15 mins. is more than necessary, I'd still like the stats to run once/hour. I'll suggest that to Max. Part of the reason is that when I make a change to the main pages, I want to make sure that the stats pages properly reflect it in all places.

Last fiddled with by gd_barnes on 2012-02-04 at 03:22
gd_barnes is online now   Reply With Quote
Old 2012-02-04, 03:35   #106
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24·397 Posts
Default

Sounds good to me. I have some clean-up to do on the scripts (next week). I'll post updates and once all of us are in agreement about what we want to see and how we want to see it, then Max can make the updates.
rogue is offline   Reply With Quote
Old 2012-02-04, 05:12   #107
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Hi guys,

Yes, I've been following this discussion--more or less skimming some of the finer points as they went by but I've got a pretty good idea where we're standing. Mark, if I'm reading you correctly, you're saying you'll have the scripts ready next week? (Definitely no rush--I expect to be a little less busy next week than this week.)

The original vstats scripts were basically self-contained, so I could just drop them in a directory on the web server, set up the cron job, and put the link on the home page; I'm assuming these are set up similarly? In that case, it would be quite simple for me to implement them--I could just replace the existing contents of the vstats/ directory with the new scripts. Also, since there are additional pages being generated by the new scripts, is the idea to have those linked individually from the home page, or as subpages linked from the main page (crus-stats.htm)? (Or is this still yet to be decided?)

Thanks,
Max
mdettweiler is offline   Reply With Quote
Old 2012-02-04, 14:40   #108
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24·397 Posts
Default

Yes, the would be linked to from the main page. The new scripts should run in the same manner as the current scripts.
rogue is offline   Reply With Quote
Old 2012-02-05, 05:08   #109
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

101×103 Posts
Default

Max,

When the pages are implemented, please set the cron job (update job?) to run once/hour at 30 minutes after the hour. Since the NPLB jobs run at the top of the hour, that will spread the work out a little on the machine.


Gary
gd_barnes is online now   Reply With Quote
Old 2012-02-05, 06:05   #110
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3·2,083 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
Max,

When the pages are implemented, please set the cron job (update job?) to run once/hour at 30 minutes after the hour. Since the NPLB jobs run at the top of the hour, that will spread the work out a little on the machine.


Gary
Okay, will do. Currently I have it running at :01, :15, :30, and :45; but yeah, just once at :30 would be quite sufficient. I think the only reason I had it going every 15 minutes was because I copied/pasted the job command from the one that runs the LLRnet/PRPnet status pages every 15 minutes.

Last fiddled with by mdettweiler on 2012-02-05 at 06:05
mdettweiler is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Sieving for CRUS rebirther Conjectures 'R Us 638 2021-06-15 07:55
BOINC effort for CRUS gd_barnes Conjectures 'R Us 75 2015-06-17 14:25
What are your CRUS plans? rogue Conjectures 'R Us 35 2013-11-09 09:03
how high will CRUS go Mini-Geek Conjectures 'R Us 1 2010-11-08 20:50
CSVs for stats available + New combined stats opyrt Prime Sierpinski Project 3 2010-05-31 08:13

All times are UTC. The time now is 10:16.


Tue Jul 27 10:16:24 UTC 2021 up 4 days, 4:45, 0 users, load averages: 2.66, 2.20, 2.03

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.