mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Data (https://www.mersenneforum.org/forumdisplay.php?f=21)
-   -   Newer milestone thread (https://www.mersenneforum.org/showthread.php?t=13871)

airsquirrels 2015-12-30 02:54

Ok so here is what we have so far from my read, and some thoughts on how we could phase this in with some relatively small changes.

I. Prime95 client modified to support the following features:
1. Output interm result lines (partial residues) to results.txt that are uploaded to primenet. I would say every million iterations would be a sane interval. This would be a simple line containing the exponent, what iteration it is at, and the 64bit residue. 5KB or so in total DB space per-exponent. We should have that disk space/bandwidth.
2. Save a full local backup of the residue corresponding to each million iteration checkpoint.
- Erase these under the following circumstances
a. The corresponding exponent is completed
b. The exponent is aborted (see #4)
c. The number of saved checkpoints exceeds the newly added max checkpoints to save number. Remove the oldest.
d. The server "accepts" a partial residue and indicates it is safe to clear older checkpoints.
e. The server requests a rollback and all checkpoints newer than the rollback point are removed.
3. Accept a response from the server indicating a rollback point (via worktodo?)
a. Find the earliest checkpoint <= to that rollback and restart from that point, the beginning if necessary.
b. Cleanup older iterations.
c. Optionally log the reason for the restart/rollback
4. If not already supported, accept a response from the server indicating work on a given exponent should be aborted.
a. remove the exponent from the work queue.
b. Restart worker with next work unit
c. Cleanup checkpoints.
d. Optionally log the reason for abort.
5. Accept a response from the server requesting a given full checkpoint be uploaded.
a. Queue another thread or process to upload the checkpoint. This could simply be a worktodo entry?
b. Handle failure modes gracefully. (Retry? Just quit and let server request the next checkpoint?)
c. Compression? (There was a thread discussing that the checkpoints aren't very compressible - I'd love to empirically show that in practice it's worth the effort but I haven't tried yet and it may well not be. Alternatively I have a great compression method, where the data can simply be encoded as n^iteration mod exponent. Pretty CPU intensive to decompress on the server though :) )
6. Accept worktodo from the server that includes information indicating that a full checkpoint at a given interation is available for download
a. Support attempting to download the full checkpoint before starting work.
b. Handle a reasonable number of retries before just starting from the beginning. Graceful failure modes..
7. Accept a response from the server that indicates checkpoints older than a given iteration are no longer needed.
a. Cleanup unless other settings call for the checkpoints to be preserved

That's it for the primenet client, those features could be added and lay dormant/be tested non destructively. Only the incremental results would need to be ignored by the server. Someone can poke holes in opportunities for abuse, etc. but this would give us all the capabilities in the client but reserve all the 'How should this behave' logic for the server where it is easier to tune.

II. The server would need the following modifications, which could be added in stages
1. Accept the incremental result residue lines.
a. Store these in a DB record 1-* with the assignment.
b. Award credit incrementally for the work since the last incremental result? it may be better not to do this at this point. See #3
c. If the class of work and server configuration warrants it, add assignment/send a response that requests the full iteration be uploaded.
d. If the work class is DC, or there are two tests running in parallel, and we have one or more residue for this iteration to compare: (Discussion needed here!)
i. If it matches another residue from another user, return an Accept response (Client #7) - this is the 'keep going all is good' state
ii. If it matches another residue from this user/assignment, return an Accept response (Client #7) for this user (Rollback was good on DC.) - Queue a worktodo/response/message for any non-matching users to abort(if still active) and indicate a hardware/software error (Client #4.) or possibly to Rollback (if running simultaneously. - Client #3). State chart needed to make sure we are a handle all corner cases/event combinations here. It's likely the second part of this is handled by iii below.
iii. If there are no matching residues and it mismatches at least one residue, return a client response #7 to rollback (we haven't matched our own, go back and see if we do.) queue a response #7 for the other assignment if it is still active as well?

2. Accept the upload of a full checkpoint as requested from Server #1c/Client #5
a. Store this in a DB record 1-* with the assignment.
b. Award partial credit at this time?

3. Support assigning work for which existing checkpoints are available from the server in a form handled by Client #6. Both LL and DC work should accept this.


I'm sure I am missing some pieces, but the framework of this doesn't seem overly arduous. Thoughts? This needs a review to ensure the credit system would remain intact, the chain of proof remains intact, and that there aren't new avenues for abuse.

Who here actually owns the responsibility for these components? I know George writes prime95, is he open to external help/patches for review? I have no idea who actually handles the server side of primenet.

Edit: Argggg, formatting / indentation lost. I will clean this up when I'm not on a mobile device.

Dubslow 2015-12-30 05:34

:direction:

Madpoo 2015-12-30 07:21

[QUOTE=airsquirrels;420476]Who here actually owns the responsibility for these components? I know George writes prime95, is he open to external help/patches for review? I have no idea who actually handles the server side of primenet.[/QUOTE]

George would be the go-to for Prime95 and on the Primenet side of things, there's George, James, and Scott. My own role seems to be to suggest wild ideas that other people would be responsible for implementing. :smile:

Of course I help out when/where I can... helping with any DB schema changes, sproc updates and what not. I've learned enough PHP to "skin" the website design and update things here and there like the milestone page, etc. The client/server communication parts of the site are something I try to avoid. Too scary for me.

For everything it does, the Primenet server is pretty straightforward, which is a good thing... made it easy for me to peek in and see the flow of things. As you'd imagine, a system that's evolved over the years has a lot of nooks and crannies though. With that said, grafting on new features doesn't need to be too difficult as long as backwards compatibility with older clients is maintained. Heck, it still accepts chatter from old v4 (Prime95 v24.x and earlier) clients which has it's own little code just to handle.

NBtarheel_33 2016-01-02 12:23

Taking a look at the old [URL=http://www.mersenne.org/report_classic/]"colorful stats report"[/URL]:[LIST][*]Less than 6,000,000 P90-years = 30,450,000 GHz-days remaining to having every Mersenne number under M79300000 tested at least once.[*]Only 146 double-checks (or factors found!) remain between exponents 30.15M and 35.1M before the row for those exponents can be collapsed into the row for 0-30.15M.[*]Only 8,600+ Mersenne numbers need factored to reach a total of 3,000,000 factored Mersenne numbers in the "classical" exponent range of 0-79.3M.[/LIST]

cuBerBruce 2016-01-14 03:29

What was the #1 first-LL exponent has been completed by an ANONYMOUS user with an expired assignment. User markr now has [url=http://www.mersenne.org/report_exponent/?exp_lo=60356927&full=1]M60356927[/url] as a double-check. The lowest first LL is now M60371299.

I note another user checked in results for 2 60M range exponents and 2 61M range exponents all at the same time less than two days ago. All four of these were for expired assignments, leaving Patrik Johansson, Zr40, and vats09 with double-checks.

EDIT: Also...
[QUOTE] Countdown to double-checking all exponents below 35M: [size=3][b]10[/b][/size] (Estimated completion : 2016-02-01) [/QUOTE]

cuBerBruce 2016-01-14 16:19

[QUOTE]Countdown to first time checking all exponents below 61M: [color=red]10[/color] [color=green](Estimated completion : 2016-02-11)[/color][/QUOTE]

It will be down to 9 within 6 hours.

petrw1 2016-01-14 16:50

Countdown to double-checking all exponents below 35M: 10 (Estimated completion : 2016-02-01)
 
I have 2 on a machine with a failed hard drive.
I will make sure they get done somehow soon enough to not hold up that range.

Uncwilly 2016-01-19 19:22

Regarding the Milestones page. I was thinking about the page as I fell asleep last night. It seems to be a bit hard coded. What about a rules based page? Snapshot for reference
[CODE]All exponents below 34,969,871 have been tested and double-checked.
All exponents below 60,371,299 have been tested at least once.

Countdown to first time checking all exponents below 61M: 6 (Estimated completion : 2016-02-11)
Countdown to first time checking all exponents below 62M: 13 (Estimated completion : 2016-02-11)
Countdown to first time checking all exponents below 63M: 25 (Estimated completion : 2016-02-11)
Countdown to first time checking all exponents below 64M: 84 (Estimated completion : 2016-05-12)
Countdown to first time checking all exponents below 65M: 134 (Estimated completion : 2016-05-12)
Countdown to first time checking all exponents below 66M: 200 (Estimated completion : 2016-05-12)
Countdown to first time checking all exponents below 67M: 764 (2 still unassigned)
Countdown to first time checking all exponents below M(74207281): 81,863

Countdown to double-checking all exponents below 35M: 7 (Estimated completion : 2016-02-02)
Countdown to double-checking all exponents below 36M: 5,932 (5,201 still unassigned)

Countdown to proving M(37156667) is the 45th Mersenne Prime: 16,191
Countdown to proving M(42643801) is the 46th Mersenne Prime: 99,202
Countdown to proving M(43112609) is the 47th Mersenne Prime: 108,391
Countdown to proving M(57885161) is the 48th Mersenne Prime: 400,202
Countdown to proving M(74207281) is the 49th Mersenne Prime: 636,148[/CODE]
The "All exponents below" lines are good.

The "Countdown to first time checking all exponents below xXXM" lines should follow the following rules in order:[LIST=1][*]Display next 2 first time LL millions milestones.[*]Display any additional first time LL milestones with counts below 100.[*]Display any additional first time LL milestones where all exponents below it have been assigned.[*]Display any open first time milestones for at Mprimes up to largest known.[/LIST]The same rules for the double checks, except:
Rule 3 would need to check that there are no outstanding first LL assignments.
Rule 4 would apply to the "Countdown to proving" section.

This would change the current page a little and could change over the course of a day, but it would save having to monkey with it all of the time. But if you make the query run once an hour at a set time, that would be fine.

:two cents:

cuBerBruce 2016-01-20 01:43

Countdown to first time checking all exponents below 61M: [size=5][b]5[/b][/size]

Madpoo 2016-01-20 03:26

[QUOTE=Uncwilly;423091]Regarding the Milestones page. I was thinking about the page as I fell asleep last night. It seems to be a bit hard coded. What about a rules based page?
...
The "All exponents below" lines are good.

The "Countdown to first time checking all exponents below xXXM" lines should follow the following rules in order:[LIST=1][*]Display next 2 first time LL millions milestones.[*]Display any additional first time LL milestones with counts below 100.[*]Display any additional first time LL milestones where all exponents below it have been assigned.[*]Display any open first time milestones for at Mprimes up to largest known.[/LIST]The same rules for the double checks, except:
Rule 3 would need to check that there are no outstanding first LL assignments.
Rule 4 would apply to the "Countdown to proving" section.

This would change the current page a little and could change over the course of a day, but it would save having to monkey with it all of the time. But if you make the query run once an hour at a set time, that would be fine.

:two cents:[/QUOTE]

Yeah, the way it's setup now, there's a SQL stored procedure that pulls the data and pops it onto the page. That same sproc will either used cached info or, if more than the default time has elapsed, it will re-run the queries to refresh the values. Kind of a convenient version of NoSQL in a sense which happens to use *actual* SQL to hold the data.

It's actually setup to be kind of flexible... the text itself and the data it shows can be customized for various things and it's just limited to the type of query I can write. I actually have it create data up to countdowns up to 70M but going past 67M right now is kind of useless...but it's there for later.

I did just add the countdown to first-time checking below M49 since it's kind of relevant and fresh, and it shows just how many possibilities there are for an unknown prime below that one.

Once we start clearing out these 61M - 64M stuff it won't be quite so cluttered. As usual there are some foot draggers in there but they'll go away eventually.

At least this system lets me add other interesting milestones if/when the mood strikes me, and it'll be cached along with the rest.

cuBerBruce 2016-01-22 13:23

[QUOTE]Countdown to double-checking all exponents below 35M: [color=red]5[/color] [color=green](Estimated completion : 2016-02-04)[/color][/QUOTE]

A couple poachings (apparently by 2 different users) of stuck assignments and the countdown is down to 5.

The smallest exponent recently was recycled and there is a race between the prior assignee and the new assignee to see who finishes it first. The next 3 will almost certainly be recycled as well even though the current assignees will likely finish them very soon after they expire.

The last one is also about to be recycled. That's a little farther from being finished (in terms of ETA) than the others.


All times are UTC. The time now is 23:15.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.