mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Data (https://www.mersenneforum.org/forumdisplay.php?f=21)
-   -   Newer milestone thread (https://www.mersenneforum.org/showthread.php?t=13871)

Madpoo 2015-12-26 23:40

[QUOTE=henryzz;420221]How fast is the needed bandwidth increasing in comparison with the cost per MB decreasing?
I would guess that this sort of setup would be getting cheaper over time. When would this look like being viable?[/QUOTE]

Good question.

Depends on many factors... at what point would an interim file be sent to the server, and how long would it be kept for?

I imagine that in whatever case, once a result is turned in, any interim files stored for that exponent would be removed... no use for them anymore. Perhaps the partial residue is still saved at whatever percent for comparison by double-checks...

In the case of abandoned work where it managed to upload an interim file at some point, would we expect the server to hand that out to the new assignee so it can continue at the same point?

Would the server keep several interims for the same exponent, or let's say it uploaded at 30% and 60% (just for example), should it delete the 30% file when the 60% comes in?

Or maybe not even bother with multiple interims of the same exponent... just have the client upload one when it reaches 33% or something and then nothing else... just delete that when it checks in a result, or hand it to someone else if the assignment expires.

It would be useful to know how many iterations, on average, an exponent reports in before it expires because the user wandered off. It's probably a big range...and that's if they even report back at all. It may not be surprising to anyone that a lot of anonymous assignments are never heard back from since they day they checked out a number.

So... yeah, a lot of variables on when to collect, how many to collect, how many and when to save them/hand them out to new folks, etc.

Madpoo 2015-12-27 00:39

[QUOTE=cuBerBruce;420253]The 60M milestone has been reached!
:party: :toot:[/QUOTE]

Milestone page updated.

henryzz 2015-12-27 17:07

Assuming weekly uploads 260 GB/7 ~ 35 GB/day. This needs average of around a 3.25 mega-bit connection. My home connection could handle that without any additional cost. It all depends on how the cost is calculated.

This is probably an overestimate as well. Many people won't upload each week.
Storage on the server shouldn't be an issue I would have thought. Special disks shouldn't be needed to provide bandwidth. A normal disk could handle that traffic.

chalsall 2015-12-27 18:02

[QUOTE=henryzz;420296]It all depends on how the cost is calculated.[/QUOTE]

There are, in my mind, two other issues...

1. Who gets the credit for the cycles (and, perhaps more importantly, when the next Mersenne Prime is found)? The one who completes the last cycle, the one who contributed the most cycles, or all who contributed cycles?

1.1. If either (or both) of the latter two, trust that some will "game the system" by contributing only enough cycles to be in the game for others to then finish.

2. Those who don't complete an assignment probably aren't that serious, and thus (heuristically) might have unreliable machines.

2.1. Is it worth investing in new software development (humans are expensive), bandwidth and storage (cheap) when the number of bad tests will almost certainly to go up (possibly without detection for years)?

2.1.1. Where, exactly, do the economic curves cross?

henryzz 2015-12-27 19:21

[QUOTE=chalsall;420297]There are, in my mind, two other issues...

1. Who gets the credit for the cycles (and, perhaps more importantly, when the next Mersenne Prime is found)? The one who completes the last cycle, the one who contributed the most cycles, or all who contributed cycles?
[/quote]
I assume credit will have to to be shared. Any recognition will need to be proportional to the amount of iterations done.
[QUOTE=chalsall;420297]
1.1. If either (or both) of the latter two, trust that some will "game the system" by contributing only enough cycles to be in the game for others to then finish.
[/quote]
A sensible minimum number of iterations would help this. 1M maybe? Credit would be divided according to the amount of iterations done anyway.
[QUOTE=chalsall;420297]
2. Those who don't complete an assignment probably aren't that serious, and thus (heuristically) might have unreliable machines.
[/quote]
Maybe an upload should be doublechecked before it is used.
[QUOTE=chalsall;420297]
2.1. Is it worth investing in new software development (humans are expensive), bandwidth and storage (cheap) when the number of bad tests will almost certainly to go up (possibly without detection for years)?
[/quote]
With doublechecking will they go up significantly?
The aim here is to not waste work when someone leaves GIMPS with a large percentage done.
Human hours is an issue. Madpoo, Prime95 etc will need to decide whether it is worthy of their time.
[QUOTE=chalsall;420297]
2.1.1. Where, exactly, do the economic curves cross?[/QUOTE]
We could do with coming up with an estimate of how much work would be saved by this.

edit: Maybe this discussion should be split off into its own thread.

Madpoo 2015-12-27 19:24

[QUOTE=chalsall;420297]There are, in my mind, two other issues...

1. Who gets the credit for the cycles (and, perhaps more importantly, when the next Mersenne Prime is found)? The one who completes the last cycle, the one who contributed the most cycles, or all who contributed cycles?[/QUOTE]

No credit for quitters. :smile:

[QUOTE=chalsall;420297]2. Those who don't complete an assignment probably aren't that serious, and thus (heuristically) might have unreliable machines.[/QUOTE]

Possibly. It's hard to say without digging into details. I suppose a basic analysis of machines that have only checked in one result, and what percent of those were bad, might offer a clue. I don't know if that could be extrapolated to machines that never checked in a result, but maybe it'll get you in the ballpark.

S485122 2015-12-27 19:35

I wouldn't want to start on the basis of work done by an unknown machine.

Another thing is that one would get a mixture of hardware, software versions an all.

Is the total work done on on almost finished, but abandoned, work units so big ? Of course they stand out because they will remain on the active assignments for a long time (until they expire.) But compared to the total of the throughput I am sure the work done on those "almost finished assignments" is negligible.

In my opinion a false good idea.

Jacob

Madpoo 2015-12-27 19:43

[QUOTE=henryzz;420296]Assuming weekly uploads 260 GB/7 ~ 35 GB/day. This needs average of around a 3.25 mega-bit connection. My home connection could handle that without any additional cost. It all depends on how the cost is calculated.

This is probably an overestimate as well. Many people won't upload each week.
Storage on the server shouldn't be an issue I would have thought. Special disks shouldn't be needed to provide bandwidth. A normal disk could handle that traffic.[/QUOTE]

Disk storage and performance isn't really an issue. Right now the server does NOT have enough space for something like this, but that can always be upgraded.

The main thing would be handling the network bandwidth of all that upload/download.

The current colocation provider offers a certain amount of monthly total data transferred, or I think alternatively they can do a bandwidth cap at 100 Mb/s.

Let's say we assumed 500 GB of weekly data, that's a couple TB per month on top of the normal server functions.

In theory it could be possible... ultimately though this gets into an area that might be better suited for hosting the files and transfers on AWS in an S3 storage blob just for convenience and to keep the core server functions isolated. I guess if it turned out the server could handle the bandwidth and storage, it could be moved in-house.

This is all speculative at this point anyway... but suffice to say that technology now would make a feature like this feasible, unlike years ago when Prime95 started out and dial-up was common.

It would involve changes to the client, the server, new options to enable/disable that feature (clients might not want a bunch of data going "into the cloud" regularly), consideration of the "upload points" (at what % does the interim file get sent up), who gets credit (only the one who finishes, if you ask me), should the server start saving temporary partial residues along the way to provide more "point in time" comparisons besides just the final, etc.

And what's the end goal? To me the end goals are essentially:
1) abandoned work can be picked up by someone else without starting over
2) partial residues along the way let us know when one or the other tests goes screwy, before the final iteration, so a 3rd (or more) test can be assigned immediately if desired.

Madpoo 2015-12-27 20:09

[QUOTE=S485122;420306]I wouldn't want to start on the basis of work done by an unknown machine.[/QUOTE]

I wouldn't mind, but probably only on double-checks so that a good/bad answer will be known at the end. I'm less sure if I'd be willing to do that when the answer as to the veracity of the result is a decade out.

[QUOTE=S485122;420306]Another thing is that one would get a mixture of hardware, software versions an all.[/QUOTE]

True, but everything still gets double-checked so it may not matter too much that something was tested at first by mprime 27.9 and finished by Prime95 28.7 or by mlucas or by a GPU. The final residue should still match unless the hardware bombed.

There are already folks out there (myself included) who may start on one machine and finish on another, or who upgrade their Prime95 mid-run so the start/end versions are different. And I think others have probably moved interim files from Win<->Linux or to a GPU. In other words, all those situations probably already happen.

[QUOTE=S485122;420306]Is the total work done on on almost finished, but abandoned, work units so big ? Of course they stand out because they will remain on the active assignments for a long time (until they expire.) But compared to the total of the throughput I am sure the work done on those "almost finished assignments" is negligible.
[/QUOTE]

Good question... I wasn't sure just how many assignments are abandoned and at what stage/% done.

There have been 720,670 assignments that expired with zero work done. I'd guess almost all of those (95%+) never checked in again after it was assigned.

Another 174,815 assignments expired after checking in "some" progress. Of those, 159,206 were in the LL stage (the others reported some progress in the TF or P-1 stages but never started LL).

The average % done of those that started LL is 12.15%, but here's a breakdown by 10th percentiles. And yes, oddly there are 3 results that expired even though the LL % done is 100%. Might have been rounded up from 99.95% + ?

[CODE]% Done Count
0 116600
10 11806
20 7164
30 5474
40 4103
50 3551
60 3084
70 2749
80 2565
90 2107
100 3[/CODE]

So if we hypothesize a system where the interim file is uploaded at 10%, there are 42,606 abandoned LL tests that could pick up again at 10%. Is a 10% leg up enough of a time-saver to make it worthwhile? If it were 20% that would mean 30,800 tests that could be picked up a fifth of the way in... is that a good enough time save?

The alternatives could involve simply uploading the interim file every 10% rather than at a single fixed point, so the time saved would depend entirely on how far they got, to the nearest 10th %. But then the monthly bandwidth requirements goes up quite a bit.

Maybe it's an option that only new accounts would use by default since they're the most likely to abandon work before finishing. That would save significant bandwidth and storage from people like Curtis and other heavy users who go through a lot of work and actually finish it.

Might even be the kind of thing that's enabled for the first couple LL tests a machine does and then shuts itself off? Just thinking out loud here...

chalsall 2015-12-27 20:17

[QUOTE=Madpoo;420311]Might even be the kind of thing that's enabled for the first couple LL tests a machine does and then shuts itself off? Just thinking out loud here...[/QUOTE]

But... That would involve a lot of work on the Server(s) and Client side software (with the risk of new bugs), for very little real upside.

An interesting idea, but not worth the risk (IMEO).

NBtarheel_33 2015-12-27 21:58

[QUOTE=Madpoo;420311]I wouldn't mind, but probably only on double-checks so that a good/bad answer will be known at the end. I'm less sure if I'd be willing to do that when the answer as to the veracity of the result is a decade out.



True, but everything still gets double-checked so it may not matter too much that something was tested at first by mprime 27.9 and finished by Prime95 28.7 or by mlucas or by a GPU. The final residue should still match unless the hardware bombed.

There are already folks out there (myself included) who may start on one machine and finish on another, or who upgrade their Prime95 mid-run so the start/end versions are different. And I think others have probably moved interim files from Win<->Linux or to a GPU. In other words, all those situations probably already happen.



Good question... I wasn't sure just how many assignments are abandoned and at what stage/% done.

There have been 720,670 assignments that expired with zero work done. I'd guess almost all of those (95%+) never checked in again after it was assigned.

Another 174,815 assignments expired after checking in "some" progress. Of those, 159,206 were in the LL stage (the others reported some progress in the TF or P-1 stages but never started LL).

The average % done of those that started LL is 12.15%, but here's a breakdown by 10th percentiles. And yes, oddly there are 3 results that expired even though the LL % done is 100%. Might have been rounded up from 99.95% + ?

[CODE]% Done Count
0 116600
10 11806
20 7164
30 5474
40 4103
50 3551
60 3084
70 2749
80 2565
90 2107
100 3[/CODE]

So if we hypothesize a system where the interim file is uploaded at 10%, there are 42,606 abandoned LL tests that could pick up again at 10%. Is a 10% leg up enough of a time-saver to make it worthwhile? If it were 20% that would mean 30,800 tests that could be picked up a fifth of the way in... is that a good enough time save?

The alternatives could involve simply uploading the interim file every 10% rather than at a single fixed point, so the time saved would depend entirely on how far they got, to the nearest 10th %. But then the monthly bandwidth requirements goes up quite a bit.

Maybe it's an option that only new accounts would use by default since they're the most likely to abandon work before finishing. That would save significant bandwidth and storage from people like Curtis and other heavy users who go through a lot of work and actually finish it.

Might even be the kind of thing that's enabled for the first couple LL tests a machine does and then shuts itself off? Just thinking out loud here...[/QUOTE]

42,606 LL tests at 10% completion is an amount of work equivalent to ~4,261 completed LL tests.

30,800 LL tests at 20% completion is an amount of work equivalent to ~6,160 completed LL tests.

At 200 GHz-days per LL test, we are looking at 852,200 GHz-days and 1,232,000 GHz-days of salvaged work, respectively. This represents roughly the combined total throughput of GIMPS as a whole over approximately 5-7 days. I suppose the next question would be over what time frame would we be "gaining" these extra days of throughput (and, of course, not all of it would be a gain, as some of the interim files may have errors).

Question: what do percentiles 1-9 look like in terms of the number of tests abandoned?


All times are UTC. The time now is 23:15.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.