![]() |
Modifications to DC assignment rules
The LL assignment rules change has been implemented and seems to working OK. Obviously, we need months of experience before we know if the self-adjusting features are working as envisioned.
Now it's time to look at the current DC rules. Feel free to weigh in. I imagine the goals should be similar to LL assignment goals with one major difference. We need to be far more reluctant to expire exponents. When we erroneously expire an LL test, it becomes a DC, which we need anyway -- no wasted work. When we expire a DC erroneously, and both the new assignment and expired assignment send results, then we have wasted work. But, first I'll start gathering some data.... In the last year, 81761 LL tests have been completed on exponents between 28M and 47M. In the last 48 hours, we've assigned the following number by category: cat 1 139 cat 2 12 cat 3 150 cat 4 3403 DC results reported in the last 30 days grouped by category: cat 1 1649 cat 2 273 cat 3 2431 cat 4 5176 |
[QUOTE=Prime95;431527]We need to be far more reluctant to expire exponents.[/QUOTE]
Perhaps for DC all we need to do is adjust the boundaries a bit, and perhaps introduce a "Cat 0" (although stragglers seem to be far less frequent). Further, to your point about not wishing to recycle unless absolutely nessesary, stick with the "Must promise" requirement for Cats 0 through 2. Auto-promotion from Cat 4 might be useful, although I suspect that most users/machines who are currently getting DC Cat 4 will be getting LL Cat 3 once they've proven themselves. |
[QUOTE=chalsall;431557]Perhaps for DC all we need to do is adjust the boundaries a bit, and perhaps introduce a "Cat 0" (although stragglers seem to be far less frequent).
Further, to your point about not wishing to recycle unless absolutely nessesary, stick with the "Must promise" requirement for Cats 0 through 2.[/QUOTE] I have a feeling that exponents originally assigned as cat 0-2 are going to finish, as long as the steps to make sure "good" machines are getting them is working out well. There's likely going to be issues with exponents assigned as cat 4 that take WAY longer than they should and wind up getting promoted all the way down to cat 1 area, even though the person is still working on it. I think that's going to happen sometimes no matter how hard we try to avoid it. But hey, even if so, the assignment rules apply to cat 4 as well and if they wind up taking too long and get expired, well... too bad I guess. It happens. |
Aside possibly from the small size of category 2, I'm quite happy with how DC rules are working now. (Of course I was quite happy with the LL tests too.)
It seems, compared to the LL anyway (maybe just because the tests dont take as long), that we have been progressing through the cat 1 exponents in a steady fashion and not having many huge outliers. I am not too worried about lost work from people really making slow progress on cat 4 exponents after a whole year. |
[QUOTE=Siegmund;432034]I am not too worried about lost work from people really making slow progress on cat 4 exponents after a whole year.[/QUOTE]
Excellent. Good to know your opinion. |
[QUOTE=chalsall;431557]Perhaps for DC all we need to do is adjust the boundaries a bit, and perhaps introduce a "Cat 0" (although stragglers seem to be far less frequent).
Further, to your point about not wishing to recycle unless absolutely nessesary, stick with the "Must promise" requirement for Cats 0 through 2.[/QUOTE] I also think the current rules are not seriously deficient. The only "must have" items are the addition of cat 0 and banning any computers that have recently expired or bad/suspect results from that cat 1 area. I also agree that since we have good cat 1 participation now, we should continue limiting cat 1 assignments to those that have signed up for smallest exponents. I'll put together some changes consistent with my thoughts above and that make the rules similar to LL testing and keep y'all posted. |
As before, my major complaint is that the throughput bounds on Cat 1 are not anywhere near tight enough. The current throughput should be Cat 2, and you could easily triple the throughput bound for Cat 1 and still not even come close to what a five year old Sandy Bridge desktop core can put out.
|
See the new thresholds page: [url]http://mersenne.org/thresholds/[/url]
The cat 3/4 boundary is going to move to 52000 exponents. I'll keep an eye on how many assignments are going to each category and post the results here. As always, let me know if you see anything suspicious -- no guarantees I implemented it all correctly. |
[QUOTE=Dubslow;432095]As before, my major complaint is that the throughput bounds on Cat 1 are not anywhere near tight enough.[/QUOTE]
In the last 24 hours, we've assigned the following number by category: cat 1 48 cat 2 98 cat 3 46 cat 4 1494 So it looks like the rules for cat 1 are now a bit tighter. I did not break it down as to why -- it could be the no expireds, no bad/suspect, days-of-work <= 5, or GHz-days/day/worker requirement. There is a big jump in cat 2 assignments as there is no longer a requirement that the user sign up for smallest exponents. At the rates above, cat 1/2 boundary should be 6000 exponents (120 day supply @50/day) and the cat 2/3 boundary should be increased to 32000 exponents (240 day supply @150/day). We didn't discuss this earlier: are we happy with the 60/120/240/360 day expirations for cat 1/2/3/4 DCs?? My random thoughts: The generous timeframes do decrease the chance we waste work due to early expirations. A downside is generous timeframes increase the cat boundaries which ends up giving users bigger exponents. A DC test takes about 1/4 the CPU time of an LL test and we need a way for GIMPS' slowest contributors to participate without fear of poaching, but is 360 days too generous? I wonder if I can conjure up a SQL query that would tell me how many DC results come in that take 240 to 360 days. If the answer is very, very few then maybe a 60/120/180/240 day time limit for cat 1/2/3/4 would be better. |
I think we should err on the side of extra time with DC. If an LL assignment expires or is poached and then finishes, the work is not wasted. With DC, the work is usually wasted. I think we should aim for all cat 2/3/4 assignments to be finished before they become cat 1, for the extra time margin.
|
There may be an unintended consequence of using 'suspect' results as a reason for downgrading a machine in double-checks. A double-check which disagrees with the first LL check is apparently counted as 'suspect', regardless of whether it is later verified as correct by a third check. As I've found a number of faulty first-time checks (later verified by another user) in the past year, I now find I'm considered unreliable and excluded from the higher categories... :cry:
|
Wow. Querying LL results below 50e6 returned this year:
Assigned more than 240 days: 373 Assigned more than 270 days: 294 Assigned more than 300 days: 217 Assigned more than 330 days: 158 Assigned more than 360 days: 133 Assigned more than 450 days: 82 Assigned more than 540 days: 55 Assigned more than 720 days: 20 Assigned more than 2100 days: 1 So moving the cat 4 requirement from 360 to 240 would possibly have expired and reassigned 240 DC results that later completed (in less than 4 months). I guess lowering the 360-day limit is not a great idea. Also, the perseverance award to Mr. 2184 days to complete a single LL test!!! I wonder if it matched.... |
[QUOTE=Syntony;432185] A double-check which disagrees with the first LL check is apparently counted as 'suspect'[/QUOTE]
That is not my understanding. A suspect result is one where the error code returned by prime95 indicates there may have been a hardware error. There are rare times when operating near the limit of an FFT where you could get a non-reproducible roundoff error. I don't think it has ever happened to me, so it is definitely rare. |
[QUOTE=Prime95;432187]There are rare times when operating near the limit of an FFT where you could get a non-reproducible roundoff error. I don't think it has ever happened to me, so it is definitely rare.[/QUOTE]
I would have thought that since FFT computation is deterministic, this shouldn't even be possible in theory! What could be the possible source(s) of s/w non-reproducibility? |
OK, my fault, I've just spotted my mistake - I tried freeing DC 35963129 (Cat 1 currently) and got DC 36809363 (Cat 2) back as a replacement. I didn't get Cat 1 as my 'days of work' was still set to 10 (now reduced). I guess I was mislead by the e-mails I get whenever my DC doesn't agree with the original LL, which tell me that my result was suspect! Got 35963129 back now... :blush:
|
[QUOTE=axn;432191]I would have thought that since FFT computation is deterministic, this shouldn't even be possible in theory! What could be the possible source(s) of s/w non-reproducibility?[/QUOTE]
FFT computation is deterministic. However, the assembly carry propagation code does not guarantee FFT data will be in balanced notation. It does guarantee it will get really, really close -- enough so that roundoff errors will only increase by an insignificant amount. But, when FFT data is read from a save file, the FFT data is in 100% balanced notation. So, after a roundoff error we could go back to the last save file and get a different round off error. |
[QUOTE=Prime95;432200]However, the assembly carry propagation code does not guarantee FFT data will be in balanced notation. It does guarantee it will get really, really close -- enough so that roundoff errors will only increase by an insignificant amount. But, when FFT data is read from a save file, the FFT data is in 100% balanced notation. So, after a roundoff error we could go back to the last save file and get a different round off error.[/QUOTE]
Ah! Would it make sense to use the balanced data used for savefile (is it balanced when written or only upon reading back?) for normal computation so that we have full reproducibility or will there be a net performance loss? |
Manual assignments
To get Cat 2 manual DC or LL assignments, is there more to it than signing up for the smallest exponents? The server gives me Cat 3 DCs and LL. I have a vague recollection of reading about a possible throughput criterion for manual tests in the new LL rules thread.
That aside, the new rules look very good & it will be interesting to see how the system settles down - and then the self-adjusting part! But perhaps better than that is the process of George et al [B]and the community[/B] settling the new rules. I raise my cup to you all! :coffee: |
[QUOTE=Prime95;432187]That is not my understanding. A suspect result is one where the error code returned by prime95 indicates there may have been a hardware error.
There are rare times when operating near the limit of an FFT where you could get a non-reproducible roundoff error. I don't think it has ever happened to me, so it is definitely rare.[/QUOTE] I get those sometimes... I've had two in the past week, but that could be due to testing exponents in the 39M range which is near an FFT boundary. In each case, I think the initial FFT size test put me just barely into the lower FFT size and then it started having roundoff errors along the way. In one of the two cases, I think a roundoff error occurred near the end of the run so it may not have had a chance to try reproducing it before it just finished. In both cases, my result matched the first check, but it did get marked as a suspect result. But yeah, ditto what you said... a mismatch is not a suspect result...only certain errors reported during the run will mark a result as suspect, not whether or not it mismatched. However, many mismatches *are* due to a suspect result during the first run, and that's because a first-time test coming in as "suspect" means it's re-assigned as another first-time check right away. Which is good, because roughly 50% of all tests marked "suspect" do end up being bad. It's not uncommon at all to look at mismatches and see that the first one is suspect but the second one is simply listed as unverified. |
Interesting how many DCs were completed very slowly.
And I have to reiterate my objection form the other thread about a very short queue length as a criterion for small exponents. (Briefly, I want to be able to leave for a week and be confident my machine won't have gone idle if it has a network hiccup, and one Friday to the subsequent Monday is conveniently 10 days, not 5.) Guess I better be happy I got one more 35M assignment yesterday before the new rules took effect... |
[QUOTE=Siegmund;432249]And I have to reiterate my objection form the other thread about a very short queue length as a criterion for small exponents. (Briefly, I want to be able to leave for a week and be confident my machine won't have gone idle if it has a network hiccup, and one Friday to the subsequent Monday is conveniently 10 days, not 5.)[/QUOTE]
Personally I find this to be a bit of straw-man argument. Some of my machines which work for GIMPS go years without my ever being in front of them. Heck, three are over five years old, and I've never once even seen them (they're rented co-located servers, somewhere...). [QUOTE=Siegmund;432249]Guess I better be happy I got one more 35M assignment yesterday before the new rules took effect...[/QUOTE] Not to be widely encouraged, but if you're *really* concerned about this before a trip you can always "cheat" the system. Bring your "Days to reserve" down, reserve a few assignments (cutting and pasting your worktodo.txt file(s)) and then amalgamate the assignments. But, honestly, when was the last time you had a network issue which would have resulted in your systems being idle? |
Yes, I am aware that I can manipulate the system. (The fact still remains that I don't think the queue length requirement serves any purpose, so long as there is a requirement about time from assignment to completion.)
[quote] But, honestly, when was the last time you had a network issue which would have resulted in your systems being idle? [/quote] Since you asked... I had just such an issue most of the time from October 2014 to December 2015. I had two systems sitting on my desktop at work (one of them with the screen off and ignored most the time.) That latter system, for some bizarre reason, could report results, but threw an odd proxy error when it tried to fetch new assignments. If, however, I logged in to mersenne.org in a web browser on that machine, it became able to fetch them again (for a short time.) I did not have any idle time for Prime95 as a result, but I did have several days, twice, when MISFIT was idle for lack of factoring assignments -- I became aware of the problem when I checked my recent results from another computer and saw that no factoring had happened. (And then it happened again because I thought it was just a transient issue.) MISFIT was only retrieving about 1 week's worth of factoring work at a time, and each task finished quickly. As it happened, Prime95 was fetching 10 days ahead and the DCs took 5 days each on that machine (LLs about 3 weeks), so I noticed the lack of factoring first. I have actually used that machine regularly this spring. But I am very aware of the possibility of it happening again the next time I am away from the office for an extended period. * * * I agree that most network hiccups are shorter duration, unless there is a persistent configuration issue. |
[QUOTE=Siegmund;432326]Yes, I am aware that I can manipulate the system. (The fact still remains that I don't think the queue length requirement serves any purpose, so long as there is a requirement about time from assignment to completion.)[/QUOTE]
Let's do a "for instance" where someone has a queue length (days of work to get) of 60 days... they just like to have a 2 month buffer of work for whatever reason. Since the server is making assignment decisions based on how many GHz-days / throughput the system is capable of, it could see this machine asking for an assignment, see that it's a nice and fast machine, and go ahead and give them a cat 0 exponent. I'm sure you see the problem already... with a 60 day buffer, this new assignment is not going to even start for 60 days and then it may take another couple weeks to complete. That's the exact opposite of what we're trying to accomplish with cat 0, which is making sure these "lowest of the low" exponents are finished quickly. In other words, it's not just how fast the machine is, it's how soon it's going to start that assignment. 10 days seems like a good value, and it's been that way for cat 1 for a long time... I don't know if you had another # of days in mind, but I don't think asking for someone to start a cat 0 (or 1) assignment in 10 days from being assigned is too big an ask. :smile: |
[quote]10 days seems like a good value, and it's been that way for cat 1 for a long time... I don't know if you had another # of days in mind, but I don't think asking for someone to start a cat 0 (or 1) assignment in 10 days from being assigned is too big an ask. :smile:
[/quote] I don't think 10 days is too much to ask either - I DO think 3 or 5 days, perhaps, is. Alternatively, rather than imposing both a queue length restriction and a ghz/day restriction, one could simply require that (queue length + estimated completion time) be below 10/30/90/180 days or similar. I will shut up about it now :) |
[QUOTE=Siegmund;432513]
Alternatively, rather than imposing both a queue length restriction and a ghz/day restriction, one could simply require that (queue length + estimated completion time) be below 10/30/90/180 days or similar.[/QUOTE] We'll try that in cat 1 DC and see how it goes. |
Recieved Cat 4 DC
1 Attachment(s)
My machine that has been doing Cat 1 LL with the occasional Cat 1 DC (it is set to What Makes Sense) has just received a Cat 4 DC.
When it I look at the Computer Details there is an old assignment listed because it was poached and the assignment auto expired. The assignment is not in my Worktodo file or in the assignment list on the site. The exponent in question is [URL="http://www.mersenne.org/report_exponent/?exp_lo=34652603&full=1"]34652603[/URL] attached is a portion from the Computer details page. |
[QUOTE=gjmccrac;432558]When it I look at the Computer Details there is an old assignment listed because it was poached and the assignment auto expired.[/QUOTE]
Just for the record, I didn't poach you. This was assigned to one of my machines 2015-09-23 14:38:48 and it completed it 2015-09-26 16:30:36. The AID given by Primenet was E4E549B56F9239195DD3DE63E1142361, if that helps George and/or Aaron drill-down on what happened. |
[QUOTE=gjmccrac;432558]
When it I look at the Computer Details there is an old assignment listed because it was poached and the assignment auto expired.[/QUOTE] This bug should now be fixed. If you unreserve the cat 4 DC you should get a cat 1 back. |
[QUOTE=chalsall;432566]Just for the record, I didn't poach you. This was assigned to one of my machines 2015-09-23 14:38:48 and it completed it 2015-09-26 16:30:36. The AID given by Primenet was E4E549B56F9239195DD3DE63E1142361, if that helps George and/or Aaron drill-down on what happened.[/QUOTE]
True, it wasn't poached... it was just one of those things where it took a bit too long (it was assigned to "For Research" on 2015-07-24 and expired after 60 days because it was a cat1 DC... the cat 1 DC upper limit for that day was 34818672). So it expired, was reassigned pretty fast (by then it was probably one of the few 34M remaining), and "For Research" turned it in 3 days late. It happens. |
[QUOTE=Madpoo;432590]So it expired, was reassigned pretty fast (by then it was probably one of the few 34M remaining), and "For Research" turned it in 3 days late. It happens.[/QUOTE]
Oh! I didn't drill down far enough in my DB... That would have been when one of my clusters blew a UPS, and it took a while to get the replacement in. Sorry about that! |
I am not as worried about what happened last Sept with the exponent 34652603.
But the fact is still shows up on my CPU Details screen as I had shown in the attachment. The cat 4 DC has already started and will be done in a few days. |
[QUOTE=gjmccrac;432612]I am not as worried about what happened last Sept with the exponent 34652603.
But the fact is still shows up on my CPU Details screen as I had shown in the attachment. The cat 4 DC has already started and will be done in a few days.[/QUOTE] Oh, the CPU Details page. That one probably shows expired assignments (not sure if it should or not, but it does). I don't pay much attention to that page, personally, since I check in/out manually... it's probably worth my while at some point to see if that page could be made a little cleaner. Check this for *active* assignments for your account if that's what you're interested in: [URL="http://www.mersenne.org/workload/"]http://www.mersenne.org/workload/[/URL] |
[QUOTE=gjmccrac;432612]
But the fact is still shows up on my CPU Details screen as I had shown in the attachment.[/QUOTE] If you happen to have the pristine AID from last September's worktodo.ini (by means of a backup, for example) [B]&&[/B] you're really_really annoyed from it appearing on that CPU page, there's a workaround to make it disappear from there. To be clear, the AID is that lengthy string appearing in your worktodo.ini just right of the = sign. Alas, this same workaround is absolutely worthless if both of the above conditions are not met. (naturally, there's no need to post the AID here, you only have to have it handy) |
[QUOTE=ric;432764]If you happen to have the pristine AID from last September's worktodo.ini (by means of a backup, for example) [B]&&[/B] you're really_really annoyed from it appearing on that CPU page, there's a workaround to make it disappear from there. To be clear, the AID is that lengthy string appearing in your worktodo.ini just right of the = sign.
Alas, this same workaround is absolutely worthless if both of the above conditions are not met. (naturally, there's no need to post the AID here, you only have to have it handy)[/QUOTE] So, what's the workaround? I have the same problem with one of my machines. |
[QUOTE=endless mike;432832]So, what's the workaround? I have the same problem with one of my machines.[/QUOTE]
Apparently, the DB table feeding the CPU page keeps track of all assigned candidates for a specific user/machine and does not clear them if one is completed with a different AID (including no AID: e.g. poaching or factoring/P-1 success without assignment). So, the workaround is to make it believe that the specific assignment is still there, with its original AID, and then unreserve it, so to make it disappear. In practice:[LIST][*]Test/Stop your running instance of p95;[*]open worktodo.ini, or its equivalent (any text editor will do);[*]add the original reservation, anywhere (usually, at the end). Save and close; [*]Test/Status, to make p95 read again the worktodo[*]Advanced/Unreserve exponent, to send it to the big bit bucket (aka /dev/null).[/LIST]This workaround has proved effective a number of times on my machines, even across them (i.e. unreserve from a machine, an exponent assigned to another one) [B]provided that the assignment has its original AID[/B]. I sincerely hope that appropriate per-user control is in place on the server, to avoid unreserving someone's else exponents. hth PS: this has nothing to do with the thread topic: mods please move somewhere else if deem appropriate. thx |
Perhaps a little off-topic, but I had a dream a few nights ago that someone found a new Mersenne prime in the 44M range during a double-check. Alas, a triple-check showed that it was wrong. :-(
|
I got a new box at work two weeks ago and had an opportunity to test out what kind of assignments it got. I set it to double-checks, with days of work = 3.
The first assignment was of course Category 4. As was the second, since it was assigned before the first completed. The third assignment was also Category 4. One DC in 3 days is "[COLOR=#000000][FONT=Tahoma]enough LL and DC GHz-days over the last 120 days to indicate the assignment will be completed in 90 days", but I assume that there is still a "computer has been proven reliable" restriction (requiring 2 error-free results) on all categories except 4. The fourth assignment was in Category 3. The fifth and sixth assignments, after 3 results had been returned, were Category 2. The seventh assignment, after 5 results had been returned, is Category 1. This seems to be a nice confirmation that the system works as advertised. (And after the 7th doublecheck is done, it will start requesting first time checks - I'll know by the weekend whether it recieves Category 1 assignments there too like I expect it to.) [/FONT][/COLOR] |
[QUOTE=Siegmund;434885][FONT=Tahoma]...
[/FONT][COLOR=#000000][FONT=Tahoma] This seems to be a nice confirmation that the system works as advertised. (And after the 7th doublecheck is done, it will start requesting first time checks - I'll know by the weekend whether it recieves Category 1 assignments there too like I expect it to.) [/FONT][/COLOR][/QUOTE] Did it start to receive Category 1 first time LL assignments? |
[quote]Did it start to receive Category 1 first time LL assignments?[/quote]
It didn't: apparently 7 double-checks in a couple weeks don't meet the criteria for "enough GHz-days to indicate an assignment will be finished in 30 days." I got 1 cat 3 LL exponent, and then 1 cat 2 exponent. Guessing that my next after that may be cat 1 finally. That's not necessarily a bug - just an interpretation whether "X GHz per 30 days" requires X GHz to be submitted, or X/3 GHz in the last 10 days. |
Unexpected assignment...
Just noticed what seems to be a strange LLD assignment to one of the machines in my team. It's just nearing completion of M37268507 double-check, which was a cat 2 assignment when it was allocated (now in the cat 1 range) and will likely be completed within 60 days. Today (14th July) it was allocated M45825977 which is well into cat 4. It hasn't processed the current assignment quite as quickly as the previous one, but slipping 2 categories is unexpected - no suspect results or other black marks afaik. Are all the cat 2 and cat 3 assignments already allocated, or am I missing something?
Tony |
[QUOTE=Syntony;438135]Just noticed what seems to be a strange LLD assignment to one of the machines in my team. It's just nearing completion of M37268507 double-check, which was a cat 2 assignment when it was allocated (now in the cat 1 range) and will likely be completed within 60 days. Today (14th July) it was allocated M45825977 which is well into cat 4. It hasn't processed the current assignment quite as quickly as the previous one, but slipping 2 categories is unexpected - no suspect results or other black marks afaik. Are all the cat 2 and cat 3 assignments already allocated, or am I missing something?[/QUOTE]
You won't be getting any cat 2 work because that system fails these cat 2 requirements: [QUOTE]Computer must have enough LL and DC GHz-days over the last 120 days to indicate the assignment will be completed in 60 days. ... Computer must have returned at least 3 results in the last 120 days.[/QUOTE] In the past 120 days, that machine has turned in 2 results. You also wouldn't get cat 3 work because: [QUOTE]Computer must have enough LL and DC GHz-days over the last 120 days to indicate the assignment will be completed in 90 days.[/QUOTE] You've done 89.416685 GHz-days in the past 120 days, but only just barely that much in the past 90 days. A current cat 3 might take 60-70 GHz-days so given the track record of the past 120/90 days, it is questionable if a cat 3 could be done in 90 days (as far as Primenet knows anyway... you might have a different perspective based on info Primenet doesn't have). :smile: Here's that machine's history when looking back 6 months: [CODE]exponent dt_received GHz_days 35211697 2016-02-10 18:53:30.940 44.01462125 35412301 2016-03-14 23:37:33.277 44.26537625 35629049 2016-04-17 22:07:17.643 44.53631125 35904299 2016-05-30 16:55:15.003 44.88037375[/CODE] I don't know for sure, but since it's looking at the past 120 days to figure out how much you could do in 90 days, the math might look like: 44.53631125 + 44.88037375 = 89.416685 every 120 days = 67.062514 every 90 days There may not have been any cat 3 available that take less than 67 GHz-days. If you had requested a new exponent just 3 days earlier, that result from March 14th would have been in the past 120 days and included in the math which would have helped out significantly and resulted in a 90 day rolling average closer to 100 GHz-days. Whatever the case, it looks like that machine had been turning in one result a month, more or less, but then slowed down, I guess as the exponents get larger. |
Thanks for the analysis Madpoo, and my apologies for a rather tardy response.
That particular system is a SOHO office system belonging to a neighbour, and the slowdown was because the owner had been out of the office more than usual in the last month. They're back in the office now... :smile: I don't have regular access to the system, but it just happened that I was doing Win 10 preparation work for the owner as the most recent LLD, M37268507, completed. Out of curiosity I freed the M45825977 cat 4 assignment and was rewarded with M38196283, which was cat 2 at the time, and is now into cat 1. Given that this system has been turning in results in the 35-55 day range long term, cat 2 seems perfectly appropriate. I had wondered if the assignment algorithm might prove to be a bit 'choppy' for cat 2 & 3, given that the assessment period (120 days) is rather short compared with the target completion times (60 & 90 days) and so it seems. While the algorithm obviously does work to exclude the slowest producers from the 'wave front', it maybe isn't making best use of the 'middle of the road' reliable producers currently. I can understand that it might not be a good idea to extend the assessment period too far, in order to pick up systems which have recently slowed down, but perhaps some adjustment for 'work in progress' would help even thing out? I don't imagine that many systems would get cat 3 assignments as things stand! Just a thought :smile: Tony |
Apologies but any idea why my machine chanakya is suddenly being assigned Cat 3 DCwork. It has returned over 20 DC results in the last 120 days or about 900 GHzDays. I don't think anything expired recently. As of Aug 11th I was still getting Cat 1 work, on Aug 12 I got Cat 2 and on Aug 16 I got Cat 3.
DaysOfWork=9 and the machine takes about 15 days to finish a DC. PS: I was on holiday and didn't return anything from July 16 to Aug 10. But the 120 day throughput meets the requirements. |
[QUOTE=garo;440344]Apologies but any idea why my machine chanakya is suddenly being assigned Cat 3 DCwork. It has returned over 20 DC results in the last 120 days or about 900 GHzDays. I don't think anything expired recently. As of Aug 11th I was still getting Cat 1 work, on Aug 12 I got Cat 2 and on Aug 16 I got Cat 3.
DaysOfWork=9 and the machine takes about 15 days to finish a DC. PS: I was on holiday and didn't return anything from July 16 to Aug 10. But the 120 day throughput meets the requirements.[/QUOTE] That CPU had exponent [URL="http://www.mersenne.org/M36651799"]M36651799[/URL] expire on August 13. It looks like it might still be (slowly?) working and turning in results even though the exponent has already been reassigned and completed by someone else. That exponent was cat 1 at the time it was assigned (just barely...I think it slipped into cat 0 on July 24, 10 days later). As a cat 1 assignment, you had 60 days to finish, but only 30 days to start. I'm not 100% certain but it looks like the first time it reported any results was on August 15th...it had already expired at that point. |
Thanks for the reply Madpoo. It was assigned July 14 but I went on holidays soon after so computer was off for 15 days. I did start working on it on Aug 10 - so within the 30 day period but since the days to report was set to 7 on the machine, it never reported progress. I am still working on it and its 84% complete so I suppose I will let it finish. Will also set the days between reports to 1. Do I need to wait 120 days before the box starts getting Cat1 and 2 again?
|
[QUOTE=garo;440399]Do I need to wait 120 days before the box starts getting Cat1 and 2 again?[/QUOTE]
I'm not sure. When your computer returns the result that ought to remove the expired assignment from the assignments table. If so, the new assignment code shouldn't be able to tell there ever was an expired assignment. |
[QUOTE=garo;440399] did start working on it on Aug 10 - so within the 30 day period but since the days to report was set to 7 on the machine, it never reported progress. [/QUOTE]
That sounds like a bug. I think the easiest "solution" would be to add a note to the page explaining the categories that your reporting period must be low. |
[QUOTE=Dubslow;440416]That sounds like a bug.
I think the easiest "solution" would be to add a note to the page explaining the categories that your reporting period must be low.[/QUOTE] Or have the client be aware of when it needs to next report before expiry. I can't see any reason why the client shouldn't communicate immediately when something major happens such as an exponent is finished/started. The only issue with more communication I can see is with the server. |
[QUOTE=garo;440344]...
and on Aug 16 I got Cat 3. ...[/QUOTE] If you free the Cat 3 assignment as soon as the expired exponent completes, you might expect to get a Cat 2 (or even Cat 1) replacement (depending on how the 30-day break affects the 120-day CPU expenditure). |
Thanks all for your replies. The exponent finishes tomorrow so I will report back after it is done.
|
[QUOTE=garo;440464]Thanks all for your replies. The exponent finishes tomorrow so I will report back after it is done.[/QUOTE]
George, as usual, has the best answer, which is to do as you're doing... let the work finish. The moment it checks in, the "expired" assignment disappears and you'll be back in business. If that doesn't work out for some reason, let us know but I'm pretty sure that's going to be the ticket. |
So the exponent finished and I unreserved a queued Cat 3 exponent and got back a Cat 2 exponent. I lowered the DayOfWork setting to 5 days and tried again and this time got Cat 1. Looks good! Thanks again everybody.
|
Category boundaries didn't change today...
It looks like something on the server has stalled, as the DC category boundaries 'Exponents below [I]exponent[/I]' didn't change at the usual time yesterday... :confused2:
|
It did change:
[url]http://www.mersenne.org/thresholds/?dt=2016-09-26[/url] [url]http://www.mersenne.org/thresholds/?dt=2016-09-27[/url] Cat0 38021438 38021858 Cat1 38940642 38936534 Cat2 40947438 40932458 Cat3 42658454 42646634 |
[QUOTE=ATH;443586]It did change:
[URL]http://www.mersenne.org/thresholds/?dt=2016-09-26[/URL] [URL]http://www.mersenne.org/thresholds/?dt=2016-09-27[/URL] Cat0 38021438 38021858 Cat1 38940642 38936534 Cat2 40947438 40932458 Cat3 42658454 42646634[/QUOTE] :redface: Doh! Fooled by the unusually high number of free Cat 0 assignments! |
Cat 4 not using the lowest available exponents
I notice the Cat 4 assignments do not fetch the lowest available exponents. According the the [url=http://www.mersenne.org/thresholds/]thresholds[/url] page cat 4 starts at 42702950 as of today. But newer assignments skip over many available exponents and primenet is giving out exponents in the 50M range. It appears as though older expired assignments above the threshold are left in limbo. Is this an oversight or by design?
|
[quote]I notice the Cat 4 assignments do not fetch the lowest available exponents[/quote]
I've been wondering about that too. Going to be mighty interesting if the runaway keeps going a few more weeks. Assuming it is some hiccup causing it not to start back from the bottom of the heap daily, but I only have questions not answers. |
The problem is with the SQL clause that prohibits a user from getting an exponent that was assigned to him previously and had expired within the last 180 days. This was added to help avoid double-checks by the same user (an expired assignment that reports in late plus a new assignment that reports in)
The problem is that Anonymous has SO many exponents and SO many expireds, there aren't a lot of non-churned exponents to assign. I changed the SQL clause for anonymous users to exclude anonymous exponents expired in the last 60 days. Keep an eye on things to see how this works out. |
[QUOTE=Prime95;445339]The problem is with the SQL clause that prohibits a user from getting an exponent that was assigned to him previously and had expired within the last 180 days. This was added to help avoid double-checks by the same user (an expired assignment that reports in late plus a new assignment that reports in)
The problem is that Anonymous has SO many exponents and SO many expireds, there aren't a lot of non-churned exponents to assign. I changed the SQL clause for anonymous users to exclude anonymous exponents expired in the last 60 days. Keep an eye on things to see how this works out.[/QUOTE] That was probably the clause I'd added back when... The trouble was that anonymous users were getting assignments and then letting them expire but then they'd still check them in eventually, much later. Meanwhile it was assigned to another anonymous user that did finish it. Thus creating a 'self verified" result. I realize I could be less stringent about my own definition of "self verified" when it comes to anonymous results...my OCD is having a hard time with that though. LOL |
[QUOTE=Prime95;445339]The problem is with the SQL clause that prohibits a user from getting an exponent that was assigned to him previously and had expired within the last 180 days. This was added to help avoid double-checks by the same user (an expired assignment that reports in late plus a new assignment that reports in)
The problem is that Anonymous has SO many exponents and SO many expireds, there aren't a lot of non-churned exponents to assign. I changed the SQL clause for anonymous users to exclude anonymous exponents expired in the last 60 days. Keep an eye on things to see how this works out.[/QUOTE] Wow, so much worry! I have uneasy feelings correlated with prior now defunct regimes where extreme attention to compliance was soon after accompanied with demise. |
@madpoo: seems to me that a restriction that an Anonymous LL test must have a non-anonymous double-check is all that is really necessary (and if the DC doesn't match, a non-anonymous triple-check, but you already harvest a lot of those)...why do we care whether or not Anonymous gets two shots at a doublecheck or not?
|
[QUOTE=Siegmund;445564]@madpoo: seems to me that a restriction that an Anonymous LL test must have a non-anonymous double-check is all that is really necessary (and if the DC doesn't match, a non-anonymous triple-check, but you already harvest a lot of those)...why do we care whether or not Anonymous gets two shots at a doublecheck or not?[/QUOTE]
It was more an issue with first-time tests since anonymous workers would quite frequently keep working on assignments even after they'd expired. If the new assignment was another anonymous user, then both would often finish resulting in a first/second check by "anonymous". For double-check assignments, that wasn't as big a deal as long as the first check wasn't an anon user, but even then it wasn't too odd to have cases where the first check was wrong and then the 2nd/3rd checks, both by anonymous users, became the matching pair. Not as common as that first scenario, but still, it happened. George's solution to limit how exponents are excluded based on past assignment expirations will hopefully be sufficient, but maybe in terms of double-checks in particular it could be loosened up even more. |
[QUOTE=Madpoo;445761]George's solution to limit how exponents are excluded based on past assignment expirations will hopefully be sufficient, but maybe in terms of double-checks in particular it could be loosened up even more.[/QUOTE]
I would argue that we need to error on the side of caution, to ensure the database is trusted. At least one of the tests need to be done by a known, trusted, entity. Two anonymous tests should not be trusted. |
Yes, requiring one non-anonymous test is very sensible. I think we were only advocating removing the restriction in the case of DC assignments that already had a non-anonymous first test.
I don't actually know how commonly anonymous users request first-time checks, but I'd be fine with not allowing two consecutive anonymous assignments there. |
[QUOTE=chalsall;445762]I would argue that we need to error on the side of caution, to ensure the database is trusted.
At least one of the tests need to be done by a known, trusted, entity. Two anonymous tests should not be trusted.[/QUOTE] I always thought that the keepers of the database have additional non-public information at their disposal, like the ComputerGUID. Doesn't any of that help to distinguish among the various Anonymii? And the shift count helps guard against accidental duplications of the same result. Sometimes MadPoo publishes list of strategic double checks, where some of the exponents have first LL tests that were done by Anonymous. So presumably it's possible to track the stats of how many good results vs. how many bad results were returned by any machine. Regardless of whether a public username was declared, there's an internal ID. Maybe the internal ID could be spoofed, but that's easy to do with the username too. |
[QUOTE=GP2;445776] Doesn't any of that help to distinguish among the various Anonymii? And the shift count helps guard against accidental duplications of the same result.[/QUOTE]
The database accepts two results on the same exponent from any user including anonymous as long as the shift count is different. For me, that's good enough to declare an exponent double-checked. Madpoo has a higher standard - he wants a different user id as well. Madpoo went back and triple-checked all exponents where the matching LL tests were done by the same user. We changed the reservation system to reduce the chance of such occurrences in the future. |
[QUOTE=Siegmund;445775]Yes, requiring one non-anonymous test is very sensible. I think we were only advocating removing the restriction in the case of DC assignments that already had a non-anonymous first test.
I don't actually know how commonly anonymous users request first-time checks, but I'd be fine with not allowing two consecutive anonymous assignments there.[/QUOTE] All I know is, since we had that big undertaking to resolve all self-verified exponents, there are 123 new ones that came up since I last tidied them up, about 5-6 months back (I've been doing triple-checking since then, figured I'd get back to those). Of the 123, 49 of them are from anonymous users, and they're mostly in the 40M-50M range of double-checks. I think those were from something that changed in the new assignment rules... I don't know if the exclusion on the same user getting work that did the first check got modified somehow, but whatever... it's mostly okay now. 54 of those are from CurtisC and those are definitely cases where he had an assignment that expired and then another of his systems got the new assignment. It's a numbers game there... could have happened to any user but for him with so many CPUs involved, it just showed up a lot more often. All 54 of those were first-time checks that got assigned, expired, re-assigned to the same user, with both machines checking in eventually. Hopefully that was a one-time surge when the new expiration rules went into place... in theory, those slow boxes that expired and checked in anyway are no longer going to get those smallish 68M-70M exponents where that's an issue, but it's also the reason I added a clause so that if you let an exponent expire, you're not getting auto-assigned that same exponent. :smile: The rest (10 exponents) are essentially from the stinkers who do self-verifications on purpose as well as one or two that I assume got re-assigned an expired exponent by pure luck. I can tell which ones did it on purpose because they either didn't have an assignment for one or the other (or neither), or if they did have assignments on both, they were made *after* I'd added the clause that kept it from assigning automatically (meaning they modified their worktodo manually and picked up an assignment the back door way). [QUOTE=Prime95;445777]The database accepts two results on the same exponent from any user including anonymous as long as the shift count is different. For me, that's good enough to declare an exponent double-checked. Madpoo has a higher standard - he wants a different user id as well.[/QUOTE] That's my minor OCD I mentioned. LOL Basically, the integrity of the system relies on the fact that the person doing the double-check doesn't know the full residue of the first check, otherwise why mask it at all? And while it's difficult to bypass the mechanism that makes sure results are authentic, it's not impossible... the only thing standing in the way there is that someone up to no good doesn't know the masked byte of the residue. Even then, if someone wanted to be obnoxious and cheat, they could create multiple accounts, but just like with door locks, the real intent is to keep honest people honest even if it won't really stop a determined bad guy. Meanwhile, there is a CPU id based in the unique characteristics of the system (the GUID that shows up in the local.txt file) so it is possible to differentiate between different anonymous computers, but all that tells us is that one or more people we don't know used one or more CPUs to arrive at the same residue. (The GUID can and does change when the OS is reinstalled, hardware changes are made like a replacement drive, etc.) Anyway, yeah, my definition of self-verified *does* include the group of anonymous users, simply for the reason that it *might* be the same person. :smile: |
[QUOTE=Madpoo;445819]Anyway, yeah, my definition of self-verified *does* include the group of anonymous users, simply for the reason that it *might* be the same person. :smile:[/QUOTE]
...which *might* be true for any group of other users as well. ;) |
[QUOTE=ramgeis;445821]...which *might* be true for any group of other users as well. ;)[/QUOTE]
True. But few people have so much compute resources available to be able to fake their own results across two or more accounts. Such patterns and behaviour would be quickly noticed and then scrutinised. The graph between first time tests and second time tests would quickly be noticed. Besides, what would be the point? |
[QUOTE=ramgeis;445821]...which *might* be true for any group of other users as well. ;)[/QUOTE]
True enough. It's been a while, but at one point I did do some looking into whether that could be the case, either by accident or other. I forget the details of what I was looking for... things like 2 tests of the same exponent that were both done by the same software version, same hardware, but different users (and I think some other things too, to keep the false matches down). It was kind of a needle-in-a-haystack situation though, and one where you're not sure there's a needle at all in there to be found, so I didn't push it further. |
[QUOTE=retina;445187]I notice the Cat 4 assignments do not fetch the lowest available exponents. According the the [url=http://www.mersenne.org/thresholds/]thresholds[/url] page cat 4 starts at 42702950 as of today. But newer assignments skip over many available exponents and primenet is giving out exponents in the 50M range. It appears as though older expired assignments above the threshold are left in limbo. Is this an oversight or by design?[/QUOTE]And now it is happening again. I get 50M exponents now while more than 27000 exponents are available in the 45M and 46M ranges.
|
[QUOTE=retina;451007]And now it is happening again. I get 50M exponents now while more than 27000 exponents are available in the 45M and 46M ranges.[/QUOTE]
Anonymous recycle rule changed to 45 days instead of 60. |
[QUOTE=Prime95;451012]Anonymous recycle rule changed to 45 days instead of 60.[/QUOTE]I just now got another 50M exponent.
|
Only anonymous users should be getting 50M exponents.
I've reduced the 45 day delay in reassigning expireds to 30 days. I now see 11000+ exponents available between 44M and 50M. Anonymous users are excluded from 8000 exponents since anonymous did the first LL test. |
[QUOTE=Prime95;451176]Only anonymous users should be getting 50M exponents.[/QUOTE]:hello:
|
| All times are UTC. The time now is 23:27. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.