![]() |
[QUOTE=chalsall;427583]Perhaps implement something like the "3 results in 90, rolling average of 1500" metric for Cat 1 and the "2 results in 120, average of 1500" metric for Cat2? Handy because the rolling average variable is easy to tweak over time.[/QUOTE]
I rather object to the Rolling Average of 1500 requirement. It seems to me Rolling Average is not a good measure of actual productivity. I have two machines with very similar productivity in terms of GHz-days/day, but their respective rolling averages are currently 1389 and 678, respectively. Both machines have been getting cat 1 LL assignments and completing them well before the expiration time for cat 1 LL. It perhaps could be argued that these machines are at a performance level more suited to cat 2, but I certainly don't think it makes sense to demote them to cat 3, which would be what Chris's proposal would cause them to get, at least based upon their current Rolling Average values The one with the lower Rolling Average is a laptop with thermal throttling issues that limits its overall throughput. I use a Throttle setting to try to keep thermal throttling from slowing down the processor too much, and set its hours/day setting to approximately compensate for that. But I guess my main point is two machines with similar overall throughput have two very different Rolling Average values (and both are below the 1500 value). On the other hand, I am also guessing I could change some settings (such as the hours/day value) and possibly get a quite different Rolling Average value. I then might get them to meet the required threshold while doing nothing as far as their actual productivity. But it seems kind of silly to me to have to do this. |
[QUOTE=Madpoo;427547][*]Rolling average of 1500 or higher (no strangely slow systems)[/QUOTE]
How is this rolling average calculated? My Haswell-E 5960X has "RollingAverage=1112" in local.txt and it is faster than most "normal" systems, except your big Xeon servers with lots of cores. Edit: I found the answer myself in this old thread: [url]http://www.mersenneforum.org/showthread.php?t=8652[/url] So this is not an comparable estimate of overall speed, it is just measuring how fast your computer is running compared to how fast Prime95 assumed it would be running, and if you set the "Hours per day this program will run" to less than 24 hours and then run it 24 hours anyway you can inflate the rolling average up to a max of 4000. |
Rolling average is not a reliable measure. On my i7-5820K it is at 3600 and the computed completion dates are realistic. Sometimes, after Prime95 is restarted the rolling average drops to around 1000 which gives unrealistic completion dates. Those fluctuation of that value have existed for a long time...
Jacob |
[QUOTE=S485122;427625]Rolling average is not a reliable measure.[/QUOTE]
OK, fine. Then we use a different metric. Perhaps simply the number of completions over the last (say) 90 days. The revised algorithm is simply being discussed at this point. Let's converge on what would (or, at least, might) work best, and maybe give it a whirl. |
[QUOTE=chalsall;427640]OK, fine. Then we use a different metric. Perhaps simply the number of completions over the last (say) 90 days.
The revised algorithm is simply being discussed at this point. Let's converge on what would (or, at least, might) work best, and maybe give it a whirl.[/QUOTE] Yeah, the rolling average, to me, is a quick and dirty way of assessing whether the system is performing at "expected" speed. I'd certainly consider anything below 1000 to be underperforming since that's the definition of the metric itself...it should be near 1000 (although it's typically higher in my experience). I know it's inaccurate as heck but it's better than nothing. Maybe 1500 is too much to expect, but I think it should still be above 1000. I get what cuberbruce is saying about the one system that has a rolling average of 1389, but the one that's at 678 I would humbly suggest is not ideally suited for cat 1 assignments. Technically I suppose it's still (probably) cable of completing a first time LL assignment in 90 days, or a DC in 60 days, but it's (literally) a half-hearted attempt. :smile: I should be less controversial... LOL... if the assignment completes in the 60/90 days reqiured, I suppose that's all that matters. |
[QUOTE=Madpoo;427656]Yeah, the rolling average, to me, is a quick and dirty way of assessing whether the system is performing at "expected" speed.[/QUOTE]
On reflection, after looking at some actual stats on machines that are eligible to get cat 1 work right now and then looking at their rolling averages, it really is all over the place. There are systems with unusually low rolling averages, like one with 82. It's completed 2 results in the past 120 days though, and a CPU speed of 2.8 GHz. No idea why it's rolling average is so low... that's really bizarre. If Prime95 isn't even running, that's fine, it won't affect the rolling average, you just won't be getting any progress. So something must be happening there that it's going so slow when it's actually running. Thermal throttling, something else taking nearly all the cycles, etc. All I know is, the rolling average is kind of a snapshot in time... if it's low for some spurious reason, it won't stay low once things are back to normal. But if it's really *SUPER* low like < 500, maybe getting cat 1 isn't the best idea for it right then? Next time it gets an assignment it might be just fine and there'll be no problems. Anyway, besides using that as a quick metric, reported by the client itself, the other option is to look at the # of results over a period of time. There's probably some clever calculation I could do to take the rolling average, cpu speed, and the ratio of cores to workers into account and come up with a "performance factor" that could approximate how many LL or DC results it would be expected to clear in a set period of time. It's also worth noting that a system might have completed a couple of smaller double-checks in the past 120 days, which would then qualify it to get a larger first time LL cat 1 assignment, but there's no basis to assume that completing a couple smaller tests at the rate of one every 2 months means it can complete a 67M exponent in 3 months time. See the problem there? When looking at the past # of results in some time frame, the *size* of those tests should be calculated... were they similar in size to the type of work they want (DC or LL)? So there is that aspect to it I hadn't considered until just now... If/how to judge suitability for first time LL assignments if their recent history only has smaller DC exponents... A system that can barely do 1 DC test every 2 months is not likely to be able to do a single cat 1 LL test in 3 months... I don't know if that's actually an issue. Someone doing DC work will probably stick with that, but then again someone doing DC might be looking for a change and ask for first time tests and they may not be up to that task. Hmm... maybe just looking at the ghz days they've done in the past 120 days instead of just how many results they've turned in. |
[QUOTE=Madpoo;427657]There are systems with unusually low rolling averages, like one with 82. It's completed 2 results in the past 120 days though, and a CPU speed of 2.8 GHz.
No idea why it's rolling average is so low... that's really bizarre. If Prime95 isn't even running, that's fine, it won't affect the rolling average, you just won't be getting any progress. So something must be happening there that it's going so slow when it's actually running. Thermal throttling, something else taking nearly all the cycles, etc.[/QUOTE] If "Hours per day this program will run" is set to 24hours but the user is only running it 2 hours per day: 1000* 2/24 = 83 That is why this is a poor metric for speed since it is so dependent on that variable being correct. |
[QUOTE=ATH;427669]If "Hours per day this program will run" is set to 24hours but the user is only running it 2 hours per day: 1000* 2/24 = 83
That is why this is a poor metric for speed since it is so dependent on that variable being correct.[/QUOTE] My current line of thinking is to look at their (GHz-days / # of workers) over the past 90 days. The way I figure it, a 67M LL test will take ~ 170 GHz-days. And since the cat 1 LL rules say it should be done in 90 days, we should be looking for systems that have done at least that much (per worker) in the past 90 days... right? Otherwise why would we assume they'd finish this new assignment in time? So then, who even cares about the # of results in that 90 days, just look at the actual ghz-days they clocked and go on that? Plus the stuff I'd like to see, such as "no expirations and no bad results" in that same 90 day period. For DC work, it's 60 days and I think 35M exponents are taking 47 Ghz-days so we'd be looking for at least that much in 60 days for them to get DC cat 1. For cat 2, similar thing, just looking at their past XX days and if they've done at least YY ghz-days in that time, per worker. How is that sounding? Are we getting closer to something that people could be happy with? I figure this is probably the best idea so far because we're actually looking at past *real* performance and using that to estimate the likelihood they'd finish a cat 1 or 2 in the time allowed. |
[QUOTE=Madpoo;427673]My current line of thinking is to look at their (GHz-days / # of workers) over the past 90 days.[/QUOTE]
This all sounds good. But at that point, we should probably get rid of discrete coarse-grained "Cats" altogether and just give them an exponent that is far enough advanced that they'll complete them before it becomes a milestone-blocker. (This is easier said than done; I haven't fully thought out how to forecast when an exponent becomes a milestone blocker). |
This is going in the right direction !
The amount of work done in the past and the absence of expired assignments and bad results are indeed good metrics for what we want to achieve. I would prefer that those be measured over a longer period than 90 days : it would also remove the necessity of returning 2 results. Let us say the measure should at least be over 90 days for DC and 120 for LL. Jacob |
[QUOTE=Madpoo;427673]How is that sounding? Are we getting closer to something that people could be happy with? I figure this is probably the best idea so far because we're actually looking at past *real* performance and using that to estimate the likelihood they'd finish a cat 1 or 2 in the time allowed.[/QUOTE]
I think this is sounding really good. Real performance metrics as observed by the server are always going to be better than what the client reports. If you really wanted to go all out you could do what "axn" suggested, and try to calculate the trending of trailing edge completion, and assign candidates to machines such that they /should/ complete just and exacting in time. But... I would suggest that would be much more work and much more difficult to get right. Sticking with the current categories and associated expiry rules for each would probably be best (at least, for now). Also, perhaps set things up such that machines given Cat 1 are expected to complete within, say, 45 days rather than the 90 days allowed. Same thing with Cat 2; 75 days expected rather than the 150 days allowed. This way you would have fewer candidates chugging along at the trailing edges of the waves. |
| All times are UTC. The time now is 23:13. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.