![]() |
Suddenly I'm getting only trivial TF tests
I decided that probably it was more sensible on my slightly-old 12-physical-core hardware to run 12 double-checks, one per thread, rather than one double-check over 12 threads.
I edited local.txt to have WorkerThreads=12 ThreadsPerTest=1 rather than the other way around, and restarted mprime And it started collecting trial-factor-to-67-bits jobs for numbers around 212.383 million, fifteen for each of the eleven threads that weren't working on the single double-check that I had assigned. Each of these jobs seems to take about 55 minutes; presumably they'd take a few minutes on a GPU, so I can't see why I'm doing them at all. This is odd, because I have WorkPreference=101 which I thought meant 'only give me double-checks'; is hardware which takes twenty days to do a double-check (but which will do twelve double-checks in parallel over those twenty days) now so totally obsolete that it should be given only make-work? |
"... [I]is hardware which takes twenty days to do a double-check (but which will do twelve double-checks in parallel over those twenty days) now so totally obsolete that it should be given only make-work?[/I]"
Certainly not. Log in to Primenet Server, go to My Account -> CPUs and make sure the correct work type is selected for all threads on that particular machine. |
You can also check the settings here:
[URL="http://www.mersenne.org/thresholds/?setting=1"]http://www.mersenne.org/thresholds/?setting=1[/URL] if you log in with your account. |
That only means that, [U]in case you choose LL or DC work[/U], you´ll get the smallest available numbers, if your machine meets the stated requirements.
To actually check what type of work the server will assign, you better look at the page I mentioned in my previous post. |
[QUOTE=lycorn;412873]"... [I]is hardware which takes twenty days to do a double-check (but which will do twelve double-checks in parallel over those twenty days) now so totally obsolete that it should be given only make-work?[/I]"
Certainly not. Log in to Primenet Server, go to My Account -> CPUs and make sure the correct work type is selected for all threads on that particular machine.[/QUOTE] On the CPUs page the line for the computer says 'D' under preferred work type; but when I go to the page for the specific computer it has one thread down as D and all the rest as TF-LMH. I have set them all to 'D' and will see what happens - there's no immediate reaction, but I guess I have fifteen hours * 11 cores of p~212M TF-to-2^67 queued up. I was expecting that anything I set in local.txt would override anything the server might decide, but that doesn't seem to be the case in the specific situation where I increase the number of workers on a single computer. There seem to be rather more computers doing TF-LMH jobs than I would expect given how well-suited those jobs are to GPU, so I wonder if this bug has bitten other people - could someone with database access check if there are many computers with all-but-one thread doing TF-LMH? |
[QUOTE=fivemack;412927]On the CPUs page the line for the computer says 'D' under preferred work type; but when I go to the page for the specific computer it has one thread down as D and all the rest as TF-LMH. I have set them all to 'D' and will see what happens[/QUOTE]
Bingo! I´m quite sure that will do the trick. |
It did - I now have nine DC tasks queued on that computer, I'm sure it will go up to twelve as the queue of TF-LMH ones drains
|
[QUOTE=fivemack;412927]...could someone with database access check if there are many computers with all-but-one thread doing TF-LMH?[/QUOTE]
I did a quick check... 8423 machines (from non anonymous users) have more than one work type. A bunch of 8 different work types...they're really covering their bases I guess. One anonymous user's CPU had 11 different kinds spread between the different workers (32 total workers). That's when I decided not to count anon users, but if you did, the total CPUs with multiple types goes up to 11,455. :smile: |
Perhaps the real question here should be [i]why[/i] is TF being given out to ordinary CPUs as a default work type?
|
[QUOTE=retina;413130]Perhaps the real question here should be [I]why[/I] is TF being given out to ordinary CPUs as a default work type?[/QUOTE]
Yes. |
[QUOTE=Madpoo;412986]I did a quick check...
8423 machines (from non anonymous users) have more than one work type. A bunch of 8 different work types...they're really covering their bases I guess. One anonymous user's CPU had 11 different kinds spread between the different workers (32 total workers). That's when I decided not to count anon users, but if you did, the total CPUs with multiple types goes up to 11,455. :smile:[/QUOTE] Thanks for that. Is there any way you can do the more specific query of whether a user has precisely two work types, with one of them running on only one core? I think there is actually an underlying bug here - when you increase the number of cores by editing local.txt, the new ones get allocated the wrong work type. |
@fivemack: Are you using a proxy (like gpu72 or so) on your prime95 connection settings? We discussed here at lengths about this problem. For example, one of my 4 cores is getting P-1 assignments when I use a proxy, but when no proxy is used, all cores get the correct "first time LL" assignments. I always blamed Chris for it :razz: but if you are right, it may have nothing to do with him...
We could not get rid of this behavior, even if we repeatedly switched the work type, per total and per core, and reduced the amount of memory to 8MB (to forbid the P-1 assignments), we are still getting P-1 assignments when we use the proxy. The solution was not to use the proxy, in spite of the fact that we are "losing" gpu72 credit if we take the assignments directly from PrimeNet. Also, it may be related or not, when a bunch of P-1 results is reported, one of them always appears as "expired" (usually the first, but is not a rule), and all the other appear as "completed" on the server. This is certainly a bug, almost-harmless, but still a bug. I say almost, because the effect is that the expired assignment is not deleted from worktodo, and there is a risk it will be worked again (wasting time) if the computer is not manually attended. |
No proxy, I'm running mprime directly.
|
[QUOTE=retina;413136] Perhaps the real question here should be why is TF being given out to ordinary CPUs as a default work type?[/QUOTE]
I've seen this a lot too. Anytime you go from say, 12 threads up to 24, and assign LL or DC it'll take TF instead. The only way to fix it is to let it run through its workload, or set the job type in the web interface, quit and rejoin. Seems like a bug instead of a feature to me. edit:meant to quote retina and kladner, edited quote. |
[QUOTE=aurashift;413156]I've seen this a lot too. Anytime you go from say, 12 threads up to 24, and assign LL or DC it'll take TF instead. The only way to fix it is to let it run through its workload, or set the job type in the web interface, quit and rejoin. Seems like a bug instead of a feature to me.[/QUOTE]
I'll admit that I have no idea why it would do that. :smile: I do my work manually getting assignments, updating the worktodo files and manually reporting them in. Easier for me since most of the time I'm picking and choosing assignments using queries of the DB. I'd have to take a peek at what the local.txt file looks like when adding additional workers to the mix. If I'd had to guess I would think it'd default to "whatever makes the most sense" and the server would look at the system and give it whatever. I can't imagine it would hand out TF to a CPU worker, but who knows. If you've ever changed that for your other workers to pick first time/DC/whatever, then add additional workers, could just be that something is defaulting in a way that's not ideal. If we can get George's attention on this he could probably spit out the answer before I type another keystroke, so before I go poking around I'll see if he's available to weigh in. |
[QUOTE=Madpoo;413181]I'll admit that I have no idea why it would do that. :smile: I do my work manually getting assignments, updating the worktodo files and manually reporting them in. Easier for me since most of the time I'm picking and choosing assignments using queries of the DB.
I'd have to take a peek at what the local.txt file looks like when adding additional workers to the mix. If I'd had to guess I would think it'd default to "whatever makes the most sense" and the server would look at the system and give it whatever. I can't imagine it would hand out TF to a CPU worker, but who knows. If you've ever changed that for your other workers to pick first time/DC/whatever, then add additional workers, could just be that something is defaulting in a way that's not ideal. If we can get George's attention on this he could probably spit out the answer before I type another keystroke, so before I go poking around I'll see if he's available to weigh in.[/QUOTE] Okay, I lied... I poked a little at the client side to see what it does. I have a system with 2 workers and when I first check the box to use Primenet, both workers default to "whatever makes the most sense". There aren't any worker specific entries in prime.txt. Then I set worker #1 to first time, #2 to double-check, and I get this in prime.txt: [CODE][Worker #1] WorkPreference=100 [Worker #2] WorkPreference=101[/CODE] I then add a 3rd worker, set it to do double-checks and I get this as expected: [CODE][Worker #3] WorkPreference=101[/CODE] I add a 4th worker and don't set the type of work to do, and there is no "Worker #4" entry in prime.txt. I then go ahead and let it connect to primenet (I had it blocked up 'til now)... Workers 1 and 2 already had some assignments so nothing new there. Worker #3 got a double-check like I told it to. Worker #4 now gets this entry in prime.txt: [CODE][Worker #4] WorkPreference=0[/CODE] Which is the same as "whatever makes the most sense". It got a double-check assigned. So, I tried to replicate some of the things y'all mentioned with adding workers, but I never did get any TF. I admit, it's not exhaustive... I tried other scenarios but it always seemed to properly add either a work pref of zero to my new workers, or it wasn't there at all and defaulted to zero the next time it talked to primenet or the client restarted. I'd suggest looking at your prime.txt file and see what kind of work preference is set for each worker. Here's the key for what's what: [CODE]type description 0 what makes sense [default] 1 trial factoring LMH 2 trial factoring 4 factor P-1 large 5 factor ECM small 6 factor ECM Fermat 100 LL first test 101 LL double-check 102 LL test for world record 103 LL test 10+ million digits 104 LL test 100+ million digits 105 LL test with no factoring[/CODE] (not all of those may be implemented on the client and/or the server) |
The difference with your test is probably that they just have the "WorkPreference=" in prime.txt not in local.txt under each worker.
|
[QUOTE=ATH;413206]The difference with your test is probably that they just have the "WorkPreference=" in prime.txt not in local.txt under each worker.[/QUOTE]
That's where I saw the "WorkPreference=" entries, in the prime.txt under each worker, not in local.txt. I suppose it would have made more sense for it to be in the local.txt along with other worker specific things (# of threads, affinity, etc) but I just called it like I saw it. :smile: (in other words, if people are adding those entries to the local.txt where they might logically seem to go, they'd be wrong) |
[QUOTE=Madpoo;413212]That's where I saw the "WorkPreference=" entries, in the prime.txt under each worker, not in local.txt. I suppose it would have made more sense for it to be in the local.txt along with other worker specific things (# of threads, affinity, etc) but I just called it like I saw it. :smile:
(in other words, if people are adding those entries to the local.txt where they might logically seem to go, they'd be wrong)[/QUOTE] Sorry I forgot there is "[Worker #1]" sections in prime.txt as well, so I thought you were talking about local.txt. But they probably had "WorkPreference=" in the main section of prime.txt not down under each worker. |
[QUOTE=ATH;413216]Sorry I forgot there is "[Worker #1]" sections in prime.txt as well, so I thought you were talking about local.txt. But they probably had "WorkPreference=" in the main section of prime.txt not down under each worker.[/QUOTE]
Gotcha. And to be honest, I've taken a peek a couple of times at the assignment code on the server and there are a lot of decisions happening in there... it's looking at all kinds of variables about the speed, reliability and history of the CPU making the assignment request, what kinds of work preferences it has, the currently available pool of exponents in the different categories, etc. It was enough to make my head spin a bit, to the point where I found it nearly impossible to manually walk through the process and predict what kind of work and exponent it would offer up. However, that being said, I can't imagine any circumstance where it would normally assign TF work to a CPU... it should almost always find at least a doublecheck assignment for it unless that worker is specifically set to get TF work. As best I could figure, it'll go down the list and if there simply aren't any exponents in the requested type (like first time checks, record breaking stuff), it defaults down to the next best thing, and so on and so on until finally it'll spit out a TF assignment, but like I said, I just don't see that happening unless all available exponents were simply unavailable for any kind of LL work. That's not to say it couldn't happen as a result of some strange twist in the decision tree that I'm missing, or a flaw in the client that makes it default to requesting TF work for new workers on a system. Basically I'm saying the server is probably okay in it's handling of things, and it's something on the client side that needs a tweak. Best way to tell would be to capture the request itself where an LL assignment is expected but it got back TF instead. Looking at the request from the client would reveal all, however that would mean the client side would need to be capturing that traffic with Wireshark or something as it's communicating with Primenet. The server logs might give a clue... I'm not sure how much of the request is passed in the URL itself using all of those query parameters listed in the Primenet API. Might be worth looking at. |
[QUOTE=Madpoo;413242]...
The server logs might give a clue... I'm not sure how much of the request is passed in the URL itself using all of those query parameters listed in the Primenet API. Might be worth looking at.[/QUOTE] Okay, well, I just looked at the server logs. I can't see what gets sent back to the client, but the URL that it logs has the basic request info like the CPU's unique identifier, which CPU (worker) is requesting work. On the server itself it keeps track of what type of work each worker prefers. I tracked down the machine fivemack was having this problem with and here's what I can tell from looking at the logs from last week: Oct 15 @ 20:19 UTC: request to update the computer info (version, cpu details, etc), followed by an update to an existing assignment Oct 15 @ 22:50 UTC: request to change set the number of worker threads to 12 Oct 15 @ 22:50-22:53 UTC: a bunch of requests to get assignments for workers 1-12 For the rest of Oct 15 the system was returning it's TF results and getting new ones Flash forward to Oct 17 @ 20:20 UTC when that system sends a bunch of requests to update the individual worker preferences. It's actually @ 20:28 UTC when I see the request come in to set all worker types to "101" (double check). Essentially all I can see is that mprime updated the server to say "hey, I've got 12 workers now" and then started requesting assignments for them all. Based on the fact that these are new workers without any preference set for them at all, it boils down to what the default on the server would be for them and then the type of CPU it is if indeed it defaults to "whatever makes the most sense". It wasn't until 2 days later that the program communicated with the server and set the preference. So, I guess for now if you're adding more workers, you'll want to set the work type again *after* doing so, so it can communicate that info to the server. That system from fivemack had previously set the number of workers to 1 on Oct 1, and then set the work type to DC back on Oct 4, but apparently adding the additional workers without any preference was causing the issue. Now that gives me more to go on... I can look at how it handles updates to the number of workers and see if it's creating new database entries with a default type of "0" (whatever makes the most sense) or if maybe they're not created until an option is actually set for them and would return NULL for any requests to get that workers preference. It could even be a race condition I suppose, if the num of workers is set and then requests for assignments are made immediately after. Probably not though since the requests for each worker came in over a period of several minutes, and it surely wouldn't have taken more than a split second to create some new entries for each additional worker. |
[QUOTE=Madpoo;413250]...
Now that gives me more to go on... I can look at how it handles updates to the number of workers and see if it's creating new database entries with a default type of "0" (whatever makes the most sense) or if maybe they're not created until an option is actually set for them and would return NULL for any requests to get that workers preference. [/QUOTE] Worked my way through the process that handles a request to update the number of workers... it does indeed create a new DB entry for each new one, and it will either use the user's default work type (if set), or "whatever makes the most sense". In fivemack's case, there's no "user default work type" set, so it should have defaulted to whatever makes sense. Maybe it actually is assigning TF work in some cases... that'd be weird. So, yeah... not really sure what happened there. Could be that it thought the CPU was good for nothing more than TF? It's one of these: "AMD Opteron(tm) Processor 6168" |
Also, there is an account-wide work-type default setting that interacts with the work-type settings; workers without an explicit setting get the default work type enumerated in the account-wide setting. Or used to at least - I can't find that setting now...
|
[QUOTE=Madpoo;413257]Could be that it thought the CPU was good for nothing more than TF? It's one of these: "AMD Opteron(tm) Processor 6168"[/QUOTE]
That's a five year old CPU. Better than my current desktop at home, actually. It's more than fine for LL. |
where is this global preference setting you speak of? And I'm getting it on brand newish Xeons (Ivy Bridge) so it shouldn't be a generational thing.
|
[QUOTE=aurashift;413281]where is this global preference setting you speak of? And I'm getting it on brand newish Xeons (Ivy Bridge) so it shouldn't be a generational thing.[/QUOTE]
Good question... I see a place where that can go in the table for user info, but can't see anywhere on the site to set it. Weird. Maybe that column in the table isn't what I think it is. LOL It might be telling that out of 140,000+ user accounts, only 800 have that value set to something besides NULL. In fact, I think what I'm looking at may be the setting that users can check to say they want to get preferred assignments (lowest available). Hmm... Well, that's weird. Not sure what the thing was I saw in the account update code where it could set the new worker "work preference" to some kind of account default, if such an account default doesn't exist. Maybe it was a planned feature that isn't there yet. Anyway, back to the investigation. I'll see if I can't find an actual answer for this without bugging George. :smile: |
[QUOTE=Madpoo;413306]Good question... I see a place where that can go in the table for user info, but can't see anywhere on the site to set it. Weird. Maybe that column in the table isn't what I think it is. LOL
It might be telling that out of 140,000+ user accounts, only 800 have that value set to something besides NULL. In fact, I think what I'm looking at may be the setting that users can check to say they want to get preferred assignments (lowest available). Hmm... Well, that's weird. Not sure what the thing was I saw in the account update code where it could set the new worker "work preference" to some kind of account default, if such an account default doesn't exist. Maybe it was a planned feature that isn't there yet. Anyway, back to the investigation. I'll see if I can't find an actual answer for this without bugging George. :smile:[/QUOTE] Ahh...I totally missed a *different* per-user setting in the database, and it really is a setting for the default work type for a user. "fivemack", your default work type is 1 (trial factoring LMH). "aurashift", yours is also 1. Going back to the 140,000+ accounts, 22,000+ have some default work type set. Methinks this is a bug. In the code for creating a new user, it's setting this to 1 for all new users even though there's a comment that mentions setting it to "whatever makes the most sense. I'll email George and James for clarification on what to do about it. I've got details on it and proposed solutions... |
:tu: You did a very good job! Hope this is fixed and saves us the future headache.
|
[QUOTE=Madpoo;413309]Ahh...I totally missed a *different* per-user setting in the database, and it really is a setting for the default work type for a user.
[snip] Methinks this is a bug. In the code for creating a new user, it's setting this to 1 for all new users even though there's a comment that mentions setting it to "whatever makes the most sense. I'll email George and James for clarification on what to do about it. I've got details on it and proposed solutions...[/QUOTE] I seem to recall there was a page that explained the hierarchy of worktype settings as well; might not have been on the same page that changed the setting. This all may have disappeared around the time the assignment recycling rules were modified, but it might have disappeared even further back when the web pages were cleaned up. Apparently it wasn't an often used setting/page. |
Many thanks to Madpoo for tracking down the bug!
|
[QUOTE=Madpoo;413250]
I tracked down the machine fivemack was having this problem with and here's what I can tell from looking at the logs from last week: Oct 15 @ 20:19 UTC: request to update the computer info (version, cpu details, etc), followed by an update to an existing assignment Oct 15 @ 22:50 UTC: request to change set the number of worker threads to 12 Oct 15 @ 22:50-22:53 UTC: a bunch of requests to get assignments for workers 1-12 For the rest of Oct 15 the system was returning it's TF results and getting new ones Flash forward to Oct 17 @ 20:20 UTC when that system sends a bunch of requests to update the individual worker preferences. It's actually @ 20:28 UTC when I see the request come in to set all worker types to "101" (double check). [/quote] That's interesting - the server doesn't seem to make a distinction between changes made by the computer's master to its configuration (Oct 15 @ 22:50), automated requests made by the client running on a computer (Oct 15 @ 22:50-22:53), and requests made by the computer's master manually at a Web page (Oct 17 @ 20:28). |
[QUOTE=fivemack;413326]Many thanks to Madpoo for tracking down the bug![/QUOTE]
No problem, it was definitely a weird one. Okay, so here's the long and short of it: Back in April 2014, the server started defaulting user accounts to "TF LMH" work unless the CPU specified something else. I think this was just a typo or confusion over the fact that in the database, zero means "do what makes sense" and one means "TF LMH", and it was mistakenly set to 1. Fortunately Prime95/mprime will default to whatever makes sense... if not for that, this would have been discovered much sooner when *all* new users started getting TF work. :smile: The only time this seemed to rear it's ugly head was when an existing CPU had more workers added... it left them in just enough of an ambiguous state that it picked up that *account* default and passed out the TF work as reported by some of you. After chatting with George, I've fixed the new account creation code to leave the account default empty, and also changed all the user accounts created since April of 2014 to set that to NULL as well. On the server, the code that handles what happens when a new worker shows up for a CPU would look for the presence of that "account default", and if it was set (which now it won't be), it would end up using that in most cases. The preference order stated in the code is to honor the worker-specific setting first, then the account default, then a team default (hasn't been implemented), and then finally "whatever makes sense" if nothing else said otherwise by that point. I think it was more a quirk in the way a new worker versus a new CPU was being handled which made it act funny. I *think* that a new worker on an existing system might still ask for "whatever makes sense"... I suppose that's better than always asking for TF LMH work though. And that's because adding workers will only tell the server about the new ones, but Prime95 isn't sending the info that says "oh, and set these new workers to type XYZ" at the same time. If in doubt, when you add additional workers, try also setting their work type specifically. Either in the program "Worker Windows" section, or adding entries to "prime.txt" for each worker section as I noted earlier. And if you're the type of GIMPSer who really, truly wants to have their account default set to something in particular like double-checking or first-time or 100M, let me know and I can update your account default accordingly. However, since that account default wasn't really ever setup or implemented, I can't say that it'll actually be around for long. Maybe if enough people showed an interest in it we could make that setting visible on the site somewhere so you can change it. George's fear was that the concept of an account default work type wouldn't be readily understandable or at least not well-used, to the point where it made sense to bubble that setting up to the user settings page. |
[QUOTE=Madpoo;413529]And if you're the type of GIMPSer who really, truly wants to have their account default set to something in particular like double-checking or first-time or 100M, let me know and I can update your account default accordingly.[/QUOTE]
I'd really like first time tests, please. Kudos for the work. A+!!1 |
[QUOTE=aurashift;413552]I'd really like first time tests, please. Kudos for the work. A+!!1[/QUOTE]
You should be set for that automatically then, based on the "what makes sense". I'm pretty sure (and someone please correct me if I'm wrong) that "what makes sense" will pick first time LL tests by default as long as your system isn't ancient. If you come across a situation where you've added additional workers, or even just a plain default install using the "what makes sense" default and you get something besides first time LL tests, let me know and we'll tackle that. |
[QUOTE=Madpoo;413560]You should be set for that automatically then, based on the "what makes sense". I'm pretty sure (and someone please correct me if I'm wrong) that "what makes sense" will pick first time LL tests by default as long as your system isn't ancient.
If you come across a situation where you've added additional workers, or even just a plain default install using the "what makes sense" default and you get something besides first time LL tests, let me know and we'll tackle that.[/QUOTE] On two of my twelve-cores where I went from six to twelve LL workers I got all DC's. Other than that, looked good. |
| All times are UTC. The time now is 23:28. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.