mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Suddenly I'm getting only trivial TF tests (https://www.mersenneforum.org/showthread.php?t=20549)

LaurV 2015-10-20 07:36

@fivemack: Are you using a proxy (like gpu72 or so) on your prime95 connection settings? We discussed here at lengths about this problem. For example, one of my 4 cores is getting P-1 assignments when I use a proxy, but when no proxy is used, all cores get the correct "first time LL" assignments. I always blamed Chris for it :razz: but if you are right, it may have nothing to do with him...

We could not get rid of this behavior, even if we repeatedly switched the work type, per total and per core, and reduced the amount of memory to 8MB (to forbid the P-1 assignments), we are still getting P-1 assignments when we use the proxy. The solution was not to use the proxy, in spite of the fact that we are "losing" gpu72 credit if we take the assignments directly from PrimeNet.

Also, it may be related or not, when a bunch of P-1 results is reported, one of them always appears as "expired" (usually the first, but is not a rule), and all the other appear as "completed" on the server. This is certainly a bug, almost-harmless, but still a bug. I say almost, because the effect is that the expired assignment is not deleted from worktodo, and there is a risk it will be worked again (wasting time) if the computer is not manually attended.

fivemack 2015-10-20 11:01

No proxy, I'm running mprime directly.

aurashift 2015-10-20 14:46

[QUOTE=retina;413136] Perhaps the real question here should be why is TF being given out to ordinary CPUs as a default work type?[/QUOTE]

I've seen this a lot too. Anytime you go from say, 12 threads up to 24, and assign LL or DC it'll take TF instead. The only way to fix it is to let it run through its workload, or set the job type in the web interface, quit and rejoin. Seems like a bug instead of a feature to me.

edit:meant to quote retina and kladner, edited quote.

Madpoo 2015-10-20 18:17

[QUOTE=aurashift;413156]I've seen this a lot too. Anytime you go from say, 12 threads up to 24, and assign LL or DC it'll take TF instead. The only way to fix it is to let it run through its workload, or set the job type in the web interface, quit and rejoin. Seems like a bug instead of a feature to me.[/QUOTE]

I'll admit that I have no idea why it would do that. :smile: I do my work manually getting assignments, updating the worktodo files and manually reporting them in. Easier for me since most of the time I'm picking and choosing assignments using queries of the DB.

I'd have to take a peek at what the local.txt file looks like when adding additional workers to the mix. If I'd had to guess I would think it'd default to "whatever makes the most sense" and the server would look at the system and give it whatever. I can't imagine it would hand out TF to a CPU worker, but who knows.

If you've ever changed that for your other workers to pick first time/DC/whatever, then add additional workers, could just be that something is defaulting in a way that's not ideal.

If we can get George's attention on this he could probably spit out the answer before I type another keystroke, so before I go poking around I'll see if he's available to weigh in.

Madpoo 2015-10-20 19:26

[QUOTE=Madpoo;413181]I'll admit that I have no idea why it would do that. :smile: I do my work manually getting assignments, updating the worktodo files and manually reporting them in. Easier for me since most of the time I'm picking and choosing assignments using queries of the DB.

I'd have to take a peek at what the local.txt file looks like when adding additional workers to the mix. If I'd had to guess I would think it'd default to "whatever makes the most sense" and the server would look at the system and give it whatever. I can't imagine it would hand out TF to a CPU worker, but who knows.

If you've ever changed that for your other workers to pick first time/DC/whatever, then add additional workers, could just be that something is defaulting in a way that's not ideal.

If we can get George's attention on this he could probably spit out the answer before I type another keystroke, so before I go poking around I'll see if he's available to weigh in.[/QUOTE]

Okay, I lied... I poked a little at the client side to see what it does.

I have a system with 2 workers and when I first check the box to use Primenet, both workers default to "whatever makes the most sense". There aren't any worker specific entries in prime.txt.

Then I set worker #1 to first time, #2 to double-check, and I get this in prime.txt:
[CODE][Worker #1]
WorkPreference=100

[Worker #2]
WorkPreference=101[/CODE]

I then add a 3rd worker, set it to do double-checks and I get this as expected:
[CODE][Worker #3]
WorkPreference=101[/CODE]

I add a 4th worker and don't set the type of work to do, and there is no "Worker #4" entry in prime.txt.

I then go ahead and let it connect to primenet (I had it blocked up 'til now)...
Workers 1 and 2 already had some assignments so nothing new there. Worker #3 got a double-check like I told it to. Worker #4 now gets this entry in prime.txt:
[CODE][Worker #4]
WorkPreference=0[/CODE]

Which is the same as "whatever makes the most sense". It got a double-check assigned.

So, I tried to replicate some of the things y'all mentioned with adding workers, but I never did get any TF.

I admit, it's not exhaustive... I tried other scenarios but it always seemed to properly add either a work pref of zero to my new workers, or it wasn't there at all and defaulted to zero the next time it talked to primenet or the client restarted.

I'd suggest looking at your prime.txt file and see what kind of work preference is set for each worker. Here's the key for what's what:
[CODE]type description
0 what makes sense [default]
1 trial factoring LMH
2 trial factoring
4 factor P-1 large
5 factor ECM small
6 factor ECM Fermat
100 LL first test
101 LL double-check
102 LL test for world record
103 LL test 10+ million digits
104 LL test 100+ million digits
105 LL test with no factoring[/CODE]

(not all of those may be implemented on the client and/or the server)

ATH 2015-10-21 02:51

The difference with your test is probably that they just have the "WorkPreference=" in prime.txt not in local.txt under each worker.

Madpoo 2015-10-21 03:46

[QUOTE=ATH;413206]The difference with your test is probably that they just have the "WorkPreference=" in prime.txt not in local.txt under each worker.[/QUOTE]

That's where I saw the "WorkPreference=" entries, in the prime.txt under each worker, not in local.txt. I suppose it would have made more sense for it to be in the local.txt along with other worker specific things (# of threads, affinity, etc) but I just called it like I saw it. :smile:

(in other words, if people are adding those entries to the local.txt where they might logically seem to go, they'd be wrong)

ATH 2015-10-21 05:09

[QUOTE=Madpoo;413212]That's where I saw the "WorkPreference=" entries, in the prime.txt under each worker, not in local.txt. I suppose it would have made more sense for it to be in the local.txt along with other worker specific things (# of threads, affinity, etc) but I just called it like I saw it. :smile:

(in other words, if people are adding those entries to the local.txt where they might logically seem to go, they'd be wrong)[/QUOTE]

Sorry I forgot there is "[Worker #1]" sections in prime.txt as well, so I thought you were talking about local.txt. But they probably had "WorkPreference=" in the main section of prime.txt not down under each worker.

Madpoo 2015-10-21 15:28

[QUOTE=ATH;413216]Sorry I forgot there is "[Worker #1]" sections in prime.txt as well, so I thought you were talking about local.txt. But they probably had "WorkPreference=" in the main section of prime.txt not down under each worker.[/QUOTE]

Gotcha.

And to be honest, I've taken a peek a couple of times at the assignment code on the server and there are a lot of decisions happening in there... it's looking at all kinds of variables about the speed, reliability and history of the CPU making the assignment request, what kinds of work preferences it has, the currently available pool of exponents in the different categories, etc. It was enough to make my head spin a bit, to the point where I found it nearly impossible to manually walk through the process and predict what kind of work and exponent it would offer up.

However, that being said, I can't imagine any circumstance where it would normally assign TF work to a CPU... it should almost always find at least a doublecheck assignment for it unless that worker is specifically set to get TF work.

As best I could figure, it'll go down the list and if there simply aren't any exponents in the requested type (like first time checks, record breaking stuff), it defaults down to the next best thing, and so on and so on until finally it'll spit out a TF assignment, but like I said, I just don't see that happening unless all available exponents were simply unavailable for any kind of LL work.

That's not to say it couldn't happen as a result of some strange twist in the decision tree that I'm missing, or a flaw in the client that makes it default to requesting TF work for new workers on a system. Basically I'm saying the server is probably okay in it's handling of things, and it's something on the client side that needs a tweak.

Best way to tell would be to capture the request itself where an LL assignment is expected but it got back TF instead. Looking at the request from the client would reveal all, however that would mean the client side would need to be capturing that traffic with Wireshark or something as it's communicating with Primenet.

The server logs might give a clue... I'm not sure how much of the request is passed in the URL itself using all of those query parameters listed in the Primenet API. Might be worth looking at.

Madpoo 2015-10-21 16:08

[QUOTE=Madpoo;413242]...
The server logs might give a clue... I'm not sure how much of the request is passed in the URL itself using all of those query parameters listed in the Primenet API. Might be worth looking at.[/QUOTE]

Okay, well, I just looked at the server logs. I can't see what gets sent back to the client, but the URL that it logs has the basic request info like the CPU's unique identifier, which CPU (worker) is requesting work.

On the server itself it keeps track of what type of work each worker prefers.

I tracked down the machine fivemack was having this problem with and here's what I can tell from looking at the logs from last week:
Oct 15 @ 20:19 UTC: request to update the computer info (version, cpu details, etc), followed by an update to an existing assignment
Oct 15 @ 22:50 UTC: request to change set the number of worker threads to 12
Oct 15 @ 22:50-22:53 UTC: a bunch of requests to get assignments for workers 1-12
For the rest of Oct 15 the system was returning it's TF results and getting new ones

Flash forward to Oct 17 @ 20:20 UTC when that system sends a bunch of requests to update the individual worker preferences. It's actually @ 20:28 UTC when I see the request come in to set all worker types to "101" (double check).

Essentially all I can see is that mprime updated the server to say "hey, I've got 12 workers now" and then started requesting assignments for them all.

Based on the fact that these are new workers without any preference set for them at all, it boils down to what the default on the server would be for them and then the type of CPU it is if indeed it defaults to "whatever makes the most sense".

It wasn't until 2 days later that the program communicated with the server and set the preference.

So, I guess for now if you're adding more workers, you'll want to set the work type again *after* doing so, so it can communicate that info to the server. That system from fivemack had previously set the number of workers to 1 on Oct 1, and then set the work type to DC back on Oct 4, but apparently adding the additional workers without any preference was causing the issue.

Now that gives me more to go on... I can look at how it handles updates to the number of workers and see if it's creating new database entries with a default type of "0" (whatever makes the most sense) or if maybe they're not created until an option is actually set for them and would return NULL for any requests to get that workers preference.

It could even be a race condition I suppose, if the num of workers is set and then requests for assignments are made immediately after. Probably not though since the requests for each worker came in over a period of several minutes, and it surely wouldn't have taken more than a split second to create some new entries for each additional worker.

Madpoo 2015-10-21 17:00

[QUOTE=Madpoo;413250]...
Now that gives me more to go on... I can look at how it handles updates to the number of workers and see if it's creating new database entries with a default type of "0" (whatever makes the most sense) or if maybe they're not created until an option is actually set for them and would return NULL for any requests to get that workers preference.
[/QUOTE]

Worked my way through the process that handles a request to update the number of workers... it does indeed create a new DB entry for each new one, and it will either use the user's default work type (if set), or "whatever makes the most sense".

In fivemack's case, there's no "user default work type" set, so it should have defaulted to whatever makes sense. Maybe it actually is assigning TF work in some cases... that'd be weird.

So, yeah... not really sure what happened there. Could be that it thought the CPU was good for nothing more than TF? It's one of these: "AMD Opteron(tm) Processor 6168"


All times are UTC. The time now is 23:28.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.