mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

Reply
 
Thread Tools
Old 2015-08-26, 00:21   #89
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

9,767 Posts
Default

Quote:
Originally Posted by Dubslow View Post
I wouldn't let it go, this is a bug somewhere in someone's code, which means it could happen again (wasting more cpu time). We (by which I mean not me) should be looking into the root cause and possible fixes for it.
OK. Let's get "Down and Dirty". This is what the GPU72 proxy saw with regards to this:

Code:
mysql> select * from Traffic where Client like "%A3BA119A8F7E3C0F4CB2F4B11C16DCFA%" or Server like "%A3BA119A8F7E3C0F4CB2F4B11C16DCFA%";
+---------+------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+---------------------+
| ID      | Type | Log  | Client                                                                                                                                                                                                           | Server                                                                                                                                                                                                           | Date                | Sent                | James               |
+---------+------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+---------------------+
| 6207082 | NULL | NULL | http://v5.mersenne.org/v5server/?v=0.95&px=GIMPS&t=ga&g=621a95c6678557e6154afcde3cfeae90&c=0&ss=18467&sh=C42C8B775F6AACBDD1533ED7AD3D8585                                                                        | pnErrorResult=0
pnErrorDetail=Server assigned Lucas Lehmer primality double-check work.
g=621a95c6678557e6154afcde3cfeae90
k=A3BA119A8F7E3C0F4CB2F4B11C16DCFA
A=1
b=2
n=34698809
c=-1
w=101
sf=71
p1=1
==END==

 | 2015-08-20 11:43:46 | 0000-00-00 00:00:00 | 0000-00-00 00:00:00 | 
| 6207083 | NULL | NULL | http://v5.mersenne.org/v5server/?v=0.95&px=GIMPS&t=ap&g=621a95c6678557e6154afcde3cfeae90&k=A3BA119A8F7E3C0F4CB2F4B11C16DCFA&c=0&p=0.0000&d=86400&e=1724589&ss=6334&sh=E09A349C8FBB850985DF5C822D9B2F03           | pnErrorResult=0
pnErrorDetail=SUCCESS
==END==

                                                                                                                                                                  | 2015-08-20 11:43:47 | 0000-00-00 00:00:00 | 0000-00-00 00:00:00 | 
| 6207093 | NULL | NULL | http://v5.mersenne.org/v5server/?v=0.95&px=GIMPS&t=ap&g=621a95c6678557e6154afcde3cfeae90&k=A3BA119A8F7E3C0F4CB2F4B11C16DCFA&c=0&p=0.0000&d=86400&e=1724317&ss=24661&sh=7AFEA889EDE86DDE2BDD09F4CFA59C2E          | pnErrorResult=0
pnErrorDetail=SUCCESS
==END==

                                                                                                                                                                  | 2015-08-20 11:49:47 | 0000-00-00 00:00:00 | 0000-00-00 00:00:00 | 
| 6209157 | NULL | NULL | http://v5.mersenne.org/v5server/?v=0.95&px=GIMPS&t=ap&g=621a95c6678557e6154afcde3cfeae90&k=A3BA119A8F7E3C0F4CB2F4B11C16DCFA&stage=LL&c=0&p=0.9050&d=86400&e=1466470&ss=28218&sh=043E7AF28EADCA209183645E4F2BA502 | pnErrorResult=0
pnErrorDetail=SUCCESS
==END==

                                                                                                                                                                  | 2015-08-22 13:03:50 | 0000-00-00 00:00:00 | 0000-00-00 00:00:00 | 
| 6210202 | NULL | NULL | http://v5.mersenne.org/v5server/?v=0.95&px=GIMPS&t=ap&g=621a95c6678557e6154afcde3cfeae90&k=A3BA119A8F7E3C0F4CB2F4B11C16DCFA&stage=LL&c=0&p=15.7541&d=86400&e=1194268&ss=6599&sh=035D23194788BF56F16F269535885AFF | pnErrorResult=0
pnErrorDetail=SUCCESS
==END==

                                                                                                                                                                  | 2015-08-23 19:03:15 | 0000-00-00 00:00:00 | 0000-00-00 00:00:00 | 
| 6211574 | NULL | NULL | http://v5.mersenne.org/v5server/?v=0.95&px=GIMPS&t=ap&g=621a95c6678557e6154afcde3cfeae90&k=A3BA119A8F7E3C0F4CB2F4B11C16DCFA&stage=LL&c=0&p=36.2721&d=86400&e=801621&ss=31343&sh=DB4AA25A61AF38DB3F9D100F331C13A1 | pnErrorResult=43
pnErrorDetail=ap: no such assignment key, GUID: 621a95c6678557e6154afcde3cfeae90, key: A3BA119A8F7E3C0F4CB2F4B11C16DCFA
==END==

                                                               | 2015-08-25 09:44:24 | 0000-00-00 00:00:00 | 0000-00-00 00:00:00 | 
+---------+------+------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+---------------------+---------------------+
I am pretty sure that this doesn't release any sensitive information.

Equally, I have no idea why this candidate was assigned by Primenet and then declared invalid.

(If, on the other hand, this was a bug on my end, please let me know.)
chalsall is offline   Reply With Quote
Old 2015-08-26, 00:32   #90
TObject
 
TObject's Avatar
 
Feb 2012

34·5 Posts
Cool

Quote:
Originally Posted by Madpoo View Post
Looks like someone checked in the matching DC. When that happens, any assignments are expired since they're no longer needed:
M34698809

The result that came in today did have an assignment for it. Was your assignment the one that expired back in May of 2014 but you were still working on it anyway? I'm guessing not since that one hasn't checked in since March of 2014 and was only 0.1% done at that time.
It used to be that LL tests completed not by an official client displayed residues in small letters on the status page. This one is displayed in capital letters, I wonder why the change…
TObject is offline   Reply With Quote
Old 2015-08-26, 02:23   #91
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

226658 Posts
Default

Quote:
Originally Posted by TObject View Post
It used to be that LL tests completed not by an official client displayed residues in small letters on the status page. This one is displayed in capital letters, I wonder why the change…
No change. P95 was used. Some related issue: there was a time when P95 used to keep the exponents if the work was already started (and cancel/abandon those in the queue whose keys became invalid, only if the work for them wasn't started yet). If the behavior changed with the new versions of P95, then someone should tell me and I will immediately roll back. This test should have been kept running, and recorded as DC (or TC, etc) and the credit given to the user. We do lots of (credited) TCs anyhow. I would be quite upset if it happens to me!

I hope one still could use N/A as a key, to work some unassigned expo (otherwise I would have to disconnect from the net during the tests )

OTOH, magic with DCs assigned through GPU72 proxy used to happen in the past - that is why I gave up, and I am now requesting DC work directly from PrimeNet - this was long discussed in a nearby topic. From the log shown above, however, it seems that is not a proxy problem, it was not a PrimeNet thing either, because the exponent was completed, so the key became nil. But P95, the local P95, should continue the work, if it was started (and replace the key with N/A), or at least, don't delete the ckpoint files, so the user may continue if he wants. Otherwise is bullshit, I* will look for exponents ready to complete (completed 80% or so), put my GPUs on them (a good GPU need less than half a day for one DC) and sabotage the credit for the assignee, I know few guys for who I would do that

------
* The "general I", not me in particular.

Last fiddled with by LaurV on 2015-08-26 at 02:39 Reason: many...
LaurV is offline   Reply With Quote
Old 2015-08-26, 03:24   #92
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default

Quote:
Originally Posted by LaurV View Post
...OTOH, magic with DCs assigned through GPU72 proxy used to happen in the past - that is why I gave up, and I am now requesting DC work directly from PrimeNet - this was long discussed in a nearby topic. From the log shown above, however, it seems that is not a proxy problem, it was not a PrimeNet thing either, because the exponent was completed, so the key became nil. But P95, the local P95, should continue the work, if it was started (and replace the key with N/A), or at least, don't delete the ckpoint files, so the user may continue if he wants. Otherwise is bullshit, I* will look for exponents ready to complete (completed 80% or so), put my GPUs on them (a good GPU need less than half a day for one DC) and sabotage the credit for the assignee, I know few guys for who I would do that ...
Oooo, evil.

It seems like in this case, I'm not convinced the assignment ever showed up on the Primenet server (as in, it didn't get logged in the assignment table, regardless of what the response code back to the GPU72 proxy was). I'll have to look back at the raw logs for that date and see what showed up.

The reason I say that... let's say you have an assignment for something, and it gets poached. The way the server handles that is to "soft" expire the assignment. It gets a date stamp in the database for when it expired, and gives an expiration reason of "poached".

If the user continues the assignment and checks it in, the result may not really be needed, but the user will still get credit for it.

This is true in other cases... if you're doing a test and someone "poaches" it by doing extra TF or P-1 and finds a factor, they check it in and all LL/DC assignments get expired the same way, but if you check in your result eventually, it's accepted and you'll get credit, it was just not needed anymore.

In this case though, I don't even see a "soft expired" entry. The only time an assignment actually gets removed from the database is when the result is checked in, or if it's manually removed using the website, or I guess if the user quits GIMPS? In other words, purposeful things.

I guess there could be something else going on... not sure what. Maybe the logs will help me when I look through them.
Madpoo is offline   Reply With Quote
Old 2015-08-26, 03:46   #93
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

331310 Posts
Default

Quote:
Originally Posted by chalsall View Post
OK. Let's get "Down and Dirty". This is what the GPU72 proxy saw with regards to this:

[CODE]| 6207082 | NULL | NULL | http://v5.mersenne.org/v5server/?v=0.95&px=GIMPS&t=ga&g=621a95c6678557e6154afcde3cfeae90&c=0&ss=18467&sh=C42C8B775F6AACBDD1533ED7AD3D8585 | pnErrorResult=0
pnErrorDetail=Server assigned Lucas Lehmer primality double-check work.
g=621a95c6678557e6154afcde3cfeae90
k=A3BA119A8F7E3C0F4CB2F4B11C16DCFA
A=1
b=2
n=34698809
c=-1
w=101
sf=71
p1=1
==END==

| 2015-08-20 11:43:46 | 0000-00-00 00:00:00 | 0000-00-00 00:00:00 |
Well, I found the Primenet log entry:
Code:
2015-08-20 15:43:45 107.155.126.203 GET /v5server/ v=0.95&px=GIMPS&t=ga&g=621a95c6678557e6154afcde3cfeae90&c=0&ss=18467&sh=C42C8B775F6AACBDD1533ED7AD3D8585 80 - <your ip address was here> - - v5.mersenne.org 200 0 0 363 202 202
I can't see the server response though. One thing I do notice... the proxy server must not be synced to an NTP time source like Primenet. There's a 1 second difference (the Primenet stamps are UTC).

So... yeah, I can't really explain what happened there. Primenet got the assignment okay. I guess if I'm curious enough I can restore a previous backup of the DB from a few days ago and look at that table just to see that, yeah, it really was in there and just got expunged for whatever reason.
Madpoo is offline   Reply With Quote
Old 2015-08-26, 04:28   #94
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

CF116 Posts
Default

Quote:
Originally Posted by Madpoo View Post
So... yeah, I can't really explain what happened there. Primenet got the assignment okay. I guess if I'm curious enough I can restore a previous backup of the DB from a few days ago and look at that table just to see that, yeah, it really was in there and just got expunged for whatever reason.
So, I just restored a copy of the DB from a couple days ago (Sunday). I see that assignment in there:
Code:
aid: A3BA119A8F7E3C0F4CB2F4B11C16DCFA
exponent: 34698809
assigned: 2015-08-20 15:43:46.413
last updated: 2015-08-22 17:03:50.233
next expected: 2015-08-23 17:03:50.233
est. completion: 2015-09-08 16:25:00.233
% done: 0.9
stage: LL
So... it did get assigned. And then for whatever reason it got reassigned to someone else on "2015-08-24 01:48:54.020"

The server wouldn't have reassigned it if that original assignment were in place and still going. It's impossible... if it's assigned, it's not available for assignment to anyone else (excepting overlapping ECM work).

My best guess? Somehow, something cancelled that assignment made on the 20th. And given it's low #, it would have been reassigned almost right away.

Think you could look in the GPU72 logs again and see if there was anything passing through there later on that may have sent a request to cancel that assignment? I'd roll through some diff logs as well as the full backup, but that's like work for me.

If I look at the IIS logs, I can see that on the 23rd it checked in again and updated the % done to 15.75%. And then on the 24th at 01:22:06 (UTC) it connected and sent a t=au for that assignment. I'm not sure what t=au is... t=ap is an update to the assignment for the latest progress, I saw that, but I'd have to dig through that code and see what t=au means. I'm guessing it's an assignment/unassign ? It came from the same GPU72 proxy IP address (50.21.x.x)

On the 25th it tried to connect again and update with the t=ap command . That must be when it found out it was unassigned.

Hope that helps. And now everyone has a weirdly fresh and blood and guts look at how the servers all talk to each other. LOL

This is why you shouldn't share your assignment ID's with anyone... someone could send a command to unassign that work leaving you high and dry. Well, at least in this case, since someone else had got the assignment and finished it real quick. Otherwise it would have accepted the unknown assignment ID (or generated a new one?)
Madpoo is offline   Reply With Quote
Old 2015-08-26, 04:43   #95
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

72×197 Posts
Default

Quote:
Originally Posted by Madpoo View Post
The reason I say that... let's say you have an assignment for something, and it gets poached. The way the server handles that is to "soft" expire the assignment. It gets a date stamp in the database for when it expired, and gives an expiration reason of "poached".

If the user continues the assignment and checks it in, the result may not really be needed, but the user will still get credit for it.

This is true in other cases... if you're doing a test and someone "poaches" it by doing extra TF or P-1 and finds a factor, they check it in and all LL/DC assignments get expired the same way, but if you check in your result eventually, it's accepted and you'll get credit, it was just not needed anymore.

In this case though, I don't even see a "soft expired" entry. The only time an assignment actually gets removed from the database is when the result is checked in, or if it's manually removed using the website, or I guess if the user quits GIMPS? In other words, purposeful things.
Yes, what you describe is exactly what it should be, and this is the correct and the fair way. Let me do quadruple checks if I want to waste my time and money (maybe a checkbox with "I know what I am doing", or a local.txt key should ensure this - but it was exactly why the N/A key was introduced, wasn't it?), and in case I am doing some legal-assigned DC, don't interrupt me (and rob me from the credit) if some poacher does my work faster.

The Prime95 I know used to erase the work from the worktodo if it was not needed anymore and if it was not started. There should not be any changes in the newer versions (are they any?). We were talking in the past about "notifying the user" (maybe he doesn't want to waste his time anymore, for not-needed work, and he will willingly switch to the next exponent in the queue), and even about canceling the current text (to save time), but if something like that was (or will be) implemented in P95, then the checkpoint files should not be deleted, to allow the user to continue the work later, if he wants.
LaurV is offline   Reply With Quote
Old 2015-08-26, 13:13   #96
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

9,767 Posts
Default

Quote:
Originally Posted by Madpoo View Post
Think you could look in the GPU72 logs again and see if there was anything passing through there later on that may have sent a request to cancel that assignment?
What I posted above is everything related to that candidate / assignment. There was no "unassign" request made -- the client was told it no longer had it when it checked in.

Further, there's no record of that candidate being assigned to anyone except kladner.

Definitely strange.
chalsall is offline   Reply With Quote
Old 2015-08-26, 18:43   #97
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default

Quote:
Originally Posted by chalsall View Post
What I posted above is everything related to that candidate / assignment. There was no "unassign" request made -- the client was told it no longer had it when it checked in.

Further, there's no record of that candidate being assigned to anyone except kladner.

Definitely strange.
Well, I looked through the code and I can confirm that a t=au parameter when calling the Primenet API will indeed delete the assignment.

That request came through the GPU72 proxy, so I guess the question is where did that originate? I've never used the proxy, but I'm assuming clients point to it, and it will pass along requests to Primenet if/when needed, or handle assignments using it's own pool of work it has at the ready.

If I had to guess, a request from the end client to unassign (t=au) some work came in and that was passed along to the Primenet server. Your logs around that same time might show that original request since it did pass through the proxy, so you could see if it's the same IP address as "kladner" normally uses. Feel free to PM me for any additional info you might need to help narrow it down... we may be boring everyone else here with all of the inside baseball chats (or maybe they enjoy it?)
Madpoo is offline   Reply With Quote
Old 2015-08-26, 19:11   #98
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

3,313 Posts
Default

Quote:
Originally Posted by Madpoo View Post
...blah blah...
Well, I looked again at the web logs, and comparing the different updates (t=ap) to the unassign request (t=au) revealed something interesting.

The assignment updates all had the CPU guid of kladner's machine. The unassign request actually came from a different user's machine: KYOJI_KAMEI, and the public CPU name of the machine in question is "unreserve"

Which is weird, because it had that assignment ID. Now, the assignment ID's are 128-bit GUIDs so the odds of any 2 being the same are pretty astronomically low... and all of them did come through the GPU72 proxy.

I'll once again assume the best intentions, especially since that other user is a prolific contributor, and somehow maybe it got the wrong assignment ID and that got things jumbled up when it was trying to unassign one of it's actual exponents? Especially since you didn't see that specific request when you ran the query... makes it sound like that wasn't the AID his machine sent (but that's definitely the AID that Primenet received).

Here's the partial GUID of the other cpu so you could query for anything from it around that UTC time when it was removed:
20fc911afa622d279ac97...

Oh, and I remembered where I saw the Primenet API info:
Web API Specification ... it's in the menu of the site under the more info/help area.
Madpoo is offline   Reply With Quote
Old 2015-08-26, 20:16   #99
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

100111101011102 Posts
Default

Quote:
The unassign request actually came from a different user's machine: KYOJI_KAMEI, and the public CPU name of the machine in question is "unreserve"

Which is weird, because it had that assignment ID.
Having a machine name of 'unreserve' is kind of strange. It reminds me of the instance of P95 I keep named 'P95_Stress.' Still, this user did not complete the assignment. Apparently, this user just caused it to be thrown back.
kladner is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Stuck Completed Assignments NickOfTime GPU to 72 8 2014-11-25 19:14
Completed 29M work not showing as completed in GPU72 Chuck GPU to 72 2 2013-02-02 03:25
GPU to 72 assignments completed prematurely? ixfd64 GPU to 72 33 2012-12-09 07:43
passing on partially completed exponent to another user PLeopard Data 9 2003-10-28 17:06
30M to 30.1M Completed Axel Fox Lone Mersenne Hunters 0 2003-06-09 13:13

All times are UTC. The time now is 02:21.


Mon Aug 2 02:21:43 UTC 2021 up 9 days, 20:50, 0 users, load averages: 2.40, 2.46, 2.09

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.