mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU to 72 (https://www.mersenneforum.org/forumdisplay.php?f=95)
-   -   Stuck assignment (https://www.mersenneforum.org/showthread.php?t=18750)

Mark Rose 2013-10-24 21:27

Stuck assignment
 
So in the assignments page, I have:

Manual 50000387 LL TF 69 72 2013-10-23 15:06 1 16.73

But if you look at [url=http://www.mersenne.org/report_exponent/?exp_lo=50000387]mersenne.org for 50000387[/url] it shows complete to 72.

chalsall 2013-10-24 21:46

Yeah... Not quite sure how that happened.

I'm guessing that since you tend to ask for the lowest TF level, you've managed to discover a "temporal window" bug on GPU72, in-between the time a low LL is completed and it's release back to the system.

Please feel to throw that candidate back. I'll drill down when I have some time (sorry -- currently dealing with several screaming children...).

blahpy 2013-10-24 21:47

I'm thinking that what happened is you reserved 69 to 72, but then you or MISFIT or whatever submitted it as 69 to 70, 70 to 71, 71 to 72. Since these are technically different assignments to just 69 to 72, it wouldn't have taken it off the list.

edit: That, or what chalsall said.

Mark Rose 2013-10-25 01:11

[QUOTE=chalsall;357326]Yeah... Not quite sure how that happened.

I'm guessing that since you tend to ask for the lowest TF level, you've managed to discover a "temporal window" bug on GPU72, in-between the time a low LL is completed and it's release back to the system.

Please feel to throw that candidate back. I'll drill down when I have some time (sorry -- currently dealing with several screaming children...).[/QUOTE]

I'll hold onto it until it expires so no one else wastes time factoring it. Hopefully you can look into it by then.

I ask for up to 72 via mfloop.py. But if I see anything lower, like I did earlier today, I'll manually grab it all and stick it on the GTX 760.

[QUOTE=blahpy;357327]I'm thinking that what happened is you reserved 69 to 72, but then you or MISFIT or whatever submitted it as 69 to 70, 70 to 71, 71 to 72. Since these are technically different assignments to just 69 to 72, it wouldn't have taken it off the list.

edit: That, or what chalsall said.[/QUOTE]

It's possible mfloop.py did that. It submits 70->72 jobs all the time without issue, but those are in the 64-69M range. I'll manually merge the result lines and see what happens... and nope, didn't clear it.

Mark Rose 2013-11-04 16:19

I've got the same thing now with [URL="http://www.mersenne.org/report_exponent/?exp_lo=50000243"]50000243[/URL].

Seems to be a problem with the 50M range.

chalsall 2013-11-04 16:40

[QUOTE=Mark Rose;358374]Seems to be a problem with the 50M range.[/QUOTE]

Arg!!! Sorry.

I don't understand how you're being assigned LLTF in the 50M range. It must be a race condition, or some other Stupid Programmer Error on my part.

I've "told" Spidy to watch that range more closely; perhaps that will help.

Could you please tell me how you're asking for these assignments? Are you using the GPU72 manual assignment page, MISFIT, or some other automatic assignment methodology? This will help me drill down on this issue.

What is a bit strange is you're the only one who is experiencing this. But then, "strange" is where problems are found, and discoveries are made. :smile:

chalsall 2013-11-04 17:01

[QUOTE=chalsall;358376]Could you please tell me how you're asking for these assignments?[/QUOTE]

OK, I went through some of the system's logs, and see that you're using (either directly or through a spider) the manual assignment page.

I've added a quick hack to ensure that nothing below 60M is assigned for LLTFing, which should prevent this issue. Although I would really like to understand how it occurred in the first place -- I'm suspecting a race condition in between when a LL is completed by a GPU72 worker and the candidate is released back to Primenet.

I will drill down further into the logs when I have some more time; currently I don't.

Mark Rose 2013-11-04 17:33

In both cases, the assignment fetching was using with mfloop.py from [url]https://github.com/teknohog/primetools[/url], which was called from cron like this:

5 * * * * /home/lol/primetools/mfloop.py -e 72 -u shifted -p lolno -U shifted -P nuh-uh -n 4 -t 0 -w /home/lol/mfaktc

Mark Rose 2013-11-04 17:35

Is there anything special that happens at 5 minutes past the hour? I offset my cron calls to be nicer on the server, assuming most people would pick 0.

chalsall 2013-11-04 17:51

[QUOTE=Mark Rose;358386]In both cases, the assignment fetching was using with mfloop.py...[/QUOTE]

Thank you for that information. Useful. The script (which I haven't studied) appears to fetch only one assignment at a time; thus more regularly than MISFIT. This supports my theory that what we're seeing here is a race condition.

To be clear, this was my error. teknohog has brought a tool to Linux users which I had promised, but wasn't able to deliver because of other pressing matters. I thank him for his work and contribution.

[QUOTE=Mark Rose;358386]Is there anything special that happens at 5 minutes past the hour?[/QUOTE]

Only that GPU72 doesn't "talk" to Primenet between 55 minutes after the hour and 10 minutes after the hour. Primenet is busy during that time.

[QUOTE=Mark Rose;358386]I offset my cron calls to be nicer on the server, assuming most people would pick 0.[/QUOTE]

It would actually be better if you set your cron job to be at some time between 30 to 45 minutes after the hour.

Mark Rose 2013-11-04 18:59

[QUOTE=chalsall;358390]Thank you for that information. Useful. The script (which I haven't studied) appears to fetch only one assignment at a time; thus more regularly than MISFIT. This supports my theory that what we're seeing here is a race condition.

To be clear, this was my error. teknohog has brought a tool to Linux users which I had promised, but wasn't able to deliver because of other pressing matters. I thank him for his work and contribution.
[/quote]

It does fetch multiple assignments in a single execution. I don't know if it makes multiple API calls; I haven't checked either as it "just works". If you're seeing multiple requests from me, it may be because I'm running multiple copies, one for each card I have. I also run mfloop.py hourly because I like to see my stats updated more often.

[quote]
Only that GPU72 doesn't "talk" to Primenet between 55 minutes after the hour and 10 minutes after the hour. Primenet is busy during that time.

It would actually be better if you set your cron job to be at some time between 30 to 45 minutes after the hour.[/QUOTE]

Done :)


All times are UTC. The time now is 15:09.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.