mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU to 72 (https://www.mersenneforum.org/forumdisplay.php?f=95)
-   -   GPU to 72 status... (https://www.mersenneforum.org/showthread.php?t=16263)

chalsall 2020-05-16 15:40

[QUOTE=Runtime Error;545507]It seems that a notebook instance likes to first start a new exponent upon launch, and if it finishes, it will move on to any partially completed jobs. But unless it gets a T4, it will not finish the job within 12 hours. I currently have a handful of partially completed 81-bit jobs, and this evening's notebooks just started fresh exponents. Do you have any advice?[/QUOTE]

A new instance will first be re-issued work partially completed, in descending order. However, an assignment is only re-issued after no updates for 30 minutes.

This can lead to a bit of a "queue" if someone does the "Factory Reset" trick a few times -- anything which hasn't had work done on it is thrown back into the pool, but if the instance isn't reset within two minutes it will report back some progress, and then the candidate is held until completion.

My advice is to just stick with it -- all work will (eventually) be completed.

P.S. Oh, also... I set up a P-1 assignment for myself in 332M as a test. 17 days on the lone CPU core... I don't think it will make sense to make this worktype available to the Colab instances.

Runtime Error 2020-05-16 16:37

[QUOTE=chalsall;545531]A new instance will first be re-issued work partially completed, in descending order. However, an assignment is only re-issued after no updates for 30 minutes.

This can lead to a bit of a "queue" if someone does the "Factory Reset" trick a few times -- anything which hasn't had work done on it is thrown back into the pool, but if the instance isn't reset within two minutes it will report back some progress, and then the candidate is held until completion.

My advice is to just stick with it -- all work will (eventually) be completed.

P.S. Oh, also... I set up a P-1 assignment for myself in 332M as a test. 17 days on the lone CPU core... I don't think it will make sense to make this worktype available to the Colab instances.[/QUOTE]

Got it. I've been cycling until I get P100s, that makes sense. Thanks!

And wow 17 days = 34+ days with the time limitations. Ouch!

chalsall 2020-05-17 17:19

[QUOTE=Runtime Error;545534]Got it. I've been cycling until I get P100s, that makes sense. Thanks![/QUOTE]

So, I've been thinking about how to handle these long-running tasks (and the restarting of same) in a better way.

One thing which could be done immediately would be to drop the number of assigned tasks per instance down to two, instead of three. The chances of a factor being found immediately after the start of the next job are very, very small, so mfaktc would (almost) never run out of work.

However, something to put out there... It would also be possible to only assign a single job at a time. The downside to this is there would be about 30 seconds of wasted compute between a job being finished, the next job being fetched, and mfaktc starting up again (along with the short self-test).

Thoughts? Perhaps make this optional, on a per-instance basis?

axn 2020-05-17 17:45

[QUOTE=chalsall;545633]The downside to this is there would be about 30 seconds of wasted compute between a job being finished, the next job being fetched, and mfaktc starting up again (along with the short self-test).[/QUOTE]
The horror, the horror!

James Heinrich 2020-05-17 17:58

I think the benefit of having fewer half-done assignments hanging around is well worth losing a minute a day or thereabouts.

Runtime Error 2020-05-17 18:51

[QUOTE=chalsall;545633]So, I've been thinking about how to handle these long-running tasks (and the restarting of same) in a better way.

One thing which could be done immediately would be to drop the number of assigned tasks per instance down to two, instead of three. The chances of a factor being found immediately after the start of the next job are very, very small, so mfaktc would (almost) never run out of work.

However, something to put out there... It would also be possible to only assign a single job at a time. The downside to this is there would be about 30 seconds of wasted compute between a job being finished, the next job being fetched, and mfaktc starting up again (along with the short self-test).

Thoughts? Perhaps make this optional, on a per-instance basis?[/QUOTE]

Sweet! I currently have 9 in-progress assignments at the 81-bit level. One of them has been stuck at 95% for a few days, presumably due to me cycling. The one-assignment per notebook rule for 81-bits would be welcomed!

Also, I just got kicked out of Colab after 1hr30min :sad:

Uncwilly 2020-05-18 03:47

GPUto72 is not seeing (in the stats) the factor found up in the 332M range on one of my colab sessions.

kladner 2020-05-18 04:10

[QUOTE=Runtime Error;545646]Sweet! I currently have 9 in-progress assignments at the 81-bit level. One of them has been stuck at 95% for a few days, presumably due to me cycling. The one-assignment per notebook rule for 81-bits would be welcomed!

[U]Also, I just got kicked out of Colab after 1hr30min [/U]:sad:[/QUOTE]
That hasn't happened to me, yet. However, I am feeling that "THEY" are onto my dodges around usage limits. The GPU Nazi is giving me a lot of "No GPU for You!" I only seem to be able to run GPUs on my paid account, and that has shot its wad, at the moment.

EDIT: But when I turned off GPU in the settings of all the notebooks, and then tried running, I got 4 CPU/P-1 instances running with RAM showing 'BUSY' on all of them. I think I was getting more push-back when I left GPUs enabled, when I knew it wasn't going to let me have them. There were more 'excess sessions' warnings and it seemed the system would only allow one. This preemptive disabling of GPUs might help with free accounts, too. I can usually run two notebooks on free accounts. Trying to run multiple accounts simultaneously has not worked out for me at least, so 2 (free) or 4 (paid) notebooks are the limits in my experience. I would be happy to hear if others have gotten more going. (I know that chalsall and others are running multiple instances through VPNs and other more abstruse means, but I am not running in that class so far. Ignorance and laziness stand in the way.)

chalsall 2020-05-18 20:37

[QUOTE=Runtime Error;545646]The one-assignment per notebook rule for 81-bits would be welcomed![/QUOTE]

OK, version 0.423 is now in production. Does the one assignment at a time thing.

A bit of wasted compute (as a %) for the shorter running jobs, but I'm too busy at the moment on other things to make this smarter. I have mapped out a solution of giving the next assignment a few minutes before the first expires, but it will take a little while to implement.

chalsall 2020-05-18 20:50

[QUOTE=Uncwilly;545684]GPUto72 is not seeing (in the stats) the factor found up in the 332M range on one of my colab sessions.[/QUOTE]

Copy. Will drill down in the next 48 hours.

Uncwilly 2020-05-18 23:19

[QUOTE=chalsall;545767]Copy. Will drill down in the next 48 hours.[/QUOTE]
Specifically I should have said the graph.


All times are UTC. The time now is 22:45.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.