mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Cloud Computing (https://www.mersenneforum.org/forumdisplay.php?f=134)
-   -   Google Diet Colab Notebook (https://www.mersenneforum.org/showthread.php?t=24646)

chalsall 2020-01-16 16:10

[QUOTE=axn;535234]Do you run CPU-only work when you don't get a GPU backend? If not, can you (on the theory that this will help the Google overmind unlearn your usage pattern)?[/QUOTE]

In order, not currently, and yes -- when I have some cycles. /Really/ busy with "real" work at the moment...

Dylan14 2020-01-16 18:39

[QUOTE=chalsall;535232]
This tells me that while Colab have not blocked the GPU72_TF Notebook specifically, it does appear there is some kind of a "lifetime" GPU quote. It will be interesting to observe what happens to others doing GPU work in the future.
[/QUOTE]


Have you tried making a copy of the GPU72 notebook and seeing if the copy is able to fetch a GPU (perhaps after a short wait)? As maybe the "limits" are notebook specific.

chalsall 2020-01-16 18:57

[QUOTE=Dylan14;535250]Have you tried making a copy of the GPU72 notebook and seeing if the copy is able to fetch a GPU (perhaps after a short wait)? As maybe the "limits" are notebook specific.[/QUOTE]

I tried creating a brand new Notebook on an entirely new account. Killed in ~25 minutes, never to be given a GPU backend ever again...

kriesel 2020-01-16 19:56

[QUOTE=kriesel;535214]21+ hours and running on dual instance, cpu and K80 gpu, mprime and mfaktc, top -d 120 last output:[CODE]Htop - 14:11:57 [B]up 21:07[/B], ...
[/CODE](No complaints, just have never seen it do that before.)[/QUOTE]New personal record duration, 26 hours 9-11 minutes for one colab cpu/gpu session, as indicated by top after termination.

[CODE]Htop - 19:14:11 up [B]1 day, 2:09[/B], 0 users, load average: 1.02, 1.04, 1.00

%Cpu(s): 1.2 us, 1.3 sy, 49.9 ni, 47.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 13335188 total, 9153228 free, 1100268 used, 3081692 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 12126072 avail Mem


421 root 30 10 564648 416868 7928 S 99.5 3.1 598:38.16 mprime
416 root 20 0 36.335g 94060 82872 S 0.7 0.7 4:19.71 mfaktc.e+
127 root 20 0 653980 153232 62208 S 0.5 1.1 3:32.14 python3
27 root 20 0 405212 101000 26152 S 0.1 0.8 0:54.88 jupyter-+
9 root 20 0 691856 63768 24944 S 0.0 0.5 0:24.02 node
9739 root 20 0 1388044 37948 25264 S 0.0 0.3 0:03.63 drive [/CODE]

Uncwilly 2020-01-17 15:15

Just got a T4.

Let's see how long it lasts......

[Update] Completed a 101,000,000 73->74 Must be AFT for a while. Will report in when I get back.

De Wandelaar 2020-01-17 19:04

Also got a T4 on one of my accounts. Was a long long time …

On the other account, I receive the message 'No GPU backend available' as usual

[Edit] : Killed after 20 min

chalsall 2020-01-17 19:21

[QUOTE=De Wandelaar;535366]Also got a T4 on one of my accounts. Was a long long time … On the other account, I receive the message 'No GPU backend available' as usual[/QUOTE]

Hmmm... I haven't seen a GPU backend for over a week, so I've been having my four local accounts running CPU work. Today after almost exactly five hours of running, all four disconnected.

I just happened to notice this within minutes and reconnected. Three of the four reported the "No GPU available; want a CPU?" message, which resulted in the CPU tasks continuing without manually restarting.

However, one of them didn't do this, and simply gave me a back-end. On a hunch, I ran the GPU72_TF Section, and it happily started TF'ing away on a full T4 (read: ~1700 GHzD/D) which survived for 36 minutes.

So, perhaps the quote is a sliding value? :unsure:

Uncwilly 2020-01-17 20:56

[QUOTE=Uncwilly;535344]Just got a T4.
[Update] Completed a 101,000,000 73->74 Must be AFT for a while. Will report in when I get back.[/QUOTE]About 35 minutes total.

chalsall 2020-01-17 21:45

[QUOTE=chalsall;535368]So, perhaps the quote is a sliding value? :unsure:[/QUOTE]

OK... More empirical on this theory...

I tried getting GPU backends on my two SOCKS tunneled accounts which Google /shouldn't/ know are associated with my four Barbados based accounts. Neither had gotten a backend for about two weeks.

The RPi environment got a full T4, and is still running after an hour. The CentOS virtualized environment also got a full T4, but disconnected after 32 minutes. Reconnecting was successful, and is still running after 40 minutes.

I'll let everyone know how long these survive for, but there's hope! :tu:

Edit: RPi lasted for 70 minutes; CentOS lasted for 60.

kriesel 2020-01-18 01:40

Google drive preview file size limit?
 
I was routinely viewing log files for Colab runs accumulating by append from run to run, by double clicking. Today one hit the point where it no longer worked. It's 706KB. Google Docs will open it but ordinary preview won't any longer.

storm5510 2020-01-18 17:23

My instance run-times have dropped off greatly in the past month. When I started, the typical was nine hours. I had one which ran eleven. Now, somewhere between three and four is the max. I do not run either notebook every day. I alternate between two. I do now know if this is the reason for the drop-off or it is something else.


All times are UTC. The time now is 22:55.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.