mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Cloud Computing (https://www.mersenneforum.org/forumdisplay.php?f=134)
-   -   Google Diet Colab Notebook (https://www.mersenneforum.org/showthread.php?t=24646)

chalsall 2019-10-05 14:11

[QUOTE=ric;527346]Are we the victims of Colab's success in crunching gpu72's allotted work? :smile: :smile:[/QUOTE]

Nope... You're the victim of yet another SPE... Sorry...

I've started the process of bringing in additional work from 73 to 74 "bits", in 97M.

This is actually a bug -- the system currently doesn't "gracefully fail" in cases like this; it should assign a different type of work range when the requested range is exhausted.

chalsall 2019-10-05 14:16

[QUOTE=EdH;527333]I see some things scroll instead of refreshing in place, such as top. This should let me play for a while. . .[/QUOTE]

Enjoy. It really is pretty cool!

And, yeah... The console is a scroll-type only; no full-screen refreshing, etc. In fact, even the "LF" trick to return the cursor to the beginning of the line to then overwrite doesn't work.

Actually, a thought just came to me... Although incoming (to the VM) TCP connections are firewalled, outgoing is fully allowed. And, thus, it should be possible to launch an "reverse tunnel" SSH connection to a server somewhere, and then be able to SSH into the VM and have a fully interactive shell.

I'm going to have to try that; it should work.

chalsall 2019-10-05 14:20

[QUOTE=axn;527352]Couple of questions:
1) Should I keep the browser tab open while things are running? What happens if I close the browser?[/QUOTE]

Yes. If you close the browser the VM detects this, and shuts down.

Note that on Kaggle you can "Commit" a Section (up to twice for GPU instances), which then runs in the background (for up to 9 hours). You can close your browser and the batch jobs continue to run.

[QUOTE=axn;527352]2) Tesla K80 is a 2x GPU in a single package. Anyone tried to run a second copy using device 1, to effectively double the throughput?[/QUOTE]

Doesn't work.

Each VM get's one of the two GPUs on the K80. Running "nvidia-smi" within the VM shows this.

Just FYI, this is the same over on GCE, AWS/EC2, etc.

axn 2019-10-05 14:43

[QUOTE=chalsall;527364]Yes. If you close the browser the VM detects this, and shuts down.

Note that on Kaggle you can "Commit" a Section (up to twice for GPU instances), which then runs in the background (for up to 9 hours). You can close your browser and the batch jobs continue to run.



Doesn't work.

Each VM get's one of the two GPUs on the K80. Running "nvidia-smi" within the VM shows this.

Just FYI, this is the same over on GCE, AWS/EC2, etc.[/QUOTE]

Thanks for the info!

ric 2019-10-05 15:05

A few comments "from the road" after using Colab for about 3 weeks (and, as usual, YMMV):[LIST][*]chalsall's notebook, integrated with gpu72, rocks :cool:[*]Colab in itself has a huge potential, not only for TF/PM1 crunching, as mentioned in some comments above[*]during these weeks, I've been served almost exclusively instances based upon Tesla K80, however I was able to run two/three instances in parallel (same IP, same MAC address), using different gmail IDs (all of which are non-professional ones)[*]Colab's instances are easily available on this side of the pond (=Europe) during nighttime (after 20:00 UTC), less so during daytime: there's likely some profiling and prioritizing made upon requests[*]once assigned a GPU, this remains available mostly for the full 12 hours period. In rare cases, instances were terminated earlier, but this happened mostly during daytime[*]FWIW, in this period I got some 10k GHzDays worth of TF: thanks, Google![*]notebooks are more suited for high bit-levels: I've run some TF work around 70-72 bits, both in Firefox and Chrome, but memory consumption - driven by progression lines, despite having set [I]PrintMode=1[/I] in [I]mfaktc.ini[/I] - grows fast, in the order of multiple GBs, which leads to the local machine swapping and becoming unresponsive[*]instances remain allocated and running even if the browser gets minimized or moved to another workspace; when the tab is closed, the instance is usually terminated within 15 minutes or so[*]Colab is effective even for CUDAPM1 work, even if GHzDays yield is around two orders of magnitude less than for TF. FWIW, a P-1 run on a 52M exponent requires about the same time of a Xeon Silver CPU (at parity of B1/B2 bounds)[/LIST]
These comments are from the top of my mind: I'll eventually update them, should something else relevant come in.

chalsall 2019-10-05 16:21

So close, but no cigar...
 
[QUOTE=chalsall;527362]Actually, a thought just came to me... Although incoming (to the VM) TCP connections are firewalled, outgoing is fully allowed. And, thus, it should be possible to launch an "reverse tunnel" SSH connection to a server somewhere, and then be able to SSH into the VM and have a fully interactive shell.[/QUOTE]

I just ***had*** to try this. Got most of the way, but...[CODE]%cd ~
!mkdir .ssh
!chmod go-rwx .ssh
!ssh-keygen -t rsa -f .ssh/id_rsa -N ""
!ls -lah
!echo
!ls -lah .ssh
!echo
!cat .ssh/id_rsa.pub[/CODE]

Run, and then copy-and-paste the public key over to a machine with a public-facing IP (and SSH port). I use unprivileged accounts for my reverse tunnels, with non-standard ports. Be sure to get the permissions correct for the "authorized_keys" file(s).

Then, run this (with ports, user and server changed to your particular configuration:[CODE]!ssh -N -T -p [SSH_PORT] -R[TUNNEL_PORT]:localhost:22 tunnel_kaggle@gpu72.com -o StrictHostKeyChecking=no -i .ssh/id_rsa[/CODE]

...and a reverse tunnel should be brought up.

From a SSH session on GPU72, I then tried a "ssh -p [TUNNEL_PORT] root@localhost" and the request failed, and the tunnel dropped.

An SSH server isn't running on the instance...

Tried installing and then starting the SSH server, and got a warning about "Failed to connect to bus: No such file or directory".

Turns out because these instances are running in Docker (or some similar) VMs, the syslog functionality isn't available and so the server won't launch.

I don't have time right now to work this further, but it should be possible to bring in as a "payload" a custom compiled OpenSSH package which will be able to launch, and listen to (local) port 22. Remote clients would then be able to attach through the reverse tunnel (which does already work).

Having a "real" interactive shell into these VMs would make payload development a whole lot easier!!!

chalsall 2019-10-05 18:24

OH MY GOD!!!
 
[QUOTE=chalsall;527371]Having a "real" interactive shell into these VMs would make payload development a whole lot easier!!![/QUOTE]

I actually got this to work!!!

I'll definitely be writing this one up! Probably easiest to do a Notebook on Github -- there are a few steps...

But this is what you see when you log into a Colab instance as root:

[CODE][chalsall@gpu72 ~]$ ssh -p [TUNNEL_PORT] root@localhost
root@localhost's password:
debug1: permanently_set_uid: 0/0
Environment:
USER=root
LOGNAME=root
HOME=/root
PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin
MAIL=/var/mail/root
SHELL=/bin/bash
TERM=xterm-256color
SSH_CLIENT=127.0.0.1 44332 22
SSH_CONNECTION=127.0.0.1 44332 127.0.0.1 22
SSH_TTY=/dev/pts/1
root@61ae30b3834b:~# who
root pts/1 Oct 5 18:19 (127.0.0.1)
root@61ae30b3834b:~# uptime
18:19:12 up 30 min, 1 user, load average: 0.02, 0.06, 0.09
root@61ae30b3834b:~# df -h
Filesystem Size Used Avail Use% Mounted on
overlay 49G 25G 22G 54% /
tmpfs 64M 0 64M 0% /dev
tmpfs 6.4G 0 6.4G 0% /sys/fs/cgroup
tmpfs 6.4G 8.0K 6.4G 1% /var/colab
/dev/sda1 55G 27G 29G 48% /etc/hosts
shm 6.0G 4.0K 6.0G 1% /dev/shm
tmpfs 6.4G 0 6.4G 0% /proc/acpi
tmpfs 6.4G 0 6.4G 0% /proc/scsi
tmpfs 6.4G 0 6.4G 0% /sys/firmware
root@61ae30b3834b:~#
[/CODE]

Edit: I almost feel guilty letting the CPU sit at 0% usage...

[CODE]top - 18:32:34 up 44 min, 1 user, load average: 0.00, 0.00, 0.02
Tasks: 9 total, 1 running, 8 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.5 us, 0.3 sy, 0.0 ni, 99.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 13335192 total, 10824260 free, 535504 used, 1975428 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 12525424 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
119 root 20 0 635844 154160 61004 S 0.7 1.2 0:11.39 python3
1 root 20 0 39196 6476 4984 S 0.0 0.0 0:00.04 run.sh
9 root 20 0 686740 55104 24888 S 0.0 0.4 0:04.09 node
24 root 20 0 403932 99240 25816 S 0.0 0.7 0:07.31 jupyter-noteboo
110 root 20 0 35888 4760 3652 S 0.0 0.0 0:00.20 tail
10710 root 20 0 56296 8776 6924 S 0.0 0.1 0:00.03 ssh
10885 root 20 0 54784 8932 6984 S 0.0 0.1 0:00.03 sshd
10893 root 20 0 18508 3524 3080 S 0.0 0.0 0:00.00 bash
11107 root 20 0 38720 3264 2800 R 0.0 0.0 0:00.03 top
[/CODE]

Happy happy dance!!! :chalsall:

axn 2019-10-06 03:04

Another question:

How do you connect your google drive in kaggle?

Prime95 2019-10-06 03:15

Thanks for the gpu72 colab scripts -- works like a champ.

Would it be possible to also install mprime and run ECM jobs from Primenet?

Dylan14 2019-10-06 04:46

[QUOTE=Prime95;527397]Thanks for the gpu72 colab scripts -- works like a champ.

Would it be possible to also install mprime and run ECM jobs from Primenet?[/QUOTE]


Indeed it is possible:


[CODE]#Notebook to run mprime on a Colab thing
import os.path
from google.colab import drive
if not os.path.exists('/content/drive/My Drive'):
drive.mount('/content/drive')

%cd 'content/drive/My Drive/'
if not os.path.exists('/content/drive/My Drive/mprime/'):
!mkdir mprime
%cd '/content/drive/My Drive/mprime//'
#fetch mprime executable if we don't have it
if not os.path.exists('mprime'):
!wget http://www.mersenne.org/ftp_root/gimps/p95v298b6.linux64.tar.gz
!tar -zxvf p95v298b6.linux64.tar.gz

#!ls
#run mprime
#first, create local.txt and prime.txt if they don't already exist
if not os.path.exists('prime.txt'):
!echo V24OptionsConverted=1 > prime.txt
!echo WGUID_version=2 >> prime.txt
!echo StressTester=0 >> prime.txt
!echo UsePrimenet=1 >> prime.txt
!echo DialUp=0 >> prime.txt
#change the user ID to your own or use ANOYNOMUS to work anoynomusly
!echo V5UserID=Dylan14 >> prime.txt
!echo Priority=1 >> prime.txt
#Since Drive is persistant, can set DaysOfWork as desired:
!echo DaysOfWork=1 >> prime.txt
#This comes from undoc.txt.
!echo MaxExponents=1 >> prime.txt
!echo RunOnBattery=1 >> prime.txt
#This sets the work preference. In this case it's set to 5, ECM on Mersennes with no known factors
!echo WorkPreference=5 >> prime.txt
!echo [PrimeNet] >> prime.txt
!echo Debug=0 >> prime.txt
!echo ProxyHost= >> prime.txt
if not os.path.exists('local.txt'):
!echo WorkerThreads=1 >> local.txt
!echo CoresPerTest=2 >> local.txt
!echo ComputerID=colab >> local.txt
!echo Memory=8192 during 7:30-23:30 else 8192 >> local.txt
#now run
!chmod +x mprime
!cat prime.txt
!cat local.txt
!./mprime[/CODE] Of course, change the UserID as desired and the work preference as desired. This also allows us to get around the welcome text and start compute right away.
And I can confirm that it works: I have an ECM assignment to me assigned to a computer called colab.

pinhodecarlos 2019-10-06 09:41

Anyway of not having to babysit every 12 hours?


All times are UTC. The time now is 22:02.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.