mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > Cloud Computing

Reply
Thread Tools
Old 2019-10-05, 14:11   #199
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

230668 Posts
Default

Quote:
Originally Posted by ric View Post
Are we the victims of Colab's success in crunching gpu72's allotted work?
Nope... You're the victim of yet another SPE... Sorry...

I've started the process of bringing in additional work from 73 to 74 "bits", in 97M.

This is actually a bug -- the system currently doesn't "gracefully fail" in cases like this; it should assign a different type of work range when the requested range is exhausted.
chalsall is online now   Reply With Quote
Old 2019-10-05, 14:16   #200
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2×67×73 Posts
Default

Quote:
Originally Posted by EdH View Post
I see some things scroll instead of refreshing in place, such as top. This should let me play for a while. . .
Enjoy. It really is pretty cool!

And, yeah... The console is a scroll-type only; no full-screen refreshing, etc. In fact, even the "LF" trick to return the cursor to the beginning of the line to then overwrite doesn't work.

Actually, a thought just came to me... Although incoming (to the VM) TCP connections are firewalled, outgoing is fully allowed. And, thus, it should be possible to launch an "reverse tunnel" SSH connection to a server somewhere, and then be able to SSH into the VM and have a fully interactive shell.

I'm going to have to try that; it should work.
chalsall is online now   Reply With Quote
Old 2019-10-05, 14:20   #201
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

978210 Posts
Default

Quote:
Originally Posted by axn View Post
Couple of questions:
1) Should I keep the browser tab open while things are running? What happens if I close the browser?
Yes. If you close the browser the VM detects this, and shuts down.

Note that on Kaggle you can "Commit" a Section (up to twice for GPU instances), which then runs in the background (for up to 9 hours). You can close your browser and the batch jobs continue to run.

Quote:
Originally Posted by axn View Post
2) Tesla K80 is a 2x GPU in a single package. Anyone tried to run a second copy using device 1, to effectively double the throughput?
Doesn't work.

Each VM get's one of the two GPUs on the K80. Running "nvidia-smi" within the VM shows this.

Just FYI, this is the same over on GCE, AWS/EC2, etc.
chalsall is online now   Reply With Quote
Old 2019-10-05, 14:43   #202
axn
 
axn's Avatar
 
Jun 2003

13DF16 Posts
Default

Quote:
Originally Posted by chalsall View Post
Yes. If you close the browser the VM detects this, and shuts down.

Note that on Kaggle you can "Commit" a Section (up to twice for GPU instances), which then runs in the background (for up to 9 hours). You can close your browser and the batch jobs continue to run.



Doesn't work.

Each VM get's one of the two GPUs on the K80. Running "nvidia-smi" within the VM shows this.

Just FYI, this is the same over on GCE, AWS/EC2, etc.
Thanks for the info!
axn is offline   Reply With Quote
Old 2019-10-05, 15:05   #203
ric
 
ric's Avatar
 
Jul 2004
Milan, Ita

2·7·13 Posts
Default

A few comments "from the road" after using Colab for about 3 weeks (and, as usual, YMMV):
  • chalsall's notebook, integrated with gpu72, rocks
  • Colab in itself has a huge potential, not only for TF/PM1 crunching, as mentioned in some comments above
  • during these weeks, I've been served almost exclusively instances based upon Tesla K80, however I was able to run two/three instances in parallel (same IP, same MAC address), using different gmail IDs (all of which are non-professional ones)
  • Colab's instances are easily available on this side of the pond (=Europe) during nighttime (after 20:00 UTC), less so during daytime: there's likely some profiling and prioritizing made upon requests
  • once assigned a GPU, this remains available mostly for the full 12 hours period. In rare cases, instances were terminated earlier, but this happened mostly during daytime
  • FWIW, in this period I got some 10k GHzDays worth of TF: thanks, Google!
  • notebooks are more suited for high bit-levels: I've run some TF work around 70-72 bits, both in Firefox and Chrome, but memory consumption - driven by progression lines, despite having set PrintMode=1 in mfaktc.ini - grows fast, in the order of multiple GBs, which leads to the local machine swapping and becoming unresponsive
  • instances remain allocated and running even if the browser gets minimized or moved to another workspace; when the tab is closed, the instance is usually terminated within 15 minutes or so
  • Colab is effective even for CUDAPM1 work, even if GHzDays yield is around two orders of magnitude less than for TF. FWIW, a P-1 run on a 52M exponent requires about the same time of a Xeon Silver CPU (at parity of B1/B2 bounds)

These comments are from the top of my mind: I'll eventually update them, should something else relevant come in.
ric is offline   Reply With Quote
Old 2019-10-05, 16:21   #204
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

263616 Posts
Default So close, but no cigar...

Quote:
Originally Posted by chalsall View Post
Actually, a thought just came to me... Although incoming (to the VM) TCP connections are firewalled, outgoing is fully allowed. And, thus, it should be possible to launch an "reverse tunnel" SSH connection to a server somewhere, and then be able to SSH into the VM and have a fully interactive shell.
I just ***had*** to try this. Got most of the way, but...
Code:
%cd ~
!mkdir .ssh
!chmod go-rwx .ssh
!ssh-keygen -t rsa -f .ssh/id_rsa -N ""
!ls -lah
!echo
!ls -lah .ssh
!echo
!cat .ssh/id_rsa.pub
Run, and then copy-and-paste the public key over to a machine with a public-facing IP (and SSH port). I use unprivileged accounts for my reverse tunnels, with non-standard ports. Be sure to get the permissions correct for the "authorized_keys" file(s).

Then, run this (with ports, user and server changed to your particular configuration:
Code:
!ssh -N -T -p [SSH_PORT] -R[TUNNEL_PORT]:localhost:22 tunnel_kaggle@gpu72.com -o StrictHostKeyChecking=no -i .ssh/id_rsa
...and a reverse tunnel should be brought up.

From a SSH session on GPU72, I then tried a "ssh -p [TUNNEL_PORT] root@localhost" and the request failed, and the tunnel dropped.

An SSH server isn't running on the instance...

Tried installing and then starting the SSH server, and got a warning about "Failed to connect to bus: No such file or directory".

Turns out because these instances are running in Docker (or some similar) VMs, the syslog functionality isn't available and so the server won't launch.

I don't have time right now to work this further, but it should be possible to bring in as a "payload" a custom compiled OpenSSH package which will be able to launch, and listen to (local) port 22. Remote clients would then be able to attach through the reverse tunnel (which does already work).

Having a "real" interactive shell into these VMs would make payload development a whole lot easier!!!
chalsall is online now   Reply With Quote
Old 2019-10-05, 18:24   #205
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2·67·73 Posts
Default OH MY GOD!!!

Quote:
Originally Posted by chalsall View Post
Having a "real" interactive shell into these VMs would make payload development a whole lot easier!!!
I actually got this to work!!!

I'll definitely be writing this one up! Probably easiest to do a Notebook on Github -- there are a few steps...

But this is what you see when you log into a Colab instance as root:

Code:
[chalsall@gpu72 ~]$ ssh -p [TUNNEL_PORT] root@localhost
root@localhost's password: 
debug1: permanently_set_uid: 0/0
Environment:
  USER=root
  LOGNAME=root
  HOME=/root
  PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin
  MAIL=/var/mail/root
  SHELL=/bin/bash
  TERM=xterm-256color
  SSH_CLIENT=127.0.0.1 44332 22
  SSH_CONNECTION=127.0.0.1 44332 127.0.0.1 22
  SSH_TTY=/dev/pts/1
root@61ae30b3834b:~# who
root     pts/1        Oct  5 18:19 (127.0.0.1)
root@61ae30b3834b:~# uptime
 18:19:12 up 30 min,  1 user,  load average: 0.02, 0.06, 0.09
root@61ae30b3834b:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
overlay          49G   25G   22G  54% /
tmpfs            64M     0   64M   0% /dev
tmpfs           6.4G     0  6.4G   0% /sys/fs/cgroup
tmpfs           6.4G  8.0K  6.4G   1% /var/colab
/dev/sda1        55G   27G   29G  48% /etc/hosts
shm             6.0G  4.0K  6.0G   1% /dev/shm
tmpfs           6.4G     0  6.4G   0% /proc/acpi
tmpfs           6.4G     0  6.4G   0% /proc/scsi
tmpfs           6.4G     0  6.4G   0% /sys/firmware
root@61ae30b3834b:~#
Edit: I almost feel guilty letting the CPU sit at 0% usage...

Code:
top - 18:32:34 up 44 min,  1 user,  load average: 0.00, 0.00, 0.02
Tasks:   9 total,   1 running,   8 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.5 us,  0.3 sy,  0.0 ni, 99.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 13335192 total, 10824260 free,   535504 used,  1975428 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 12525424 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                         
    119 root      20   0  635844 154160  61004 S   0.7  1.2   0:11.39 python3                         
      1 root      20   0   39196   6476   4984 S   0.0  0.0   0:00.04 run.sh                          
      9 root      20   0  686740  55104  24888 S   0.0  0.4   0:04.09 node                            
     24 root      20   0  403932  99240  25816 S   0.0  0.7   0:07.31 jupyter-noteboo                 
    110 root      20   0   35888   4760   3652 S   0.0  0.0   0:00.20 tail                            
  10710 root      20   0   56296   8776   6924 S   0.0  0.1   0:00.03 ssh                             
  10885 root      20   0   54784   8932   6984 S   0.0  0.1   0:00.03 sshd                            
  10893 root      20   0   18508   3524   3080 S   0.0  0.0   0:00.00 bash                            
  11107 root      20   0   38720   3264   2800 R   0.0  0.0   0:00.03 top
Happy happy dance!!!

Last fiddled with by chalsall on 2019-10-05 at 18:34
chalsall is online now   Reply With Quote
Old 2019-10-06, 03:04   #206
axn
 
axn's Avatar
 
Jun 2003

117378 Posts
Default

Another question:

How do you connect your google drive in kaggle?

Last fiddled with by axn on 2019-10-06 at 03:07
axn is offline   Reply With Quote
Old 2019-10-06, 03:15   #207
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

19·397 Posts
Default

Thanks for the gpu72 colab scripts -- works like a champ.

Would it be possible to also install mprime and run ECM jobs from Primenet?
Prime95 is offline   Reply With Quote
Old 2019-10-06, 04:46   #208
Dylan14
 
Dylan14's Avatar
 
"Dylan"
Mar 2017

22×5×29 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Thanks for the gpu72 colab scripts -- works like a champ.

Would it be possible to also install mprime and run ECM jobs from Primenet?

Indeed it is possible:


Code:
#Notebook to run mprime on a Colab thing
import os.path
from google.colab import drive
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')

%cd 'content/drive/My Drive/'
if not os.path.exists('/content/drive/My Drive/mprime/'):
  !mkdir mprime
%cd '/content/drive/My Drive/mprime//'
#fetch mprime executable if we don't have it
if not os.path.exists('mprime'):
  !wget http://www.mersenne.org/ftp_root/gimps/p95v298b6.linux64.tar.gz
  !tar -zxvf p95v298b6.linux64.tar.gz

#!ls
#run mprime
#first, create local.txt and prime.txt if they don't already exist
if not os.path.exists('prime.txt'):
  !echo V24OptionsConverted=1 > prime.txt
  !echo WGUID_version=2 >> prime.txt
  !echo StressTester=0 >> prime.txt
  !echo UsePrimenet=1 >> prime.txt
  !echo DialUp=0 >> prime.txt
  #change the user ID to your own or use ANOYNOMUS to work anoynomusly
  !echo V5UserID=Dylan14 >> prime.txt
  !echo Priority=1 >> prime.txt
  #Since Drive is persistant, can set DaysOfWork as desired:
  !echo DaysOfWork=1 >> prime.txt
  #This comes from undoc.txt.
  !echo MaxExponents=1 >> prime.txt
  !echo RunOnBattery=1 >> prime.txt
  #This sets the work preference. In this case it's set to 5, ECM on Mersennes with no known factors
  !echo WorkPreference=5 >> prime.txt
  !echo [PrimeNet] >> prime.txt
  !echo Debug=0 >> prime.txt
  !echo ProxyHost= >> prime.txt
if not os.path.exists('local.txt'):
  !echo WorkerThreads=1 >> local.txt
  !echo CoresPerTest=2 >> local.txt
  !echo ComputerID=colab >> local.txt
  !echo Memory=8192 during 7:30-23:30 else 8192 >> local.txt
#now run
!chmod +x mprime
!cat prime.txt
!cat local.txt
!./mprime
Of course, change the UserID as desired and the work preference as desired. This also allows us to get around the welcome text and start compute right away.
And I can confirm that it works: I have an ECM assignment to me assigned to a computer called colab.
Dylan14 is offline   Reply With Quote
Old 2019-10-06, 09:41   #209
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

3·17·97 Posts
Default

Anyway of not having to babysit every 12 hours?
pinhodecarlos is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Alternatives to Google Colab kriesel Cloud Computing 11 2020-01-14 18:45
Notebook enzocreti enzocreti 0 2019-02-15 08:20
Computer Diet causes Machine Check Exception -- need heuristics help Christenson Hardware 32 2011-12-25 08:17
Computer diet - Need help garo Hardware 41 2011-10-06 04:06
Workunit diet ? dsouza123 NFSNET Discussion 5 2004-02-27 00:42

All times are UTC. The time now is 23:27.


Fri Aug 6 23:27:18 UTC 2021 up 14 days, 17:56, 1 user, load averages: 3.77, 4.01, 4.03

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.