mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Cloud Computing (https://www.mersenneforum.org/forumdisplay.php?f=134)
-   -   Google Diet Colab Notebook (https://www.mersenneforum.org/showthread.php?t=24646)

Chuck 2019-10-16 02:28

[QUOTE=mnd9;528046]Sorry to repost about this issue -- but my main google account is having serious issues using Colab at this point. I can no longer get a session longer than a few hours, and as of this AM, I keep getting repeatedly disconnected after just a couple minutes. It's not saying no GPU is available, it's allowing me to reconnect then simply dropping me after 2-3 minutes.

Is there anything to do regarding this? All I've been doing is running mfaktc.

I have a secondary account that is running for the past 4-5 days reconnecting each 12 hours continuously with no issues.[/QUOTE]

I continue to have success using two Google accounts and alternating between them each 12 hours (only one running at a time). Would you want to consider additional accounts if you want to run more than one at a time?

Dylan14 2019-10-16 02:28

[QUOTE=Dylan14;527931]Actually, what I stated here was wrong, as using the newest available version from [URL]http://ftp.us.debian.org/debian/dists/testing/[/URL] (7.16.1) did not rectify the issue. So, to see what's going on, I switched back to the old code (shown [URL="https://mersenneforum.org/showpost.php?p=525527&postcount=39"]here[/URL]), which shows all the output. By doing so, the issue appears to be with the Internet connection:


[CODE]13-Oct-2019 18:31:57 [---] Project communication failed: attempting access to reference site

13-Oct-2019 18:32:00 [---] BOINC can't access Internet - check network connection or proxy configuration.[/CODE] But it is clear that I am able to connect to the Internet and ping on Kaggle:


[CODE]--- 8.8.8.8 ping statistics ---

84 packets transmitted, 84 received, 0% packet loss, time 84956ms

rtt min/avg/max/mdev = 0.294/0.423/0.989/0.127 ms[/CODE]so this would suggest Kaggle is using a proxy. I have no idea what the settings would be though.[/QUOTE]


Returning back to this, there is no proxy on Kaggle. The solution was to switch from https:// to [url]http://.[/url] Still find it weird that PrimeGrid and NFS@Home still use http for connecting to the server with the BOINC client.

chalsall 2019-10-16 13:54

[QUOTE=Dylan14;528116]So to correct this, I used the getpass command for the account key, which will hide the key:[/QUOTE]

Very nice! Thank you!

This also works on Kaggle, so I can use it in my Tunnel scripts, to hide the root password.

ATH 2019-10-16 20:26

Seems to work fine running both mprime and mfaktc at the same time on Colab.

This assumes you have "mfaktc" folder with mfaktc_colab.exe on google drive and "mprime" folder with all the correct settings in prime.txt and local.txt.

If you do not want output files you can change
!./mprime -d >> outputmprime.txt 2>&1 &
/usr/local/bin/mfaktc_colab.exe >> outputmfaktc.txt

to

!./mprime -d > /dev/null 2>&1 &
/usr/local/bin/mfaktc_colab.exe > /dev/null


It seems to be only 1 CPU core probably 2 threads, getting 2.88 ms/iter on a 9.65M PRP CF, so just under 8 hours for that. Might be better to do PRP CF DC or ECM curves on it.


[CODE]#@title
import os.path
from google.colab import drive

if not os.path.exists('/content/drive/My Drive'):
drive.mount('/content/drive')

%cd '/content/drive/My Drive'
!chmod 700 mprime
%cd '/content/drive/My Drive/mprime'
!chmod 700 mprime
!./mprime -d >> outputmprime.txt 2>&1 &

%cd '/content/drive/My Drive/mfaktc/'

!cp 'mfaktc_colab.exe' /usr/local/bin/
!chmod 755 '/usr/local/bin/mfaktc_colab.exe'

!cd '.' && LD_LIBRARY_PATH="lib:${LD_LIBRARY_PATH}" /usr/local/bin/mfaktc_colab.exe >> outputmfaktc.txt

!cat 'results.txt'[/CODE]

ATH 2019-10-17 02:03

I made a simple script running mfaktc on kaggle, but how do you get the results out?

I planned to run mfaktc as a background process with:
!./mfaktc > /dev/null 2>&1 &

and then run a loop which uploaded the results every 15min or 30min, but kaggle does not allow background processes.

Dylan14 2019-10-17 02:59

2 Attachment(s)
After a bit of trial and error (plus some work in automating the checking of the logs for errors), I have a script that can be used to compile and run Mlucas, without needing to tunnel in. The script is below:


[CODE]#code to compile and run Ernst Mayer's mlucas
import os
from google.colab import drive
if not os.path.exists('/content/drive/My Drive'):
drive.mount('/content/drive')

!apt-get install gcc-8
%cd 'content/drive/My Drive/'
if not os.path.exists('/content/drive/My Drive/mlucas/'):
!mkdir mprime
%cd '/content/drive/My Drive/mlucas//'
if not os.path.exists('mlucas_v18.txz'):
!wget https://www.mersenneforum.org/mayer/src/C/mlucas_v18.txz
!tar -xJf mlucas_v18.txz

#switch to the mlucas source directory
#check that we have both executables (one for avx2, and one for avx512)
if not os.path.exists('/content/drive/My Drive/mlucas/Mlucasavx512') or not os.path.exists('/content/drive/My Drive/mlucas/Mlucasavx2'):
%cd '/content/drive/My Drive/mlucas/mlucas_v18/src'
#we build mlucas twice. Once with avx2, and once with avx512.
#first, the avx512 build:
!gcc-8 -c -O3 -DUSE_AVX512 -march=skylake-avx512 -DUSE_THREADS *.c >& build1.log
!grep error build1.log > erroravx512.log
if os.stat("erroravx512.log").st_size == 0: #grep came up empty
!gcc-8 -o Mlucasavx512 *.o -lm -lpthread -lrt
else: #something went wrong
!echo "Error in compilation. Check build.log and tell either Dylan14 (if you think Dylan made a mistake) or ewmayer."
exit()
#move Mlucasavx512 up a directory and clean up the src directory
!mv Mlucasavx512 ..
!rm *.o
#now build the avx2 executable
!gcc-8 -c -O3 -DUSE_AVX2 -mavx2 -DUSE_THREADS *.c >& build2.log
!grep error build2.log > erroravx2.log
if os.stat("erroravx2.log").st_size == 0: #grep came up empty
!gcc-8 -o Mlucasavx2 *.o -lm -lpthread -lrt
else: #something went wrong
!echo "Error in compilation. Check build.log and tell either Dylan14 (if you think Dylan made a mistake) or ewmayer."
exit()
#move Mlucasavx2 up a directory and clean up the src directory
!mv Mlucasavx2 ..
!rm *.o

#now we check the processor we have
!echo "Checking processor so we can choose the right executable..."
%cd '/content/drive/My Drive/mlucas/mlucas_v18/'
#by default the permissions are not correct to run the mlucas
!chmod 755 Mlucasavx512
!chmod 755 Mlucasavx2
!grep avx512 /proc/cpuinfo > avx512.txt
!grep avx2 /proc/cpuinfo > avx2.txt
if os.stat("avx512.txt").st_size != 0: #avx512 is available
!echo "AVX512 detected..."
#test executable
!./Mlucasavx512 -fftlen 192 -iters 100 -radset 0
#performance tune with 2 threads (takes about 10 minutes)
!./Mlucasavx512 -s m -cpu 0:1 >& selftest.log
#to do: add code for managing worktodo.txt
#then run Mlucas
#!./Mlucasavx512
elif os.stat("avx2.txt").st_size != 0: #avx2 is available
!echo "AVX2 detected..."
#test executable
!./Mlucasavx2 -fftlen 192 -iters 100 -radset 0
#performance tune with 2 threads (takes about 10 minutes)
!./Mlucasavx2 -s m -cpu 0:1 >& selftest.log
#to do: add code for managing worktodo.txt
#then run Mlucas
#!./Mlucasavx2
else: #we have some other processor, which I think is fairly unlikely
!echo "Strange. We don't have avx2 or avx512."
exit()

[/CODE]Attached also is what the selftest.log and mlucas.cfg files should look like. At the current first time wavefront (length 5120 kdoubles, or fft length 5M) the time for one iteration is about 50 ms/iter on the Colab with AVX512 instructions.

chalsall 2019-10-17 03:00

[QUOTE=ATH;528191]...but kaggle does not allow background processes.[/QUOTE]

Sure it does. Heck, I've have cron jobs running on it (hint: place an appropriately created crontab file in /etc/cron.d/ after installing and starting cron). Something like:
[CODE]
root@Colab_MAH:~/prime# ls -la /etc/cron.d/
total 20
drwxr-xr-x 2 root root 4096 Oct 17 02:58 .
drwxr-xr-x 1 root root 4096 Oct 16 22:45 ..
-rw-r--r-- 1 root root 102 Nov 16 2017 .placeholder
-rw-r--r-- 1 root root 112 Oct 17 02:58 iroot

root@Colab_MAH:~/prime# cat /etc/cron.d/iroot
# Run system stats collection script every minute.
* * * * * root /root/bin/telemetry.pl >/dev/null 2>/dev/null
[/CODE]

axn 2019-10-17 03:24

[QUOTE=ATH;528191]I made a simple script running mfaktc on kaggle, but how do you get the results out?

I planned to run mfaktc as a background process with:
!./mfaktc > /dev/null 2>&1 &

and then run a loop which uploaded the results every 15min or 30min, but kaggle does not allow background processes.[/QUOTE]

When you commit, it will run the job as a batch, and there will be a Version. Later on, you can come and look at the version, and you'll get a Output link, which will list all the files in the kaggle base directory, including results.txt

ATH 2019-10-17 03:31

Ok thanks. I came up with a loop that uses "head" command to create a worktodo.txt from another file with x number of lines and after finishing those it uploads results and creates another worktodo with "head".

I will look into the reverse SSH tomorrow maybe.

chalsall 2019-10-17 17:46

[QUOTE=Dylan14;528193]The script is below:[/QUOTE]

If I may please say, excellent work sir! :tu:

If I may please also say, however, that I'm always amused when people say that Perl is difficult to read... :wink:

kriesel 2019-10-17 18:52

[QUOTE=chalsall;528238]If I may please say, excellent work sir! :tu:

If I may please also say, however, that I'm always amused when people say that Perl is difficult to read... :wink:[/QUOTE]
1) Yes!
2) Depends on the author / style. Except pattern matches can be um, puzzling sometimes, in my opinion.

And the rapid progress of this collaborative effort has been something to see.


All times are UTC. The time now is 22:51.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.