mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > Cloud Computing

Reply
 
Thread Tools
Old 2020-03-24, 20:56   #936
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

52×163 Posts
Default

Quote:
Originally Posted by petrw1 View Post
https://download.mersenne.ca/CUDAPm1/

Can anyone tell me which one will work at colab?
Gpuowl P-1 works on Colab, all 4 gpu models, with differing -maxAlloc values advisable. In my attempts, CUDAPm1 did not work. https://www.mersenneforum.org/showthread.php?t=24839
If you get CUDAPm1 to work, please let me know how.
kriesel is online now   Reply With Quote
Old 2020-03-24, 22:00   #937
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

7×67 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
Anyone else think that Colab and the rest of google and amazon, M$, etc should through all their spare cycles at folding@home.
Nah, I'm having fun playing with / abusing a few of those cycles...
PhilF is offline   Reply With Quote
Old 2020-03-25, 23:44   #938
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

432010 Posts
Default Can someone explain this and help me out...

I can't see, and therefore, can't execute in colab, the files I uploaded (i.e. gpuowl.exe)from my Desktop (that I saved from mersenneforum).

"My Drive" that I see in a Browser is different than "My Drive" in colab.

Thx

Quote:
Originally Posted by petrw1 View Post
If I go to Google Drive and open directory My Drive.
It shows up on the URL bar as: https://drive.google.com/drive/my-drive
In the folder list as: My Drive
My Drive contains gpuowl.exe.
It has subfolders of
- Colab Notebooks with a file with the source code for each Notebook session I am running and Copy of gpuowl.exe
- ColabWork (A new folder I created) with another copy of gpuowl.exe

HOWEVER in my Python window when I run this code:
Code:
import os.path
from google.colab import drive

if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')

%cd '/content/drive/My Drive'
for x in os.listdir('.'):
    print (x)
I see only these 2 directories:

Code:
/content/drive/My Drive
cudapm1
gpuowl
These were created form WorkBooks I ran recently to install cudaPm1 and GpuOwl.
It does NOT see gpuowl.exe; nor does it see my ColabWork directory that I set up.
petrw1 is offline   Reply With Quote
Old 2020-04-02, 16:56   #939
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

52×163 Posts
Default

Quote:
Originally Posted by petrw1 View Post
I can't see, and therefore, can't execute in colab, the files I uploaded (i.e. gpuowl.exe)from my Desktop (that I saved from mersenneforum).

"My Drive" that I see in a Browser is different than "My Drive" in colab.

Thx

It does NOT see gpuowl.exe; nor does it see my ColabWork directory that I set up.
Been there, early on. As I recall chmod was the solution.

Programming a substitute for ls -l seems unnecessary to me. Try some or all the following. The "top" is there to keep the main process busy so the session is not terminated. I use this in a script to run both cpu task and gpu task and if either GIMPS task terminates, such as by running out of work, the other can continue for the rest of the allowed session length. The 3-minute interval is a compromise that allows a full 12 hour sessionlength; increase interval to ~ 6 minutes if running Colab Pro for 24 hours, or the early part of the output will be lost from the finite buffer.
Code:
!chmod 777 '/content/drive/My Drive/gpuowl'
%cd '/content/drive/My Drive/gpuowl//'
!echo ls -l ./
!ls -l ./
statinfo = os.stat('./worktodo.txt')
if statinfo.st_size < 50:
  print ('WARNING, small file size indicates little or no gpuowl work to do')
!LD_LIBRARY_PATH="lib:${LD_LIBRARY_PATH}" && chmod 777 gpuowl.exe && chmod 777 worktodo.txt
!./gpuowl.exe  >>gpuowllog.txt 2>&1 &
print('gpuowl launched in background')
!top -d 180
There are assorted code examples at https://www.mersenneforum.org/showthread.php?t=24839
including #16 for branching to gpu-model-specific folders with different gpuowl config.txt files, worktodo, etc. Different -maxAlloc are advisable for doing P-1 factoring.
Good luck.

Last fiddled with by kriesel on 2020-04-02 at 17:09
kriesel is online now   Reply With Quote
Old 2020-04-03, 03:35   #940
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

25×33×5 Posts
Default gpuowl.exe not found...

Running this as you requested:

Code:
import os.path
from google.colab import drive
import sys
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')
%cd '/content/drive/My Drive//'

!chmod 777 '/content/drive/My Drive/gpuowl'
%cd '/content/drive/My Drive/gpuowl//'
!echo ls -l ./
!ls -l ./
%cd gpuowl/
!echo ls -l ./
!ls -l ./

statinfo = os.stat('./worktodo.txt')
if statinfo.st_size < 50:
  print ('WARNING, small file size indicates little or no gpuowl work to do')
!LD_LIBRARY_PATH="lib:${LD_LIBRARY_PATH}" && chmod 777 gpuowl.exe && chmod 777 worktodo.txt
!./gpuowl.exe  >>gpuowllog.txt 2>&1 &
print('gpuowl launched in background')
!top -d 180
The run first lists the entire directory.
Note there are 2 gpuowl directories; one within the other.
The inner seems to have everything except the executable.

Code:
/content/drive/My Drive
/content/drive/My Drive/gpuowl
/content/drive/My Drive/gpuowl/gpuowl
/content/drive/My Drive/gpuowl
ls -l ./
total 5
drwx------ 7 root root 4096 Apr  3 03:15 gpuowl
-rw------- 1 root root   51 Apr  3 03:22 gpuowllog.txt
-rwx------ 1 root root   47 Mar 12 21:46 worktodo.txt
/content/drive/My Drive/gpuowl/gpuowl
ls -l ./
total 459
-rw------- 1 root root    192 Mar 12 21:32 AllocTrac.cpp
-rw------- 1 root root   1613 Mar 12 21:32 AllocTrac.h
-rw------- 1 root root   7647 Mar 12 21:32 Args.cpp
-rw------- 1 root root    960 Mar 12 21:32 Args.h
-rw------- 1 root root    386 Mar 12 21:32 Background.h
-rw------- 1 root root   3945 Mar 12 21:32 Blake2.h
-rw------- 1 root root   3344 Mar 12 21:32 Buffer.h
-rw------- 1 root root   5376 Mar 12 21:32 checkpoint.cpp
-rw------- 1 root root   2787 Mar 12 21:32 checkpoint.h
-rw------- 1 root root  13674 Mar 12 21:32 clwrap.cpp
-rw------- 1 root root   3289 Mar 12 21:32 clwrap.h
-rw------- 1 root root    864 Mar 12 21:32 CMakeLists.txt
-rw------- 1 root root    466 Mar 12 21:32 codestyle.md
-rw------- 1 root root   1504 Mar 12 21:32 common.cpp
-rw------- 1 root root    635 Mar 12 21:32 common.h
-rw------- 1 root root    747 Mar 12 21:32 Context.h
-rw------- 1 root root   4990 Mar 12 21:32 conv.py
-rw------- 1 root root   2200 Mar 12 21:32 FFTConfig.cpp
-rw------- 1 root root    560 Mar 12 21:32 FFTConfig.h
-rw------- 1 root root   3897 Mar 12 21:32 File.h
-rw------- 1 root root   1496 Mar 12 21:32 GmpUtil.cpp
-rw------- 1 root root    448 Mar 12 21:32 GmpUtil.h
-rw------- 1 root root  10648 Mar 12 21:33 GmpUtil.o
-rw------- 1 root root  33980 Mar 12 21:32 Gpu.cpp
-rw------- 1 root root   5758 Mar 12 21:32 Gpu.h
-rw------- 1 root root 106232 Mar 12 21:32 gpuowl.cl
-rw------- 1 root root 106281 Mar 12 21:33 gpuowl-wrap.cpp
-rw------- 1 root root     37 Mar 12 21:32 head.txt
-rw------- 1 root root   1632 Mar 12 21:32 kernel.h
-rw------- 1 root root  35141 Mar 12 21:32 LICENSE
-rw------- 1 root root   2491 Mar 12 21:32 main.cpp
-rw------- 1 root root   1358 Mar 12 21:32 Makefile
-rw------- 1 root root   1101 Mar 12 21:32 Makefile-old
-rw------- 1 root root   4231 Mar 12 21:32 Pm1Plan.cpp
-rw------- 1 root root    821 Mar 12 21:32 Pm1Plan.h
-rw------- 1 root root  29672 Mar 12 21:33 Pm1Plan.o
-rw------- 1 root root   2438 Mar 12 21:32 ProofSet.h
-rw------- 1 root root   2726 Mar 12 21:32 Queue.h
-rw------- 1 root root   7971 Mar 12 21:32 README.md
-rw------- 1 root root   1019 Mar 12 21:32 SConstruct
-rw------- 1 root root    203 Mar 12 21:32 shared.h
-rw------- 1 root root    554 Mar 12 21:32 Signal.cpp
-rw------- 1 root root    160 Mar 12 21:32 Signal.h
-rw------- 1 root root   2911 Mar 12 21:32 state.cpp
-rw------- 1 root root    554 Mar 12 21:32 state.h
-rw------- 1 root root     12 Mar 12 21:32 tail.txt
-rw------- 1 root root   5486 Mar 12 21:32 Task.cpp
-rw------- 1 root root   1224 Mar 12 21:32 Task.h
drwx------ 2 root root   4096 Mar 12 21:32 test-pm1
-rw------- 1 root root    445 Mar 12 21:32 timeutil.cpp
-rw------- 1 root root    776 Mar 12 21:32 timeutil.h
-rw------- 1 root root  14181 Mar 12 21:32 tinycl.h
drwx------ 2 root root   4096 Mar 12 21:32 tools
-rw------- 1 root root    257 Mar 12 21:32 typeName.h
-rw------- 1 root root    131 Mar 12 21:32 version.h
-rw------- 1 root root     27 Mar 12 21:33 version.inc
-rw------- 1 root root     27 Apr  3 03:15 version.new
-rw------- 1 root root   4299 Mar 12 21:32 Worktodo.cpp
-rw------- 1 root root    577 Mar 12 21:32 Worktodo.h
-rw------- 1 root root     95 Mar 12 21:39 worktodo.txt
chmod: cannot access 'gpuowl.exe': No such file or directory
gpuowl launched in background
petrw1 is offline   Reply With Quote
Old 2020-04-14, 13:59   #941
lycorn
 
lycorn's Avatar
 
Sep 2002
Oeiras, Portugal

56416 Posts
Default The Google definition of "idle"

I have been running several Colab notebooks (5 users in total) with no major issues.
For the moment I am not using the GPU_72 notebook, as I am doing some sort of "custom" work. For each user I fire up separated GPU and CPU instances. For the GPUs, I use to give a break of at least 48 hours between the end of a session and the beginning of the next one.
The CPU instances always run smoothly for 12 hours, and as soon as I reconnect they resume work for another 12. Whenever I mix GPU instances, though, things get a bit more funny. The GPU running times vary from ~ 5-6 hours to a maximum of 10, well in line with most users here. Now, the interesting point is that very frequently, when the GPU instances disconnect before the full 10 hours have elapsed, I get a message stating that the disconnection was due to inactivity. IMHO, if there is a program that doesn´t even give the VM a chance to be idle is mfaktc. More recently I have noticed that when the GPU disconnects the same happens to the CPU, and very often I also get the same message on the CPU instance window. Again, running mprime doesn´t fit in the definition of "idle", does it?
This behaviour isn´t totally consistent, though. Sometimes the GPU disconnects and the CPU keeps running (both instances were started at the same time).
I´m quite sure someone else must have already noticed this behaviour. Will someone care to comment on it, hinting at the probable cause for this "inactivity"?
Thanks in advance.

Last fiddled with by lycorn on 2020-04-14 at 14:01
lycorn is offline   Reply With Quote
Old 2020-04-14, 14:07   #942
kuratkull
 
kuratkull's Avatar
 
Mar 2007
Estonia

2·67 Posts
Default

Indeed it randomly notifies about the lapsing of the 10-12 hours by saying my session was inactive, though of course it wasn't. I think it's just a mistake on their part. I have an automated setup that tries to restart sessions, and I am left with all the error screens it produced in the meanwhile - I see "no backend available", "session timed out", "too many instances", all while trying to get an instance.
Maybe you haven't abused the notebooks long enough yet? I got back-to-back instances for my first days (less than a week IIRC), after that it put on much more strict limits.
kuratkull is offline   Reply With Quote
Old 2020-04-14, 14:16   #943
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2·4,513 Posts
Default

Quote:
Originally Posted by lycorn View Post
Will someone care to comment on it, hinting at the probable cause for this "inactivity"?
Not sure if this is directly related to your workflow/use-case... But I've found it's good to click on the "Connect to hosted runtime" from the drop-down menu in the upper-right-hand side, to the immediate right of the CPU / Memory scale graphics. The Notebook then continues for the ~7 or so hours (for a GPU instance), with the graphic replaced with a "... Busy" message.

I've found that if I don't do this, the Notebook will disconnect after almost exactly an hour. If I happen to notice this within five minutes or so it will reconnect and continue on (again showing the Busy message). If I miss this window the Notebook will have been killed, and I'll have to rerun my Sections again.

I hope that helps.
chalsall is online now   Reply With Quote
Old 2020-04-14, 14:37   #944
lycorn
 
lycorn's Avatar
 
Sep 2002
Oeiras, Portugal

22·3·5·23 Posts
Default

Thank you both for the answers.

@kuratkull: Indeed, whenever I try to restart a session after receiving the "inactivity" message, I get the "no backend available", and an indication that the usage limits were exceeded.. Now the thing is I am doing things in a way as not to abuse the GPUs. As I said, I am giving a > 48 hours break between sessions. Oh well, Google gods got grumpy. Let´s hope it´s not the SARS-COV-2...

@chalsall: You mean clicking there when I get disconnected, as a way to resume, or to start the session? I sometimes start the session clicking there, and it doesn´t change things, but AFAIR I have never done it to try and resume a disconnected session; I´ll give it a go next time. One more thing: does it happen to you to get only the GPU instance disconnected? And have you ever received this "inactivity" message?
lycorn is offline   Reply With Quote
Old 2020-04-14, 14:52   #945
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2·4,513 Posts
Default

Quote:
Originally Posted by lycorn View Post
You mean clicking there when I get disconnected, as a way to resume, or to start the session?
No... What I do in the morning is go to each of my account's UIs, and press Cntl-F9. This runs all the Sections of the Notebook in order (I often have a reverse SSH Tunnel section before the GPU72_TF section, for development/debugging purposes, or just to record telemetry statistics).

If I get a GPU, I then do few "Factory Reset" cycles until I'll get a T4 or a P100, and I then do the "Connect to hosted runtime" thing, which basically tells Colab "This is going to run for quite a while; don't kill the session just because a human doesn't interact".

For those sessions where I only get a CPU, I do the same Connect thing to secure the CPU session. I then come back at the expected GPU availability window opening time, and do the Connect thing again. If the GPU isn't yet available, it will say "No GPU backend available", so the CPU just continues. If the GPU is available the session will reconnect without the warning, and I do the coveted GPU hunt.

Rinse and repeat...

Edit: No, I've never received the inactivity message. Although I think Kracker once reported his getting a pop-up saying that the GPU wasn't being used, even though he was running the GPU72_TF Notebook, and the GPU was at 100%.

Last fiddled with by chalsall on 2020-04-14 at 14:54
chalsall is online now   Reply With Quote
Old 2020-04-14, 15:16   #946
lycorn
 
lycorn's Avatar
 
Sep 2002
Oeiras, Portugal

138010 Posts
Default

Ok, I see.
Thanks again for your time. Very useful info.
lycorn is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Notebook enzocreti enzocreti 0 2019-02-15 08:20
Educational Qualifications & Work Experience Raman Lounge 6 2009-02-19 06:59
Ancient notebook won't run Prime95 kwstone Hardware 4 2004-01-15 00:11
v23.5 seems to have activated speedstepping in my notebook?! E_tron Software 9 2003-08-19 08:59
What is your primary educational background? eepiccolo Lounge 8 2002-12-28 21:25

All times are UTC. The time now is 17:52.

Sat Jul 11 17:52:51 UTC 2020 up 108 days, 15:25, 0 users, load averages: 1.86, 1.75, 1.51

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.