![]() |
|
|
#936 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
22·5·271 Posts |
Quote:
If you get CUDAPm1 to work, please let me know how. |
|
|
|
|
|
|
#937 |
|
Feb 2005
Colorado
5×131 Posts |
|
|
|
|
|
|
#938 | |
|
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
13×192 Posts |
I can't see, and therefore, can't execute in colab, the files I uploaded (i.e. gpuowl.exe)from my Desktop (that I saved from mersenneforum).
"My Drive" that I see in a Browser is different than "My Drive" in colab. Thx Quote:
|
|
|
|
|
|
|
#939 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
22·5·271 Posts |
Quote:
Programming a substitute for ls -l seems unnecessary to me. Try some or all the following. The "top" is there to keep the main process busy so the session is not terminated. I use this in a script to run both cpu task and gpu task and if either GIMPS task terminates, such as by running out of work, the other can continue for the rest of the allowed session length. The 3-minute interval is a compromise that allows a full 12 hour sessionlength; increase interval to ~ 6 minutes if running Colab Pro for 24 hours, or the early part of the output will be lost from the finite buffer. Code:
!chmod 777 '/content/drive/My Drive/gpuowl'
%cd '/content/drive/My Drive/gpuowl//'
!echo ls -l ./
!ls -l ./
statinfo = os.stat('./worktodo.txt')
if statinfo.st_size < 50:
print ('WARNING, small file size indicates little or no gpuowl work to do')
!LD_LIBRARY_PATH="lib:${LD_LIBRARY_PATH}" && chmod 777 gpuowl.exe && chmod 777 worktodo.txt
!./gpuowl.exe >>gpuowllog.txt 2>&1 &
print('gpuowl launched in background')
!top -d 180
including #16 for branching to gpu-model-specific folders with different gpuowl config.txt files, worktodo, etc. Different -maxAlloc are advisable for doing P-1 factoring. Good luck. Last fiddled with by kriesel on 2020-04-02 at 17:09 |
|
|
|
|
|
|
#940 |
|
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
13×192 Posts |
Running this as you requested:
Code:
import os.path
from google.colab import drive
import sys
if not os.path.exists('/content/drive/My Drive'):
drive.mount('/content/drive')
%cd '/content/drive/My Drive//'
!chmod 777 '/content/drive/My Drive/gpuowl'
%cd '/content/drive/My Drive/gpuowl//'
!echo ls -l ./
!ls -l ./
%cd gpuowl/
!echo ls -l ./
!ls -l ./
statinfo = os.stat('./worktodo.txt')
if statinfo.st_size < 50:
print ('WARNING, small file size indicates little or no gpuowl work to do')
!LD_LIBRARY_PATH="lib:${LD_LIBRARY_PATH}" && chmod 777 gpuowl.exe && chmod 777 worktodo.txt
!./gpuowl.exe >>gpuowllog.txt 2>&1 &
print('gpuowl launched in background')
!top -d 180
Note there are 2 gpuowl directories; one within the other. The inner seems to have everything except the executable. Code:
/content/drive/My Drive /content/drive/My Drive/gpuowl /content/drive/My Drive/gpuowl/gpuowl /content/drive/My Drive/gpuowl ls -l ./ total 5 drwx------ 7 root root 4096 Apr 3 03:15 gpuowl -rw------- 1 root root 51 Apr 3 03:22 gpuowllog.txt -rwx------ 1 root root 47 Mar 12 21:46 worktodo.txt /content/drive/My Drive/gpuowl/gpuowl ls -l ./ total 459 -rw------- 1 root root 192 Mar 12 21:32 AllocTrac.cpp -rw------- 1 root root 1613 Mar 12 21:32 AllocTrac.h -rw------- 1 root root 7647 Mar 12 21:32 Args.cpp -rw------- 1 root root 960 Mar 12 21:32 Args.h -rw------- 1 root root 386 Mar 12 21:32 Background.h -rw------- 1 root root 3945 Mar 12 21:32 Blake2.h -rw------- 1 root root 3344 Mar 12 21:32 Buffer.h -rw------- 1 root root 5376 Mar 12 21:32 checkpoint.cpp -rw------- 1 root root 2787 Mar 12 21:32 checkpoint.h -rw------- 1 root root 13674 Mar 12 21:32 clwrap.cpp -rw------- 1 root root 3289 Mar 12 21:32 clwrap.h -rw------- 1 root root 864 Mar 12 21:32 CMakeLists.txt -rw------- 1 root root 466 Mar 12 21:32 codestyle.md -rw------- 1 root root 1504 Mar 12 21:32 common.cpp -rw------- 1 root root 635 Mar 12 21:32 common.h -rw------- 1 root root 747 Mar 12 21:32 Context.h -rw------- 1 root root 4990 Mar 12 21:32 conv.py -rw------- 1 root root 2200 Mar 12 21:32 FFTConfig.cpp -rw------- 1 root root 560 Mar 12 21:32 FFTConfig.h -rw------- 1 root root 3897 Mar 12 21:32 File.h -rw------- 1 root root 1496 Mar 12 21:32 GmpUtil.cpp -rw------- 1 root root 448 Mar 12 21:32 GmpUtil.h -rw------- 1 root root 10648 Mar 12 21:33 GmpUtil.o -rw------- 1 root root 33980 Mar 12 21:32 Gpu.cpp -rw------- 1 root root 5758 Mar 12 21:32 Gpu.h -rw------- 1 root root 106232 Mar 12 21:32 gpuowl.cl -rw------- 1 root root 106281 Mar 12 21:33 gpuowl-wrap.cpp -rw------- 1 root root 37 Mar 12 21:32 head.txt -rw------- 1 root root 1632 Mar 12 21:32 kernel.h -rw------- 1 root root 35141 Mar 12 21:32 LICENSE -rw------- 1 root root 2491 Mar 12 21:32 main.cpp -rw------- 1 root root 1358 Mar 12 21:32 Makefile -rw------- 1 root root 1101 Mar 12 21:32 Makefile-old -rw------- 1 root root 4231 Mar 12 21:32 Pm1Plan.cpp -rw------- 1 root root 821 Mar 12 21:32 Pm1Plan.h -rw------- 1 root root 29672 Mar 12 21:33 Pm1Plan.o -rw------- 1 root root 2438 Mar 12 21:32 ProofSet.h -rw------- 1 root root 2726 Mar 12 21:32 Queue.h -rw------- 1 root root 7971 Mar 12 21:32 README.md -rw------- 1 root root 1019 Mar 12 21:32 SConstruct -rw------- 1 root root 203 Mar 12 21:32 shared.h -rw------- 1 root root 554 Mar 12 21:32 Signal.cpp -rw------- 1 root root 160 Mar 12 21:32 Signal.h -rw------- 1 root root 2911 Mar 12 21:32 state.cpp -rw------- 1 root root 554 Mar 12 21:32 state.h -rw------- 1 root root 12 Mar 12 21:32 tail.txt -rw------- 1 root root 5486 Mar 12 21:32 Task.cpp -rw------- 1 root root 1224 Mar 12 21:32 Task.h drwx------ 2 root root 4096 Mar 12 21:32 test-pm1 -rw------- 1 root root 445 Mar 12 21:32 timeutil.cpp -rw------- 1 root root 776 Mar 12 21:32 timeutil.h -rw------- 1 root root 14181 Mar 12 21:32 tinycl.h drwx------ 2 root root 4096 Mar 12 21:32 tools -rw------- 1 root root 257 Mar 12 21:32 typeName.h -rw------- 1 root root 131 Mar 12 21:32 version.h -rw------- 1 root root 27 Mar 12 21:33 version.inc -rw------- 1 root root 27 Apr 3 03:15 version.new -rw------- 1 root root 4299 Mar 12 21:32 Worktodo.cpp -rw------- 1 root root 577 Mar 12 21:32 Worktodo.h -rw------- 1 root root 95 Mar 12 21:39 worktodo.txt chmod: cannot access 'gpuowl.exe': No such file or directory gpuowl launched in background |
|
|
|
|
|
#941 |
|
"GIMFS"
Sep 2002
Oeiras, Portugal
5C116 Posts |
I have been running several Colab notebooks (5 users in total) with no major issues.
For the moment I am not using the GPU_72 notebook, as I am doing some sort of "custom" work. For each user I fire up separated GPU and CPU instances. For the GPUs, I use to give a break of at least 48 hours between the end of a session and the beginning of the next one. The CPU instances always run smoothly for 12 hours, and as soon as I reconnect they resume work for another 12. Whenever I mix GPU instances, though, things get a bit more funny. The GPU running times vary from ~ 5-6 hours to a maximum of 10, well in line with most users here. Now, the interesting point is that very frequently, when the GPU instances disconnect before the full 10 hours have elapsed, I get a message stating that the disconnection was due to inactivity. IMHO, if there is a program that doesn´t even give the VM a chance to be idle is mfaktc. More recently I have noticed that when the GPU disconnects the same happens to the CPU, and very often I also get the same message on the CPU instance window. Again, running mprime doesn´t fit in the definition of "idle", does it? This behaviour isn´t totally consistent, though. Sometimes the GPU disconnects and the CPU keeps running (both instances were started at the same time). I´m quite sure someone else must have already noticed this behaviour. Will someone care to comment on it, hinting at the probable cause for this "inactivity"? Thanks in advance. Last fiddled with by lycorn on 2020-04-14 at 14:01 |
|
|
|
|
|
#942 |
|
Mar 2007
Estonia
2·71 Posts |
Indeed it randomly notifies about the lapsing of the 10-12 hours by saying my session was inactive, though of course it wasn't. I think it's just a mistake on their part. I have an automated setup that tries to restart sessions, and I am left with all the error screens it produced in the meanwhile - I see "no backend available", "session timed out", "too many instances", all while trying to get an instance.
Maybe you haven't abused the notebooks long enough yet? I got back-to-back instances for my first days (less than a week IIRC), after that it put on much more strict limits. |
|
|
|
|
|
#943 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
9,767 Posts |
Quote:
I've found that if I don't do this, the Notebook will disconnect after almost exactly an hour. If I happen to notice this within five minutes or so it will reconnect and continue on (again showing the Busy message). If I miss this window the Notebook will have been killed, and I'll have to rerun my Sections again. I hope that helps. |
|
|
|
|
|
|
#944 |
|
"GIMFS"
Sep 2002
Oeiras, Portugal
27018 Posts |
Thank you both for the answers.
@kuratkull: Indeed, whenever I try to restart a session after receiving the "inactivity" message, I get the "no backend available", and an indication that the usage limits were exceeded.. Now the thing is I am doing things in a way as not to abuse the GPUs. As I said, I am giving a > 48 hours break between sessions. Oh well, Google gods got grumpy. Let´s hope it´s not the SARS-COV-2... ![]() @chalsall: You mean clicking there when I get disconnected, as a way to resume, or to start the session? I sometimes start the session clicking there, and it doesn´t change things, but AFAIR I have never done it to try and resume a disconnected session; I´ll give it a go next time. One more thing: does it happen to you to get only the GPU instance disconnected? And have you ever received this "inactivity" message? |
|
|
|
|
|
#945 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
9,767 Posts |
Quote:
If I get a GPU, I then do few "Factory Reset" cycles until I'll get a T4 or a P100, and I then do the "Connect to hosted runtime" thing, which basically tells Colab "This is going to run for quite a while; don't kill the session just because a human doesn't interact". For those sessions where I only get a CPU, I do the same Connect thing to secure the CPU session. I then come back at the expected GPU availability window opening time, and do the Connect thing again. If the GPU isn't yet available, it will say "No GPU backend available", so the CPU just continues. If the GPU is available the session will reconnect without the warning, and I do the coveted GPU hunt. Rinse and repeat... ![]() Edit: No, I've never received the inactivity message. Although I think Kracker once reported his getting a pop-up saying that the GPU wasn't being used, even though he was running the GPU72_TF Notebook, and the GPU was at 100%. Last fiddled with by chalsall on 2020-04-14 at 14:54 |
|
|
|
|
|
|
#946 |
|
"GIMFS"
Sep 2002
Oeiras, Portugal
3×491 Posts |
Ok, I see.
Thanks again for your time. Very useful info. |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Alternatives to Google Colab | kriesel | Cloud Computing | 11 | 2020-01-14 18:45 |
| Notebook | enzocreti | enzocreti | 0 | 2019-02-15 08:20 |
| Computer Diet causes Machine Check Exception -- need heuristics help | Christenson | Hardware | 32 | 2011-12-25 08:17 |
| Computer diet - Need help | garo | Hardware | 41 | 2011-10-06 04:06 |
| Workunit diet ? | dsouza123 | NFSNET Discussion | 5 | 2004-02-27 00:42 |