Register FAQ Search Today's Posts Mark Forums Read

2020-03-24, 20:56   #936
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

52×163 Posts

Quote:
 Originally Posted by petrw1 https://download.mersenne.ca/CUDAPm1/ Can anyone tell me which one will work at colab?
Gpuowl P-1 works on Colab, all 4 gpu models, with differing -maxAlloc values advisable. In my attempts, CUDAPm1 did not work. https://www.mersenneforum.org/showthread.php?t=24839
If you get CUDAPm1 to work, please let me know how.

2020-03-24, 22:00   #937
PhilF

Feb 2005

7×67 Posts

Quote:
 Originally Posted by Uncwilly Anyone else think that Colab and the rest of google and amazon, M$, etc should through all their spare cycles at folding@home. Nah, I'm having fun playing with / abusing a few of those cycles... 2020-03-25, 23:44 #938 petrw1 1976 Toyota Corona years forever! "Wayne" Nov 2006 Saskatchewan, Canada 432010 Posts Can someone explain this and help me out... I can't see, and therefore, can't execute in colab, the files I uploaded (i.e. gpuowl.exe)from my Desktop (that I saved from mersenneforum). "My Drive" that I see in a Browser is different than "My Drive" in colab. Thx Quote:  Originally Posted by petrw1 If I go to Google Drive and open directory My Drive. It shows up on the URL bar as: https://drive.google.com/drive/my-drive In the folder list as: My Drive My Drive contains gpuowl.exe. It has subfolders of - Colab Notebooks with a file with the source code for each Notebook session I am running and Copy of gpuowl.exe - ColabWork (A new folder I created) with another copy of gpuowl.exe HOWEVER in my Python window when I run this code: Code: import os.path from google.colab import drive if not os.path.exists('/content/drive/My Drive'): drive.mount('/content/drive') %cd '/content/drive/My Drive' for x in os.listdir('.'): print (x) I see only these 2 directories: Code: /content/drive/My Drive cudapm1 gpuowl These were created form WorkBooks I ran recently to install cudaPm1 and GpuOwl. It does NOT see gpuowl.exe; nor does it see my ColabWork directory that I set up. 2020-04-02, 16:56 #939 kriesel "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 52×163 Posts Quote:  Originally Posted by petrw1 I can't see, and therefore, can't execute in colab, the files I uploaded (i.e. gpuowl.exe)from my Desktop (that I saved from mersenneforum). "My Drive" that I see in a Browser is different than "My Drive" in colab. Thx It does NOT see gpuowl.exe; nor does it see my ColabWork directory that I set up. Been there, early on. As I recall chmod was the solution. Programming a substitute for ls -l seems unnecessary to me. Try some or all the following. The "top" is there to keep the main process busy so the session is not terminated. I use this in a script to run both cpu task and gpu task and if either GIMPS task terminates, such as by running out of work, the other can continue for the rest of the allowed session length. The 3-minute interval is a compromise that allows a full 12 hour sessionlength; increase interval to ~ 6 minutes if running Colab Pro for 24 hours, or the early part of the output will be lost from the finite buffer. Code: !chmod 777 '/content/drive/My Drive/gpuowl' %cd '/content/drive/My Drive/gpuowl//' !echo ls -l ./ !ls -l ./ statinfo = os.stat('./worktodo.txt') if statinfo.st_size < 50: print ('WARNING, small file size indicates little or no gpuowl work to do') !LD_LIBRARY_PATH="lib:${LD_LIBRARY_PATH}" && chmod 777 gpuowl.exe && chmod 777 worktodo.txt
!./gpuowl.exe  >>gpuowllog.txt 2>&1 &
print('gpuowl launched in background')
!top -d 180
There are assorted code examples at https://www.mersenneforum.org/showthread.php?t=24839
including #16 for branching to gpu-model-specific folders with different gpuowl config.txt files, worktodo, etc. Different -maxAlloc are advisable for doing P-1 factoring.
Good luck.

Last fiddled with by kriesel on 2020-04-02 at 17:09

 2020-04-03, 03:35 #940 petrw1 1976 Toyota Corona years forever!     "Wayne" Nov 2006 Saskatchewan, Canada 25×33×5 Posts gpuowl.exe not found... Running this as you requested: Code: import os.path from google.colab import drive import sys if not os.path.exists('/content/drive/My Drive'): drive.mount('/content/drive') %cd '/content/drive/My Drive//' !chmod 777 '/content/drive/My Drive/gpuowl' %cd '/content/drive/My Drive/gpuowl//' !echo ls -l ./ !ls -l ./ %cd gpuowl/ !echo ls -l ./ !ls -l ./ statinfo = os.stat('./worktodo.txt') if statinfo.st_size < 50: print ('WARNING, small file size indicates little or no gpuowl work to do') !LD_LIBRARY_PATH="lib:\${LD_LIBRARY_PATH}" && chmod 777 gpuowl.exe && chmod 777 worktodo.txt !./gpuowl.exe >>gpuowllog.txt 2>&1 & print('gpuowl launched in background') !top -d 180 The run first lists the entire directory. Note there are 2 gpuowl directories; one within the other. The inner seems to have everything except the executable. Code: /content/drive/My Drive /content/drive/My Drive/gpuowl /content/drive/My Drive/gpuowl/gpuowl /content/drive/My Drive/gpuowl ls -l ./ total 5 drwx------ 7 root root 4096 Apr 3 03:15 gpuowl -rw------- 1 root root 51 Apr 3 03:22 gpuowllog.txt -rwx------ 1 root root 47 Mar 12 21:46 worktodo.txt /content/drive/My Drive/gpuowl/gpuowl ls -l ./ total 459 -rw------- 1 root root 192 Mar 12 21:32 AllocTrac.cpp -rw------- 1 root root 1613 Mar 12 21:32 AllocTrac.h -rw------- 1 root root 7647 Mar 12 21:32 Args.cpp -rw------- 1 root root 960 Mar 12 21:32 Args.h -rw------- 1 root root 386 Mar 12 21:32 Background.h -rw------- 1 root root 3945 Mar 12 21:32 Blake2.h -rw------- 1 root root 3344 Mar 12 21:32 Buffer.h -rw------- 1 root root 5376 Mar 12 21:32 checkpoint.cpp -rw------- 1 root root 2787 Mar 12 21:32 checkpoint.h -rw------- 1 root root 13674 Mar 12 21:32 clwrap.cpp -rw------- 1 root root 3289 Mar 12 21:32 clwrap.h -rw------- 1 root root 864 Mar 12 21:32 CMakeLists.txt -rw------- 1 root root 466 Mar 12 21:32 codestyle.md -rw------- 1 root root 1504 Mar 12 21:32 common.cpp -rw------- 1 root root 635 Mar 12 21:32 common.h -rw------- 1 root root 747 Mar 12 21:32 Context.h -rw------- 1 root root 4990 Mar 12 21:32 conv.py -rw------- 1 root root 2200 Mar 12 21:32 FFTConfig.cpp -rw------- 1 root root 560 Mar 12 21:32 FFTConfig.h -rw------- 1 root root 3897 Mar 12 21:32 File.h -rw------- 1 root root 1496 Mar 12 21:32 GmpUtil.cpp -rw------- 1 root root 448 Mar 12 21:32 GmpUtil.h -rw------- 1 root root 10648 Mar 12 21:33 GmpUtil.o -rw------- 1 root root 33980 Mar 12 21:32 Gpu.cpp -rw------- 1 root root 5758 Mar 12 21:32 Gpu.h -rw------- 1 root root 106232 Mar 12 21:32 gpuowl.cl -rw------- 1 root root 106281 Mar 12 21:33 gpuowl-wrap.cpp -rw------- 1 root root 37 Mar 12 21:32 head.txt -rw------- 1 root root 1632 Mar 12 21:32 kernel.h -rw------- 1 root root 35141 Mar 12 21:32 LICENSE -rw------- 1 root root 2491 Mar 12 21:32 main.cpp -rw------- 1 root root 1358 Mar 12 21:32 Makefile -rw------- 1 root root 1101 Mar 12 21:32 Makefile-old -rw------- 1 root root 4231 Mar 12 21:32 Pm1Plan.cpp -rw------- 1 root root 821 Mar 12 21:32 Pm1Plan.h -rw------- 1 root root 29672 Mar 12 21:33 Pm1Plan.o -rw------- 1 root root 2438 Mar 12 21:32 ProofSet.h -rw------- 1 root root 2726 Mar 12 21:32 Queue.h -rw------- 1 root root 7971 Mar 12 21:32 README.md -rw------- 1 root root 1019 Mar 12 21:32 SConstruct -rw------- 1 root root 203 Mar 12 21:32 shared.h -rw------- 1 root root 554 Mar 12 21:32 Signal.cpp -rw------- 1 root root 160 Mar 12 21:32 Signal.h -rw------- 1 root root 2911 Mar 12 21:32 state.cpp -rw------- 1 root root 554 Mar 12 21:32 state.h -rw------- 1 root root 12 Mar 12 21:32 tail.txt -rw------- 1 root root 5486 Mar 12 21:32 Task.cpp -rw------- 1 root root 1224 Mar 12 21:32 Task.h drwx------ 2 root root 4096 Mar 12 21:32 test-pm1 -rw------- 1 root root 445 Mar 12 21:32 timeutil.cpp -rw------- 1 root root 776 Mar 12 21:32 timeutil.h -rw------- 1 root root 14181 Mar 12 21:32 tinycl.h drwx------ 2 root root 4096 Mar 12 21:32 tools -rw------- 1 root root 257 Mar 12 21:32 typeName.h -rw------- 1 root root 131 Mar 12 21:32 version.h -rw------- 1 root root 27 Mar 12 21:33 version.inc -rw------- 1 root root 27 Apr 3 03:15 version.new -rw------- 1 root root 4299 Mar 12 21:32 Worktodo.cpp -rw------- 1 root root 577 Mar 12 21:32 Worktodo.h -rw------- 1 root root 95 Mar 12 21:39 worktodo.txt chmod: cannot access 'gpuowl.exe': No such file or directory gpuowl launched in background
 2020-04-14, 13:59 #941 lycorn     Sep 2002 Oeiras, Portugal 56416 Posts The Google definition of "idle" I have been running several Colab notebooks (5 users in total) with no major issues. For the moment I am not using the GPU_72 notebook, as I am doing some sort of "custom" work. For each user I fire up separated GPU and CPU instances. For the GPUs, I use to give a break of at least 48 hours between the end of a session and the beginning of the next one. The CPU instances always run smoothly for 12 hours, and as soon as I reconnect they resume work for another 12. Whenever I mix GPU instances, though, things get a bit more funny. The GPU running times vary from ~ 5-6 hours to a maximum of 10, well in line with most users here. Now, the interesting point is that very frequently, when the GPU instances disconnect before the full 10 hours have elapsed, I get a message stating that the disconnection was due to inactivity. IMHO, if there is a program that doesn´t even give the VM a chance to be idle is mfaktc. More recently I have noticed that when the GPU disconnects the same happens to the CPU, and very often I also get the same message on the CPU instance window. Again, running mprime doesn´t fit in the definition of "idle", does it? This behaviour isn´t totally consistent, though. Sometimes the GPU disconnects and the CPU keeps running (both instances were started at the same time). I´m quite sure someone else must have already noticed this behaviour. Will someone care to comment on it, hinting at the probable cause for this "inactivity"? Thanks in advance. Last fiddled with by lycorn on 2020-04-14 at 14:01
 2020-04-14, 14:07 #942 kuratkull     Mar 2007 Estonia 2·67 Posts Indeed it randomly notifies about the lapsing of the 10-12 hours by saying my session was inactive, though of course it wasn't. I think it's just a mistake on their part. I have an automated setup that tries to restart sessions, and I am left with all the error screens it produced in the meanwhile - I see "no backend available", "session timed out", "too many instances", all while trying to get an instance. Maybe you haven't abused the notebooks long enough yet? I got back-to-back instances for my first days (less than a week IIRC), after that it put on much more strict limits.
2020-04-14, 14:16   #943
chalsall
If I May

"Chris Halsall"
Sep 2002

2·4,513 Posts

Quote:
 Originally Posted by lycorn Will someone care to comment on it, hinting at the probable cause for this "inactivity"?
Not sure if this is directly related to your workflow/use-case... But I've found it's good to click on the "Connect to hosted runtime" from the drop-down menu in the upper-right-hand side, to the immediate right of the CPU / Memory scale graphics. The Notebook then continues for the ~7 or so hours (for a GPU instance), with the graphic replaced with a "... Busy" message.

I've found that if I don't do this, the Notebook will disconnect after almost exactly an hour. If I happen to notice this within five minutes or so it will reconnect and continue on (again showing the Busy message). If I miss this window the Notebook will have been killed, and I'll have to rerun my Sections again.

I hope that helps.

 2020-04-14, 14:37 #944 lycorn     Sep 2002 Oeiras, Portugal 22·3·5·23 Posts Thank you both for the answers. @kuratkull: Indeed, whenever I try to restart a session after receiving the "inactivity" message, I get the "no backend available", and an indication that the usage limits were exceeded.. Now the thing is I am doing things in a way as not to abuse the GPUs. As I said, I am giving a > 48 hours break between sessions. Oh well, Google gods got grumpy. Let´s hope it´s not the SARS-COV-2... @chalsall: You mean clicking there when I get disconnected, as a way to resume, or to start the session? I sometimes start the session clicking there, and it doesn´t change things, but AFAIR I have never done it to try and resume a disconnected session; I´ll give it a go next time. One more thing: does it happen to you to get only the GPU instance disconnected? And have you ever received this "inactivity" message?
2020-04-14, 14:52   #945
chalsall
If I May

"Chris Halsall"
Sep 2002

2·4,513 Posts

Quote:
 Originally Posted by lycorn You mean clicking there when I get disconnected, as a way to resume, or to start the session?
No... What I do in the morning is go to each of my account's UIs, and press Cntl-F9. This runs all the Sections of the Notebook in order (I often have a reverse SSH Tunnel section before the GPU72_TF section, for development/debugging purposes, or just to record telemetry statistics).

If I get a GPU, I then do few "Factory Reset" cycles until I'll get a T4 or a P100, and I then do the "Connect to hosted runtime" thing, which basically tells Colab "This is going to run for quite a while; don't kill the session just because a human doesn't interact".

For those sessions where I only get a CPU, I do the same Connect thing to secure the CPU session. I then come back at the expected GPU availability window opening time, and do the Connect thing again. If the GPU isn't yet available, it will say "No GPU backend available", so the CPU just continues. If the GPU is available the session will reconnect without the warning, and I do the coveted GPU hunt.

Rinse and repeat...

Edit: No, I've never received the inactivity message. Although I think Kracker once reported his getting a pop-up saying that the GPU wasn't being used, even though he was running the GPU72_TF Notebook, and the GPU was at 100%.

Last fiddled with by chalsall on 2020-04-14 at 14:54

 2020-04-14, 15:16 #946 lycorn     Sep 2002 Oeiras, Portugal 138010 Posts Ok, I see. Thanks again for your time. Very useful info.

 Similar Threads Thread Thread Starter Forum Replies Last Post enzocreti enzocreti 0 2019-02-15 08:20 Raman Lounge 6 2009-02-19 06:59 kwstone Hardware 4 2004-01-15 00:11 E_tron Software 9 2003-08-19 08:59 eepiccolo Lounge 8 2002-12-28 21:25

All times are UTC. The time now is 17:52.

Sat Jul 11 17:52:51 UTC 2020 up 108 days, 15:25, 0 users, load averages: 1.86, 1.75, 1.51

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.