![]() |
[QUOTE=petrw1;540772][URL]https://download.mersenne.ca/CUDAPm1/[/URL]
Can anyone tell me which one will work at colab?[/QUOTE] Gpuowl P-1 works on Colab, all 4 gpu models, with differing -maxAlloc values advisable. In my attempts, CUDAPm1 did not work. [url]https://www.mersenneforum.org/showthread.php?t=24839[/url] If you get CUDAPm1 to work, please let me know how. |
[QUOTE=Uncwilly;540748]Anyone else think that Colab and the rest of google and amazon, M$, etc should through all their spare cycles at folding@home.[/QUOTE]
Nah, I'm having fun playing with / abusing a few of those cycles... |
Can someone explain this and help me out...
I can't see, and therefore, can't execute in colab, the files I uploaded (i.e. gpuowl.exe)from my Desktop (that I saved from mersenneforum).
"My Drive" that I see in a Browser is different than "My Drive" in colab. Thx [QUOTE=petrw1;540084]If I go to Google Drive and open directory My Drive. It shows up on the URL bar as: [url]https://drive.google.com/drive/my-drive[/url] In the folder list as: My Drive My Drive contains gpuowl.exe. It has subfolders of - Colab Notebooks with a file with the source code for each Notebook session I am running and Copy of gpuowl.exe - ColabWork (A new folder I created) with another copy of gpuowl.exe HOWEVER in my Python window when I run this code: [CODE]import os.path from google.colab import drive if not os.path.exists('/content/drive/My Drive'): drive.mount('/content/drive') %cd '/content/drive/My Drive' for x in os.listdir('.'): print (x)[/CODE] I see only these 2 directories: [CODE]/content/drive/My Drive cudapm1 gpuowl[/CODE] These were created form WorkBooks I ran recently to install cudaPm1 and GpuOwl.[/QUOTE] It does NOT see gpuowl.exe; nor does it see my ColabWork directory that I set up. |
[QUOTE=petrw1;540898]I can't see, and therefore, can't execute in colab, the files I uploaded (i.e. gpuowl.exe)from my Desktop (that I saved from mersenneforum).
"My Drive" that I see in a Browser is different than "My Drive" in colab. Thx It does NOT see gpuowl.exe; nor does it see my ColabWork directory that I set up.[/QUOTE]Been there, early on. As I recall chmod was the solution. Programming a substitute for ls -l seems unnecessary to me. Try some or all the following. The "top" is there to keep the main process busy so the session is not terminated. I use this in a script to run both cpu task and gpu task and if either GIMPS task terminates, such as by running out of work, the other can continue for the rest of the allowed session length. The 3-minute interval is a compromise that allows a full 12 hour sessionlength; increase interval to ~ 6 minutes if running Colab Pro for 24 hours, or the early part of the output will be lost from the finite buffer.[CODE]!chmod 777 '/content/drive/My Drive/gpuowl' %cd '/content/drive/My Drive/gpuowl//' !echo ls -l ./ !ls -l ./ statinfo = os.stat('./worktodo.txt') if statinfo.st_size < 50: print ('WARNING, small file size indicates little or no gpuowl work to do') !LD_LIBRARY_PATH="lib:${LD_LIBRARY_PATH}" && chmod 777 gpuowl.exe && chmod 777 worktodo.txt !./gpuowl.exe >>gpuowllog.txt 2>&1 & print('gpuowl launched in background') !top -d 180[/CODE]There are assorted code examples at [URL]https://www.mersenneforum.org/showthread.php?t=24839[/URL] including #16 for branching to gpu-model-specific folders with different gpuowl config.txt files, worktodo, etc. Different -maxAlloc are advisable for doing P-1 factoring. Good luck. |
gpuowl.exe not found...
Running this as you requested:
[CODE]import os.path from google.colab import drive import sys if not os.path.exists('/content/drive/My Drive'): drive.mount('/content/drive') %cd '/content/drive/My Drive//' !chmod 777 '/content/drive/My Drive/gpuowl' %cd '/content/drive/My Drive/gpuowl//' !echo ls -l ./ !ls -l ./ %cd gpuowl/ !echo ls -l ./ !ls -l ./ statinfo = os.stat('./worktodo.txt') if statinfo.st_size < 50: print ('WARNING, small file size indicates little or no gpuowl work to do') !LD_LIBRARY_PATH="lib:${LD_LIBRARY_PATH}" && chmod 777 gpuowl.exe && chmod 777 worktodo.txt !./gpuowl.exe >>gpuowllog.txt 2>&1 & print('gpuowl launched in background') !top -d 180[/CODE] The run first lists the entire directory. Note there are 2 gpuowl directories; one within the other. The inner seems to have everything except the executable. [CODE]/content/drive/My Drive /content/drive/My Drive/gpuowl /content/drive/My Drive/gpuowl/gpuowl /content/drive/My Drive/gpuowl ls -l ./ total 5 drwx------ 7 root root 4096 Apr 3 03:15 gpuowl -rw------- 1 root root 51 Apr 3 03:22 gpuowllog.txt -rwx------ 1 root root 47 Mar 12 21:46 worktodo.txt /content/drive/My Drive/gpuowl/gpuowl ls -l ./ total 459 -rw------- 1 root root 192 Mar 12 21:32 AllocTrac.cpp -rw------- 1 root root 1613 Mar 12 21:32 AllocTrac.h -rw------- 1 root root 7647 Mar 12 21:32 Args.cpp -rw------- 1 root root 960 Mar 12 21:32 Args.h -rw------- 1 root root 386 Mar 12 21:32 Background.h -rw------- 1 root root 3945 Mar 12 21:32 Blake2.h -rw------- 1 root root 3344 Mar 12 21:32 Buffer.h -rw------- 1 root root 5376 Mar 12 21:32 checkpoint.cpp -rw------- 1 root root 2787 Mar 12 21:32 checkpoint.h -rw------- 1 root root 13674 Mar 12 21:32 clwrap.cpp -rw------- 1 root root 3289 Mar 12 21:32 clwrap.h -rw------- 1 root root 864 Mar 12 21:32 CMakeLists.txt -rw------- 1 root root 466 Mar 12 21:32 codestyle.md -rw------- 1 root root 1504 Mar 12 21:32 common.cpp -rw------- 1 root root 635 Mar 12 21:32 common.h -rw------- 1 root root 747 Mar 12 21:32 Context.h -rw------- 1 root root 4990 Mar 12 21:32 conv.py -rw------- 1 root root 2200 Mar 12 21:32 FFTConfig.cpp -rw------- 1 root root 560 Mar 12 21:32 FFTConfig.h -rw------- 1 root root 3897 Mar 12 21:32 File.h -rw------- 1 root root 1496 Mar 12 21:32 GmpUtil.cpp -rw------- 1 root root 448 Mar 12 21:32 GmpUtil.h -rw------- 1 root root 10648 Mar 12 21:33 GmpUtil.o -rw------- 1 root root 33980 Mar 12 21:32 Gpu.cpp -rw------- 1 root root 5758 Mar 12 21:32 Gpu.h -rw------- 1 root root 106232 Mar 12 21:32 gpuowl.cl -rw------- 1 root root 106281 Mar 12 21:33 gpuowl-wrap.cpp -rw------- 1 root root 37 Mar 12 21:32 head.txt -rw------- 1 root root 1632 Mar 12 21:32 kernel.h -rw------- 1 root root 35141 Mar 12 21:32 LICENSE -rw------- 1 root root 2491 Mar 12 21:32 main.cpp -rw------- 1 root root 1358 Mar 12 21:32 Makefile -rw------- 1 root root 1101 Mar 12 21:32 Makefile-old -rw------- 1 root root 4231 Mar 12 21:32 Pm1Plan.cpp -rw------- 1 root root 821 Mar 12 21:32 Pm1Plan.h -rw------- 1 root root 29672 Mar 12 21:33 Pm1Plan.o -rw------- 1 root root 2438 Mar 12 21:32 ProofSet.h -rw------- 1 root root 2726 Mar 12 21:32 Queue.h -rw------- 1 root root 7971 Mar 12 21:32 README.md -rw------- 1 root root 1019 Mar 12 21:32 SConstruct -rw------- 1 root root 203 Mar 12 21:32 shared.h -rw------- 1 root root 554 Mar 12 21:32 Signal.cpp -rw------- 1 root root 160 Mar 12 21:32 Signal.h -rw------- 1 root root 2911 Mar 12 21:32 state.cpp -rw------- 1 root root 554 Mar 12 21:32 state.h -rw------- 1 root root 12 Mar 12 21:32 tail.txt -rw------- 1 root root 5486 Mar 12 21:32 Task.cpp -rw------- 1 root root 1224 Mar 12 21:32 Task.h drwx------ 2 root root 4096 Mar 12 21:32 test-pm1 -rw------- 1 root root 445 Mar 12 21:32 timeutil.cpp -rw------- 1 root root 776 Mar 12 21:32 timeutil.h -rw------- 1 root root 14181 Mar 12 21:32 tinycl.h drwx------ 2 root root 4096 Mar 12 21:32 tools -rw------- 1 root root 257 Mar 12 21:32 typeName.h -rw------- 1 root root 131 Mar 12 21:32 version.h -rw------- 1 root root 27 Mar 12 21:33 version.inc -rw------- 1 root root 27 Apr 3 03:15 version.new -rw------- 1 root root 4299 Mar 12 21:32 Worktodo.cpp -rw------- 1 root root 577 Mar 12 21:32 Worktodo.h -rw------- 1 root root 95 Mar 12 21:39 worktodo.txt chmod: cannot access 'gpuowl.exe': No such file or directory gpuowl launched in background[/CODE] |
The Google definition of "idle"
I have been running several Colab notebooks (5 users in total) with no major issues.
For the moment I am not using the GPU_72 notebook, as I am doing some sort of "custom" work. For each user I fire up separated GPU and CPU instances. For the GPUs, I use to give a break of at least 48 hours between the end of a session and the beginning of the next one. The CPU instances always run smoothly for 12 hours, and as soon as I reconnect they resume work for another 12. Whenever I mix GPU instances, though, things get a bit more funny. The GPU running times vary from ~ 5-6 hours to a maximum of 10, well in line with most users here. Now, the interesting point is that very frequently, when the GPU instances disconnect before the full 10 hours have elapsed, I get a message stating that the disconnection was due to inactivity. IMHO, if there is a program that doesn´t even give the VM a chance to be idle is mfaktc. More recently I have noticed that when the GPU disconnects the same happens to the CPU, and very often I also get the same message on the CPU instance window. Again, running mprime doesn´t fit in the definition of "idle", does it? This behaviour isn´t totally consistent, though. Sometimes the GPU disconnects and the CPU keeps running (both instances were started at the same time). I´m quite sure someone else must have already noticed this behaviour. Will someone care to comment on it, hinting at the probable cause for this "inactivity"? Thanks in advance. |
Indeed it randomly notifies about the lapsing of the 10-12 hours by saying my session was inactive, though of course it wasn't. I think it's just a mistake on their part. I have an automated setup that tries to restart sessions, and I am left with all the error screens it produced in the meanwhile - I see "no backend available", "session timed out", "too many instances", all while trying to get an instance.
Maybe you haven't abused the notebooks long enough yet? I got back-to-back instances for my first days (less than a week IIRC), after that it put on much more strict limits. |
[QUOTE=lycorn;542626]Will someone care to comment on it, hinting at the probable cause for this "inactivity"?[/QUOTE]
Not sure if this is directly related to your workflow/use-case... But I've found it's good to click on the "Connect to hosted runtime" from the drop-down menu in the upper-right-hand side, to the immediate right of the CPU / Memory scale graphics. The Notebook then continues for the ~7 or so hours (for a GPU instance), with the graphic replaced with a "... Busy" message. I've found that if I don't do this, the Notebook will disconnect after almost exactly an hour. If I happen to notice this within five minutes or so it will reconnect and continue on (again showing the Busy message). If I miss this window the Notebook will have been killed, and I'll have to rerun my Sections again. I hope that helps. |
Thank you both for the answers.
@kuratkull: Indeed, whenever I try to restart a session after receiving the "inactivity" message, I get the "no backend available", and an indication that the usage limits were exceeded.. Now the thing is I am doing things in a way as not to abuse the GPUs. As I said, I am giving a > 48 hours break between sessions. Oh well, Google gods got grumpy. Let´s hope it´s not the SARS-COV-2...:smile: @chalsall: You mean clicking there when I get disconnected, as a way to resume, or to start the session? I sometimes start the session clicking there, and it doesn´t change things, but AFAIR I have never done it to try and resume a disconnected session; I´ll give it a go next time. One more thing: does it happen to you to get only the GPU instance disconnected? And have you ever received this "inactivity" message? |
[QUOTE=lycorn;542632]You mean clicking there when I get disconnected, as a way to resume, or to start the session?[/QUOTE]
No... What I do in the morning is go to each of my account's UIs, and press Cntl-F9. This runs all the Sections of the Notebook in order (I often have a reverse SSH Tunnel section before the GPU72_TF section, for development/debugging purposes, or just to record telemetry statistics). If I get a GPU, I then do few "Factory Reset" cycles until I'll get a T4 or a P100, and I then do the "Connect to hosted runtime" thing, which basically tells Colab "This is going to run for quite a while; don't kill the session just because a human doesn't interact". For those sessions where I only get a CPU, I do the same Connect thing to secure the CPU session. I then come back at the expected GPU availability window opening time, and do the Connect thing again. If the GPU isn't yet available, it will say "No GPU backend available", so the CPU just continues. If the GPU is available the session will reconnect without the warning, and I do the coveted GPU hunt. Rinse and repeat... :smile: Edit: No, I've never received the inactivity message. Although I think Kracker once reported his getting a pop-up saying that the GPU wasn't being used, even though he was running the GPU72_TF Notebook, and the GPU was at 100%. |
Ok, I see.
Thanks again for your time. Very useful info. |
| All times are UTC. The time now is 22:30. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.