mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > Cloud Computing

Reply
Thread Tools
Old 2019-10-21, 19:39   #408
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

262716 Posts
Default

Quote:
Originally Posted by kriesel View Post
!top -d n sorta does. Here n=120.
I stand corrected!

Thanks. You just taught me something.
chalsall is offline   Reply With Quote
Old 2019-10-21, 20:37   #409
mnd9
 
Jun 2019
Boston, MA

3×13 Posts
Default

I'm still a bit confused about using Kaggle. If I leave a job running in the edit window, it seems to time out and power off after a short while (well before the 6 hour max) making me lose everything. I tried committing, but then I'm confused as to how to re-enter the session and the see the output from my code cells or how to download any output files from the committed session. When I click on the committed session I see the "code" page which I can't really discern, and if I click "edit" it seems to just open a new draft session...

Can someone give me some basic pointers on this?
mnd9 is offline   Reply With Quote
Old 2019-10-21, 20:55   #410
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

2·33·71 Posts
Default

Quote:
Originally Posted by Dylan14 View Post
@EdH: issuing locate cuda.h yields the following results:
. . .
so maybe try adding /usr/include/linux/ to your makefile (where you define where the cuda library is located).
The "--with-cuda=" option worked for cuda.h, but the troubles aren't over:
Code:
checking cuda.h usability... yes 
checking cuda.h presence... yes 
checking for cuda.h... yes 
checking that CUDA Toolkit version is at least 3.0... no 
configure: error: a newer version of the CUDA Toolkit is needed
Code:
cuda-toolkit-10-0/unknown,now 10.0.130-1 amd64 [installed,automatic]
  CUDA Toolkit 10.0 meta-package
cuda-toolkit-10-1/unknown,now 10.1.243-1 amd64 [installed]
  CUDA Toolkit 10.1 meta-package
Now 10 is somehow older than 3? (maybe 1 compared to 3, instead of 10). I guess I'm going to have to into the configure code, make a change and see where that goes.

More later. Thanks all!
EdH is offline   Reply With Quote
Old 2019-10-21, 22:32   #411
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

9,767 Posts
Default

Quote:
Originally Posted by mnd9 View Post
I'm still a bit confused about using Kaggle. If I leave a job running in the edit window, it seems to time out and power off after a short while (well before the 6 hour max) making me lose everything. I tried committing, but then I'm confused as to how to re-enter the session and the see the output from my code cells or how to download any output files from the committed session. When I click on the committed session I see the "code" page which I can't really discern, and if I click "edit" it seems to just open a new draft session...

Can someone give me some basic pointers on this?
All I can offer you is my own empirical observations. They might be of some use.

I have found that the Kaggle Browser-based User Interface (UI) somewhat confusing.

Right now I have a GPU attached instance up and running, working away. But the UI tells me that "The kernel is powered off. Click this banner to turn it back on."

Two things:

1. I'm logged into the instance "tail -f"'ing logs. I know it's still running.

2. My "GPU Quota" continues to count down. I once wasted ~8.5 hours of a 9-hour instance "happening" this way...

I've found that clicking on the banner causes a restart of the instance -- any SSH connections immediately drop, and the instance which becomes available by way of the UI is "virgin" (although it's "uptime" might be several hours).

With regards to "Committed" jobs, my understanding is there is somewhere within the VM's FS you can place data for later harvesting. I haven't investigated that myself; I believe others here have.

As an aside, it's a good thing I "Hash". Last Saturday was ~12 km up and down steep hills. And interacting with humans in "meat-space".

If it wasn't for that weekly event, I might never get off my sorry little ass...
chalsall is offline   Reply With Quote
Old 2019-10-21, 22:40   #412
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

5·131 Posts
Default

Quote:
Originally Posted by mnd9 View Post
I'm still a bit confused about using Kaggle. If I leave a job running in the edit window, it seems to time out and power off after a short while (well before the 6 hour max) making me lose everything. I tried committing, but then I'm confused as to how to re-enter the session and the see the output from my code cells or how to download any output files from the committed session. When I click on the committed session I see the "code" page which I can't really discern, and if I click "edit" it seems to just open a new draft session...

Can someone give me some basic pointers on this?
After getting your code all ready to run, click on commit. Then you should get a screen like the one I have included here. At the top is a link to your committed run. You can use that link to get to that session, or, once the code has completed, your session will be listed over on the right side of the screen (where you can see my previous V1 through V6 commits; V7 will appear there once the code completes and exits).

Please note that files are not saved, only screen output is retained. What I have done in the code you see here is run some ECM curves (via GMP-ECM), which is configured to save its output to a file called ecm-out.txt. Then, in my code, after the line that invokes ecm, I use !cat ecm-out.txt to send it to the (virtual) screen.

After the run is complete, I can call up that commit (V7 in this case), and simply copy/paste the output into a local text file.

Hope this helps...
Attached Thumbnails
Click image for larger version

Name:	ScreenShot01.jpg
Views:	60
Size:	139.6 KB
ID:	21166  
PhilF is offline   Reply With Quote
Old 2019-10-21, 23:20   #413
mnd9
 
Jun 2019
Boston, MA

3·13 Posts
Default

Quote:
Originally Posted by chalsall View Post
All I can offer you is my own empirical observations. They might be of some use.

I have found that the Kaggle Browser-based User Interface (UI) somewhat confusing.

Right now I have a GPU attached instance up and running, working away. But the UI tells me that "The kernel is powered off. Click this banner to turn it back on."

Two things:

1. I'm logged into the instance "tail -f"'ing logs. I know it's still running.

2. My "GPU Quota" continues to count down. I once wasted ~8.5 hours of a 9-hour instance "happening" this way...

I've found that clicking on the banner causes a restart of the instance -- any SSH connections immediately drop, and the instance which becomes available by way of the UI is "virgin" (although it's "uptime" might be several hours).

With regards to "Committed" jobs, my understanding is there is somewhere within the VM's FS you can place data for later harvesting. I haven't investigated that myself; I believe others here have.

As an aside, it's a good thing I "Hash". Last Saturday was ~12 km up and down steep hills. And interacting with humans in "meat-space".

If it wasn't for that weekly event, I might never get off my sorry little ass...
Thanks Chris for your insight—today my kernels kept powering off, with all cells grayed out so it seemed my only option was to click power on which loses everything as you noted.

How are you able to see things are still running with power “off” without access to the code cells? It seems I’m missing something...
mnd9 is offline   Reply With Quote
Old 2019-10-21, 23:29   #414
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

65510 Posts
Default

Quote:
Originally Posted by mnd9 View Post
Thanks Chris for your insight—today my kernels kept powering off, with all cells grayed out so it seemed my only option was to click power on which loses everything as you noted.

How are you able to see things are still running with power “off” without access to the code cells? It seems I’m missing something...
I've noticed that too. If a kernel powers off, don't click on the banner to power it back on. Instead, close the window. Then open a new Kaggle window, go to your Notebooks, and choose the powered off notebook from there. Going at it that way, I have found the files are still there.
PhilF is offline   Reply With Quote
Old 2019-10-21, 23:33   #415
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

9,767 Posts
Default

Quote:
Originally Posted by mnd9 View Post
How are you able to see things are still running with power “off” without access to the code cells? It seems I’m missing something...
It's a bit "Geeky", but please see this.

These instances are full-blown Ubuntu (read: Linux) environments. Although all incoming network traffic is firewalled, all outgoing (and "established") traffic is allowed.

Thanks to the GPL, it is relatively trivial to get shell access into things like this.

Last fiddled with by chalsall on 2019-10-21 at 23:42 Reason: Error in URL.
chalsall is offline   Reply With Quote
Old 2019-10-21, 23:35   #416
mnd9
 
Jun 2019
Boston, MA

478 Posts
Default

Quote:
Originally Posted by PhilF View Post
After getting your code all ready to run, click on commit. Then you should get a screen like the one I have included here. At the top is a link to your committed run. You can use that link to get to that session, or, once the code has completed, your session will be listed over on the right side of the screen (where you can see my previous V1 through V6 commits; V7 will appear there once the code completes and exits).

Please note that files are not saved, only screen output is retained. What I have done in the code you see here is run some ECM curves (via GMP-ECM), which is configured to save its output to a file called ecm-out.txt. Then, in my code, after the line that invokes ecm, I use !cat ecm-out.txt to send it to the (virtual) screen.

After the run is complete, I can call up that commit (V7 in this case), and simply copy/paste the output into a local text file.

Hope this helps...
This is helpful! What if I wanted to work on a single exponent over several sessions (eg a wavefront exponent using cudalucas or gpuowl)? Is there a way to cat the checkpoint file and snag it? And when you say the run “completes” you mean reaches the time limit and is killed right?
mnd9 is offline   Reply With Quote
Old 2019-10-21, 23:39   #417
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

5·131 Posts
Default

Quote:
Originally Posted by mnd9 View Post
This is helpful! What if I wanted to work on a single exponent over several sessions (eg a wavefront exponent using cudalucas or gpuowl)? Is there a way to cat the checkpoint file and snag it? And when you say the run “completes” you mean reaches the time limit and is killed right?
No, I mean the executable exits to the shell, like GMP-ECM does when its work is complete. Unfortunately, mprime does not do that.

I don't know about the other executables people are running. If they won't exit, I assume the job is killed after 9 hours, but then the cat statement(s) would never get a chance to execute.

However, if the executable can send its output data to the screen as it is running, then you probably could go into the job after 9 hours and retrieve that screen output.

Last fiddled with by PhilF on 2019-10-21 at 23:42
PhilF is offline   Reply With Quote
Old 2019-10-22, 02:45   #418
axn
 
axn's Avatar
 
Jun 2003

2·3·7·112 Posts
Default

Quote:
Originally Posted by mnd9 View Post
I'm still a bit confused about using Kaggle. If I leave a job running in the edit window, it seems to time out and power off after a short while (well before the 6 hour max) making me lose everything. I tried committing, but then I'm confused as to how to re-enter the session and the see the output from my code cells or how to download any output files from the committed session.
Once your commited job finished, you will be able to see all your files in the Output tab

Quote:
Originally Posted by PhilF View Post
Please note that files are not saved, only screen output is retained.
That's exactly upside down. Only files are retained in Output tab. Screen output is lost (unless you redirect it to a file).

Quote:
Originally Posted by mnd9 View Post
Is there a way to cat the checkpoint file and snag it?
Yes. In the Output tab, all the files in the kaggle folder will be available once the session completes (either the code has run to completion, or session was killed after 9 hours (CPU) / 6 hrs (GPU))

EDIT:- Key to the kingdom: https://www.kaggle.com/<yourid>/<yourkernel>/

Last fiddled with by axn on 2019-10-22 at 02:54
axn is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Alternatives to Google Colab kriesel Cloud Computing 11 2020-01-14 18:45
Notebook enzocreti enzocreti 0 2019-02-15 08:20
Computer Diet causes Machine Check Exception -- need heuristics help Christenson Hardware 32 2011-12-25 08:17
Computer diet - Need help garo Hardware 41 2011-10-06 04:06
Workunit diet ? dsouza123 NFSNET Discussion 5 2004-02-27 00:42

All times are UTC. The time now is 16:27.


Mon Aug 2 16:27:10 UTC 2021 up 10 days, 10:56, 0 users, load averages: 3.11, 2.64, 2.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.