mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Cloud Computing (https://www.mersenneforum.org/forumdisplay.php?f=134)
-   -   Google Diet Colab Notebook (https://www.mersenneforum.org/showthread.php?t=24646)

PhilF 2019-10-13 20:04

I get that same error, "Transport endpoint is not connected", after clicking the stop button to interrupt mprime. :confused:

After that, I just close the window, open a new one, go back to Colab, and start a new session. It is like you get one shot at executing, so no interrupts allowed!

There might be a better way to get reconnected, I don't know. I just avoid the stop button. :smile:

In the upper right it shows the green checkmark, so that means you are (supposedly) connected to a VM.

kriesel 2019-10-13 20:40

[QUOTE=PhilF;527937]I get that same error, "Transport endpoint is not connected", after clicking the stop button to interrupt mprime. :confused:

After that, I just close the window, open a new one, go back to Colab, and start a new session. It is like you get one shot at executing, so no interrupts allowed!

There might be a better way to get reconnected, I don't know. I just avoid the stop button. :smile:

In the upper right it shows the green checkmark, so that means you are (supposedly) connected to a VM.[/QUOTE]
Just now, even after I close the whole browser, restart, open the Colaboratory welcome page, I'm signed in, have the green check mark next to the ram and disk bar graphs, hover mouse over it and it says I'm connected to Python3 something or other, and +CODE, !pwd, same error. Stop button not used during that browser session. Maybe I should go do some useful home maintenance while it sorts itself out.

PhilF 2019-10-13 20:57

[QUOTE=kriesel;527939]Just now, even after I close the whole browser, restart, open the Colaboratory welcome page, I'm signed in, have the green check mark next to the ram and disk bar graphs, hover mouse over it and it says I'm connected to Python3 something or other, and +CODE, !pwd, same error. Stop button not used during that browser session. Maybe I should go do some useful home maintenance while it sorts itself out.[/QUOTE]

Well, that is interesting. I just interrupted my mprime execution, but before I closed the window I clicked on Runtime --> Manage Sessions, then terminated the session.

Then I opened a new window (Chrome, running on Windows 10), went to [URL="https://colab.research.google.com/notebooks/welcome.ipynb"]https://colab.research.google.com/notebooks/welcome.ipynb[/URL], clicked on Connect which got me a VM connection, clicked on +Code, and in the blank line typed !pwd then pressed control-Enter. I got:

/content

So, I don't know what is going on that is different on your end, other than after closing and re-opening your browser that maybe you are getting your old session back?

chalsall 2019-10-13 22:04

[QUOTE=xx005fs;527927]During the process of installation, it seems to try to change something in the kernel and is rebuilding it, but I am not sure about the exact process. But after installation the OpenCL device detection works immediately and the driver version shown by nvidia-smi is still unchanged, so idk if it actually changed anything within the driver stack to be considered a crack attempt or changed anything in kernels.[/QUOTE]

Hmmm... What you're describing sounds like a perfectly reasonable series of actions to attempt to get a driver installed and running.

It /sounds/ like what the "make" was doing was building a version of _itself_ to be compatible with the running kernel. The only modifications to the kernel going on would be the "insmod" attaching the driver to it. Done all the time.

I hope you regain access to Kaggle soon, so you can continue your experiments. Based on heuristics of email communications with Kaggle, however, I might suggest you set up under new credentials, rather than waiting for a human to see your plea...

kriesel 2019-10-13 22:38

[QUOTE=PhilF;527940]Well, that is interesting. I just interrupted my mprime execution, but before I closed the window I clicked on Runtime --> Manage Sessions, then terminated the session.

Then I opened a new window (Chrome, running on Windows 10), went to [URL]https://colab.research.google.com/notebooks/welcome.ipynb[/URL], clicked on Connect which got me a VM connection, clicked on +Code, and in the blank line typed !pwd then pressed control-Enter. I got:

/content

So, I don't know what is going on that is different on your end, other than after closing and re-opening your browser that maybe you are getting your old session back?[/QUOTE]
Thanks. I futzed with Manage Sessions as you described, and now I'm back [URL]https://www.youtube.com/watch?v=dBN86y30Ufc[/URL]
:spinner:

chalsall 2019-10-13 22:42

[QUOTE=kriesel;527935]...or just !pwd fails.[/QUOTE]

This is just a shot in the dark, but...

Do other Linux console commands fail? Maybe try something like "ps -aux" (which doesn't touch the filesystem).

If this works, try "%cd /" and then "!ls -lah".

My theory is /maybe/ something is resetting the filesystem and/or context underneath your shell, and you need to reestablish where your "Working Directory" actually is.

PhilF 2019-10-13 23:01

[QUOTE=kriesel;527944]Thanks. I futzed with Manage Sessions as you described, and now I'm back [URL]https://www.youtube.com/watch?v=dBN86y30Ufc[/URL]
:spinner:[/QUOTE]

You're welcome. But your youtube link sent me off on a half-hour youtube tangent... :snapoutofit:

xx005fs 2019-10-14 00:59

[QUOTE=chalsall;527941]Hmmm... What you're describing sounds like a perfectly reasonable series of actions to attempt to get a driver installed and running.

It /sounds/ like what the "make" was doing was building a version of _itself_ to be compatible with the running kernel. The only modifications to the kernel going on would be the "insmod" attaching the driver to it. Done all the time.

I hope you regain access to Kaggle soon, so you can continue your experiments. Based on heuristics of email communications with Kaggle, however, I might suggest you set up under new credentials, rather than waiting for a human to see your plea...[/QUOTE]

I already set up another account using my secondary email and another phone number, so it is crunching away already. As soon as my first account works I'll play around more with GPUs on it and I'll definitely remove any extra info like profile pictures (which are known to cause issues on kaggle and I saw a lot of people reporting on the forum). Currently I am testing the speed on mprime with kaggle (using 2 CPU cores for PRP), but so far the results are disappointing and I might stop doing it in favor of using a GPU.

The CPUs on kaggle are genuinely not impressive with its low clock (about 2.5GHz compared to the newest Cascade Lake's 2.7 or higher) and doesn't have avx512 support, and it takes a significant amount of time to finish one exponent with just one instance (about 30 days or so running 24/7), and also it's really hard to manage 10 datasets of 10 different instances despite the ability to run 10 commits in parallel. I guess I'm just completely giving up on the CPU instances as they are kind of a waste of time for these large workloads.

Prime95 2019-10-14 03:47

I recommend all scripts that launch mprime use the -d command line argument.

xx005fs 2019-10-14 04:35

[QUOTE=Prime95;527952]I recommend all scripts that launch mprime use the -d command line argument.[/QUOTE]
Thx a lot, I didn't know this option existed and now I am able to see all the stats instead of relying on primenet's progress report and I can see exactly how many ms/it it's doing :)


Update: The speed is actually not half bad. I suppose it's around the same of a quad core FX or dual core sandy bridge CPU. With 10 instances running (kaggle's upper CPU instance limit) the power will be significant. Approximately 14.577GHzDays/day according to my calculation, which means 10 of them would equal a tweaked Vega64:tu:

[CODE]
[Work thread Oct 14 04:34] Iteration: 1725671 / 93280081 [1.84%].
[Work thread Oct 14 04:35] Iteration: 1730000 / 93280081 [1.85%], ms/iter: 21.797, ETA: 23d 02:18
[Work thread Oct 14 04:39] Iteration: 1740000 / 93280081 [1.86%], ms/iter: 21.759, ETA: 23d 01:16
[Work thread Oct 14 04:43] Iteration: 1750000 / 93280081 [1.87%], ms/iter: 21.469, ETA: 22d 17:51[/CODE]

xx005fs 2019-10-14 04:42

Dual Colab Instances on 1 Account
 
Has anyone attempted to run 2 colab instances on one account without interrupting the other one? that is running 1 on CPU workload and another on GPU. If it's indeed possible then I could always attempt to run some mprime alongside GPUOWL.


All times are UTC. The time now is 22:43.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.