mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Cloud Computing (https://www.mersenneforum.org/forumdisplay.php?f=134)
-   -   Google Diet Colab Notebook (https://www.mersenneforum.org/showthread.php?t=24646)

PhilF 2019-12-21 01:49

[QUOTE=kriesel;533306]My first P100 gpuowl session on Google Colab hit the same issue after 8 hours today . I've had many (dozens) of K80 gpuowl P-1 sessions without hitting that ever.[CODE]2019-12-20 23:29:54 colab/TeslaP100 Exception NSt12experimental10filesystem2v17__cxx1116filesystem_errorE: filesystem error: cannot get current path: Transport endpoint is not connected
2019-12-20 23:29:54 colab/TeslaP100 waiting for background GCDs..
2019-12-20 23:29:54 colab/TeslaP100 Bye
shell-init: error retrieving current directory: getcwd: cannot access parent directories: Transport endpoint is not connected
shell-init: error retrieving current directory: getcwd: cannot access parent directories: Transport endpoint is not connected [/CODE][/QUOTE]

I don't think this is a GPU-related issue. I have had similar disconnects in the last few days on CPU-only instances. It is interesting to note the processing continues, so it appears it is only the backend connection to Google Drive that is failing (and not us purposely getting booted).

kriesel 2019-12-21 02:18

[QUOTE=PhilF;533307]I don't think this is a GPU-related issue. I have had similar disconnects in the last few days on CPU-only instances. It is interesting to note the processing continues, so it appears it is only the backend connection to Google Drive that is failing (and not us purposely getting booted).[/QUOTE]I also think it is generic. We now have at least CUDAPm1, gpuowl, and your cpu instance demonstrating it. In my case, the gpu task halted. That is proven by the !top -d 120 command running, that follows the gpu task in the script, to ensure the cpu background task runs to the full allowed length of the session. And the program itself saying 'Bye" as in goodbye, which is gpuowl's exit notification.

Fan Ming 2019-12-21 08:00

1 Attachment(s)
I succeed compiling GMP-ECM 7.0.5dev svn3068 with GPU setting on using CUDA-10.1 in colab instances. Attached is the compiled binary and the Makefile I gengerated.
It seems the make check command failed to work, but I almost knew nothing about linux.
I tested some numbers, for example,
[CODE]!echo "520442584599824825523685710600326050921751" | ./ecm -gpu -v -c 2 5e4 5e4[/CODE]
Here is it's output:
[CODE]/content/drive/My Drive/gpu-ecm
GMP-ECM 7.0.5-dev [configured with GMP 6.1.2, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Tuned for x86_64/k8/params.h
Running on cb388feefd42
Input number is 520442584599824825523685710600326050921751 (42 digits)
Using MODMULN [mulredc:0, sqrredc:1]
Computing batch product (of 72115 bits) of primes up to B1=50000 took 1ms
GPU: will use device 0: Tesla P100-PCIE-16GB, compute capability 6.0, 56 MPs.
GPU: maxSharedPerBlock = 49152 maxThreadsPerBlock = 1024 maxRegsPerBlock = 65536
GPU: Using device code targeted for architecture compile_60
GPU: Ptx version is 60
GPU: maxThreadsPerBlock = 1024
GPU: numRegsPerThread = 32 sharedMemPerBlock = 24576 bytes
GPU: Selection and initialization of the device took 17ms
Using B1=50000, B2=50004, sigma=3:1608589586-3:1608593169 (3584 curves)
dF=2, k=1, d=12, d2=1, i0=4166
Expected number of curves to find a factor of n digits (assuming one exists):
35 40 45 50 55 60 65 70 75 80
1920155 6.5e+07 2.4e+09 4.4e+11 Inf Inf Inf Inf Inf Inf
GPU: Block: 32x32x1 Grid: 112x1x1 (3584 parallel curves)
GPU: factor 94291866932171243501 found in Step 1 with curve 39 (-sigma 3:1608589625)
GPU: factor 5519485418336288303251 found in Step 1 with curve 304 (-sigma 3:1608589890)
GPU: factor 94291866932171243501 found in Step 1 with curve 474 (-sigma 3:1608590060)
GPU: factor 94291866932171243501 found in Step 1 with curve 495 (-sigma 3:1608590081)
GPU: factor 94291866932171243501 found in Step 1 with curve 664 (-sigma 3:1608590250)
GPU: factor 94291866932171243501 found in Step 1 with curve 771 (-sigma 3:1608590357)
GPU: factor 94291866932171243501 found in Step 1 with curve 1169 (-sigma 3:1608590755)
GPU: factor 94291866932171243501 found in Step 1 with curve 1237 (-sigma 3:1608590823)
GPU: factor 5519485418336288303251 found in Step 1 with curve 1743 (-sigma 3:1608591329)
GPU: factor 94291866932171243501 found in Step 1 with curve 2215 (-sigma 3:1608591801)
GPU: factor 94291866932171243501 found in Step 1 with curve 2919 (-sigma 3:1608592505)
GPU: factor 94291866932171243501 found in Step 1 with curve 3252 (-sigma 3:1608592838)
GPU: factor 5519485418336288303251 found in Step 1 with curve 3386 (-sigma 3:1608592972)
Computing 3584 Step 1 took 346ms of CPU time / 19502ms of GPU time
Throughput: 183.778 curves per second (on average 5.44ms per Step 1)
********** Factor found in step 1: 5519485418336288303251
Found prime factor of 22 digits: 5519485418336288303251
Prime cofactor 94291866932171243501 has 20 digits
********** Factor found in step 1: 94291866932171243501
Found input number N
Peak memory usage: 13182MB
[/CODE]
It seems that factors can be found correctly. Can anybody test this complied binary to check if it works correctly?

bayanne 2019-12-22 15:31

[url]https://www.mersenneforum.org/showpost.php?p=533367&postcount=4518[/url]

bayanne 2019-12-30 18:14

I have been unable to establish a connection for quite a period of time, anyone else having the same problem?

chalsall 2019-12-30 18:17

[QUOTE=bayanne;533770]I have been unable to establish a connection for quite a period of time, anyone else having the same problem?[/QUOTE]

Yup. I haven't been able to get a single instance in the past 24 hours -- previously I was running up to five (three in a single browser (different tabs / Google accounts, the other two SOCKS tunneled to appear to be in the States)).

The GPU72 admin report is also showing that only two people are currently running; normally this is over a dozen or so.

I'm /hoping/ this is temporary, but it certainly possible that the gig is up... :sad:

PhilF 2019-12-30 18:35

[QUOTE=chalsall;533772]Yup. I haven't been able to get a single instance in the past 24 hours -- previously I was running up to five (three in a single browser (different tabs / Google accounts, the other two SOCKS tunneled to appear to be in the States)).

The GPU72 admin report is also showing that only two people are currently running; normally this is over a dozen or so.

I'm /hoping/ this is temporary, but it certainly possible that the gig is up... :sad:[/QUOTE]

Are you guys referring to CPU connections, or GPU instances? I was able to get 3 CPU instances this morning.

chalsall 2019-12-30 18:44

[QUOTE=PhilF;533773]Are you guys referring to CPU connections, or GPU instances? I was able to get 3 CPU instances this morning.[/QUOTE]

GPU. I haven't actually accepted the offer, but I think I /could/ get a CPU only instance if I wanted it.

De Wandelaar 2019-12-30 18:48

I have the same problem with GPU connections since yesterday.
CPU instances are (until now) OK.
Game over :sad: ?

Uncwilly 2019-12-30 19:00

Same here. 2 different devices and different accounts.

chalsall 2019-12-30 19:05

[QUOTE=De Wandelaar;533777]Game over :sad: ?[/QUOTE]

Hmmm... I just got two out of three -- both P100s. One of the tunneled instances (a RPi) was denied. Hmmm...


All times are UTC. The time now is 23:01.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.