mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > Cloud Computing

Reply
Thread Tools
Old 2019-12-21, 01:49   #727
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

2·7·47 Posts
Default

Quote:
Originally Posted by kriesel View Post
My first P100 gpuowl session on Google Colab hit the same issue after 8 hours today . I've had many (dozens) of K80 gpuowl P-1 sessions without hitting that ever.
Code:
2019-12-20 23:29:54 colab/TeslaP100 Exception NSt12experimental10filesystem2v17__cxx1116filesystem_errorE: filesystem error: cannot get current path: Transport endpoint is not connected 
2019-12-20 23:29:54 colab/TeslaP100 waiting for background GCDs.. 
2019-12-20 23:29:54 colab/TeslaP100 Bye 
shell-init: error retrieving current directory: getcwd: cannot access parent directories: Transport endpoint is not connected 
shell-init: error retrieving current directory: getcwd: cannot access parent directories: Transport endpoint is not connected
I don't think this is a GPU-related issue. I have had similar disconnects in the last few days on CPU-only instances. It is interesting to note the processing continues, so it appears it is only the backend connection to Google Drive that is failing (and not us purposely getting booted).
PhilF is offline   Reply With Quote
Old 2019-12-21, 02:18   #728
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by PhilF View Post
I don't think this is a GPU-related issue. I have had similar disconnects in the last few days on CPU-only instances. It is interesting to note the processing continues, so it appears it is only the backend connection to Google Drive that is failing (and not us purposely getting booted).
I also think it is generic. We now have at least CUDAPm1, gpuowl, and your cpu instance demonstrating it. In my case, the gpu task halted. That is proven by the !top -d 120 command running, that follows the gpu task in the script, to ensure the cpu background task runs to the full allowed length of the session. And the program itself saying 'Bye" as in goodbye, which is gpuowl's exit notification.
kriesel is offline   Reply With Quote
Old 2019-12-21, 08:00   #729
Fan Ming
 
Oct 2019

1378 Posts
Default

I succeed compiling GMP-ECM 7.0.5dev svn3068 with GPU setting on using CUDA-10.1 in colab instances. Attached is the compiled binary and the Makefile I gengerated.
It seems the make check command failed to work, but I almost knew nothing about linux.
I tested some numbers, for example,
Code:
!echo "520442584599824825523685710600326050921751" | ./ecm -gpu -v -c 2 5e4 5e4
Here is it's output:
Code:
/content/drive/My Drive/gpu-ecm
		GMP-ECM 7.0.5-dev [configured with GMP 6.1.2, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
		Tuned for x86_64/k8/params.h
		Running on cb388feefd42
		Input number is 520442584599824825523685710600326050921751 (42 digits)
		Using MODMULN [mulredc:0, sqrredc:1]
		Computing batch product (of 72115 bits) of primes up to B1=50000 took 1ms
		GPU: will use device 0: Tesla P100-PCIE-16GB, compute capability 6.0, 56 MPs.
		GPU: maxSharedPerBlock = 49152 maxThreadsPerBlock = 1024 maxRegsPerBlock = 65536
		GPU: Using device code targeted for architecture compile_60
		GPU: Ptx version is 60
		GPU: maxThreadsPerBlock = 1024
		GPU: numRegsPerThread = 32 sharedMemPerBlock = 24576 bytes
		GPU: Selection and initialization of the device took 17ms
		Using B1=50000, B2=50004, sigma=3:1608589586-3:1608593169 (3584 curves)
		dF=2, k=1, d=12, d2=1, i0=4166
		Expected number of curves to find a factor of n digits (assuming one exists):
		35	40	45	50	55	60	65	70	75	80
		1920155	6.5e+07	2.4e+09	4.4e+11	Inf	Inf	Inf	Inf	Inf	Inf
		GPU: Block: 32x32x1 Grid: 112x1x1 (3584 parallel curves)
		GPU: factor 94291866932171243501 found in Step 1 with curve 39 (-sigma 3:1608589625)
		GPU: factor 5519485418336288303251 found in Step 1 with curve 304 (-sigma 3:1608589890)
		GPU: factor 94291866932171243501 found in Step 1 with curve 474 (-sigma 3:1608590060)
		GPU: factor 94291866932171243501 found in Step 1 with curve 495 (-sigma 3:1608590081)
		GPU: factor 94291866932171243501 found in Step 1 with curve 664 (-sigma 3:1608590250)
		GPU: factor 94291866932171243501 found in Step 1 with curve 771 (-sigma 3:1608590357)
		GPU: factor 94291866932171243501 found in Step 1 with curve 1169 (-sigma 3:1608590755)
		GPU: factor 94291866932171243501 found in Step 1 with curve 1237 (-sigma 3:1608590823)
		GPU: factor 5519485418336288303251 found in Step 1 with curve 1743 (-sigma 3:1608591329)
		GPU: factor 94291866932171243501 found in Step 1 with curve 2215 (-sigma 3:1608591801)
		GPU: factor 94291866932171243501 found in Step 1 with curve 2919 (-sigma 3:1608592505)
		GPU: factor 94291866932171243501 found in Step 1 with curve 3252 (-sigma 3:1608592838)
		GPU: factor 5519485418336288303251 found in Step 1 with curve 3386 (-sigma 3:1608592972)
		Computing 3584 Step 1 took 346ms of CPU time / 19502ms of GPU time
		Throughput: 183.778 curves per second (on average 5.44ms per Step 1)
		********** Factor found in step 1: 5519485418336288303251
		Found prime factor of 22 digits: 5519485418336288303251
		Prime cofactor 94291866932171243501 has 20 digits
		********** Factor found in step 1: 94291866932171243501
		Found input number N
		Peak memory usage: 13182MB
It seems that factors can be found correctly. Can anybody test this complied binary to check if it works correctly?
Attached Files
File Type: zip ecm-7.0.5dev-linux-with-gpu.zip (1.36 MB, 69 views)
Fan Ming is offline   Reply With Quote
Old 2019-12-22, 15:31   #730
bayanne
 
bayanne's Avatar
 
"Tony Gott"
Aug 2002
Yell, Shetland, UK

22·83 Posts
Default

https://www.mersenneforum.org/showpo...postcount=4518
bayanne is online now   Reply With Quote
Old 2019-12-30, 18:14   #731
bayanne
 
bayanne's Avatar
 
"Tony Gott"
Aug 2002
Yell, Shetland, UK

22·83 Posts
Default

I have been unable to establish a connection for quite a period of time, anyone else having the same problem?
bayanne is online now   Reply With Quote
Old 2019-12-30, 18:17   #732
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

978210 Posts
Default

Quote:
Originally Posted by bayanne View Post
I have been unable to establish a connection for quite a period of time, anyone else having the same problem?
Yup. I haven't been able to get a single instance in the past 24 hours -- previously I was running up to five (three in a single browser (different tabs / Google accounts, the other two SOCKS tunneled to appear to be in the States)).

The GPU72 admin report is also showing that only two people are currently running; normally this is over a dozen or so.

I'm /hoping/ this is temporary, but it certainly possible that the gig is up...
chalsall is offline   Reply With Quote
Old 2019-12-30, 18:35   #733
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

2·7·47 Posts
Default

Quote:
Originally Posted by chalsall View Post
Yup. I haven't been able to get a single instance in the past 24 hours -- previously I was running up to five (three in a single browser (different tabs / Google accounts, the other two SOCKS tunneled to appear to be in the States)).

The GPU72 admin report is also showing that only two people are currently running; normally this is over a dozen or so.

I'm /hoping/ this is temporary, but it certainly possible that the gig is up...
Are you guys referring to CPU connections, or GPU instances? I was able to get 3 CPU instances this morning.
PhilF is offline   Reply With Quote
Old 2019-12-30, 18:44   #734
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

230668 Posts
Default

Quote:
Originally Posted by PhilF View Post
Are you guys referring to CPU connections, or GPU instances? I was able to get 3 CPU instances this morning.
GPU. I haven't actually accepted the offer, but I think I /could/ get a CPU only instance if I wanted it.
chalsall is offline   Reply With Quote
Old 2019-12-30, 18:48   #735
De Wandelaar
 
De Wandelaar's Avatar
 
"Yves"
Jul 2017
Belgium

5×13 Posts
Default

I have the same problem with GPU connections since yesterday.
CPU instances are (until now) OK.
Game over ?
De Wandelaar is online now   Reply With Quote
Old 2019-12-30, 19:00   #736
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

2×7×19×37 Posts
Default

Same here. 2 different devices and different accounts.
Uncwilly is offline   Reply With Quote
Old 2019-12-30, 19:05   #737
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2×67×73 Posts
Default

Quote:
Originally Posted by De Wandelaar View Post
Game over ?
Hmmm... I just got two out of three -- both P100s. One of the tunneled instances (a RPi) was denied. Hmmm...
chalsall is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Alternatives to Google Colab kriesel Cloud Computing 11 2020-01-14 18:45
Notebook enzocreti enzocreti 0 2019-02-15 08:20
Computer Diet causes Machine Check Exception -- need heuristics help Christenson Hardware 32 2011-12-25 08:17
Computer diet - Need help garo Hardware 41 2011-10-06 04:06
Workunit diet ? dsouza123 NFSNET Discussion 5 2004-02-27 00:42

All times are UTC. The time now is 05:45.


Fri Aug 6 05:45:49 UTC 2021 up 14 days, 14 mins, 1 user, load averages: 3.20, 3.02, 2.90

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.