mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > Cloud Computing

Reply
Thread Tools
Old 2019-12-16, 19:00   #705
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

543710 Posts
Default

Quote:
Originally Posted by kriesel View Post
Now I'm repeatedly getting no GPU at all. On the account from which I made many unsuccessful tries to get a K80:
Code:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Meanwhile the other account I'm using has no trouble getting a P100.
Today:
the one where I need a K80 to finish benchmarking gets a P100;
the one where I don't care if it issues a P100, can't get any gpu. (I had started setting that account up for P100 since it was reliably getting gpus, usually P100, and the other one had been getting nothing)

cpu-only colab now for me.
kriesel is offline   Reply With Quote
Old 2019-12-16, 19:12   #706
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2·67·73 Posts
Default

Quote:
Originally Posted by kriesel View Post
cpu-only colab now for me.
Hmmm... This morning I spun up three instances. Got two P100s and a T4 (at ~50% throughput).

It seems Google, like many gods, works in mysterious ways...
chalsall is offline   Reply With Quote
Old 2019-12-16, 19:33   #707
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

2·19·101 Posts
Default

And, I got a K80 right away. . .
EdH is offline   Reply With Quote
Old 2019-12-16, 19:37   #708
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

2×7×47 Posts
Default

On Kaggle, all of my CPU instances for the last few days have been Skylake Xeons. Up until then those were rare, with Haswell Xeons being the norm.
PhilF is offline   Reply With Quote
Old 2019-12-16, 19:48   #709
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by chalsall View Post
It seems Google, like many gods, works in mysterious ways...
If at all. Like a bureaucracy.

Last fiddled with by kriesel on 2019-12-16 at 19:48
kriesel is offline   Reply With Quote
Old 2019-12-16, 21:25   #710
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

2·11·67 Posts
Default

Quote:
Originally Posted by kriesel View Post

cpu-only colab now for me.
I hear you!

I started playing with colab using chalsall GPU72 TF notebook. Everything went well for some days, then I started getting some occasional "No GPU backend available" messages. Later I setup the notebook to run mprime and began running ECM, quitting using GPUs. Overtime, I added several instances, using some gmail accounts I have access to, and for several weeks I have been running 4 colab instances (CPU only, mprime doing ECM on small exponents). Always getting 12 hours, no disconnects whatsoever, always a TPU available upon any session restart. A couple of days ago, I threw some GPU instances into the mix (more precisely, on two of the accounts I am using I added a GPU instance). It wasn´t very long until I started getting the "no GPU backend available" message again.
Today, on top of that and for the very first time, I am getting also "No TPU available" on the accounts where I was concurrently running a GPU instance. The other two accounts, that have just run TPUs, don´t suffer from that problem.
So yes, looks like our friends at Google are becoming fed up with us overusing their precious GPUs...
I will return to CPU only instances and see what happens (if and when I regain access to them, that is). Oh, well, we shouldn´t complain, did we?

Last fiddled with by lycorn on 2019-12-16 at 22:06
lycorn is offline   Reply With Quote
Old 2019-12-16, 22:55   #711
Dylan14
 
Dylan14's Avatar
 
"Dylan"
Mar 2017

22·5·29 Posts
Default

Introducing tf1G.py v0.11:
The major change to the script this time is the addition of the min and max exponent variables, which are allowable parameters on mersenne.ca. In addition, some sanity checks were added.
Attached Files
File Type: txt tf1G.py.txt (9.2 KB, 64 views)
Dylan14 is offline   Reply With Quote
Old 2019-12-17, 09:18   #712
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by lycorn View Post
I hear you!
...
Today, on top of that and for the very first time, I am getting also "No TPU available" on the accounts where I was concurrently running a GPU instance. The other two accounts, that have just run TPUs, don´t suffer from that problem.
So yes, looks like our friends at Google are becoming fed up with us overusing their precious GPUs...
I will return to CPU only instances and see what happens (if and when I regain access to them, that is). Oh, well, we shouldn´t complain, did we?
I remind myself that it's for free, and to be grateful it's ever available.
What do you run on the TPUs? I'm not aware of any GIMPS use for them.
If running mprime only, I select "NONE", to leave the TPUs available for others to use.
The good news is that about an hour ago, I got K80s on both accounts. So they do still exist! It will take a few more such sessions on one of the accounts to finish out the current exponent's benchmark in P-1.
kriesel is offline   Reply With Quote
Old 2019-12-17, 13:27   #713
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

2·11·67 Posts
Default

On the TPUs (which are actually CPUs) I run ECM factoring on exponents < 1M.

Today I got them available again so I´m back to 4 instances CPU-only. I´m pretty sure I won´t have any more problems (unless, of course, the Google gods/godesses decide it´s time to cut the resources altogether).
lycorn is offline   Reply With Quote
Old 2019-12-17, 20:55   #714
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

5,437 Posts
Default

Quote:
Originally Posted by lycorn View Post
On the TPUs (which are actually CPUs) I run ECM factoring on exponents < 1M.

Today I got them available again so I´m back to 4 instances CPU-only. I´m pretty sure I won´t have any more problems (unless, of course, the Google gods/godesses decide it´s time to cut the resources altogether).
Google says that TPUs are ASICs.
https://cloud.google.com/tpu/docs/tpus "Tensor Processing Units (TPUs) are Google’s custom-developed application-specific integrated circuits (ASICs) used to accelerate machine learning workloads. TPUs are designed from the ground up with the benefit of Google’s deep experience and leadership in machine learning."
kriesel is offline   Reply With Quote
Old 2019-12-17, 22:50   #715
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

147410 Posts
Default

Quote:
Originally Posted by kriesel View Post
Google says that TPUs are ASICs.
https://cloud.google.com/tpu/docs/tpus "Tensor Processing Units (TPUs) are Google’s custom-developed application-specific integrated circuits (ASICs) used to accelerate machine learning workloads. TPUs are designed from the ground up with the benefit of Google’s deep experience and leadership in machine learning."
That is true, but all my colab instances CPUs are identified as Intel Xeon @ 2.30GHz Linux64 on My Account->CPUs option from mersenne.org menu. Next time I use it I´ll select "no accelerator" and see what happens. It won´t probably change anything, meaning the "Intel Xeon @ 2.30GHz" reported is simply the VM´s CPU made available on each session.
lycorn is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Alternatives to Google Colab kriesel Cloud Computing 11 2020-01-14 18:45
Notebook enzocreti enzocreti 0 2019-02-15 08:20
Computer Diet causes Machine Check Exception -- need heuristics help Christenson Hardware 32 2011-12-25 08:17
Computer diet - Need help garo Hardware 41 2011-10-06 04:06
Workunit diet ? dsouza123 NFSNET Discussion 5 2004-02-27 00:42

All times are UTC. The time now is 05:45.


Fri Aug 6 05:45:52 UTC 2021 up 14 days, 14 mins, 1 user, load averages: 3.18, 3.02, 2.90

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.