mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2016-01-15, 21:36   #1
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

5×31×41 Posts
Default Keeping cuda working over Ubuntu upgrades

After a few automatic upgrades and a reboot, nvidia-smi is telling me

Code:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
and indeed 'lsmod | grep nv' gives no output.

Presumably I need to cause the driver to get rebuilt against the new kernel version, but I can't see how you do that.
fivemack is offline   Reply With Quote
Old 2016-01-15, 21:46   #2
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

22·2,281 Posts
Default

I've had the same problem for years while I was using my home workstation for both work and cuda. (I don't anymore; easier to run cuda computations on the cloud and keep home computer lightly loaded... and in Windows so the kids can use it)

What I've gathered is that NVIDIA makes this non-automatable deliberately. I still have to type 'accept' every time I install
Code:
# ssh into a new EC2 node
sudo yum -y update
sudo yum -y install tcsh wget bc perl unzip gcc gcc-c++ openssh-clients diffutils gmp-devel kernel-devel-`uname -r`

wget http://us.download.nvidia.com/XFree86/Linux-x86_64/358.16/NVIDIA-Linux-x86_64-358.16.run
sudo sh ./NVIDIA-Linux-x86_64-358.16.run
Batalov is offline   Reply With Quote
Old 2016-01-15, 22:15   #3
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2,039 Posts
Default

Quote:
Originally Posted by Batalov View Post
I still have to type 'accept' every time I install
Code:
sudo sh ./NVIDIA-Linux-x86_64-358.16.run -s
Add -s as above.
frmky is offline   Reply With Quote
Old 2016-01-15, 22:29   #4
Mark Rose
 
Mark Rose's Avatar
 
"/X\(β€˜-β€˜)/X\"
Jan 2013

32×317 Posts
Default

Are you using dkms? It's used to recompile modules on kernel upgrades.
Mark Rose is offline   Reply With Quote
Old 2016-01-15, 23:29   #5
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

2·32·569 Posts
Default

Quote:
Originally Posted by fivemack View Post
Presumably I need to cause the driver to get rebuilt against the new kernel version, but I can't see how you do that.
Move to Gentoo. It all "just works"

Well, it does for me anyway.

Paul
xilman is offline   Reply With Quote
Old 2016-01-16, 11:58   #6
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

18D316 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
Are you using dkms? It's used to recompile modules on kernel upgrades.
I believe I'm using dkms, but all it see it doing is deleting old versions of the module when I do apt-get autoremove to clean up the huge pile of old kernels filling my unreasonably-small /boot partition.
fivemack is offline   Reply With Quote
Old 2016-01-16, 12:11   #7
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

11000110100112 Posts
Default

Quote:
Originally Posted by Batalov View Post
I've had the same problem for years while I was using my home workstation for both work and cuda. (I don't anymore; easier to run cuda computations on the cloud and keep home computer lightly loaded... and in Windows so the kids can use it)
Don't you find running CUDA computations on the cloud expensive? I'm paying probably Β£200/year for electricity for the GTX580, though I suppose a g2.2xlarge at spot price is 5p/hour so that's only a factor two.
fivemack is offline   Reply With Quote
Old 2016-01-16, 12:54   #8
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

635510 Posts
Default

Installing on a fresh machine is basically fine.

But I'm now in a situation where nvidia-sim can't find the device, and downloading the .deb and doing 'sudo apt-get install cuda' just tells me 'cuda is already the newest version'.

sudo apt-get purge cuda; sudo apt-get install cuda also does very little

Code:
sudo apt-get remove cuda
sudo apt-get install cuda
re-downloads Java and half of X11, and still leaves me in a situation where nvidia-smi can't find the device.

I'll try again using the run-file that nvidia ship; after a reboot (I really would prefer a solution with no reboots - this is a compute node, I aim to have twelve gnfs-lasieve jobs running 24/365) I get a new exciting unhelpful message

Code:
pumpkin@pumpkin:~$ nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
fivemack is offline   Reply With Quote
Old 2016-01-16, 14:08   #9
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

635510 Posts
Default

On further examination, the card has fallen off the PCIe bus entirely: lspci | grep -i nv returns nothing. Maybe Ubuntu is not to blame.
fivemack is offline   Reply With Quote
Old 2016-01-16, 16:37   #10
chris2be8
 
chris2be8's Avatar
 
Sep 2009

2·33·5·7 Posts
Default

Quote:
Originally Posted by Batalov View Post
What I've gathered is that NVIDIA makes this non-automatable deliberately. I still have to type 'accept' every time I install
Code:
# ssh into a new EC2 node
sudo yum -y update
sudo yum -y install tcsh wget bc perl unzip gcc gcc-c++ openssh-clients diffutils gmp-devel kernel-devel-`uname -r`

wget http://us.download.nvidia.com/XFree86/Linux-x86_64/358.16/NVIDIA-Linux-x86_64-358.16.run
sudo sh ./NVIDIA-Linux-x86_64-358.16.run
Read up on Expect. I used the perl expect module to automate setting a new user's password. There's also a program called expect which is easier to call from a shell script (other parts of the user setup script were already in perl so that was the obvious choice in my case).

You only need to install expect on the system you are connecting to the new node from, it can automate responses to a SSH session.

Chris
chris2be8 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Firefox copies Microsoft, makes downgrades into necessary upgrades jasong jasong 5 2017-11-22 14:11
get msieve1.52 working with CUDA 7 Anyone Msieve 22 2015-11-16 17:40
need help setting up CUDA drivers on Ubuntu 10.04 mdettweiler GPU Computing 9 2013-07-29 09:56
Keeping relations fivemack Factoring 1 2009-01-26 17:49
Keeping the Heat down at Home outlnder Hardware 61 2003-02-15 03:12

All times are UTC. The time now is 10:16.

Sat Sep 19 10:16:21 UTC 2020 up 9 days, 7:27, 0 users, load averages: 1.84, 1.82, 1.69

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.