mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   CUDA driver disappeared after patch installation + kernel update (https://www.mersenneforum.org/showthread.php?t=17579)

Graff 2012-12-18 17:54

CUDA driver disappeared after patch installation + kernel update
 
Unbelievable. I had a running CUDA installation on one of my machine
for about two days. Then I installed some patches and a new Linux kernel.
After rebooting, mfatkc didn't run. After digging around a bit, I tried
nvidia-smi -a and got:

FATAL: Module nvidia not found.
NVIDIA: failed to load the NVIDIA kernel module.
NVIDIA-SMI has failed because it couldn't communicate with NVIDIA driver. Make sure that latest NVIDIA driver is installed and running.

!!!! It was there before I installed the new kernel.

# dir /dev/nvidiactl
dir: cannot access /dev/nvidiactl: No such file or directory

I assume I need to reinstall CUDA. Has anyone else experienced this?

Gareth

ckdo 2012-12-18 19:11

Happens with every kernel update on Ubuntu. :bangheadonwall:

Dubslow 2012-12-18 19:14

For Ubuntu at least, there's a repository you can add that will auto-update drivers via the package manager. I'll try and find it.

Edit: [url]http://www.ubuntuupdates.org/ppa/ubuntu-x-swat[/url]
[code]sudo add-apt-repository ppa:ubuntu-x-swat/x-updates
sudo apt-get update
sudo apt-get install nvidia-current[/code]

Graff 2012-12-18 19:54

[QUOTE=Dubslow;322004]For Ubuntu at least, there's a repository you can add that will auto-update drivers via the package manager. I'll try and find it.

Edit: [url]http://www.ubuntuupdates.org/ppa/ubuntu-x-swat[/url]
[code]sudo add-apt-repository ppa:ubuntu-x-swat/x-updates
sudo apt-get update
sudo apt-get install nvidia-current[/code][/QUOTE]

Thanks. That worked. My GPU is functioning again. The installed driver
is now 304.64, which is older than the 310.19 I was running earlier. But
I guess that isn't a problem.
I assume I'll need to do all three commands after each kernel "upgrade"?

Gareth

henryzz 2012-12-18 20:24

[QUOTE=Graff;322007]Thanks. That worked. My GPU is functioning again. The installed driver
is now 304.64, which is older than the 310.19 I was running earlier. But
I guess that isn't a problem.
I assume I'll need to do all three commands after each kernel "upgrade"?

Gareth[/QUOTE]

Just the last one should do.
The first command was adding a repository. The second was downloading the lists from that repository.

Dubslow 2012-12-19 00:25

[QUOTE=Graff;322007]Thanks. That worked. My GPU is functioning again. The installed driver
is now 304.64, which is older than the 310.19 I was running earlier. But
I guess that isn't a problem.
I assume I'll need to do all three commands after each kernel "upgrade"?

Gareth[/QUOTE]

Not quite.

`sudo apt-get upgrade` will tell your package manager to update all the packages on your system.

If you're in Ubuntu (I'm guessing you are) then the graphical interface is called "Update Manager". Any updates to the kernel, drivers, or anything else, will be handled by the Update Manager. Since the drivers are now managed by the package system, you shouldn't need to do anything at all after any kernel upgrades, since those are also handled by the package manager. Any updates to the driver itself will appear on the list of packages that need updating, whenever the Update Manager pops up like that.

In other (simpler) words, the drivers are now a part of the same system that updates the kernel, and all other installed packages on your system. It will make sure that all packages work after any updates.

Graff 2012-12-27 21:52

[QUOTE=Dubslow;322025]If you're in Ubuntu (I'm guessing you are) then the [/QUOTE]

Yes, Ubuntu 12.04 LTS.

[QUOTE=Dubslow;322025]
In other (simpler) words, the drivers are now a part of the same system that updates the kernel, and all other installed packages on your system. It will make sure that all packages work after any updates.[/QUOTE]

Thanks for that info, I hope that is the case.

I've just had to restart both of my GPU-equipped machines and upon reboot
mfaktc fails to run on both machines:

[CODE]./mfaktc.exe
mfaktc v0.19 (64bit built)
...
CUDA version info
binary compiled for CUDA 4.20
CUDA runtime version 0.0
CUDA driver version 4350.57
ERROR: CUDA runtime version must match the CUDA toolkit version used during compile!
[/CODE]

nvidia-smi -a tells me:

[CODE]NVIDIA: could not open the device file /dev/nvidiactl (No such file or directory).
NVIDIA-SMI has failed because it couldn't communicate with NVIDIA driver. Make sure that latest NVIDIA driver is installed and running.[/CODE]

Attempting to reinstall the driver tells me I already have the latest
driver. lshw indicates that the GPU card is using the nvidia driver:

[CODE]configuration: driver=nvidia latency=0[/CODE]

printenv | grep cuda shows the correct entries in LD_LIBRARY_PATH
and PATH.

So what am I missing? Why isn't my CUDA setup being maintained
across reboots/power cycles? Is this really normal behavior????

Gareth

Graff 2012-12-27 21:58

[QUOTE=Graff;322855]So what am I missing? Why isn't my CUDA setup being maintained
across reboots/power cycles? Is this really normal behavior????
[/QUOTE]

Just tried another reboot. This time ran the nvidia-smi -a command with sudo.
Normal output resulted! Was able to get mfaktc running.
Will now try this on the other machine.

Same thing, no joy until I ran sudo nvidia-smi -a.

Gareth

chalsall 2012-12-27 22:08

[QUOTE=Graff;322855]So what am I missing? Why isn't my CUDA setup being maintained
across reboots/power cycles? Is this really normal behavior????[/QUOTE]

Sadly, yes.

I very recently had a similar situation. I upgraded the kernel on one of my CentOS-64 installations, and suddenly mfaktc failed.

Trying to access [URL="http://www.nvidia.com/object/unix.html"]the nVidia Unix drivers[/URL] via lynx and wget failed.

Thankfully I had another system I could use, and a flash drive, on site. So I was able to download the latest driver and run the installation script which compiled the latest driver against the just installed kernel.

This would be funny if it wasn't so sad....

Dubslow 2012-12-27 22:10

Huh, yes that is really bizzare behavior. Another thing to try is `sudo apt-get upgrade`, though I'm not sure that would help.

I have no idea why the drivers seem to disappear, or why `sudo nvidia-smi -a` would fix it (but not without the sudo).

chalsall 2012-12-28 03:01

[QUOTE=Dubslow;322859]Huh, yes that is really bizzare behavior. Another thing to try is `sudo apt-get upgrade`, though I'm not sure that would help.[/QUOTE]

Actually, that's not really bizzare.

The nVidia drivers are proprietary code. So you are suppose to download them yourself each and every time. Then run the script to recompile the driver against your current kernel.

Welcome to freedom... Even though you payed for all the hardware, you still have to jump through hoops to run said hardware using free software....


All times are UTC. The time now is 13:37.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.