![]() |
ubuntu-15.10 nvidia-355 driver fails to detect two dissimilar GPUs
[code]
$ nvidia-smi Fri Nov 13 20:08:05 2015 +------------------------------------------------------+ | NVIDIA-SMI 355.11 Driver Version: 355.11 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 970 Off | 0000:02:00.0 Off | N/A | | 0% 27C P0 43W / 160W | 15MiB / 4093MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ [/code] [code] $ lspci | grep -i nvi 02:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1) 02:00.1 Audio device: NVIDIA Corporation GM204 High Definition Audio Controller (rev a1) 03:00.0 VGA compatible controller: NVIDIA Corporation GF110 [GeForce GTX 580] (rev a1) 03:00.1 Audio device: NVIDIA Corporation GF110 High Definition Audio Controller (rev a1) [/code] [code] $ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2014 NVIDIA Corporation Built on Thu_Jul_17_21:41:27_CDT_2014 Cuda compilation tools, release 6.5, V6.5.12 [/code] This nvcc won't compile for compute_10, so I removed the references to that from msieve/b40c/Makefile, but still I get [code] pumpkin@pumpkin:~/msieve-cuda/trunk/X$ time ../msieve -g 0 -np1 "stage1_norm=1e25 0,1000" error (line 71): CUDA_ERROR_NO_DEVICE [/code] Moreover: [code] pumpkin@pumpkin:~/msieve-cuda/trunk/X$ dpkg -l "*nvi*" Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-=================================-=====================-=====================-======================================================================= un libgl1-nvidia-alternatives <none> <none> (no description available) rc nvidia-352 352.41-0ubuntu1 amd64 NVIDIA binary driver - version 352.41 ii nvidia-355 355.11-0ubuntu0~gpu15 amd64 NVIDIA binary driver - version 355.11 un nvidia-common <none> <none> (no description available) un nvidia-compute-profiler <none> <none> (no description available) un nvidia-cuda-debugger <none> <none> (no description available) ii nvidia-cuda-dev 6.5.14-2 amd64 NVIDIA CUDA development files ii nvidia-cuda-doc 6.5.14-2 all NVIDIA CUDA and OpenCL documentation ii nvidia-cuda-gdb 6.5.14-2 amd64 NVIDIA CUDA Debugger (GDB) un nvidia-cuda-profiler <none> <none> (no description available) ii nvidia-cuda-toolkit 6.5.14-2 amd64 NVIDIA CUDA development toolkit un nvidia-driver-binary <none> <none> (no description available) un nvidia-libopencl1 <none> <none> (no description available) un nvidia-libopencl1-352 <none> <none> (no description available) un nvidia-libopencl1-352-updates <none> <none> (no description available) un nvidia-libopencl1-dev <none> <none> (no description available) ii nvidia-opencl-dev:amd64 6.5.14-2 amd64 NVIDIA OpenCL development files un nvidia-opencl-icd <none> <none> (no description available) ii nvidia-opencl-icd-352 352.55-0ubuntu0~gpu15 amd64 NVIDIA OpenCL ICD rc nvidia-opencl-icd-352-updates 352.41-0ubuntu1 amd64 NVIDIA OpenCL ICD un nvidia-opencl-icd-355 <none> <none> (no description available) un nvidia-opencl-profiler <none> <none> (no description available) un nvidia-persistenced <none> <none> (no description available) ii nvidia-prime 0.8.1 amd64 Tools to enable NVIDIA's Prime ii nvidia-profiler 6.5.14-2 amd64 NVIDIA Profiler for CUDA and OpenCL ii nvidia-settings 358.09-0ubuntu0~gpu15 amd64 Tool for configuring the NVIDIA graphics driver un nvidia-settings-binary <none> <none> (no description available) un nvidia-vdpau-driver <none> <none> (no description available) ii nvidia-visual-profiler 6.5.14-2 amd64 NVIDIA Visual Profiler for CUDA and OpenCL [/code] |
If I swap the cards round so the 580 is in the top slot, nvidia-smi only picks up the 580, and still gives me
[code] pumpkin@pumpkin:~/msieve-cuda/trunk/X$ time ../msieve -g 0 -np1 "stage1_norm=1e25 0,1000" error (line 71): CUDA_ERROR_NO_DEVICE [/code] |
You may wish to add msieve to the title.
|
This sounds like a hardware problem. But here are a few things to check in case it's a config problem.
Does lspci show them? What driver does lspci -v show for them? Do they both work with anything else, eg gmp-ecm? Does dmesg or syslog show anything interesting. Does either card work if it's the only card in the system? Is the PSU able to feed both cards? Does the motherboard manual say both slots are suitable for a GPU? Chris |
It's a 1kW PSU; the system has in the recent past worked successfully with two GTX580 cards running simultaneously. However, since then I have reinstalled the OS (previously it was Ubuntu 13.10, now it is 15.10) and replaced one of the GTX580 with a GTX970 (in the same slot). I was not able to get gpgpu to work on the new OS before adding the new card. I suspect this is just a config or driver problem, but I don't know how to attack it.
Output of 'sudo lspci -v' looks OK to me: [code] 02:00.0 VGA compatible controller: NVIDIA Corporation GF110 [GeForce GTX 580] (rev a1) (prog-if 00 [VGA controller]) Subsystem: CardExpert Technology Device 0401 Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at fa000000 (32-bit, non-prefetchable) [size=16M] Memory at b8000000 (64-bit, prefetchable) [size=128M] Memory at c0000000 (64-bit, prefetchable) [size=32M] I/O ports at e000 [size=128] [virtual] Expansion ROM at fb000000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [b4] Vendor Specific Information: Len=14 <?> Capabilities: [100] Virtual Channel Capabilities: [128] Power Budgeting <?> Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Kernel driver in use: nvidia 02:00.1 Audio device: NVIDIA Corporation GF110 High Definition Audio Controller (rev a1) Subsystem: CardExpert Technology Device 0401 Flags: bus master, fast devsel, latency 0, IRQ 17 Memory at fb080000 (32-bit, non-prefetchable) [size=16K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Kernel driver in use: snd_hda_intel 03:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1) (prog-if 00 [VGA controller]) Subsystem: Gigabyte Technology Co., Ltd Device 36bc Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at f8000000 (32-bit, non-prefetchable) [size=16M] Memory at a0000000 (64-bit, prefetchable) [size=256M] Memory at b0000000 (64-bit, prefetchable) [size=32M] I/O ports at d000 [size=128] Expansion ROM at f9000000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Legacy Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [250] Latency Tolerance Reporting Capabilities: [258] L1 PM Substates Capabilities: [128] Power Budgeting <?> Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Capabilities: [900] #19 Kernel driver in use: nvidia [/code] |
I'm running my GTX 970 on openSUSE 13.2 so can't help with driver issues on Ubuntu.
The lspci output suggests both cards are OK from a hardware viewpoint. Here's what lspci shows about my card (in case it helps): [code] 4core:~ # lspci -v -s 01:00 01:00.0 VGA compatible controller: NVIDIA Corporation Device 13c2 (rev a1) (prog-if 00 [VGA controller]) Subsystem: eVga.com. Corp. Device 3978 Flags: bus master, fast devsel, latency 0, IRQ 48 Memory at f6000000 (32-bit, non-prefetchable) [size=16M] Memory at e0000000 (64-bit, prefetchable) [size=256M] Memory at f0000000 (64-bit, prefetchable) [size=32M] I/O ports at e000 [size=128] [virtual] Expansion ROM at f7000000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Legacy Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [250] Latency Tolerance Reporting Capabilities: [258] L1 PM Substates Capabilities: [128] Power Budgeting Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 Capabilities: [900] #19 Kernel driver in use: nvidia Kernel modules: nouveau, nvidia 01:00.1 Audio device: NVIDIA Corporation Device 0fbb (rev a1) Subsystem: eVga.com. Corp. Device 3978 Flags: bus master, fast devsel, latency 0, IRQ 17 Memory at f7080000 (32-bit, non-prefetchable) [size=16K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel [/code] Check output from dmesg and syslog. There might be a clue there. If not all I can suggest is removing one card and and trying to get the other working on it's own. Then swap cards and repeat. Then hope you can get them both going. Sorry I can't help any more. Chris |
[QUOTE=fivemack;416105]If I swap the cards round so the 580 is in the top slot, nvidia-smi only picks up the 580, and still gives me
[code] pumpkin@pumpkin:~/msieve-cuda/trunk/X$ time ../msieve -g 0 -np1 "stage1_norm=1e25 0,1000" error (line 71): CUDA_ERROR_NO_DEVICE [/code][/QUOTE] I missed this earlier. This confirms it's not related to the actual application, but to driver or hardware. Have you confirmed that SLI is disabled in the BIOS if there is a setting for it? In addition to /var/log/syslog I would also check /var/log/X.log and see if X detects both cards. Also, what is the output of ls -l /dev/nv* ? It's possible the driver isn't making the device node for the second card. Since I use the onboard graphics in one system I have to manually make these with a script. |
To try another app, I downloaded the CUDALucas source and built it
[code] pumpkin@pumpkin:~/cudalucas-build/cudalucas-code$ ./CUDALucas -d 0 -threadbench 1 16 5 0 device_number >= device_count ... exiting (This is probably a driver problem) [/code] |
Ended up installing ubuntu-14.04 and then the .deb of drivers distributed by nvidia; that appears to work
|
Maybe 16.04 will work. For something like CUDA support I would guess that the LTS versions are more likely to work.
|
| All times are UTC. The time now is 04:22. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.