mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2015-11-13, 20:18   #1
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

13·491 Posts
Default ubuntu-15.10 nvidia-355 driver fails to detect two dissimilar GPUs

Code:
$ nvidia-smi
Fri Nov 13 20:08:05 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 355.11     Driver Version: 355.11         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 970     Off  | 0000:02:00.0     Off |                  N/A |
|  0%   27C    P0    43W / 160W |     15MiB /  4093MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Code:
$ lspci | grep -i nvi
02:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1)
02:00.1 Audio device: NVIDIA Corporation GM204 High Definition Audio Controller (rev a1)
03:00.0 VGA compatible controller: NVIDIA Corporation GF110 [GeForce GTX 580] (rev a1)
03:00.1 Audio device: NVIDIA Corporation GF110 High Definition Audio Controller (rev a1)
Code:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2014 NVIDIA Corporation
Built on Thu_Jul_17_21:41:27_CDT_2014
Cuda compilation tools, release 6.5, V6.5.12
This nvcc won't compile for compute_10, so I removed the references to that from msieve/b40c/Makefile, but still I get

Code:
pumpkin@pumpkin:~/msieve-cuda/trunk/X$ time ../msieve -g 0 -np1 "stage1_norm=1e25 0,1000"
error (line 71): CUDA_ERROR_NO_DEVICE
Moreover:
Code:
pumpkin@pumpkin:~/msieve-cuda/trunk/X$ dpkg -l "*nvi*"
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                              Version               Architecture          Description
+++-=================================-=====================-=====================-=======================================================================
un  libgl1-nvidia-alternatives        <none>                <none>                (no description available)
rc  nvidia-352                        352.41-0ubuntu1       amd64                 NVIDIA binary driver - version 352.41
ii  nvidia-355                        355.11-0ubuntu0~gpu15 amd64                 NVIDIA binary driver - version 355.11
un  nvidia-common                     <none>                <none>                (no description available)
un  nvidia-compute-profiler           <none>                <none>                (no description available)
un  nvidia-cuda-debugger              <none>                <none>                (no description available)
ii  nvidia-cuda-dev                   6.5.14-2              amd64                 NVIDIA CUDA development files
ii  nvidia-cuda-doc                   6.5.14-2              all                   NVIDIA CUDA and OpenCL documentation
ii  nvidia-cuda-gdb                   6.5.14-2              amd64                 NVIDIA CUDA Debugger (GDB)
un  nvidia-cuda-profiler              <none>                <none>                (no description available)
ii  nvidia-cuda-toolkit               6.5.14-2              amd64                 NVIDIA CUDA development toolkit
un  nvidia-driver-binary              <none>                <none>                (no description available)
un  nvidia-libopencl1                 <none>                <none>                (no description available)
un  nvidia-libopencl1-352             <none>                <none>                (no description available)
un  nvidia-libopencl1-352-updates     <none>                <none>                (no description available)
un  nvidia-libopencl1-dev             <none>                <none>                (no description available)
ii  nvidia-opencl-dev:amd64           6.5.14-2              amd64                 NVIDIA OpenCL development files
un  nvidia-opencl-icd                 <none>                <none>                (no description available)
ii  nvidia-opencl-icd-352             352.55-0ubuntu0~gpu15 amd64                 NVIDIA OpenCL ICD
rc  nvidia-opencl-icd-352-updates     352.41-0ubuntu1       amd64                 NVIDIA OpenCL ICD
un  nvidia-opencl-icd-355             <none>                <none>                (no description available)
un  nvidia-opencl-profiler            <none>                <none>                (no description available)
un  nvidia-persistenced               <none>                <none>                (no description available)
ii  nvidia-prime                      0.8.1                 amd64                 Tools to enable NVIDIA's Prime
ii  nvidia-profiler                   6.5.14-2              amd64                 NVIDIA Profiler for CUDA and OpenCL
ii  nvidia-settings                   358.09-0ubuntu0~gpu15 amd64                 Tool for configuring the NVIDIA graphics driver
un  nvidia-settings-binary            <none>                <none>                (no description available)
un  nvidia-vdpau-driver               <none>                <none>                (no description available)
ii  nvidia-visual-profiler            6.5.14-2              amd64                 NVIDIA Visual Profiler for CUDA and OpenCL

Last fiddled with by fivemack on 2015-11-13 at 20:20
fivemack is offline   Reply With Quote
Old 2015-11-13, 20:52   #2
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

13·491 Posts
Default

If I swap the cards round so the 580 is in the top slot, nvidia-smi only picks up the 580, and still gives me
Code:
pumpkin@pumpkin:~/msieve-cuda/trunk/X$ time ../msieve -g 0 -np1 "stage1_norm=1e25 0,1000"
error (line 71): CUDA_ERROR_NO_DEVICE
fivemack is offline   Reply With Quote
Old 2015-11-13, 21:04   #3
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

29×101 Posts
Default

You may wish to add msieve to the title.
Mark Rose is offline   Reply With Quote
Old 2015-11-13, 22:44   #4
chris2be8
 
chris2be8's Avatar
 
Sep 2009

200610 Posts
Default

This sounds like a hardware problem. But here are a few things to check in case it's a config problem.

Does lspci show them? What driver does lspci -v show for them?
Do they both work with anything else, eg gmp-ecm?
Does dmesg or syslog show anything interesting.
Does either card work if it's the only card in the system?

Is the PSU able to feed both cards?
Does the motherboard manual say both slots are suitable for a GPU?

Chris
chris2be8 is offline   Reply With Quote
Old 2015-11-14, 07:57   #5
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

13×491 Posts
Default

It's a 1kW PSU; the system has in the recent past worked successfully with two GTX580 cards running simultaneously. However, since then I have reinstalled the OS (previously it was Ubuntu 13.10, now it is 15.10) and replaced one of the GTX580 with a GTX970 (in the same slot). I was not able to get gpgpu to work on the new OS before adding the new card. I suspect this is just a config or driver problem, but I don't know how to attack it.

Output of 'sudo lspci -v' looks OK to me:

Code:
02:00.0 VGA compatible controller: NVIDIA Corporation GF110 [GeForce GTX 580] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: CardExpert Technology Device 0401
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Memory at b8000000 (64-bit, prefetchable) [size=128M]
        Memory at c0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        [virtual] Expansion ROM at fb000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [b4] Vendor Specific Information: Len=14 <?>
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Kernel driver in use: nvidia

02:00.1 Audio device: NVIDIA Corporation GF110 High Definition Audio Controller (rev a1)
        Subsystem: CardExpert Technology Device 0401
        Flags: bus master, fast devsel, latency 0, IRQ 17
        Memory at fb080000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Kernel driver in use: snd_hda_intel

03:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Gigabyte Technology Co., Ltd Device 36bc
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at f8000000 (32-bit, non-prefetchable) [size=16M]
        Memory at a0000000 (64-bit, prefetchable) [size=256M]
        Memory at b0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at d000 [size=128]
        Expansion ROM at f9000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] #19
        Kernel driver in use: nvidia

Last fiddled with by fivemack on 2015-11-14 at 08:14
fivemack is offline   Reply With Quote
Old 2015-11-14, 16:48   #6
chris2be8
 
chris2be8's Avatar
 
Sep 2009

2·17·59 Posts
Default

I'm running my GTX 970 on openSUSE 13.2 so can't help with driver issues on Ubuntu.

The lspci output suggests both cards are OK from a hardware viewpoint.

Here's what lspci shows about my card (in case it helps):
Code:
4core:~ # lspci -v -s 01:00
01:00.0 VGA compatible controller: NVIDIA Corporation Device 13c2 (rev a1) (prog-if 00 [VGA controller])
        Subsystem: eVga.com. Corp. Device 3978
        Flags: bus master, fast devsel, latency 0, IRQ 48
        Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at f0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        [virtual] Expansion ROM at f7000000 [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting 
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 
        Capabilities: [900] #19
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia

01:00.1 Audio device: NVIDIA Corporation Device 0fbb (rev a1)
        Subsystem: eVga.com. Corp. Device 3978
        Flags: bus master, fast devsel, latency 0, IRQ 17
        Memory at f7080000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel
Check output from dmesg and syslog. There might be a clue there. If not all I can suggest is removing one card and and trying to get the other working on it's own. Then swap cards and repeat. Then hope you can get them both going.

Sorry I can't help any more.

Chris
chris2be8 is offline   Reply With Quote
Old 2015-11-14, 18:31   #7
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

29×101 Posts
Default

Quote:
Originally Posted by fivemack View Post
If I swap the cards round so the 580 is in the top slot, nvidia-smi only picks up the 580, and still gives me
Code:
pumpkin@pumpkin:~/msieve-cuda/trunk/X$ time ../msieve -g 0 -np1 "stage1_norm=1e25 0,1000"
error (line 71): CUDA_ERROR_NO_DEVICE
I missed this earlier. This confirms it's not related to the actual application, but to driver or hardware. Have you confirmed that SLI is disabled in the BIOS if there is a setting for it? In addition to /var/log/syslog I would also check /var/log/X.log and see if X detects both cards.

Also, what is the output of ls -l /dev/nv* ?

It's possible the driver isn't making the device node for the second card. Since I use the onboard graphics in one system I have to manually make these with a script.
Mark Rose is offline   Reply With Quote
Old 2015-11-14, 19:46   #8
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

13·491 Posts
Default

To try another app, I downloaded the CUDALucas source and built it

Code:
pumpkin@pumpkin:~/cudalucas-build/cudalucas-code$ ./CUDALucas -d 0 -threadbench 1 16 5 0

device_number >=  device_count ... exiting
(This is probably a driver problem)
fivemack is offline   Reply With Quote
Old 2015-11-17, 21:40   #9
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

13·491 Posts
Default

Ended up installing ubuntu-14.04 and then the .deb of drivers distributed by nvidia; that appears to work
fivemack is offline   Reply With Quote
Old 2015-11-18, 12:55   #10
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2·41·71 Posts
Default

Maybe 16.04 will work. For something like CUDA support I would guess that the LTS versions are more likely to work.
henryzz is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
NVIDIA TITAN 320.59 driver with cuda 5.5 Manpowre GPU Computing 43 2013-08-22 12:28
Possible problems with nVidia 320.18 driver kladner GPU Computing 0 2013-06-15 15:33
New NVIDIA driver 310.70 - slower mfaktc? Chuck GPU Computing 3 2013-01-10 21:21
Nvidia GPU driver level Chuck GPU Computing 11 2012-08-17 20:27
Nvidia driver problem Sideshow Bob Software 4 2004-02-13 13:39

All times are UTC. The time now is 02:05.

Tue Mar 9 02:05:11 UTC 2021 up 95 days, 22:16, 0 users, load averages: 3.54, 3.13, 2.82

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.