mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Linux

Reply
 
Thread Tools
Old 2014-02-14, 21:25   #12
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

25×257 Posts
Default

This is weird, but we think the order in which the GPUs are installed causes this issue.



(Perhaps Linux deals with card order improperly? Windows worked flawlessly.)

We have three GPUs in the system now.

We carefully monitor temperatures and never allow the GPUs to exceed 70°C.

We will report back in a few days with our results.

Xyzzy is offline   Reply With Quote
Old 2014-02-14, 21:48   #13
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

11·347 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
This is weird, but we think the order in which the GPUs are installed causes this issue.



(Perhaps Linux deals with card order improperly? Windows worked flawlessly.)

We have three GPUs in the system now.

We carefully monitor temperatures and never allow the GPUs to exceed 70°C.

We will report back in a few days with our results.

I have had issues with linux not assigning USB devices in the same order during reboot. Is this a similar issue?
EdH is offline   Reply With Quote
Old 2014-02-15, 22:33   #14
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

25·257 Posts
Default

We figured out that (for some reason) the 660 "owns" the "gpu 0" slot no matter what slot we put it in, and even if we do not use it as the primary display, or as a display at all.

(The 660 and TITAN are double-width cards so we can only use them in the top two slots. We have a 430 that is a single-width card that we put into the third slot.)

Now that we have tested the system with the 660 in the slot closest to the CPU everything has worked perfectly.

The 660 was the card that was stalling, and it has not done so since we tried the current order, so we will probably keep things just the way they are.

Xyzzy is offline   Reply With Quote
Old 2014-02-15, 22:39   #15
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

37·263 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
The 660 was the card that was stalling, and it has not done so since we tried the current order, so we will probably keep things just the way they are.
Keep observing...

Just because you change something (or reboot), and it appears to fix the problem, it doesn't necessarily mean the problem is fixed. You need many samples before you can be confident (not sure, but confident).
chalsall is offline   Reply With Quote
Old 2014-02-16, 22:00   #16
blip
 
blip's Avatar
 
Jan 2014

2·73 Posts
Default

I happen to have the same issue with a 590: One or the other of the two cores just hangs after a while. Both do CUDALucas. (Ubuntu 13.10).
blip is offline   Reply With Quote
Old 2014-02-16, 23:00   #17
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

100110000000112 Posts
Default

Quote:
Originally Posted by blip View Post
I happen to have the same issue with a 590: One or the other of the two cores just hangs after a while. Both do CUDALucas. (Ubuntu 13.10).
This suggests strongly that the drivers are at fault, rather than the programs.

No big surprise, really....
chalsall is offline   Reply With Quote
Old 2014-02-16, 23:15   #18
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

202016 Posts
Default

No hangs yet from the system. Our fingers are crossed!

Code:
$ w
 17:14:27 up 2 days,  2:32,  6 users,  load average: 2.45, 2.82, 3.21
Xyzzy is offline   Reply With Quote
Old 2014-02-16, 23:40   #19
blip
 
blip's Avatar
 
Jan 2014

2×73 Posts
Default

Quote:
Originally Posted by chalsall View Post
This suggests strongly that the drivers are at fault, rather than the programs.

No big surprise, really....

Code:
nvidia-smi
Mon Feb 17 00:34:53 2014       
+------------------------------------------------------+                       
| NVIDIA-SMI 5.319.32   Driver Version: 319.32         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 590     Off  | 0000:03:00.0     N/A |                  N/A |
|  0%   91C  N/A     N/A /  N/A |      153MB /  1535MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 590     Off  | 0000:04:00.0     N/A |                  N/A |
| 88%   89C  N/A     N/A /  N/A |      153MB /  1535MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 650     Off  | 0000:09:00.0     N/A |                  N/A |
| 18%   46C  N/A     N/A /  N/A |       46MB /  2047MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0            Not Supported                                               |
|    1            Not Supported                                               |
|    2            Not Supported                                               |
+-----------------------------------------------------------------------------+
(quite hot, I know...)

(Where) Is there a newer/better driver for Ubuntu 13.10? (331.??)
blip is offline   Reply With Quote
Old 2014-02-17, 01:57   #20
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Um... Get it from nvidia's website?
kracker is offline   Reply With Quote
Old 2014-02-17, 08:20   #21
blip
 
blip's Avatar
 
Jan 2014

2·73 Posts
Default

Quote:
Originally Posted by kracker View Post
Um... Get it from nvidia's website?
In my case:
Code:
sudo add-apt-repository ppa:xorg-edgers/ppa
sudo apt-get update
sudo apt-get install nvidia-331

Last fiddled with by blip on 2014-02-17 at 08:21
blip is offline   Reply With Quote
Old 2014-02-19, 17:47   #22
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

25×257 Posts
Default

We need to reboot for a kernel update. So far no problems!

Code:
$ w
 11:47:12 up 4 days, 21:05,  6 users,  load average: 4.10, 4.18, 4.15
Xyzzy is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Trouble restarting large job fivemack Msieve 4 2018-01-04 01:13
assignment restarting prob isaac1204 Information & Answers 2 2017-07-20 17:26
restarting nfs linear algebra cubaq YAFU 2 2017-04-02 11:35
Well hung parliaments davieddy Soap Box 0 2010-08-23 13:43
Stop p95 or llr before restarting? Joshua2 Software 6 2005-05-16 16:36

All times are UTC. The time now is 08:29.


Sat Jul 17 08:29:18 UTC 2021 up 50 days, 6:16, 1 user, load averages: 1.94, 1.71, 1.56

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.