mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-05-30, 18:51   #1
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

253716 Posts
Default New GPU; new issues...

Hey All. OK, some more guidance requested...

In a very kind and generous gesture, Jerry (AKA flashj) donated to me a brand new MSi Twin Frozr III 580GTX which he had "in inventory".

As I currently only have access to a single machine which can support such kit (a Dell T7500), I swapped out my GTX560 and installed this.

Unfortunately, it did not pass the CUDALucas self test, nor Carl's CUDAmemtest. So I started using Carl's technique of lowering the clocks. First the memory, then the core. Still failing.

Having to leave the site at the end of the day, I gave up and figured I would simply let the card do TF work until I was next in front of the machine.

But... I then discovered that when running mfatkc the machine will quickly hard reset. Like instantly; no kernel panic. Just suddenly a blank screen, and then a reboot (the machine is configured to auto-power up upon power loss-then-restore).

The strange thing is this never has happened with CUDALucas running (for hours), nor Carl's memtest (again, for hours).

When I'm next in front of the machine I'm going to continue to fiddle with the BIOS clock settings, and possibly the voltages (but only slightly). I'm also planning on connecting a USB cable to the UPS which feeds this machine so I can monitor and graph the power consumption. I still think the (1100W) power supply is good -- this behavior occurs regardless of if the CPU is at 100% load, or 0% -- but swapping in another is scheduled.

Also, I will shortly have access to six Dell Power Edge R720s, so I'll have additional machines to test this card (and the GTX 560) in -- I realize that testing with only a single host machine is not ideal.

Any additional thoughts or advice from anyone which might help in this analysis?
chalsall is offline   Reply With Quote
Old 2013-05-30, 19:40   #2
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

32×241 Posts
Default

It sounds like a PSU issue. Exactly, what PSU do you have? No name one or a good one?

Also, if it is a older one, remember that they lose efficiency and output gradually.

Last fiddled with by kracker on 2013-05-30 at 19:40
kracker is offline   Reply With Quote
Old 2013-05-30, 19:49   #3
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

7·1,361 Posts
Default

Quote:
Originally Posted by kracker View Post
It sounds like a PSU issue. Exactly, what PSU do you have? No name one or a good one?

Also, if it is a older one, remember that they lose efficiency and output gradually.
Dell 1100W. And I agree it may be the 12V "rail". That's why I have a new 1400W PSU on order for this machine.

But is strikes me as weird that CUDALucas and Carl's memtest runs for hours without such an issue, while mfaktc won't run for more than five minutes without this (very hard) reset.

As discussed before, different software loads the hardware differently. I'm simply documenting my personal experience, and asking others to point out if they think I've overlooked anything obvious.
chalsall is offline   Reply With Quote
Old 2013-05-30, 19:51   #4
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

32·241 Posts
Default

Quote:
Originally Posted by chalsall View Post
Dell 1100W. And I agree it may be the 12V "rail". That's why I have a new 1400W PSU on order for this machine.

But is strikes me as weird that CUDALucas and Carl's memtest runs for hours without such an issue, while mfaktc won't run for more than five minutes without this (very hard) reset.

As discussed before, different software loads the hardware differently. I'm simply documenting my personal experience, and asking others to point out if they think I've overlooked anything obvious.
Have you tested it and compared how much power they use?

Also, if a 560 worked, and you swapped with a higher end model, and it doesn't work, it's almost 100% the PSU(unless the card is bad)

EDIT: A free card? you lucky b**.... Anyways the power company will want more money from you. Yay!

Last fiddled with by kracker on 2013-05-30 at 19:53
kracker is offline   Reply With Quote
Old 2013-05-30, 20:05   #5
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

100101001101112 Posts
Default

Quote:
Originally Posted by kracker View Post
Have you tested it and compared how much power they use?
Still working on that. The power consumption delta needs to be inferred; it can't be measured directly since we're working with DC and one of the three feeds is from the bus. That's the reason for the UPS data-feed.

Quote:
Originally Posted by kracker View Post
Also, if a 560 worked, and you swapped with a higher end model, and it doesn't work, it's almost 100% the PSU(unless the card is bad)
Agreed. Trying to figure out which of the two most likely possibilities is the truth.

Quote:
Originally Posted by kracker View Post
EDIT: A free card? you lucky b**....
Indeed. Jerry is a very kind gentleman.

Quote:
Originally Posted by kracker View Post
Anyways the power company will want more money from you. Yay!
You have no idea... Barbados has some of the most expensive electricity in the world....

Last fiddled with by chalsall on 2013-05-30 at 20:10 Reason: s/gentlemen/gentleman/
chalsall is offline   Reply With Quote
Old 2013-05-30, 20:42   #6
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

2×11×31 Posts
Default

Try using different PCI-E power connectors. That PSU has (AFAICT) 6 virtual 12V (12VA - 12VF) rails, each with an 18A over-current trip point.
sdbardwick is offline   Reply With Quote
Old 2013-05-30, 20:50   #7
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

7×1,361 Posts
Default

Quote:
Originally Posted by sdbardwick View Post
Try using different PCI-E power connectors. That PSU has (AFAICT) 6 virtual 12V (12VA - 12VF) rails, each with an 18A over-current trip point.
Thanks!

Useful.
chalsall is offline   Reply With Quote
Old 2013-05-30, 21:08   #8
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

7×1,361 Posts
Default

Code:
Iteration 9990000 M( 9999973 )C, 0xbc0245fae77c5faf, n = 576K, CUDALucas v2.05 Alpha err = 0.01660 (0:10 real, 1.0172 ms/iter, ETA 0:00)
M( 9999973 )C, 0x7da6ccf13a866e7f, n = 576K, CUDALucas v2.05 Alpha, estimated total time = 2:48:49
Bad card! Bad!
chalsall is offline   Reply With Quote
Old 2013-05-30, 21:50   #9
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

100101001101112 Posts
Default

Quote:
Originally Posted by sdbardwick View Post
Try using different PCI-E power connectors. That PSU has (AFAICT) 6 virtual 12V (12VA - 12VF) rails, each with an 18A over-current trip point.
Actually, thinking about this a bit more...

The MSi card's box talks about how each PCI-E connector powers a different subsystem.

This could explain why CUDALucas runs for hours (even with errors), while mfaktc doesn't for more than a few minutes.

Thanks again!!!
chalsall is offline   Reply With Quote
Old 2013-05-31, 18:35   #10
TheMawn
 
TheMawn's Avatar
 
May 2013
East. Always East.

11×157 Posts
Default Dell = Bad

This is a dell box? Did the card you replaced come with the box? Note that Dell was notorious for using proprietary fans, headers, plugs, etc.

That's why I replaced my passable Dell machine with a completely new computer I built myself. I wanted a new video card, but I was worried the PSU wouldn't be happy with it (most manufacturers skimp out on power supplies, since it's the hardest to find, least impactful on performance, device). I was worried about replacing the PSU because of the number of things it connects to that it could fry.

They have been known to do things like swap around the 12V, 5V and ground pins which means it immediately fries the entire motherboard if you plug a new PSU into it, or the old PSU into a new board.


Apparently they do less of that now, but it's entirely possible the power supply and video card were in a passionately romantic relationship and one refuses to function without the other. The PCI-E lane actually can supply a small amount of power (AMD's HD 7750 is the fastest card that uses PCI-E power only: it doesn't plug into a PSU) so it's possible that a certain component of the card is not receiving power from the PSU but is getting just enough from the lane to barely function.


Else I have no clue. Just stay away from Dell like I learned to :P
TheMawn is offline   Reply With Quote
Old 2013-06-01, 03:41   #11
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

7·1,361 Posts
Default

Quote:
Originally Posted by chalsall View Post
The MSi card's box talks about how each PCI-E connector powers a different subsystem. This could explain why CUDALucas runs for hours (even with errors), while mfaktc doesn't for more than a few minutes.
Thanks (for a third time) for this idea -- I had stupidly assumed the PSU had a single 12V rail, but that was clearly wrong (and actually printed (in very fine print) on the PSU itself once I looked at it with a magnifying glass). And, indeed, changing which PCI-E power connectors were used fixed the hard crashes with mfaktc.

I still had to bring the memory clock down from 2100 MHz to 2000 MHz (thanks Carl!), but I'm happy to report that today the card survived 10 CUDALucas self tests, six hours of Carl's CUDAmemtest, and (so far) three hours of mfaktc TFing.

I'll still want to run a few CUDALucas DCs to have 99.999% confidence, but right now I'm one very happy camper!!!

Thanks to everyone for the advice (and the testing tools), and to Jerry for his generosity!
chalsall is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
ECM RAM issues yoyo GMP-ECM 7 2018-04-28 05:51
Windows 10 Issues Sulamandora Software 5 2015-08-13 06:47
Mersenne.org Issues SiliconSentry Information & Answers 3 2014-05-21 22:36
AffinityScramble issues willmore Software 9 2009-10-26 20:47
Speed issues... Xyzzy Lounge 42 2003-10-08 01:27

All times are UTC. The time now is 00:04.

Sun Apr 18 00:04:16 UTC 2021 up 9 days, 18:45, 0 users, load averages: 2.35, 2.30, 2.15

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.