mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   New GPU; new issues... (https://www.mersenneforum.org/showthread.php?t=18251)

chalsall 2013-05-30 18:51

New GPU; new issues...
 
Hey All. OK, some more guidance requested...

In a very kind and generous gesture, Jerry (AKA flashj) donated to me a brand new MSi Twin Frozr III 580GTX which he had "in inventory". :fusion:

As I currently only have access to a single machine which can support such kit (a Dell T7500), I swapped out my GTX560 and installed this.

Unfortunately, it did not pass the CUDALucas self test, nor Carl's CUDAmemtest. So I started using Carl's technique of lowering the clocks. First the memory, then the core. Still failing.

Having to leave the site at the end of the day, I gave up and figured I would simply let the card do TF work until I was next in front of the machine.

But... I then discovered that when running mfatkc the machine will quickly hard reset. Like instantly; no kernel panic. Just suddenly a blank screen, and then a reboot (the machine is configured to auto-power up upon power loss-then-restore).

[B]The strange thing is this never has happened with CUDALucas running (for hours), nor Carl's memtest (again, for hours).[/B]

When I'm next in front of the machine I'm going to continue to fiddle with the BIOS clock settings, and possibly the voltages (but only slightly). I'm also planning on connecting a USB cable to the UPS which feeds this machine so I can monitor and graph the power consumption. I still think the (1100W) power supply is good -- this behavior occurs regardless of if the CPU is at 100% load, or 0% -- but swapping in another is scheduled.

Also, I will shortly have access to six Dell Power Edge R720s, so I'll have additional machines to test this card (and the GTX 560) in -- I realize that testing with only a single host machine is not ideal.

Any additional thoughts or advice from anyone which might help in this analysis?

kracker 2013-05-30 19:40

It sounds like a PSU issue. Exactly, what PSU do you have? No name one or a good one?

Also, if it is a older one, remember that they lose efficiency and output gradually.

chalsall 2013-05-30 19:49

[QUOTE=kracker;342011]It sounds like a PSU issue. Exactly, what PSU do you have? No name one or a good one?

Also, if it is a older one, remember that they lose efficiency and output gradually.[/QUOTE]

Dell 1100W. And I agree it may be the 12V "rail". That's why I have a new 1400W PSU on order for this machine.

But is strikes me as weird that CUDALucas and Carl's memtest runs for hours without such an issue, while mfaktc won't run for more than five minutes without this (very hard) reset.

As discussed before, different software loads the hardware differently. I'm simply documenting my personal experience, and asking others to point out if they think I've overlooked anything obvious.

kracker 2013-05-30 19:51

[QUOTE=chalsall;342013]Dell 1100W. And I agree it may be the 12V "rail". That's why I have a new 1400W PSU on order for this machine.

But is strikes me as weird that CUDALucas and Carl's memtest runs for hours without such an issue, while mfaktc won't run for more than five minutes without this (very hard) reset.

As discussed before, different software loads the hardware differently. I'm simply documenting my personal experience, and asking others to point out if they think I've overlooked anything obvious.[/QUOTE]

Have you tested it and compared how much power they use?

Also, if a 560 worked, and you swapped with a higher end model, and it doesn't work, it's almost 100% the PSU(unless the card is bad)

EDIT: A free card? you lucky b**.... Anyways the power company will want more money from you. Yay!

chalsall 2013-05-30 20:05

[QUOTE=kracker;342014]Have you tested it and compared how much power they use?[/QUOTE]

Still working on that. The power consumption delta needs to be inferred; it can't be measured directly since we're working with DC and one of the three feeds is from the bus. That's the reason for the UPS data-feed.

[QUOTE=kracker;342014]Also, if a 560 worked, and you swapped with a higher end model, and it doesn't work, it's almost 100% the PSU(unless the card is bad)[/QUOTE]

Agreed. Trying to figure out which of the two most likely possibilities is the truth.

[QUOTE=kracker;342014]EDIT: A free card? you lucky b**....[/QUOTE]

Indeed. Jerry is a very kind gentleman.

[QUOTE=kracker;342014]Anyways the power company will want more money from you. Yay![/QUOTE]

You have no idea... Barbados has some of the most expensive electricity in the world....

sdbardwick 2013-05-30 20:42

Try using different PCI-E power connectors. That PSU has (AFAICT) 6 virtual 12V (12VA - 12VF) rails, each with an 18A over-current trip point.

chalsall 2013-05-30 20:50

[QUOTE=sdbardwick;342019]Try using different PCI-E power connectors. That PSU has (AFAICT) 6 virtual 12V (12VA - 12VF) rails, each with an 18A over-current trip point.[/QUOTE]

Thanks!

Useful.

chalsall 2013-05-30 21:08

[CODE]Iteration 9990000 M( 9999973 )C, 0xbc0245fae77c5faf, n = 576K, CUDALucas v2.05 Alpha err = 0.01660 (0:10 real, 1.0172 ms/iter, ETA 0:00)
M( 9999973 )C, 0x7da6ccf13a866e7f, n = 576K, CUDALucas v2.05 Alpha, estimated total time = 2:48:49[/CODE]

Bad card! Bad!

chalsall 2013-05-30 21:50

[QUOTE=sdbardwick;342019]Try using different PCI-E power connectors. That PSU has (AFAICT) 6 virtual 12V (12VA - 12VF) rails, each with an 18A over-current trip point.[/QUOTE]

Actually, thinking about this a bit more...

The MSi card's box talks about how each PCI-E connector powers a different subsystem.

This could explain why CUDALucas runs for hours (even with errors), while mfaktc doesn't for more than a few minutes.

Thanks again!!!

TheMawn 2013-05-31 18:35

Dell = Bad
 
This is a dell box? Did the card you replaced come with the box? Note that Dell was notorious for using proprietary fans, headers, plugs, etc.

That's why I replaced my passable Dell machine with a completely new computer I built myself. I wanted a new video card, but I was worried the PSU wouldn't be happy with it (most manufacturers skimp out on power supplies, since it's the hardest to find, least impactful on performance, device). I was worried about replacing the PSU because of the number of things it connects to that it could fry.

They have been known to do things like swap around the 12V, 5V and ground pins which means it immediately fries the entire motherboard if you plug a new PSU into it, or the old PSU into a new board.


Apparently they do less of that now, but it's entirely possible the power supply and video card were in a passionately romantic relationship and one refuses to function without the other. The PCI-E lane actually can supply a small amount of power (AMD's HD 7750 is the fastest card that uses PCI-E power only: it doesn't plug into a PSU) so it's possible that a certain component of the card is not receiving power from the PSU but is getting just enough from the lane to barely function.


Else I have no clue. Just stay away from Dell like I learned to :P

chalsall 2013-06-01 03:41

[QUOTE=chalsall;342029]The MSi card's box talks about how each PCI-E connector powers a different subsystem. This could explain why CUDALucas runs for hours (even with errors), while mfaktc doesn't for more than a few minutes.[/QUOTE]

Thanks (for a third time) for this idea -- I had stupidly assumed the PSU had a single 12V rail, but that was clearly wrong (and actually printed (in [I]very[/I] fine print) on the PSU itself once I looked at it with a magnifying glass). And, indeed, changing which PCI-E power connectors were used fixed the hard crashes with mfaktc.

I still had to bring the memory clock down from 2100 MHz to 2000 MHz (thanks Carl!), but I'm happy to report that today the card survived 10 CUDALucas self tests, six hours of Carl's CUDAmemtest, and (so far) three hours of mfaktc TFing.

I'll still want to run a few CUDALucas DCs to have 99.999% confidence, but right now I'm one very happy camper!!! :fusion:

Thanks to everyone for the advice (and the testing tools), and to Jerry for his generosity! :smile:

chalsall 2013-06-01 03:44

[QUOTE=TheMawn;342146]Just stay away from Dell like I learned to :P[/QUOTE]

Actually, because of my situation, I almost always recommend Dell -- at least here in Bimshire.

I agree their lower end stuff isn't great, but their higher end stuff is pretty good. Also, they are the only supplier here which offers NBD "on-site" support of their kit.

sdbardwick 2013-06-01 04:56

NP, happy to help!

At least we know that Dell implements effective OCP, unlike some other high-wattage "multi-rail" PSUs. The functionality makes sense, as Dell products that start server room fires would not impress corporate customers.

TheMawn 2013-06-02 05:30

I don't know what exactly you mean by high end. If you're talking about servers and things like that, then Dell would probably be high up on my list. They're probably the best looking company with business solutions. I just wasn't happy with the $1600 Desktop I got from them.

For that I got a case that had hardly any air flow, the crap stock heat sink, a SERIOUSLY overvolted processor (i7-920 at 1.520V) that hit 90C before I started running the computer with the side panel off, and a video card which at the time was pretty much the mid-level of graphics. RAM was okay. PSU looked pretty skimpy.

Just not happy.


Anyway, happy to hear your issues are solved. I have a GTX 670 so I don't know what I would do if I had to choose between it and a 580. I'd probably take the old computer out of the basement, drop the GTX 580 into it and get it working.

kladner 2013-06-02 11:29

Congratulations, Chris! Glad you got it worked out. :tu:

flashjh 2013-06-12 13:21

[QUOTE=chalsall;342201]Thanks to everyone for the advice (and the testing tools), and to Jerry for his generosity! :smile:[/QUOTE]
Glad you got it working! I didn't even know this thread was here. That's what I get for moving ;) You're welcome.

kladner 2013-06-12 13:58

[QUOTE=flashjh;343140]Glad you got it working! I didn't even know this thread was here. That's what I get for moving ;) You're welcome.[/QUOTE]

Hey Jerry! I just want to add my thanks to you for your generosity to Chris. It makes me feel good about the community. It is also really fitting that something come back to Chris for all that he has, and does, contribute to the group.

Kudos to the both of you!

chalsall 2013-06-12 19:23

[QUOTE=kladner;343144]Kudos to the both of you![/QUOTE]

Thank you for your kind words klander. :smile:

flashjh 2013-06-12 19:28

Yes, thank you!


All times are UTC. The time now is 13:04.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.