mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2015-01-31, 19:07   #287
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
Originally Posted by LaurV View Post
Take the fan out, open all the screws carefully, see where the label is attached (sticker), take it out to access the fan axis' bearing, put a drop of oil into the hole (it may have a rubber cap, if you are lucky; keep that, is very good, most fans don't have it). Use singer oil for sewing machines, or for fine mechanics, you can buy it everywhere, and if not, then a drop of clear engine oil (the one you use in your car) it is ok. Play a little bit with the fan till the oil is sucked inside. Put back the rubber cap and the label, put back the screws. It does wonders.....
That fan did free up after running for a while.

However, to insert another happy note in this sad thread, besides the cheap 580 I put online recently, yesterday I attacked the mothballed 570. It is the one which had bad contact in the PCIe power connectors. Also on LaurV's advice, I was able to remove the nylon shells from the terminals on the card. This involved a bit of force, so afterwards, I reflowed a couple of cracked solder joints, realigned the pins, and snipped off one or two which ended up sticking out further. With a bit of care, I got the cables connected and they seemed to be very snug when pushed on as far a possible. It is now in the place of the 460. It is a Super OC model, so it comes with a 844 MHz factory OC, compared to the 580's 797 MHz FOC. Consequently, depending on how hard I run each one, the 570 comes within 20-30 GHz-d/d of the 580. Since it is on top, the 580 runs hot anyway: 82-84 C. I wish I could put it on the bottom, but it is too thick, at least in a mid tower. It would be interesting to see how the 3 fan Gigabyte cooler would do in the hot seat.
kladner is offline   Reply With Quote
Old 2015-03-04, 16:36   #288
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

To continue describing the eventual results of a failed GPU: I have now gotten the fans on the EVGA/Zalman-cooled 580 card to run directly from one of the Opt fan/sensor ports on this Asus board. The trick for getting it to function in a reasonable temperature range was finding the right accessible spot to tuck the sensor into. This turned out to be against the cooler base plate where one of the heat pipes emerges. I don't care about the actual temperature, just that the location is hot enough to work in one of the available max temp ranges on the motherboard. The sensor is reporting 55 C under load. Setting the control for 60 C max ran the fans faster than necessary, but the match is just about perfect with the 70 C max range. The card holds right at 72 C, which is ideal.

EDIT: I had been having erratic performance from the second 580. My first thought was of PCIe power connections, but tightening the contacts on the cables made no real difference. However, wiping the contacts which plug into the PCIe slot with contact cleaner seemed to make a temporary bit of difference. I was still suspecting the card itself (MSI GTX 580), until I plugged my faithful GTX 460 into the same slot and got exactly the same error. At this point, suspicion moved to the slot itself, and I sprayed it with contact cleaner. The 580 has been running without problems ever since.

Last fiddled with by kladner on 2015-03-04 at 17:03
kladner is offline   Reply With Quote
Old 2015-03-05, 00:14   #289
TheMawn
 
TheMawn's Avatar
 
May 2013
East. Always East.

6BF16 Posts
Default

If cleaning the contacts temporarily fixes the problem, I am not surprised that cleaning the slot fixed it permanently. I've heard of dust gaining a greasy gooey texture in electronics so that might have been it. Glad it's fixed!
TheMawn is offline   Reply With Quote
Old 2015-04-30, 21:37   #290
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

7·13·47 Posts
Default

About 2 months ago I bought an extra GTX 570 since I had a spare PCI-E slot... And today, not 6 hours after I finish my project, it decides it's done and wants to let out the magic smoke. At 100% GPU it gets real loud and hot and smelly... so for now it's running at 25% duty cycle using CPU sieving (does mfaktc have a "throttle" option I don't know about?).
James Heinrich is offline   Reply With Quote
Old 2015-04-30, 21:49   #291
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
San Diego, Calif.

32×7×163 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
About 2 months ago I bought an extra GTX 570 since I had a spare PCI-E slot... And today, not 6 hours after I finish my project, it decides it's done and wants to let out the magic smoke.
Read Hamlet's monologue to it. maybe it will reconsider.

...who would fardels bear,
To grunt and sweat under a weary life,
But that the dread of something after death,
The undiscover’d country from whose bourn
No traveller returns, puzzles the will
And makes us rather bear those ills we have
Than fly to others that we know not of?
Thus conscience does make cowards of us all.
Batalov is offline   Reply With Quote
Old 2015-05-01, 00:05   #292
only_human
 
only_human's Avatar
 
"Gang aft agley"
Sep 2002

375410 Posts
Default

Or read Macbeth's soliloquy to express the futility of it all. It is one of my favorites:
Quote:
She should have died hereafter;
There would have been a time for such a word.
To-morrow, and to-morrow, and to-morrow,
Creeps in this petty pace from day to day,
To the last syllable of recorded time;
And all our yesterdays have lighted fools
The way to dusty death. Out, out, brief candle!
Life's but a walking shadow, a poor player
That struts and frets his hour upon the stage
And then is heard no more. It is a tale
Told by an idiot, full of sound and fury
Signifying nothing.
— Macbeth (Act 5, Scene 5, lines 17-28)
As a last recourse, the Galaxy Song may provoke the board to donate its components to science.

Last fiddled with by only_human on 2015-05-01 at 00:06 Reason: deuglify link
only_human is offline   Reply With Quote
Old 2015-05-01, 00:15   #293
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
San Diego, Calif.

101000000111012 Posts
Thumbs up

Most people who didn't like Birdman don't know that this soliloquy is recited in its entirety in the largest theater of them all - in the street, near the end of the movie, by a street bum
Batalov is offline   Reply With Quote
Old 2015-05-06, 02:33   #294
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default Not Unhappy, but Weird

So right now I'm running my mid-tower on its side. Why? To keep it short, I found that putting the system upright directly caused my second GPU to crash mfaktc. The second card is an MSI GTX 580 Lightning. One of its features which has caused problems is that it has a very rigid frame bracing the card. Unfortunately, of all the cards I have run in this system, this one is a bit short. I have tried loosening all the MB screws to see if I could gain a bit of slack. I loosened, and then removed the top two screws holding the rear bracket on, and tried to gently flex the bracket closer to the PCI slots.

At some point, I realized that I could get both GPUs running as long as the case was lying down. Apparently, setting it upright puts a different twist on things and pulls on the second card in its slot.

I'd really like to look into a full height, high air flow tower, anyway.

Last fiddled with by kladner on 2015-05-06 at 02:34 Reason: height , ,
kladner is offline   Reply With Quote
Old 2018-01-19, 06:10   #295
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11110100100002 Posts
Default Quadro lost its magic smoke?

I have, one by one, brought up a small fleet of GPUs, many of them Quadro 2000s. To be thorough on the latest one, I set out to test _all possible_ gpu ram on the most recent one, as part of my usual startup batch job, launched to run overnight. It did a steady 26gb/sec read rate, using CUDALucas's memory test, until unexpectedly dropping off to less than 1/10 the speed in the last testable 75MB. Positions 0 through 34 were actually more stable speeds than is usual. But no errors found.

Code:
Position 34, Data Type 1, Iteration 34400000, Errors: 0, completed 90.53%, Read 25.99GB/s, Write 8.66GB/s, ETA 16:55)
Position 34, Data Type 2, Iteration 34500000, Errors: 0, completed 90.79%, Read 25.96GB/s, Write 8.65GB/s, ETA 16:27)
Position 34, Data Type 2, Iteration 34600000, Errors: 0, completed 91.05%, Read 25.98GB/s, Write 8.66GB/s, ETA 15:59)
Position 34, Data Type 3, Iteration 34700000, Errors: 0, completed 91.32%, Read 25.96GB/s, Write 8.65GB/s, ETA 15:31)
Position 34, Data Type 3, Iteration 34800000, Errors: 0, completed 91.58%, Read 25.98GB/s, Write 8.66GB/s, ETA 15:02)
Position 34, Data Type 4, Iteration 34900000, Errors: 0, completed 91.84%, Read 25.98GB/s, Write 8.66GB/s, ETA 14:34)
Position 34, Data Type 4, Iteration 35000000, Errors: 0, completed 92.11%, Read 25.97GB/s, Write 8.66GB/s, ETA 14:06)
Position 35, Data Type 0, Iteration 35100000, Errors: 0, completed 92.37%, Read 2.27GB/s, Write 0.76GB/s, ETA 14:02)
Position 35, Data Type 0, Iteration 35200000, Errors: 0, completed 92.63%, Read 2.27GB/s, Write 0.76GB/s, ETA 13:56)
Position 35, Data Type 1, Iteration 35300000, Errors: 0, completed 92.89%, Read 2.27GB/s, Write 0.76GB/s, ETA 13:49)
Position 35, Data Type 1, Iteration 35400000, Errors: 0, completed 93.16%, Read 2.27GB/s, Write 0.76GB/s, ETA 13:40)
Position 35, Data Type 2, Iteration 35500000, Errors: 0, completed 93.42%, Read 2.27GB/s, Write 0.76GB/s, ETA 13:29)
Position 35, Data Type 2, Iteration 35600000, Errors: 0, completed 93.68%, Read 2.27GB/s, Write 0.76GB/s, ETA 13:16)
Position 35, Data Type 3, Iteration 35700000, Errors: 0, completed 93.95%, Read 2.27GB/s, Write 0.76GB/s, ETA 13:01)
Position 35, Data Type 3, Iteration 35800000, Errors: 0, completed 94.21%, Read 2.27GB/s, Write 0.76GB/s, ETA 12:45)
Position 35, Data Type 4, Iteration 35900000, Errors: 0, completed 94.47%, Read 2.27GB/s, Write 0.76GB/s, ETA 12:27)
Position 35, Data Type 4, Iteration 36000000, Errors: 0, completed 94.74%, Read 2.27GB/s, Write 0.76GB/s, ETA 12:08)
Position 36, Data Type 0, Iteration 36100000, Errors: 0, completed 95.00%, Read 1.42GB/s, Write 0.47GB/s, ETA 11:56)
Position 36, Data Type 0, Iteration 36200000, Errors: 0, completed 95.26%, Read 1.42GB/s, Write 0.47GB/s, ETA 11:42)
Position 36, Data Type 1, Iteration 36300000, Errors: 0, completed 95.53%, Read 1.42GB/s, Write 0.47GB/s, ETA 11:26)
Position 36, Data Type 1, Iteration 36400000, Errors: 0, completed 95.79%, Read 1.42GB/s, Write 0.47GB/s, ETA 11:06)
Position 36, Data Type 2, Iteration 36500000, Errors: 0, completed 96.05%, Read 1.42GB/s, Write 0.47GB/s, ETA 10:44)
Position 36, Data Type 2, Iteration 36600000, Errors: 0, completed 96.32%, Read 1.42GB/s, Write 0.47GB/s, ETA 10:19)
Position 36, Data Type 3, Iteration 36700000, Errors: 0, completed 96.58%, Read 1.42GB/s, Write 0.47GB/s, ETA 9:52)
Position 36, Data Type 3, Iteration 36800000, Errors: 0, completed 96.84%, Read 1.42GB/s, Write 0.47GB/s, ETA 9:22)
Position 36, Data Type 4, Iteration 36900000, Errors: 0, completed 97.11%, Read 1.42GB/s, Write 0.47GB/s, ETA 8:49)
Position 36, Data Type 4, Iteration 37000000, Errors: 0, completed 97.37%, Read 1.42GB/s, Write 0.47GB/s, ETA 8:13)
Position 37, Data Type 0, Iteration 37100000, Errors: 0, completed 97.63%, Read 2.21GB/s, Write 0.74GB/s, ETA 7:31)
Position 37, Data Type 0, Iteration 37200000, Errors: 0, completed 97.89%, Read 2.21GB/s, Write 0.74GB/s, ETA 6:47)
Position 37, Data Type 1, Iteration 37300000, Errors: 0, completed 98.16%, Read 2.21GB/s, Write 0.74GB/s, ETA 6:01)
Position 37, Data Type 1, Iteration 37400000, Errors: 0, completed 98.42%, Read 2.21GB/s, Write 0.74GB/s, ETA 5:14)
Position 37, Data Type 2, Iteration 37500000, Errors: 0, completed 98.68%, Read 2.21GB/s, Write 0.74GB/s, ETA 4:25)
Position 37, Data Type 2, Iteration 37600000, Errors: 0, completed 98.95%, Read 2.21GB/s, Write 0.74GB/s, ETA 3:35)
Position 37, Data Type 3, Iteration 37700000, Errors: 0, completed 99.21%, Read 2.21GB/s, Write 0.74GB/s, ETA 2:43)
Position 37, Data Type 3, Iteration 37800000, Errors: 0, completed 99.47%, Read 2.21GB/s, Write 0.74GB/s, ETA 1:50)
Position 37, Data Type 4, Iteration 37900000, Errors: 0, completed 99.74%, Read 2.21GB/s, Write 0.74GB/s, ETA 0:56)
Position 37, Data Type 4, Iteration 38000000, Errors: 0, completed 100.00%, Read 2.21GB/s, Write 0.74GB/s, ETA 0:00)
Test complete. Total errors: 0.
Next up was a CUDALucas -r test of residues up to 8192k, passing all. (I may move this to later in the sequence in the future, to test the actual as-tuned setup.)

Then CUDALucas ran a full fft benchmark from 1k to 38880k and produced an fft file.

Then CUDALucas ran part of an exhaustive threadbench, with the last logged entry the beginning of 13230K fft length.

The last startup test would have been a retest of the prime 6972593. It didn't get that far.

Not only did the card apparently fail in mid threadbench in the wee hours, it did it spectacularly enough that it downed the box. Apparently it shorted robustly, such that power cycling the box with the gpu still in it generated a beep and blink code for power supply failed! Case fans made only a fraction of a revolution during power up attempts. Resurrecting the box required removing the Quadro and installing a different video card.

I've used this same process to start up the previous several Quadro 2000's & other model GPUs, without incident, and without the drop-off in memory speed seen on this card.

Have you seen anything like that memory speed drop, or box-stopping gpu failure?
kriesel is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Stockfish game: "Move 8 poll", not "move 3.14159 discussion" MooMoo2 Other Chess Games 5 2016-10-22 01:55
"Master" and "helper" threads Madpoo Software 0 2016-09-08 01:27
Aouessare-El Haddouchi-Essaaidi "test": "if Mp has no factor, it is prime!" wildrabbitt Miscellaneous Math 11 2015-03-06 08:17
Would Minimizing "iterations between results file" may reveal "is not prime" earlier? nitai1999 Software 7 2004-08-26 18:12

All times are UTC. The time now is 15:00.


Fri Jul 7 15:00:32 UTC 2023 up 323 days, 12:29, 0 users, load averages: 1.39, 1.20, 1.14

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔