![]() |
|
|
#1 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
22·5·271 Posts |
Tested this gpu to 10 memory blocks of 25 each with no errors when I installed it. Later it seemed unreliable, so I retested it on the full memory width and found about 37 errors in 10 passes. All errors were clumped in the span block 23 to 40 or 42, on multiple runs. I demoted it to factoring. That approach seemed to be working fine for several months. Recently it started showing unspecified launch failure errors and illegal memory access errors and the same inappropriate factor found repeatedly in mfaktc. A quick reinstall of CUDALucas so I could retest its memory, and a 90 minute test using 3 passes was very revealing: an error rate ~300,000 times higher per pass than before. And note, the sharpness of the peaks from block 23 to 40 or 42 are still there. Hardly a block did not have errors today; out of 56 blocks, only 5 were error free today. Total error count in 3 passes about 3.5 MILLION. Yikes. Uninstall, recycle, RIP.
This leaves me wondering if hardware should be formally tested quarterly or if semiannually would be enough. Maybe more frequently if a prior test shows any errors at all. Testing its well-behaved 702Mhz twin installed in the same system produced a total of zero errors again. The total test activity per block per pass works out to a hundred trillion bits written and read. (25MB * 8 bits * 5 patterns * 100,000 iterations) So the 3.5 million errors in 56 blocks and 3 passes is an absolutely unacceptably high bit error rate, of 2E-10 per write/read cycle. https://en.wikipedia.org/wiki/Dynami...and_correction Last fiddled with by kriesel on 2018-06-26 at 00:37 |
|
|
|
|
|
#2 |
|
Aug 2002
North San Diego County
5×137 Posts |
Before trashing that GPU, I'd see if it becomes reliable with lower memory speeds. I'd start by reducing the memory speed to 50% of stock, and increasing it in 10% increments until unstable again. Then fine-tune memory speed from there.
|
|
|
|
|
|
#3 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
100111101011102 Posts |
Look for blown capacitors, too. They are usually pretty obvious when they spew their guts.
|
|
|
|
|
|
#4 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
152C16 Posts |
Thanks. But you're making it sound like I shouldn't replace it with a 3 times as fast, 3/4 the power draw, 16/3 the memory, GTX1080. Or the Quadro 5000 I already have and want to test out. And as I recall, I tested it with ~15% clock reduction back in the few-dozen errors days and it had no effect. We'll see.
Last fiddled with by kriesel on 2018-06-26 at 02:55 |
|
|
|
|
|
#5 |
|
"Marv"
May 2009
near the Tannhäuser Gate
2×7×47 Posts |
Is it summer where you live?
I ALWAYS have to increase the fan speeds as it gets warmer and warmer unitil it sounds like I live on an airport runway. I use MSI afterburner to control my GPUs. It's free ! Try 80-85% fan speed. |
|
|
|
|
|
#6 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
542010 Posts |
With EVGA Precision XOC, I reduced the core clock and memory clock as far as it allowed; core went from 725.6 to 423.4, memory from 950 to 699 Mhz as confirmed by GPU-Z. For test speed since the previous test had plenty of error count, I reduced it from 3 passes to 1. GPU temperature dropped from high 90s to 79C. Average bit error rate went up, from 2.1E-10 to 3.3E-10. The low and high portions of the memory range had zero error, while the middle worsened. Blocks 0-22 and 44-55 had zero error, plus a couple near the middle (31, 32).
GTX480s seem to run their fans full out regardless of load. This is not its first Wisconsin summer, and my air conditioning works. Last fiddled with by kriesel on 2018-06-26 at 04:40 |
|
|
|
|
|
#7 |
|
Feb 2016
UK
22×109 Posts |
Quite an old card? Those temps are a bit higher than I'd like. Maybe worth giving it a clean, refresh the paste/thermal pads? Although only the core temp is reported, other parts may be getting warm also.
|
|
|
|
|
|
#8 |
|
"Victor de Hollander"
Aug 2011
the Netherlands
23×3×72 Posts |
Yikes!
On the other hand, a good excuse to buy a new shiny GTX1080 ! |
|
|
|
|
|
#9 | |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
152C16 Posts |
Quote:
The temperature ratings vary a lot. The GTX480 is the highest I've seen, at 105C. Quite different from GTX10xx. It has had the cover off for a good cleaning. And the errors were as or more frequent at 79C than 99C. |
|
|
|
|
|
|
#10 |
|
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
22×5×271 Posts |
I posted a table of temperature limit data for several gpus at
http://www.mersenneforum.org/showpos...11&postcount=2 Last fiddled with by kriesel on 2018-06-26 at 15:33 |
|
|
|
|
|
#11 | |
|
Just call me Henry
"David"
Sep 2007
Cambridge (GMT/BST)
16FE16 Posts |
Quote:
|
|
|
|
|