![]() |
|
|
#45 |
|
Dec 2014
3·5·17 Posts |
I moved the rx-480 card to a newer stable machine (has ddr4 so can't be too old)
but got similar unhappy results. mfakto 0.14 does good on the long self test but almost always fails the short selftest. Compiling the source with SDK 3.0 gets 70 of 30,000 long self test fails. This time I also tried compiling with SDK 2.9 but the resulting mfakto ran 30 times slower. I assume this is a software problem and the hardware is OK. |
|
|
|
|
|
#46 | |
|
"David"
Jul 2015
Ohio
51710 Posts |
Quote:
|
|
|
|
|
|
|
#47 |
|
Dec 2014
3·5·17 Posts |
I will try that version.
I forgot to mention the first machine had Windows 7 and the second machine Ubuntu LTS 16 with new AMD driver. |
|
|
|
|
|
#48 | |
|
"Victor de Hollander"
Aug 2011
the Netherlands
49B16 Posts |
AMD has released new drivers that 'fix' power consumption:
http://www.anandtech.com/show/10477/...umption-issues Driver version number is 16.7.1 Quotes from the Anandtech article: Quote:
|
|
|
|
|
|
|
#49 |
|
Aug 2002
21D216 Posts |
We are running the 16.7.2 driver.
Non-WHQL-64Bit-Radeon-Software-Crimson-16.7.2-Win10-Win8.1-Win7-July9.exe Code:
Radeon Settings Version - 2016.0708.1511.25486 Driver Packaging Version - 16.20.1035.1001-160708a-304447E Provider - Advanced Micro Devices, Inc. 2D Driver Version - 8.1.1.1558 Direct3D® Version - 9.14.10.1197 OpenGL® Version - 6.14.10.13441 OpenCL™ Version - 2.0.6.0 AMD Mantle Version - 9.1.10.123 AMD Mantle API Version - 98309 AMD Audio Driver Version - 10.0.0.3 Vulkan Driver Version - 1.2.0 Vulkan API Version - 1.0.17
|
|
|
|
|
|
#50 |
|
Dec 2014
3·5·17 Posts |
Any update "Rocky"?
I went looking for mfakto 0.15pre5 but only found it for windows and my card is currently in a Linux box. |
|
|
|
|
|
#51 | |
|
"David"
Jul 2015
Ohio
11×47 Posts |
Quote:
Even with 0.14 you should be able to force the vector size and set it to GCN |
|
|
|
|
|
|
#52 |
|
"David"
Jul 2015
Ohio
51710 Posts |
I finally got my linux system up and running with the RX480.
Notes: 1. For testing this I am running latest Debian Sid with the 4.7.0 Kernel release configured with the built-in AMDGPU driver and latest linux-firmware polaris firmware, which is a bit ahead of the DKMS driver you'll get from the AMDGPU-PRO package. I was hoping this would mean more stability but alas, that does not seem to be the case. 2. I'm using the mfakto-pre5 branch with my local-memory GPU patch. This is the same build that works flawlessly on my fglrx systems. OpenCL device info interestingly seems to be off, note that clinfo shows the same for the Fury Nano. name Ellesmere (Advanced Micro Devices, Inc.) device (driver) version OpenCL 1.2 AMD-APP (2117.7) (2117.7 (VM)) maximum threads per block 256 maximum threads per grid 16777216 number of multiprocessors 14 (896 compute elements) clock rate 555MHz mfakto self tests are failing 64-68 tests on both the RX480 and Fury Nano in the system, I'm suspecting a problem in the AMDGPU driver. That said, for a 74 bit factor I'm seeing 476.34 GhzDay/Day vs ~600 for the Fury Nano (~20% slower) clLucas tests show the card is pretty decent, performing just 15% slower than a Fury Nano/30% slower than a Fury X and kicking out a 4096K FFT result in less than 5 days. RX480 Iteration 10000 M( 74207281 )C, 0xaa08c91f2f626775, n = 4096K, clLucas v1.04 err = 0.1416 (0:56 real, 5.6221 ms/iter, ETA 115:51:42) Iteration 20000 M( 74207281 )C, 0xa216434787875d0f, n = 4096K, clLucas v1.04 err = 0.1416 (0:57 real, 5.6945 ms/iter, ETA 117:20:16) Iteration 30000 M( 74207281 )C, 0x35b1ad9d5eba82cb, n = 4096K, clLucas v1.04 err = 0.1416 (0:57 real, 5.6536 ms/iter, ETA 116:28:49) Fury Nano (Same Drivers) Iteration 10000 M( 74207281 )C, 0xaa08c91f2f626775, n = 4096K, clLucas v1.04 err = 0.1416 (0:48 real, 4.8029 ms/iter, ETA 98:58:48) Iteration 20000 M( 74207281 )C, 0xa216434787875d0f, n = 4096K, clLucas v1.04 err = 0.1416 (0:49 real, 4.8482 ms/iter, ETA 99:53:59) Iteration 30000 M( 74207281 )C, 0x35b1ad9d5eba82cb, n = 4096K, clLucas v1.04 err = 0.1416 (0:48 real, 4.8154 ms/iter, ETA 99:12:38) Fury X (old fglrx drivers) Iteration 10000 M( 74207281 )C, 0xaa08c91f2f626775, n = 4096K, clLucas v1.04 err = 0.1416 (0:40 real, 3.9611 ms/iter, ETA 81:37:57) Iteration 20000 M( 74207281 )C, 0xa216434787875d0f, n = 4096K, clLucas v1.04 err = 0.1416 (0:39 real, 3.9297 ms/iter, ETA 80:58:24) Iteration 30000 M( 74207281 )C, 0x35b1ad9d5eba82cb, n = 4096K, clLucas v1.04 err = 0.1416 (0:40 real, 3.9166 ms/iter, ETA 80:41:37) clLucas seems to work without any mistakes (residues matched for all of the first DC test I did.) |
|
|
|
|
|
#53 |
|
"David"
Jul 2015
Ohio
10058 Posts |
Sorry to reply to myself, but incase bdot stumbles in here...
So far I've established that there is some kind of corner case in the AMDGPU driver, which is required for the RX480 but also used for other cards. This causes some of the self test cases to fail for any card, even cards that work fine using the fglrx driver. The trick is any given self test does not fail all the time, however not every test seems eligible to fail. For example M53017183 seems prone to failing but does not fail every time. I am currently running 100 self tests overnight so I can compare the results for which tests tend to fail, and then I should be able to run some kernel traces to figure out what magical thread-scheduling or timing glitch is causing the failure and likely file a bug with AMD. |
|
|
|
|
|
#54 |
|
"David"
Jul 2015
Ohio
11×47 Posts |
More information - Out of 300 iterations of 32k tests there only 152 exponents that ever fail, and there are a handful of exponents that fail 100% of the time.
M300050761 M30568231 M45448679 M45588523 M49346867 M52031087 M599501681 M67094119 M71065531 M71115521 M72067427 M74697017 The rest fail some percent of the time between 1 and 100 with an even distribution. I was able to isolate from the trace of M300050761 that the failure appears to be with the OpenCL barriers in the GPU sieve - the failing AMDGPU driver shows that sieving is still happening while the TF is going on and the bit count of sieved candidates varies per execution instead of showing the correct value after properly completing the sieve before the TF step. Enabling trace logging of level 5 on the sieve kernel introduces enough of a delay to ensure that all the self tests pass. I'm looking into the sieve barriers to see if there is an easy workaround. Last fiddled with by airsquirrels on 2016-07-31 at 22:27 |
|
|
|