![]() |
|
|
#1255 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
41708 Posts |
Quote:
Code:
m-gs-fulltest m-gs-96-24 m-gs-128-16 m-gs-128-32 |
|
|
|
|
|
|
#1256 |
|
Nov 2010
Germany
59710 Posts |
OK, it turns out that my guess was wrong - bonaire still has the slow (4-cycle) int32 multiplications. I reverted the code base accordingly. Thanks for this test.
It's sad that AMD's versioning of GCN 1.0/1.1/1.2 does not seem to mean anything at all (at least nothing that I could use). |
|
|
|
|
|
#1257 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
Do I understand it right that each instance, when running alone would give ~725GHz, but when starting the other instance the speed drops to ~450GHz per card? In this case I'd say this is AMD's Powertune technology in action. Maybe you can try to use Catalyst Control Center and set the power target to some percent higher and watch if the GHz-d/d output increases accordingly? But careful, if you do that over a longer period of time: the additional heat generation can be significant. I don't have a good explanation why the speed would drop below the 0.14 level, though. Maybe the 15-bit kernels have a bigger share of simple instructions that do not generate so much heat? Edit: which settings did you use for this parallel TF test? Your files suggest you should be using something similar to m-gs-128-32.ini or m-gs-fulltest.ini for maximum performance. Last fiddled with by Bdot on 2014-11-05 at 21:47 |
|
|
|
|
|
|
#1258 | |
|
Nov 2010
Germany
3×199 Posts |
Quote:
To ease the perftest, you can add a number on the mfakto command line for these tests: mfakto -i m-gs-128-32.ini --perftest 2 > testresults/m-gs-128-32.log This number gives something like the number of iterations for each test, default is 10 so that 2 would be ~5 times faster. Edit: Does the driver survive running a TF test using m-gs-128-32.ini? Last fiddled with by Bdot on 2014-11-05 at 21:45 |
|
|
|
|
|
|
#1259 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
I'll try that. Little cosmetic bug?
|
|
|
|
|
|
#1260 | |
|
Apr 2014
528 Posts |
Quote:
|
|
|
|
|
|
|
#1261 |
|
Nov 2010
Germany
3×199 Posts |
|
|
|
|
|
|
#1262 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
I noticed on my system that the GPU throughput was lower when I allow the CPU to go idle and spin down. Keeping one prime95 thread running gives best results for me ... Probably the GPU results are served faster in this mode. Finally, another test would be to run 2 mfakto instances per GPU: between the kernel invocations there is always a bit delay because the CPU has to set up the next one. On a faster card, these gaps may play a bigger role. So you could check if 4 instances give you a total of ~1400GHz ... If not, then check the power thing. |
|
|
|
|
|
|
#1263 |
|
Feb 2012
the Netherlands
2·29 Posts |
Here are results of the 280X, the CPU is a Pentium S775 dual-core (4GB ram).
|
|
|
|
|
|
#1264 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Quote:
That's the max width I have.Also, I ran "regular" TF with same ini for a hour, no crashes. Running --perftest crashes, "2" works... Last fiddled with by kracker on 2014-11-06 at 00:16 |
|
|
|
|
|
|
#1265 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
Please also check out the Catalyst Control Center's Overclocking department (Graphics Overdrive). The safe method is to downclock the core, say from 1070MHz (it that your clock speed?) to 1000MHz, and see what single tests (like "mfakto -i m-cpu-GCN2.ini --perftest" return. If the results are almost unchanged, then you had reached the power limit. If they drop by ~6%, then you had not. The other way to test it would by to increase the power target to +10% and see if the results are better. In case it is not the power target but the GPU temperature, you can try to manually set the fan speed to 80% and check if that yields an improvement. If so, then probably the cooler is not correctly seated (or poor thermal paste does not transfer the heat quickly enough). Another indication of throttling are the fulltest's GPU sieve results. The very first test with the smallest sieve comes out fastest. Normally, way bigger sieve sizes are needed for best performance. It's just that the first test ran at full speed for a longer time. On the other hand, by simple scaling of my HD7950's clock and number of compute elements, you should come out 11% ahead of me. In the GCN and GCN2 tests, this can be observed for the more difficult tasks, like 4000M. For the 2M TF, my card is 20% ahead. It almost looks like your card refuses to go faster than 500GHz-days/day - no matter which kernel. For 2M, they all have the same speed! |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2718 | 2021-07-06 18:30 |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |