![]() |
1 Attachment(s)
[QUOTE=kracker;386968]Also, I ran "regular" TF with same ini for a hour, no crashes. Running --perftest crashes, "2" works...[/QUOTE]
Can you try regular TF with time per class over 10 seconds (DC range, or higher bitlevels)? If "2" works, would "5" or "8" work as well? And can you pls post the results of such a run with the critical tests reduced to a working number? Would [URL="http://msdn.microsoft.com/en-us/library/windows/hardware/ff569918%28v=vs.85%29.aspx"]increasing TdrDelay[/URL] help? The perftest dumps the whole test onto the GPU at once - something that the "normal" TF does not do. [QUOTE=kracker;386968]:razz: That's the max width I have.[/QUOTE] Sometimes, size matters. |
[QUOTE=Bdot;386950]Before I saw the test results of your x16 vs. x8 PCIex cards I would have said that the bus etc. does not have any influence on mfakto. But seeing the x16 card consistently a few percent ahead of the x8 counterpart suggests it does make a difference.
Do I understand it right that each instance, when running alone would give ~725GHz, but when starting the other instance the speed drops to ~450GHz per card? In this case I'd say this is AMD's Powertune technology in action. Maybe you can try to use Catalyst Control Center and set the power target to some percent higher and watch if the GHz-d/d output increases accordingly? But careful, if you do that over a longer period of time: the additional heat generation can be significant. I don't have a good explanation why the speed would drop below the 0.14 level, though. Maybe the 15-bit kernels have a bigger share of simple instructions that do not generate so much heat? Edit: which settings did you use for this parallel TF test? Your files suggest you should be using something similar to m-gs-128-32.ini or m-gs-fulltest.ini for maximum performance.[/QUOTE] Ok, I removed the gtx690 which was in slot2 and windows had disabled due to a driver error... bringing both card back to 16x and the Same temps. And it fixed the issues overall performance issues... 72bit - 128/32 GCN - 600 each concurrently, 625 individual. 72bit - 128/32 GCN3 - 730 each concurrently, 740 individual. some black square artifacts on screen ... 74bit - 128/32 GCN3 - full tf run concurrently no factor for M70384891 from 2^73 to 2^74 [mfakto 0.15pre5-Win cl_barrett32_76_gs_2] tf(): total time spent: 53m 11.196s (735.87 GHz-days / day) no factor for M70384891 from 2^73 to 2^74 [mfakto 0.15pre5-Win cl_barrett32_76_gs_2] tf(): total time spent: 53m 15.578s (734.86 GHz-days / day) Thou with GCN3 I have some black square artifacts flickering on screen, I'll Try updating Catalyst from 14.3 to 14.9 and see if they still appear... Edit: artifacts go away with 14.9, and Concurrently processing stablizes and windows/screen response is if the CPU is busy (or it could be closing gpuz) and then set to idle.. Nov 06 16:51 | 464 9.8% | 3.775 54m29s | 647.99 80181 0.00% Nov 06 16:51 | 465 9.9% | 3.859 55m38s | 633.88 80181 0.00% IDLE Nov 06 16:52 | 468 10.0% | 3.415 49m11s | 716.30 80181 0.00% Nov 06 16:52 | 473 10.1% | 3.384 48m40s | 722.86 80181 0.00% Nov 06 16:52 | 476 10.2% | 3.392 48m44s | 721.15 80181 0.00% Nov 06 16:51 | 564 12.2% | 3.805 53m28s | 642.88 80181 0.00% Nov 06 16:52 | 569 12.3% | 3.592 50m24s | 681.00 80181 0.00% IDLE Nov 06 16:52 | 576 12.4% | 3.384 47m26s | 722.86 80181 0.00% Nov 06 16:52 | 581 12.5% | 3.393 47m30s | 720.94 80181 0.00% Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Nov 06 16:52 | 585 12.6% | 3.379 47m15s | 723.93 80181 0.00% Edit2: Yep, definely faster with CPU idle (or at least not running P1)..., 10% overclock brings it up to 810 ghz/d each... Edit3: yep it was Prime95 4 cores running P1 that reduced performance, switching to Boinc - Primeboinca with 8 core @ 100% the GPU still have max performance... |
[QUOTE=Bdot;387043]Thank you. These results also show signs of reaching the power limit. In the GCN2 test, cl_barrett15_82 is even faster than cl_barrett15_73/74 - this can only happen because cl_barrett15_82 was run before the others and was executed at higher clock speed.
Please also check out the Catalyst Control Center's Overclocking department (Graphics Overdrive). The safe method is to downclock the core, say from 1070MHz (it that your clock speed?) to 1000MHz, and see what single tests (like "mfakto -i m-cpu-GCN2.ini --perftest" return. If the results are almost unchanged, then you had reached the power limit. If they drop by ~6%, then you had not. The other way to test it would by to increase the power target to +10% and see if the results are better. In case it is not the power target but the GPU temperature, you can try to manually set the fan speed to 80% and check if that yields an improvement. If so, then probably the cooler is not correctly seated (or poor thermal paste does not transfer the heat quickly enough). Another indication of throttling are the fulltest's GPU sieve results. The very first test with the smallest sieve comes out fastest. Normally, way bigger sieve sizes are needed for best performance. It's just that the first test ran at full speed for a longer time. On the other hand, by simple scaling of my HD7950's clock and number of compute elements, you should come out 11% ahead of me. In the GCN and GCN2 tests, this can be observed for the more difficult tasks, like 4000M. For the 2M TF, my card is 20% ahead. It almost looks like your card refuses to go faster than 500GHz-days/day - no matter which kernel. For 2M, they all have the same speed![/QUOTE] Thanks for analysing so closely. I'm kinda confused right know. First of all: the cards run in a 15C ambient room. One GPU is 79C, the other 66C. The hotter one runs at 1070mhz (Top), the other at 1000mhz (buttom). MSI Afterburner shows no sign of automatic downclocking. I'm not sure if it will detect it. With TF 71-72 om 70M I can get aĆ high as 430ghz a day. CPU bottleneck? I'm using the latest official drivers. |
[QUOTE=Bdot;387044]
Would [URL="http://msdn.microsoft.com/en-us/library/windows/hardware/ff569918%28v=vs.85%29.aspx"]increasing TdrDelay[/URL] help? The perftest dumps the whole test onto the GPU at once - something that the "normal" TF does not do. [/QUOTE] Thank you, that seems to work! :smile: [QUOTE=Bdot;387044] Sometimes, size matters.[/QUOTE] Thank you, that seems to work! :smile: Running tests now.... FYI they look quite similar to the 260X.... :redface: |
[QUOTE=Stef42;387047]One GPU is 79C, the other 66C. The hotter one runs at 1070mhz (Top), the other at 1000mhz (buttom).
[/QUOTE] The top/bottom difference may be more important than the clock difference. Try changing clocks or exchanging cards (swap them) if you can and see what is going on. OTOH, the 80C is not "hot" at all for a HD card (my HD7970, air cooled, is always at this temperature during the day, when the ambient raises to around 28C, and it was like that for years, still working). |
2 Attachment(s)
:smile:
|
[QUOTE=Stef42;387047]
MSI Afterburner shows no sign of automatic downclocking. I'm not sure if it will detect it. [/QUOTE] That is an important question: Who would detect it. For my 7950, when I set the power target too low, all tools still show the nominal clock speed. However, mfakto slows down. Therefore I wrote the hocus-pocus about checking out different power limits in CCC. |
[QUOTE=kracker;387116]:smile:[/QUOTE]
These results, again, change a few things - thank you so much. I'll start with the APU results as they are a bit more consistent. GPU sieving on VLIW architecture is inefficient - CPU sieving (SievePrimes=25000) is 25% faster than GPU sieving (80181). The GPU sieve test suggests the optimal GPUSievePrimes value should be around 45000, but see below. The most important learning here is that I can dump the stand-alone GPU-sieve-speed test. While it may be interesting in some ways, the "sweet spot" of GPUSieveSize vs. SievePrimes that it tries to find is of no value. I assume it's because of a relatively small cache (or rather the cache being shared with the CPU), that running only the sieve kernel is more efficient than the interlocking of sieving and TF in real tests. Therefore "real" tests prefer a smaller GPUSieveSize - 64 MBit seems to be the optimum. After all, who cares how fast the GPU sieving alone is - the combined speed is what counts. I'll create a "default" perftest that only measures the combined speed and skips all the other stuff. The R285 results are a bit confusing, especially when comparing them to your R260x results. First of all, R285 has the slow int32 and slow DP multiplication, so I have to move Tonga into the GCN category - same as Bonaire. However, compared to Bonaire, it seems to have bigger/faster/more efficient caches as the maximum speed is achieved when maxing out GPUSieveSize(128) and GPUSieveProcessSize(32). Bonaire definitely needs GPUSieveProcessSize=24 along with GPUSieveSize=126. The optimal kernel selection also differs sometimes ... Just comparing compute units and clock speed, R285 to R260x should be a factor of 1.82. For small exponents we see 1.9 - 1.95, for larger exponents even 2.4 - 2.55. This may be related to the improved memory interface of the 285. I have not seen real 15GB/s transfer speed from CPU to GPU before ... |
[QUOTE=Bdot;387119]... The optimal kernel selection also differs sometimes ...[/QUOTE]
Maybe "personal" kernel ordering might be best at this time? etc, do a quick test of all the kernels on first launch, write them to file and read from it at launch? Just a idea that I have... :razz: |
1 Attachment(s)
I change my computer and my results are much better.
My new platform is Asus Z97-AR with i7 4790k and 2x8GB 2400MHz DDR3. My old mb+cpu limited my 290 card so much... |
One question: i noticed that kernel barret32 is faster than barret15. How change the mfakto settings to use "32"?
|
| All times are UTC. The time now is 23:01. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.