![]() |
|
|
#1266 | |
|
Nov 2010
Germany
59710 Posts |
Quote:
If "2" works, would "5" or "8" work as well? And can you pls post the results of such a run with the critical tests reduced to a working number? Would increasing TdrDelay help? The perftest dumps the whole test onto the GPU at once - something that the "normal" TF does not do. Sometimes, size matters. Last fiddled with by Bdot on 2014-11-06 at 22:22 |
|
|
|
|
|
|
#1267 | |
|
Apr 2014
1010102 Posts |
Quote:
72bit - 128/32 GCN - 600 each concurrently, 625 individual. 72bit - 128/32 GCN3 - 730 each concurrently, 740 individual. some black square artifacts on screen ... 74bit - 128/32 GCN3 - full tf run concurrently no factor for M70384891 from 2^73 to 2^74 [mfakto 0.15pre5-Win cl_barrett32_76_gs_2] tf(): total time spent: 53m 11.196s (735.87 GHz-days / day) no factor for M70384891 from 2^73 to 2^74 [mfakto 0.15pre5-Win cl_barrett32_76_gs_2] tf(): total time spent: 53m 15.578s (734.86 GHz-days / day) Thou with GCN3 I have some black square artifacts flickering on screen, I'll Try updating Catalyst from 14.3 to 14.9 and see if they still appear... Edit: artifacts go away with 14.9, and Concurrently processing stablizes and windows/screen response is if the CPU is busy (or it could be closing gpuz) and then set to idle.. Nov 06 16:51 | 464 9.8% | 3.775 54m29s | 647.99 80181 0.00% Nov 06 16:51 | 465 9.9% | 3.859 55m38s | 633.88 80181 0.00% IDLE Nov 06 16:52 | 468 10.0% | 3.415 49m11s | 716.30 80181 0.00% Nov 06 16:52 | 473 10.1% | 3.384 48m40s | 722.86 80181 0.00% Nov 06 16:52 | 476 10.2% | 3.392 48m44s | 721.15 80181 0.00% Nov 06 16:51 | 564 12.2% | 3.805 53m28s | 642.88 80181 0.00% Nov 06 16:52 | 569 12.3% | 3.592 50m24s | 681.00 80181 0.00% IDLE Nov 06 16:52 | 576 12.4% | 3.384 47m26s | 722.86 80181 0.00% Nov 06 16:52 | 581 12.5% | 3.393 47m30s | 720.94 80181 0.00% Date Time | class Pct | time ETA | GHz-d/day Sieve Wait Nov 06 16:52 | 585 12.6% | 3.379 47m15s | 723.93 80181 0.00% Edit2: Yep, definely faster with CPU idle (or at least not running P1)..., 10% overclock brings it up to 810 ghz/d each... Edit3: yep it was Prime95 4 cores running P1 that reduced performance, switching to Boinc - Primeboinca with 8 core @ 100% the GPU still have max performance... Last fiddled with by NickOfTime on 2014-11-06 at 23:19 |
|
|
|
|
|
|
#1268 | |
|
Feb 2012
the Netherlands
2×29 Posts |
Quote:
One GPU is 79C, the other 66C. The hotter one runs at 1070mhz (Top), the other at 1000mhz (buttom). MSI Afterburner shows no sign of automatic downclocking. I'm not sure if it will detect it. With TF 71-72 om 70M I can get aĆ high as 430ghz a day. CPU bottleneck? I'm using the latest official drivers. |
|
|
|
|
|
|
#1269 | |
|
"Mr. Meeseeks"
Jan 2012
California, USA
216810 Posts |
Quote:
![]() Thank you, that seems to work! ![]() Running tests now.... FYI they look quite similar to the 260X....
|
|
|
|
|
|
|
#1270 |
|
Romulan Interpreter
Jun 2011
Thailand
32×29×37 Posts |
The top/bottom difference may be more important than the clock difference. Try changing clocks or exchanging cards (swap them) if you can and see what is going on. OTOH, the 80C is not "hot" at all for a HD card (my HD7970, air cooled, is always at this temperature during the day, when the ambient raises to around 28C, and it was like that for years, still working).
|
|
|
|
|
|
#1271 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
|
|
|
|
|
|
#1272 |
|
Nov 2010
Germany
3·199 Posts |
That is an important question: Who would detect it. For my 7950, when I set the power target too low, all tools still show the nominal clock speed. However, mfakto slows down. Therefore I wrote the hocus-pocus about checking out different power limits in CCC.
|
|
|
|
|
|
#1273 |
|
Nov 2010
Germany
3×199 Posts |
These results, again, change a few things - thank you so much. I'll start with the APU results as they are a bit more consistent.
GPU sieving on VLIW architecture is inefficient - CPU sieving (SievePrimes=25000) is 25% faster than GPU sieving (80181). The GPU sieve test suggests the optimal GPUSievePrimes value should be around 45000, but see below. The most important learning here is that I can dump the stand-alone GPU-sieve-speed test. While it may be interesting in some ways, the "sweet spot" of GPUSieveSize vs. SievePrimes that it tries to find is of no value. I assume it's because of a relatively small cache (or rather the cache being shared with the CPU), that running only the sieve kernel is more efficient than the interlocking of sieving and TF in real tests. Therefore "real" tests prefer a smaller GPUSieveSize - 64 MBit seems to be the optimum. After all, who cares how fast the GPU sieving alone is - the combined speed is what counts. I'll create a "default" perftest that only measures the combined speed and skips all the other stuff. The R285 results are a bit confusing, especially when comparing them to your R260x results. First of all, R285 has the slow int32 and slow DP multiplication, so I have to move Tonga into the GCN category - same as Bonaire. However, compared to Bonaire, it seems to have bigger/faster/more efficient caches as the maximum speed is achieved when maxing out GPUSieveSize(128) and GPUSieveProcessSize(32). Bonaire definitely needs GPUSieveProcessSize=24 along with GPUSieveSize=126. The optimal kernel selection also differs sometimes ... Just comparing compute units and clock speed, R285 to R260x should be a factor of 1.82. For small exponents we see 1.9 - 1.95, for larger exponents even 2.4 - 2.55. This may be related to the improved memory interface of the 285. I have not seen real 15GB/s transfer speed from CPU to GPU before ... |
|
|
|
|
|
#1274 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Maybe "personal" kernel ordering might be best at this time? etc, do a quick test of all the kernels on first launch, write them to file and read from it at launch? Just a idea that I have...
Last fiddled with by kracker on 2014-11-07 at 21:52 |
|
|
|
|
|
#1275 |
|
Sep 2014
19 Posts |
I change my computer and my results are much better.
My new platform is Asus Z97-AR with i7 4790k and 2x8GB 2400MHz DDR3. My old mb+cpu limited my 290 card so much... |
|
|
|
|
|
#1276 |
|
Sep 2014
19 Posts |
One question: i noticed that kernel barret32 is faster than barret15. How change the mfakto settings to use "32"?
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2718 | 2021-07-06 18:30 |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |