![]() |
|
|
#837 | |
|
Jun 2010
Pennsylvania
2×467 Posts |
Quote:
I'll also play around with the values that kracker suggested on the previous page, see if and how they affect the screen lag. I'd still like to wring out as much throughput as I can (while keeping the system within acceptable usability). Rodrigo |
|
|
|
|
|
|
#838 | |
|
Nov 2010
Germany
10010101012 Posts |
Oh boy, it seems I was away from this thread for a year, I'll try to read up ...
Quote:
As Jayder correctly mentioned, lowering GPUSieveSize can help a lot as this will schedule fewer kernels in advance. However, a 7770 is about the border of when it makes sense to use the GPU sieve. Meaning, you can set SieveOnGPU=0, and return to CPU sieving. You should then use two or three instances, just as you did with 0.12. And you will still have the advantage of the new and faster kernels. Everything that was possible with 0.12 is also possible with 0.13, just faster .On my 5770 I also stick to CPU sieving, because it is much faster on VLIW5 GPUs (135 GHz GPU-sieve, 180 GHz CPU sieve on 3 cores), and the screen is more responsive. For GCN-based GPUs such as yours, there will not be such a big speedup by going to CPU sieve, but ~5-10% should be possible with two or three CPU cores. |
|
|
|
|
|
|
#839 | |
|
Nov 2010
Germany
3·199 Posts |
Quote:
thanks for helping out in the AMD world, and sorry it does not work out the way it should ... In order to investigate the failed selftest, could you please send me (or post) the mfakto.ini file you're using? I did test different values, but certainly not all ![]() For your card, these values should provide a performance of within 2-3% of the optimum: VectorSize=2 GPUSievePrimes=110000 GPUSieveSize=128 GPUSieveProcessSize=24 And they also work on similar GPUs. Could you please give this a try? As for the --perftest: as of now, this is only testing the CPU sieve and the host-to-device memory copy. Nothing that would be used when GPU sieving. Coming up next ... GPUSievePrimes from the ini file will just be used as a basis for the number to be used. There are quite a few requirements to meet, but this will be done automatically: GPUSievePrimes will be adjusted up or down a bit in order to find a suitable number. Therefore I'm really curious what errors you discovered. Software bug is most likely here, though I need to find out in which software. Unfortunately the driver also belongs to the suspects. mfakto reports a line like device (driver) version OpenCL 1.2 AMD-APP (1124.2) (1124.2 (VM)) what does it list for you? Or best, list here the whole header that mfakto write - this will include the important ini variables as well. BTW, AMD also has available all the old drivers, like here for Win7/64. Last fiddled with by Bdot on 2013-06-27 at 02:20 |
|
|
|
|
|
|
#840 |
|
Romulan Interpreter
Jun 2011
Thailand
2·3·1,609 Posts |
"Cleaning the house" (that was a very good tool! Thanks a lot!) solved my problem with bad tests. Maybe some nVidia remnants of the drivers kicked the card in the butt first time. If I find again a reproducible situation, I will post it, for sure (I mean, not random things which are most probably related to heat and OC).
After a cleaning, I put 12.10 then now 13.6 (the beta one from AMD site) and things are much better. Comparing, 13.6 uses more CPU (still has the same "bug" or "feature", I am not convinced which is true, like the stable version 13.4 has, therefore giving a speed penalty when Prime95 is running), but per assembly is a bit faster than 12.10, and the speed difference can be seen when "scrypting" (like 650KH/s, instead of 560). For mfakto, the card gets a good performance. It still stays around 400GHzD/D, and the computer is still usable. I have reached almost the same values like you posted, by experimenting, with the difference that my SievePrimes went lower, not higher than the default (I only experimented on the lower side, because my impression was that a lower value gives a higher speed, it seemed to me at the time that your implementation of exponentiation is much better then your implementation of the sieving ). But I will give a try with your values, and post a result within today.
Last fiddled with by LaurV on 2013-06-28 at 09:48 Reason: s/then/than (grrr! again!) |
|
|
|
|
|
#841 |
|
Jul 2012
Saarland / Germany
10001002 Posts |
hi,
I use the same settings: VectorSize=2 GPUSievePrimes=110000 GPUSieveSize=128 GPUSieveProcessSize=24 with driverversion 13.1 and there is no cpu-bug. |
|
|
|
|
|
#842 |
|
Romulan Interpreter
Jun 2011
Thailand
2·3·1,609 Posts |
It is not a "bug" (therefore the quotes). If you followed the discussion: the last catalysts (13.x), use the CPU a bit more, therefore there is a speed difference between the case when Prime95 is running in background (therefore requesting more CPU) and the case when the CPU is free. This difference can go as high as 5-10%, or higher, if you have mfakto running with low priority too (as P95 is running low priority, if you launch mfakto with normal priority, you don't see the speed difference in mfakto, but you see it in P95, like your time per iteration goes from 22ms to 26ms or so, because mfakto is stealing CPU clocks from P95; otoh, if they both have low priority, they share the CPU clocks, and as P95 uses the core most of the time, you don't see decreasing performance of p95, but you see decreasing performance of mfakto. This is quite normal, it is not a bug, but older versions of the drivers do not exhibit this behavior, or not so much, as people here say. I did not test, I am beginner in AMD GPU world).
Given a CPU core, you can occupy it by a worker of P95, or by an instance of mfakto. You can not have both running with full speed, on the same core. For driver versions below 12.10 (and included), the speed difference is smaller. Also for some cards, 12.10 is faster. Not for my card. Example, with fictive numbers (no time to look for real numbers, but that is the idea): Code:
drv: 12.10 or 13.1 | 13.4 or 13.6 P95: running not running average | running not running average some card: 399 401 400 | 395 403 399 my card: 398 400 399 | 392 408 400 Last fiddled with by LaurV on 2013-06-28 at 10:20 Reason: grr, again formatting tables! |
|
|
|
|
|
#843 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
216810 Posts |
Maybe just me, but the versions with the cpu "bug"* floods one core per session completely, while older ones who don't have the "bug" uses 0-1% usually.
|
|
|
|
|
|
#844 | |
|
Jun 2010
Pennsylvania
16468 Posts |
Quote:
Nor is Prime95 affected.Throughput for TF is at 145 GHz-days/day, which is actually a wee bit higher than when I started on 0.13. But not as high as what kracker is reporting (160). I haven't tried any of the other suggested adjustments to the settings. Rodrigo Last fiddled with by Rodrigo on 2013-06-28 at 22:12 Reason: add'l info |
|
|
|
|
|
|
#845 | |
|
Jun 2010
Pennsylvania
2×467 Posts |
Quote:
![]() Now that the settings kracker offered are apparently working well, should I also experiment with (concurrently) changing the settings that you and Jayder suggested? Rodrigo |
|
|
|
|
|
|
#846 |
|
Jun 2010
Pennsylvania
11101001102 Posts |
Lowered GPUSieveSize from the default 64 to 48, then tested the same exponent.
This test (70 --> 71) took 0:35:55 for an average 142.42 GHz-days/day, compared to 0:35:19 and 144.89 before changing the GPUSieveSize value. Probably (?) not a significant difference. Left the adjusted (not default) settings for GPUSievePrimes and GPUSieveProcessSize as indicated above. Both tests done on mfakto 0.13 x32. Rodrigo Last fiddled with by Rodrigo on 2013-06-28 at 23:08 Reason: typo |
|
|
|
|
|
#847 |
|
"Mr. Meeseeks"
Jan 2012
California, USA
23·271 Posts |
Why not?
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| gpuOwL: an OpenCL program for Mersenne primality testing | preda | GpuOwl | 2718 | 2021-07-06 18:30 |
| mfaktc: a CUDA program for Mersenne prefactoring | TheJudger | GPU Computing | 3497 | 2021-06-05 12:27 |
| LL with OpenCL | msft | GPU Computing | 433 | 2019-06-23 21:11 |
| OpenCL for FPGAs | TObject | GPU Computing | 2 | 2013-10-12 21:09 |
| Program to TF Mersenne numbers with more than 1 sextillion digits? | Stargate38 | Factoring | 24 | 2011-11-03 00:34 |