mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-06-27, 00:16   #837
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

2·467 Posts
Default

Quote:
Originally Posted by Jayder View Post
Did you try lowering GPUSieveSize in mfakto.ini? That was what made the big difference for me.
Thank you, I'll try that as soon as the current TF run on mfakto 0.12 is finished.

I'll also play around with the values that kracker suggested on the previous page, see if and how they affect the screen lag. I'd still like to wring out as much throughput as I can (while keeping the system within acceptable usability).

Rodrigo
Rodrigo is offline   Reply With Quote
Old 2013-06-27, 01:52   #838
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Oh boy, it seems I was away from this thread for a year, I'll try to read up ...
Quote:
Originally Posted by Rodrigo View Post
More observations on version 0.13:

Often (not all the time), I've noticed a distinct lag when repositioning windows, opening websites, or even moving the mouse cursor around the screen -- especially just after sending a file to the printer. The mouse cursor lag is particularly pronounced as it seems to just sit there after I move the mouse, and then the cursor pops up way off somewhere else, making it difficult to select or click on things on the screen.

None of this happens with version 0.12.

The shine is coming off a bit from 0.13. As this is a production machine, I'm going back to 0.12 for the time being, even though Prime95 takes a hit and the TF yield is lower even with three mfakto instances running.

Anybody else getting this lag? If I can get rid of it, it'll make 0.13 viable again for me.

Rodrigo
Yes, I do see a similar screen lag, it is even more pronounced on older (like HD5xxx) GPUs. The reason is that the GPU sieve schedules kernels to run very tightly, and these kernels each take a number of milliseconds to complete. When the OS schedules something like "reposition and redraw the mouse pointer", it gets enqueued, but the already-enqueued mfakto kernels will complete first.

As Jayder correctly mentioned, lowering GPUSieveSize can help a lot as this will schedule fewer kernels in advance.

However, a 7770 is about the border of when it makes sense to use the GPU sieve. Meaning, you can set SieveOnGPU=0, and return to CPU sieving. You should then use two or three instances, just as you did with 0.12. And you will still have the advantage of the new and faster kernels. Everything that was possible with 0.12 is also possible with 0.13, just faster .

On my 5770 I also stick to CPU sieving, because it is much faster on VLIW5 GPUs (135 GHz GPU-sieve, 180 GHz CPU sieve on 3 cores), and the screen is more responsive. For GCN-based GPUs such as yours, there will not be such a big speedup by going to CPU sieve, but ~5-10% should be possible with two or three CPU cores.
Bdot is offline   Reply With Quote
Old 2013-06-27, 02:18   #839
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by LaurV View Post
That is the first "Bingo!" related to my problem. Thanks. I am glad I started to read the topic viceversa! (from the newest posts to the oldest). The discussion is not necessarily connected, but it can provide a faster solution, hehe.

What drivers are recommended? (Win7 64 bits)

Still reading.
Hi LaurV,

thanks for helping out in the AMD world, and sorry it does not work out the way it should ...

In order to investigate the failed selftest, could you please send me (or post) the mfakto.ini file you're using? I did test different values, but certainly not all

For your card, these values should provide a performance of within 2-3% of the optimum:

VectorSize=2
GPUSievePrimes=110000
GPUSieveSize=128
GPUSieveProcessSize=24

And they also work on similar GPUs. Could you please give this a try?

As for the --perftest: as of now, this is only testing the CPU sieve and the host-to-device memory copy. Nothing that would be used when GPU sieving. Coming up next ...

GPUSievePrimes from the ini file will just be used as a basis for the number to be used. There are quite a few requirements to meet, but this will be done automatically: GPUSievePrimes will be adjusted up or down a bit in order to find a suitable number. Therefore I'm really curious what errors you discovered. Software bug is most likely here, though I need to find out in which software. Unfortunately the driver also belongs to the suspects.

mfakto reports a line like
device (driver) version OpenCL 1.2 AMD-APP (1124.2) (1124.2 (VM))
what does it list for you? Or best, list here the whole header that mfakto write - this will include the important ini variables as well.

BTW, AMD also has available all the old drivers, like here for Win7/64.

Last fiddled with by Bdot on 2013-06-27 at 02:20
Bdot is offline   Reply With Quote
Old 2013-06-28, 09:14   #840
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

2×3×1,609 Posts
Default

"Cleaning the house" (that was a very good tool! Thanks a lot!) solved my problem with bad tests. Maybe some nVidia remnants of the drivers kicked the card in the butt first time. If I find again a reproducible situation, I will post it, for sure (I mean, not random things which are most probably related to heat and OC).

After a cleaning, I put 12.10 then now 13.6 (the beta one from AMD site) and things are much better. Comparing, 13.6 uses more CPU (still has the same "bug" or "feature", I am not convinced which is true, like the stable version 13.4 has, therefore giving a speed penalty when Prime95 is running), but per assembly is a bit faster than 12.10, and the speed difference can be seen when "scrypting" (like 650KH/s, instead of 560).

For mfakto, the card gets a good performance. It still stays around 400GHzD/D, and the computer is still usable. I have reached almost the same values like you posted, by experimenting, with the difference that my SievePrimes went lower, not higher than the default (I only experimented on the lower side, because my impression was that a lower value gives a higher speed, it seemed to me at the time that your implementation of exponentiation is much better then your implementation of the sieving ). But I will give a try with your values, and post a result within today.

Last fiddled with by LaurV on 2013-06-28 at 09:48 Reason: s/then/than (grrr! again!)
LaurV is online now   Reply With Quote
Old 2013-06-28, 09:39   #841
NormanRKN
 
NormanRKN's Avatar
 
Jul 2012
Saarland / Germany

1048 Posts
Default

hi,

I use the same settings:

VectorSize=2
GPUSievePrimes=110000
GPUSieveSize=128
GPUSieveProcessSize=24

with driverversion 13.1 and there is no cpu-bug.
NormanRKN is offline   Reply With Quote
Old 2013-06-28, 10:06   #842
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

2·3·1,609 Posts
Default

It is not a "bug" (therefore the quotes). If you followed the discussion: the last catalysts (13.x), use the CPU a bit more, therefore there is a speed difference between the case when Prime95 is running in background (therefore requesting more CPU) and the case when the CPU is free. This difference can go as high as 5-10%, or higher, if you have mfakto running with low priority too (as P95 is running low priority, if you launch mfakto with normal priority, you don't see the speed difference in mfakto, but you see it in P95, like your time per iteration goes from 22ms to 26ms or so, because mfakto is stealing CPU clocks from P95; otoh, if they both have low priority, they share the CPU clocks, and as P95 uses the core most of the time, you don't see decreasing performance of p95, but you see decreasing performance of mfakto. This is quite normal, it is not a bug, but older versions of the drivers do not exhibit this behavior, or not so much, as people here say. I did not test, I am beginner in AMD GPU world).

Given a CPU core, you can occupy it by a worker of P95, or by an instance of mfakto. You can not have both running with full speed, on the same core.

For driver versions below 12.10 (and included), the speed difference is smaller. Also for some cards, 12.10 is faster. Not for my card.

Example, with fictive numbers (no time to look for real numbers, but that is the idea):
Code:
drv:             12.10 or 13.1               |               13.4 or 13.6
P95:        running    not running  average  |   running     not running  average
some card:    399          401         400   |    395           403         399
my card:      398          400         399   |    392           408         400
edit2: So, if you do not run p95 in that computer (or other applications demanding a lot of CPU power) then it may be better (or not) to go for the last drivers, to get more mfakto performance, but if you do run P95, former drivers may be better (or not)...

Last fiddled with by LaurV on 2013-06-28 at 10:20 Reason: grr, again formatting tables!
LaurV is online now   Reply With Quote
Old 2013-06-28, 16:29   #843
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Maybe just me, but the versions with the cpu "bug"* floods one core per session completely, while older ones who don't have the "bug" uses 0-1% usually.
kracker is offline   Reply With Quote
Old 2013-06-28, 22:09   #844
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

3A616 Posts
Default

Quote:
Originally Posted by kracker View Post
I pushed GPUSievePrimes around 10k up from default, GPUSieveProcessSize to 24 from 16.
I'm trying these settings for 0.13 right now on the same TF exponent I'd tested before, and am not getting the screen lag (so far). Nor is Prime95 affected.

Throughput for TF is at 145 GHz-days/day, which is actually a wee bit higher than when I started on 0.13. But not as high as what kracker is reporting (160).

I haven't tried any of the other suggested adjustments to the settings.

Rodrigo

Last fiddled with by Rodrigo on 2013-06-28 at 22:12 Reason: add'l info
Rodrigo is offline   Reply With Quote
Old 2013-06-28, 22:17   #845
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

11101001102 Posts
Default

Quote:
Originally Posted by Bdot View Post
Oh boy, it seems I was away from this thread for a year, I'll try to read up ...


Yes, I do see a similar screen lag, it is even more pronounced on older (like HD5xxx) GPUs. The reason is that the GPU sieve schedules kernels to run very tightly, and these kernels each take a number of milliseconds to complete. When the OS schedules something like "reposition and redraw the mouse pointer", it gets enqueued, but the already-enqueued mfakto kernels will complete first.

As Jayder correctly mentioned, lowering GPUSieveSize can help a lot as this will schedule fewer kernels in advance.

However, a 7770 is about the border of when it makes sense to use the GPU sieve. Meaning, you can set SieveOnGPU=0, and return to CPU sieving. You should then use two or three instances, just as you did with 0.12. And you will still have the advantage of the new and faster kernels. Everything that was possible with 0.12 is also possible with 0.13, just faster .

On my 5770 I also stick to CPU sieving, because it is much faster on VLIW5 GPUs (135 GHz GPU-sieve, 180 GHz CPU sieve on 3 cores), and the screen is more responsive. For GCN-based GPUs such as yours, there will not be such a big speedup by going to CPU sieve, but ~5-10% should be possible with two or three CPU cores.
Yeah -- as you can see, you can't leave the thread unattended even for a couple of days!

Now that the settings kracker offered are apparently working well, should I also experiment with (concurrently) changing the settings that you and Jayder suggested?

Rodrigo
Rodrigo is offline   Reply With Quote
Old 2013-06-28, 23:05   #846
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

2·467 Posts
Default

Lowered GPUSieveSize from the default 64 to 48, then tested the same exponent.

This test (70 --> 71) took 0:35:55 for an average 142.42 GHz-days/day, compared to 0:35:19 and 144.89 before changing the GPUSieveSize value. Probably (?) not a significant difference.

Left the adjusted (not default) settings for GPUSievePrimes and GPUSieveProcessSize as indicated above.

Both tests done on mfakto 0.13 x32.

Rodrigo

Last fiddled with by Rodrigo on 2013-06-28 at 23:08 Reason: typo
Rodrigo is offline   Reply With Quote
Old 2013-06-28, 23:22   #847
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by Rodrigo View Post
Yeah -- as you can see, you can't leave the thread unattended even for a couple of days!

Now that the settings kracker offered are apparently working well, should I also experiment with (concurrently) changing the settings that you and Jayder suggested?

Rodrigo
Why not?
kracker is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2718 2021-07-06 18:30
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 07:46.


Mon Aug 2 07:46:29 UTC 2021 up 10 days, 2:15, 0 users, load averages: 1.68, 1.50, 1.41

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.