mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfakto: an OpenCL program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=15646)

kracker 2012-01-11 15:42

[QUOTE=Bdot;285871]Thanks for this info! Could you please also post the OpenCL device info part as mfakto reports it? If I can easily figure out we're running on Llano, then I can enable a zero-memory-copy optimization, that should increase GPU utilisation by ~10% when only a single instance is running (and by a small amount for multi-instance).[/QUOTE]

Ok, here it is.
(btw, it only uses about 65-80% of my gpu when I run 1 instance... that might be normal ?)

OpenCL device info
name BeaverCreek (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 1.1 AMD-APP (851.4) (CAL 1.4.1646 (VM))
maximum threads per block 256
maximum threads per grid 16777216
number of multiprocessors 5 (400 compute elements (estimate for ATI GPUs))
clock rate 600MHz

Dubslow 2012-01-11 15:52

[QUOTE=kracker;285896]
(btw, it only uses about 65-80% of my gpu when I run 1 instance... that might be normal ?)[/QUOTE]
That depends on what CPU you're using. I'll try and link you to an old post that I have no intention of rewriting from scratch.



[offtopic]Welcome to the GPU to 72 team! Except it seems you haven't actually gotten work from the [url=gpu.mersenne.info]tool[/url]. You can find more info the GPU to 72 subforum somewhere around here. Happy crunching![/offtopic]

KyleAskine 2012-01-11 16:10

[QUOTE=kracker;285896]
(btw, it only uses about 65-80% of my gpu when I run 1 instance... that might be normal ?)
[/QUOTE]

It is probably the issue that Bdot mentioned above. You should be at around 90% or so with one instance in my opinion, since it is obvious that your CPU can sieve way faster than your GPU can process.

kracker 2012-01-11 16:47

Ah, ok thanks :)
Oh, and also I was wondering is there any way to reduce the priority of it? I have to pause it every time I do a gpu-intensive program or game, Thanks :)

(P.S.: Is there a way to automatically pull assignments? Right now I realized I'll have to manually get more once it gets done. :whistle:)

Dubslow 2012-01-11 21:07

[QUOTE=kracker;285903]Ah, ok thanks :)
Oh, and also I was wondering is there any way to reduce the priority of it? I have to pause it every time I do a gpu-intensive program or game, Thanks :)
[/QUOTE]

Try using a batch file; you can set CPU affinity and priority in the command to start mfakto. Or you can change it via Task Manager after it's already running. (Sadly I have yet to find a decent post on the GPU usage thing.)
Edit: As a holdover: [QUOTE=Dubslow;274571]The CPU wait indicates how long the CPU is waiting for work. If it's greater than 1000, than the CPU is waiting a lot, which means the GPU is overwhelmed. Sieve Primes controls how much work is done on the CPU; that's why the program auto-adjusted that up to 200,000 (the default is 25,000, and 5,000 is the minimum).[/QUOTE]

[QUOTE=kracker;285903]
(P.S.: Is there a way to automatically pull assignments? Right now I realized I'll have to manually get more once it gets done. :whistle:)[/QUOTE]
That is being worked on at the moment, unfortunately not ready yet.

Bdot 2012-01-11 22:32

Thanks for the device info, I'll put that on the wishlist ;-)

[QUOTE=kracker;285903]Ah, ok thanks :)
Oh, and also I was wondering is there any way to reduce the priority of it? I have to pause it every time I do a gpu-intensive program or game, Thanks :)
[/QUOTE]

Adding to Dubslow's comment:

While you can lower the priority as mentioned (but not built-in), it may not result in what you want. The reason is, that the priority setting applies to the CPU part only. On the GPU there is no such thing as priorities - it's all round-robin on the same level. mfakto tries to keep 5 blocks (tasks) in the GPU-queue, which can make the UI laggy, if window-movements for instance have to wait until these 5 tasks are processed.

You can try two settings in [B]mfakto.ini[/B]:
[B]GridSize [/B]defines of how many factor candidates one block will consist. Lowering this value should already increase responsiveness a lot at the expense of a little more CPU overhead.
[B]NumStreams[/B] is the number of blocks being scheduled. Lowering to 3 or 2 causes other tasks to be served quicker, but mfakto will have a smaller buffer to cover fluctuations in available CPU power.

BTW, the relatively low GPU utilization can also occur if the CPU cores are rather busy. Sometimes the auto-adjusting of the SievePrimes value is confused if there is no CPU available to serve the GPU queue: the time it took to get the required CPU power is then wrongly interpreted as CPU idle time waiting for the GPU to finish. Try setting [B]SievePrimesAdjust=0[/B] and [B]SievePrimes=100000[/B] (to be tested what is good). Alternatively, set up two copies of mfakto to run in parallel. Then they can cover each other's gaps in GPU utilisation.

James Heinrich 2012-01-11 23:30

Can I request any [i]mfakto[/i] users help me out with some benchmark data. I want to update my [url=http://mersenne-aries.sili.net/mfaktc.php]mfaktc table[/url] to include AMD GPUs as well, but I need some data to base it on. Please send me benchmarks on a wide variety of GPUs (very old to very new, and very slow to very fast) so that I can get as accurate a picture of how GFLOPS scales into GHz-days/day performance across the various products. For each GPU, I need the following 4 bits of data of [b]a single running instance[/b] (even if you normally run multiple instances, please just run one for this test):[list=1][*]GPU model (including clockspeed if overclocked)[*]assignment (exponent, from-bits, to-bits)[*]wall clock runtime[*]average GPU usage[/list]If you want to include the CPU model/speed and SievePrimes values as well that's interesting, but not required.

Please PM or email me the results as opposed to posting in this thread. I'll post back when I have enough data to make a reasonable chart.

KyleAskine 2012-01-12 03:44

[QUOTE=James Heinrich;285993]Can I request any [i]mfakto[/i] users help me out with some benchmark data. I want to update my [url=http://mersenne-aries.sili.net/mfaktc.php]mfaktc table[/url] to include AMD GPUs as well, but I need some data to base it on. Please send me benchmarks on a wide variety of GPUs (very old to very new, and very slow to very fast) so that I can get as accurate a picture of how GFLOPS scales into GHz-days/day performance across the various products. For each GPU, I need the following 4 bits of data of [b]a single running instance[/b] (even if you normally run multiple instances, please just run one for this test):[list=1][*]GPU model (including clockspeed if overclocked)[*]assignment (exponent, from-bits, to-bits)[*]wall clock runtime[*]average GPU usage[/list]If you want to include the CPU model/speed and SievePrimes values as well that's interesting, but not required.

Please PM or email me the results as opposed to posting in this thread. I'll post back when I have enough data to make a reasonable chart.[/QUOTE]

Why isn't SievePrimes required data? If the GPU is the bottleneck, a different SievePrimes number will affect wall clock runtime, but not the other three variables.

Put another way: I can always lower sieve primes and destroy my wall clock performance by increasing the number of candidates, but the reason performance is bad would be missed by your metrics, since the GPU usage would be the same.

I will try to get you some values tomorrow. I have 6950's modded with shaders unlocked, and a 6570 I can hopefully get you tomorrow!

James Heinrich 2012-01-12 13:38

[QUOTE=KyleAskine;286016]Why isn't SievePrimes required data?[/QUOTE]It's nice to have so I can see if I expect any given benchmark to be on the high or low side, but I don't need it per se for the calculations. There's enough (too much!) variance in the data (based on what I've seen of mfaktc data) that it doesn't really make much difference overall, my chart will just provide [i]rough[/i] guidelines, +/-10% at best.

KyleAskine 2012-01-12 14:00

[QUOTE=James Heinrich;286044]It's nice to have so I can see if I expect any given benchmark to be on the high or low side, but I don't need it per se for the calculations. There's enough (too much!) variance in the data (based on what I've seen of mfaktc data) that it doesn't really make much difference overall, my chart will just provide [i]rough[/i] guidelines, +/-10% at best.[/QUOTE]

I have no idea, and this could be way off target, but why don't you just get the M/s value and the GPU usage? Wouldn't that be the easiest benchmark to get? Or does that not account for everything?

James Heinrich 2012-01-12 15:27

[QUOTE=James Heinrich;285993]I want to update my [url=http://mersenne-aries.sili.net/mfaktc.php]mfaktc table[/url] to include AMD GPUs as well[/QUOTE]I have some rough data now, enough to at least put up the chart. I may refine it slightly as I get more data, but it should be reasonably close.

You'll notice that the Radeon+mfakto combination is considerably less efficient at turning theoretical GFLOPS into GHz-days/day TF results than GeForce+mfaktc. Right now I'm using a divider of 18 (for mfaktc, I'm using 14 for older v1.x GPUs, 5 for v2.0 and 7.5 for v2.1). So that's why you see a Radeon 6990 and a GeForce GTX 570 both expecting ~282GHz-days/day, even though the 6990 has 5100 GFLOPS to the 570's 1400.

More benchmark data is still welcome, especially from older/slower GPUs.


All times are UTC. The time now is 22:42.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.