mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-01-11, 15:42   #309
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by Bdot View Post
Thanks for this info! Could you please also post the OpenCL device info part as mfakto reports it? If I can easily figure out we're running on Llano, then I can enable a zero-memory-copy optimization, that should increase GPU utilisation by ~10% when only a single instance is running (and by a small amount for multi-instance).
Ok, here it is.
(btw, it only uses about 65-80% of my gpu when I run 1 instance... that might be normal ?)

OpenCL device info
name BeaverCreek (Advanced Micro Devices, Inc.)
device (driver) version OpenCL 1.1 AMD-APP (851.4) (CAL 1.4.1646 (VM))
maximum threads per block 256
maximum threads per grid 16777216
number of multiprocessors 5 (400 compute elements (estimate for ATI GPUs))
clock rate 600MHz
kracker is offline   Reply With Quote
Old 2012-01-11, 15:52   #310
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Quote:
Originally Posted by kracker View Post
(btw, it only uses about 65-80% of my gpu when I run 1 instance... that might be normal ?)
That depends on what CPU you're using. I'll try and link you to an old post that I have no intention of rewriting from scratch.



[offtopic]Welcome to the GPU to 72 team! Except it seems you haven't actually gotten work from the tool. You can find more info the GPU to 72 subforum somewhere around here. Happy crunching![/offtopic]
Dubslow is offline   Reply With Quote
Old 2012-01-11, 16:10   #311
KyleAskine
 
KyleAskine's Avatar
 
Oct 2011
Maryland

29010 Posts
Default

Quote:
Originally Posted by kracker View Post
(btw, it only uses about 65-80% of my gpu when I run 1 instance... that might be normal ?)
It is probably the issue that Bdot mentioned above. You should be at around 90% or so with one instance in my opinion, since it is obvious that your CPU can sieve way faster than your GPU can process.
KyleAskine is offline   Reply With Quote
Old 2012-01-11, 16:47   #312
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Ah, ok thanks :)
Oh, and also I was wondering is there any way to reduce the priority of it? I have to pause it every time I do a gpu-intensive program or game, Thanks :)

(P.S.: Is there a way to automatically pull assignments? Right now I realized I'll have to manually get more once it gets done. )
kracker is offline   Reply With Quote
Old 2012-01-11, 21:07   #313
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

160658 Posts
Default

Quote:
Originally Posted by kracker View Post
Ah, ok thanks :)
Oh, and also I was wondering is there any way to reduce the priority of it? I have to pause it every time I do a gpu-intensive program or game, Thanks :)
Try using a batch file; you can set CPU affinity and priority in the command to start mfakto. Or you can change it via Task Manager after it's already running. (Sadly I have yet to find a decent post on the GPU usage thing.)
Edit: As a holdover:
Quote:
Originally Posted by Dubslow View Post
The CPU wait indicates how long the CPU is waiting for work. If it's greater than 1000, than the CPU is waiting a lot, which means the GPU is overwhelmed. Sieve Primes controls how much work is done on the CPU; that's why the program auto-adjusted that up to 200,000 (the default is 25,000, and 5,000 is the minimum).
Quote:
Originally Posted by kracker View Post
(P.S.: Is there a way to automatically pull assignments? Right now I realized I'll have to manually get more once it gets done. )
That is being worked on at the moment, unfortunately not ready yet.

Last fiddled with by Dubslow on 2012-01-11 at 21:57
Dubslow is offline   Reply With Quote
Old 2012-01-11, 22:32   #314
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

10010101012 Posts
Default

Thanks for the device info, I'll put that on the wishlist ;-)

Quote:
Originally Posted by kracker View Post
Ah, ok thanks :)
Oh, and also I was wondering is there any way to reduce the priority of it? I have to pause it every time I do a gpu-intensive program or game, Thanks :)
Adding to Dubslow's comment:

While you can lower the priority as mentioned (but not built-in), it may not result in what you want. The reason is, that the priority setting applies to the CPU part only. On the GPU there is no such thing as priorities - it's all round-robin on the same level. mfakto tries to keep 5 blocks (tasks) in the GPU-queue, which can make the UI laggy, if window-movements for instance have to wait until these 5 tasks are processed.

You can try two settings in mfakto.ini:
GridSize defines of how many factor candidates one block will consist. Lowering this value should already increase responsiveness a lot at the expense of a little more CPU overhead.
NumStreams is the number of blocks being scheduled. Lowering to 3 or 2 causes other tasks to be served quicker, but mfakto will have a smaller buffer to cover fluctuations in available CPU power.

BTW, the relatively low GPU utilization can also occur if the CPU cores are rather busy. Sometimes the auto-adjusting of the SievePrimes value is confused if there is no CPU available to serve the GPU queue: the time it took to get the required CPU power is then wrongly interpreted as CPU idle time waiting for the GPU to finish. Try setting SievePrimesAdjust=0 and SievePrimes=100000 (to be tested what is good). Alternatively, set up two copies of mfakto to run in parallel. Then they can cover each other's gaps in GPU utilisation.
Bdot is offline   Reply With Quote
Old 2012-01-11, 23:30   #315
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

23×149 Posts
Default

Can I request any mfakto users help me out with some benchmark data. I want to update my mfaktc table to include AMD GPUs as well, but I need some data to base it on. Please send me benchmarks on a wide variety of GPUs (very old to very new, and very slow to very fast) so that I can get as accurate a picture of how GFLOPS scales into GHz-days/day performance across the various products. For each GPU, I need the following 4 bits of data of a single running instance (even if you normally run multiple instances, please just run one for this test):
  1. GPU model (including clockspeed if overclocked)
  2. assignment (exponent, from-bits, to-bits)
  3. wall clock runtime
  4. average GPU usage
If you want to include the CPU model/speed and SievePrimes values as well that's interesting, but not required.

Please PM or email me the results as opposed to posting in this thread. I'll post back when I have enough data to make a reasonable chart.
James Heinrich is offline   Reply With Quote
Old 2012-01-12, 03:44   #316
KyleAskine
 
KyleAskine's Avatar
 
Oct 2011
Maryland

2·5·29 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
Can I request any mfakto users help me out with some benchmark data. I want to update my mfaktc table to include AMD GPUs as well, but I need some data to base it on. Please send me benchmarks on a wide variety of GPUs (very old to very new, and very slow to very fast) so that I can get as accurate a picture of how GFLOPS scales into GHz-days/day performance across the various products. For each GPU, I need the following 4 bits of data of a single running instance (even if you normally run multiple instances, please just run one for this test):
  1. GPU model (including clockspeed if overclocked)
  2. assignment (exponent, from-bits, to-bits)
  3. wall clock runtime
  4. average GPU usage
If you want to include the CPU model/speed and SievePrimes values as well that's interesting, but not required.

Please PM or email me the results as opposed to posting in this thread. I'll post back when I have enough data to make a reasonable chart.
Why isn't SievePrimes required data? If the GPU is the bottleneck, a different SievePrimes number will affect wall clock runtime, but not the other three variables.

Put another way: I can always lower sieve primes and destroy my wall clock performance by increasing the number of candidates, but the reason performance is bad would be missed by your metrics, since the GPU usage would be the same.

I will try to get you some values tomorrow. I have 6950's modded with shaders unlocked, and a 6570 I can hopefully get you tomorrow!
KyleAskine is offline   Reply With Quote
Old 2012-01-12, 13:38   #317
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

23·149 Posts
Default

Quote:
Originally Posted by KyleAskine View Post
Why isn't SievePrimes required data?
It's nice to have so I can see if I expect any given benchmark to be on the high or low side, but I don't need it per se for the calculations. There's enough (too much!) variance in the data (based on what I've seen of mfaktc data) that it doesn't really make much difference overall, my chart will just provide rough guidelines, +/-10% at best.
James Heinrich is offline   Reply With Quote
Old 2012-01-12, 14:00   #318
KyleAskine
 
KyleAskine's Avatar
 
Oct 2011
Maryland

2×5×29 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
It's nice to have so I can see if I expect any given benchmark to be on the high or low side, but I don't need it per se for the calculations. There's enough (too much!) variance in the data (based on what I've seen of mfaktc data) that it doesn't really make much difference overall, my chart will just provide rough guidelines, +/-10% at best.
I have no idea, and this could be way off target, but why don't you just get the M/s value and the GPU usage? Wouldn't that be the easiest benchmark to get? Or does that not account for everything?
KyleAskine is offline   Reply With Quote
Old 2012-01-12, 15:27   #319
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

23×149 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
I want to update my mfaktc table to include AMD GPUs as well
I have some rough data now, enough to at least put up the chart. I may refine it slightly as I get more data, but it should be reasonably close.

You'll notice that the Radeon+mfakto combination is considerably less efficient at turning theoretical GFLOPS into GHz-days/day TF results than GeForce+mfaktc. Right now I'm using a divider of 18 (for mfaktc, I'm using 14 for older v1.x GPUs, 5 for v2.0 and 7.5 for v2.1). So that's why you see a Radeon 6990 and a GeForce GTX 570 both expecting ~282GHz-days/day, even though the 6990 has 5100 GFLOPS to the 570's 1400.

More benchmark data is still welcome, especially from older/slower GPUs.
James Heinrich is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2719 2021-08-05 22:43
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 13:02.


Fri Aug 6 13:02:31 UTC 2021 up 14 days, 7:31, 1 user, load averages: 3.01, 2.93, 2.74

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.