mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2013-01-09, 08:37   #2058
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

160658 Posts
Default

Quote:
Originally Posted by LaurV View Post
For me, this (letting the CPU free) is the manna from the heaven! (as everybody knows, my systems are all CPU-bottle-necked). Now I can run P-1, or LL, or DC or aliquots, with the CPU, which before I could not. THIS IS THE BIG ADVANTAGE. For which I bow again to the people who made this possible.
I just stopped running mfaktc for months so I could do the other things

By the way, how can I make it more responsive? I set GPUSieveProcessSize=8 and GPUSieveSize=16, and now it's useable, but still somewhat laggy. I don't really want to reduce GPUSieveSize further, so will doing something like reducing sieve primes help? Or will that cause an even bigger performance hit than reducing GPUSieveSize further?
Dubslow is offline   Reply With Quote
Old 2013-01-09, 11:03   #2059
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

3×491 Posts
Default mfaktc 0.20 and the small exponents

I´ve been running 0.20 on 60-61M exponents and am quite pleased with it. Faster, and "CPU-free", which is a plus, definitely.
Nevertheless, I was surprised when trying to run it on small exponents (by small I mean 2.6M exponents, from 62 to 64 bits. The GHz-d/d went down from ~258 to 40-45, and this even using using the "LessClasses" version. It is way slower than 0.19 for this type of work.
Is there any setting I should look into, or is it just the way it is?
lycorn is offline   Reply With Quote
Old 2013-01-09, 12:31   #2060
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

10010110100102 Posts
Default

Thank you Oliver and George

Now, my question.

I am actually running 1 mfaktc 0.19 and 1 cudalucas (DC) on my GTX 275, and 2 mfaktc 0.19 and 1 cudalucas (DC+LL) on my GTX580.

If I run mfaktc 0.20, how much GPU can be used for Cudalucas?

Luigi
ET_ is online now   Reply With Quote
Old 2013-01-09, 13:00   #2061
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

342110 Posts
Default

Quote:
Originally Posted by lycorn View Post
I was surprised when trying to run it on small exponents (by small I mean 2.6M exponents, from 62 to 64 bits. The GHz-d/d went down from ~258 to 40-45, and this even using using the "LessClasses" version. It is way slower than 0.19 for this type of work.
Is there any setting I should look into, or is it just the way it is?
GPU sieving is not enabled below 264. Not only that, but it uses older, less-optimized kernels that are inherently slower. I have been doing a lot of <264 TF above 1000M and the best I can come up with is about 140GHd/d from my GTX 570, and that's using 6 CPU cores to boot. In comparison, running a single GPU-sieving instance in a normal range I can get about 420Ghd/d. So yes, it's pretty inefficient.

Experiment with GridSize (my exponents run in 10 seconds or less, so GridSize=0 made a big improvement for me), use v0.20-LESS_CLASSES 64-bit (in CPU-sieving cases, 64-bit is faster; for GPU-sieving 32-bit is faster).

If you notice on the mfaktc v0.20 .plan there's now a line for improved support below 264, but it talking to Oliver it's apparently non-trivial, so don't hold your breath.
James Heinrich is offline   Reply With Quote
Old 2013-01-09, 13:20   #2062
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

Quote:
Originally Posted by ET_ View Post
If I run mfaktc 0.20, how much GPU can be used for Cudalucas?
On your GTX 275: you can't -- GPU sieving isn't supported below CC 2.0 (that GPU is CC 1.3).

On any supported GPU: try and see. There's no controllable load-sharing, it's just a competition for GPU resources, whether it's mfaktc+CUDALucas, or multiple instances of mfatkc. You'll likely get somewhere around 50:50 balance, or it may be biased towards one program or the other, depending on how the code flows. Easiest way to answer is try and see.
James Heinrich is offline   Reply With Quote
Old 2013-01-09, 13:30   #2063
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

2×3×11×73 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
On your GTX 275: you can't -- GPU sieving isn't supported below CC 2.0 (that GPU is CC 1.3).

On any supported GPU: try and see. There's no controllable load-sharing, it's just a competition for GPU resources, whether it's mfaktc+CUDALucas, or multiple instances of mfatkc. You'll likely get somewhere around 50:50 balance, or it may be biased towards one program or the other, depending on how the code flows. Easiest way to answer is try and see.
I did it with mmff, and noticed that one instance of mmff nearly blocks every other program running on the GPU... I was wondering if the same behavior may be expected from the new mfaktc 0.20

Luigi
ET_ is online now   Reply With Quote
Old 2013-01-09, 13:39   #2064
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

11100001101012 Posts
Default

Quote:
Originally Posted by ET_ View Post
I did it with mmff, and noticed that one instance of mmff nearly blocks every other program running on the GPU... I was wondering if the same behavior may be expected from the new mfaktc 0.20

Luigi
Presumably. AFAIK, the code is very similar -- TheJudger took Prime95's sieve code (in turn built on work by rcv, bsquared, and axn IIRC), and Prime95 took TheJudger's TF code.
Dubslow is offline   Reply With Quote
Old 2013-01-09, 16:57   #2065
Aramis Wyler
 
Aramis Wyler's Avatar
 
"Bill Staffen"
Jan 2013
Pittsburgh, PA, USA

23·53 Posts
Default A couple reference points.

I will put up benchmarks for my GTX480 when I get home, but I'll need to set the sieve back to default. I had gotten an extra 1.3 ghzdays/day by setting it down to 70000. When I get home from work I'll reset the default, run 5 numbers and post the results to your form.

It's currently doing 374.85 ghzdays/day. It's going about 25 ghz days/day faster than it did when I was running 4 instances of .19 (one per cpu core) because my cpu couldn't keep up with it.
Aramis Wyler is offline   Reply With Quote
Old 2013-01-09, 18:18   #2066
sonjohan
 
sonjohan's Avatar
 
May 2003
Belgium

27810 Posts
Default

Is there a way to suppress the newline-posting after every 5 seconds?
For the benchmark (which asks for wall clock time), it would be quite useful to only see 1st and last line of the output.
It was possible in the previous version, but I don't know whether it's still possible.
sonjohan is offline   Reply With Quote
Old 2013-01-09, 18:23   #2067
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
Originally Posted by sonjohan View Post
Is there a way to suppress the newline-posting after every 5 seconds?
For the benchmark (which asks for wall clock time), it would be quite useful to only see 1st and last line of the output.
It was possible in the previous version, but I don't know whether it's still possible.
Would this be what you're looking for? (From mfaktc.ini)
Code:
# possible values for PrintMode:
# 0: print a new line for each finished class
# 1: overwrite the current line (more compact output)
#
# Default: PrintMode=0

PrintMode=0
kladner is offline   Reply With Quote
Old 2013-01-09, 21:11   #2068
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11×101 Posts
Default

Quote:
Originally Posted by lycorn View Post
I´ve been running 0.20 on 60-61M exponents and am quite pleased with it. Faster, and "CPU-free", which is a plus, definitely.
Nevertheless, I was surprised when trying to run it on small exponents (by small I mean 2.6M exponents, from 62 to 64 bits. The GHz-d/d went down from ~258 to 40-45, and this even using using the "LessClasses" version. It is way slower than 0.19 for this type of work.
Is there any setting I should look into, or is it just the way it is?
Well, below 264 mfaktc 0.20 should perform very similar to 0.19. I didn't touch the (old) kernels which can handle those numbers.

Quote:
Originally Posted by ET_ View Post
I did it with mmff, and noticed that one instance of mmff nearly blocks every other program running on the GPU... I was wondering if the same behavior may be expected from the new mfaktc 0.20

Luigi
All current GPUs can run only one application at a time (timesharing). CC 2.0 or newer GPUs can handle multiple kernels started from exactly one application (process) at the same time. When they come from different application they will serialized. CC 3.5 can do this, Nvidia calls this "Hyper-Q". Currently only the GK110 chip is CC 3.5 and they sell them as Tesla K20 for a high price.

If you want to mix mfaktc and cudalucas you can run half of your time cudalucas and the remaining time mfaktc.

Oliver
TheJudger is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 09:56.


Mon Aug 2 09:56:43 UTC 2021 up 10 days, 4:25, 0 users, load averages: 1.38, 1.34, 1.31

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.