![]() |
[QUOTE=LaurV;324114]
For me, this (letting the CPU free) is the [URL="http://en.wikipedia.org/wiki/Manna"]manna from the heaven[/URL]! (as everybody knows, my systems are all CPU-bottle-necked). Now I can run P-1, or LL, or DC or aliquots, with the CPU, which before I could not. THIS IS THE BIG ADVANTAGE. For which I bow again to the people who made this possible. :bow:[/QUOTE] I just stopped running mfaktc for months so I could do the other things :smile: By the way, how can I make it more responsive? I set GPUSieveProcessSize=8 and GPUSieveSize=16, and now it's useable, but still somewhat laggy. I don't really want to reduce GPUSieveSize further, so will doing something like reducing sieve primes help? Or will that cause an even bigger performance hit than reducing GPUSieveSize further? |
mfaktc 0.20 and the small exponents
I´ve been running 0.20 on 60-61M exponents and am quite pleased with it. Faster, and "CPU-free", which is a plus, definitely.
Nevertheless, I was surprised when trying to run it on small exponents (by small I mean 2.6M exponents, from 62 to 64 bits. The GHz-d/d went down from ~258 to 40-45, and this even using using the "LessClasses" version. It is way slower than 0.19 for this type of work. Is there any setting I should look into, or is it just the way it is? |
Thank you Oliver and George :smile:
Now, my question. I am actually running 1 mfaktc 0.19 and 1 cudalucas (DC) on my GTX 275, and 2 mfaktc 0.19 and 1 cudalucas (DC+LL) on my GTX580. If I run mfaktc 0.20, how much GPU can be used for Cudalucas? Luigi |
[QUOTE=lycorn;324124]I was surprised when trying to run it on small exponents (by small I mean 2.6M exponents, from 62 to 64 bits. The GHz-d/d went down from ~258 to 40-45, and this even using using the "LessClasses" version. It is way slower than 0.19 for this type of work.
Is there any setting I should look into, or is it just the way it is?[/QUOTE]GPU sieving is not enabled below 2[sup]64[/sup]. Not only that, but it uses older, less-optimized kernels that are inherently slower. I have been doing a lot of <2[sup]64[/sup] TF above 1000M and the best I can come up with is about 140GHd/d from my GTX 570, and that's using 6 CPU cores to boot. In comparison, running a single GPU-sieving instance in a normal range I can get about 420Ghd/d. So yes, it's pretty inefficient. Experiment with GridSize (my exponents run in 10 seconds or less, so GridSize=0 made a big improvement for me), use v0.20-LESS_CLASSES 64-bit (in CPU-sieving cases, 64-bit is faster; for GPU-sieving 32-bit is faster). If you notice on the mfaktc v0.20 .plan there's now a line for improved support below 2[sup]64[/sup], but it talking to Oliver it's apparently non-trivial, so don't hold your breath. |
[QUOTE=ET_;324132]If I run mfaktc 0.20, how much GPU can be used for Cudalucas?[/QUOTE]On your GTX 275: you can't -- GPU sieving isn't supported below CC 2.0 (that GPU is CC 1.3).
On any supported GPU: try and see. There's no controllable load-sharing, it's just a competition for GPU resources, whether it's mfaktc+CUDALucas, or multiple instances of mfatkc. You'll likely get somewhere around 50:50 balance, or it may be biased towards one program or the other, depending on how the code flows. Easiest way to answer is try and see. |
[QUOTE=James Heinrich;324137]On your GTX 275: you can't -- GPU sieving isn't supported below CC 2.0 (that GPU is CC 1.3).
On any supported GPU: try and see. There's no controllable load-sharing, it's just a competition for GPU resources, whether it's mfaktc+CUDALucas, or multiple instances of mfatkc. You'll likely get somewhere around 50:50 balance, or it may be biased towards one program or the other, depending on how the code flows. Easiest way to answer is try and see.[/QUOTE] I did it with mmff, and noticed that one instance of mmff nearly blocks every other program running on the GPU... I was wondering if the same behavior may be expected from the new mfaktc 0.20 Luigi |
[QUOTE=ET_;324139]I did it with mmff, and noticed that one instance of mmff nearly blocks every other program running on the GPU... I was wondering if the same behavior may be expected from the new mfaktc 0.20
Luigi[/QUOTE] Presumably. AFAIK, the code is very similar -- TheJudger took Prime95's sieve code (in turn built on work by rcv, bsquared, and axn IIRC), and Prime95 took TheJudger's TF code. :smile: |
A couple reference points.
I will put up benchmarks for my GTX480 when I get home, but I'll need to set the sieve back to default. I had gotten an extra 1.3 ghzdays/day by setting it down to 70000. When I get home from work I'll reset the default, run 5 numbers and post the results to your form.
It's currently doing 374.85 ghzdays/day. It's going about 25 ghz days/day faster than it did when I was running 4 instances of .19 (one per cpu core) because my cpu couldn't keep up with it. |
Is there a way to suppress the newline-posting after every 5 seconds?
For the benchmark (which asks for wall clock time), it would be quite useful to only see 1st and last line of the output. It was possible in the previous version, but I don't know whether it's still possible. |
[QUOTE=sonjohan;324177]Is there a way to suppress the newline-posting after every 5 seconds?
For the benchmark (which asks for wall clock time), it would be quite useful to only see 1st and last line of the output. It was possible in the previous version, but I don't know whether it's still possible.[/QUOTE] Would this be what you're looking for? (From mfaktc.ini) [CODE]# possible values for PrintMode: # 0: print a new line for each finished class # 1: overwrite the current line (more compact output) # # Default: PrintMode=0 PrintMode=0 [/CODE] |
[QUOTE=lycorn;324124]I´ve been running 0.20 on 60-61M exponents and am quite pleased with it. Faster, and "CPU-free", which is a plus, definitely.
Nevertheless, I was surprised when trying to run it on small exponents (by small I mean 2.6M exponents, from 62 to 64 bits. The GHz-d/d went down from ~258 to 40-45, and this even using using the "LessClasses" version. It is way slower than 0.19 for this type of work. Is there any setting I should look into, or is it just the way it is?[/QUOTE] Well, below 2[SUP]64[/SUP] mfaktc 0.20 should perform very similar to 0.19. I didn't touch the (old) kernels which can handle those numbers. [QUOTE=ET_;324139]I did it with mmff, and noticed that one instance of mmff nearly blocks every other program running on the GPU... I was wondering if the same behavior may be expected from the new mfaktc 0.20 Luigi[/QUOTE] All current GPUs can run only one application at a time (timesharing). CC 2.0 or newer GPUs can handle multiple kernels started from exactly one application (process) at the same time. When they come from different application they will serialized. CC 3.5 can do this, Nvidia calls this "Hyper-Q". Currently only the GK110 chip is CC 3.5 and they sell them as Tesla K20 for a high price. If you want to mix mfaktc and cudalucas you can run half of your time cudalucas and the remaining time mfaktc. Oliver |
| All times are UTC. The time now is 23:16. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.