mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

bcp19 2012-01-06 16:13

[QUOTE=flashjh;285081]Your CPU does some sieving, so that's normal. Everything from you're picture seems good. You could try running two instances to see if you get more throughput.[/QUOTE]

Doubtful. The sieveprimes is not down to 5000 which means the GPU is not outperforming the CPU. From his M/s I'd guess that he'd only see a 10-15% increase with 2 instances, which is kind of a waste of a second core.

Yes, you'd need 2 folders to run 2 instances.

kladner 2012-01-06 16:14

[QUOTE=xtreme2k;285078]I am completely new to GPU TFing and have some stats to share with you guys.

System
2600K/GTX460 stock

During TFing on the GPU it seems like the CPU is rather busy as well, which in turns use a fair bit of power, 26.8 watt in this case. Is this normal? Is there anyway to run this more effectively? The same CPU idling uses about 5-7w of power.

The GTX seems to churn out a 70-71 bit TF in about 1 hour and a 71-72 in about 2 hours which is fairly fast (i think) compared to pure CPU TFing.[/QUOTE]

When mfaktc is running the CPU prepares data to feed to the GPU. It is normal for the CPU usage to go up. Many people here (though not all, I think) assign affinity for an instance of mfaktc to a particular core. There is considerable discussion in this thread on ways to do this when mfaktc is started. The previous page has some of this discussion.

The GPU is a lot faster than a CPU for TF. The GTX 460 is a good worker in this, though of course not the latest and greatest. My 460 does turn out lots of TF work.

EDIT: The consensus now seems to be that it is a waste of CPU time to do TF there. Fortunately, there are other tasks at which an i7 will shine, such as LL or P-1.

James Heinrich 2012-01-06 16:18

[QUOTE=xtreme2k;285082]Would you also run prime95 on 3 CPU cores (TF or other units) at the same time?[/QUOTE]Absolutely run Prime95 on the other 3 cores (as said above, best if affinity is locked to individual cores for both mfaktc and Prime95). But do [i]not[/i] run TF on the CPU cores, that's a waste of time (your GPU is much much faster), use your CPU for P-1 or L-L.

xtreme2k 2012-01-06 16:27

Thanks for all the response. I have used task manager to set the affinity for the thread for now. Its getting late for me now in Sydney so I will come back to this tomorrow!

How much quicker does a 570/580GTX TF?

flashjh 2012-01-06 16:44

[QUOTE=bcp19;285083]Doubtful. The sieveprimes is not down to 5000 which means the GPU is not outperforming the CPU. From his M/s I'd guess that he'd only see a 10-15% increase with 2 instances, which is kind of a waste of a second core.

Yes, you'd need 2 folders to run 2 instances.[/QUOTE]

You're right. I wouldn't put more than one core per instance, but I think his GPU could handle more.

[QUOTE=kladner;285084]When mfaktc is running the CPU prepares data to feed to the GPU. It is normal for the CPU usage to go up. Many people here (though not all, I think) assign affinity for an instance of mfaktc to a particular core. There is considerable discussion in this thread on ways to do this when mfaktc is started. The previous page has some of this discussion.

The GPU is a lot faster than a CPU for TF. The GTX 460 is a good worker in this, though of course not the latest and greatest. My 460 does turn out lots of TF work.

EDIT: The consensus now seems to be that it is a waste of CPU time to do TF there. Fortunately, there are other tasks at which an i7 will shine, such as LL or P-1.[/QUOTE]

This is how I run my dual GTX 580s on a Phenom 1055T:

- Five instances each running TF on one 580.
- CUDA running on the other 580
- Prime95 running 1 P-1

Instances 1 - 5 affinity set to one core, respectively.
CUDA uses core 5 also
Prime95 uses core 6

Instances 1-4 run First time TFs. Instance 5 runs DC TF.

This balance gives me SievePrimes about 10,000, wait times range from 2.3 to 16 depending on the work being done. CUDA is 9ms per iteration and Prime95 gets the leftover time not used by the OS on core 6. System get a little laggy during stage 2 P-1 using 16Gb of ram, but otherwise it all runs well.

Without settig affinity I never get a good balance and SievePrimes always bottoms out at 5000.

[QUOTE=xtreme2k;285089]Thanks for all the response. I have used task manager to set the affinity for the thread for now. Its getting late for me now in Sydney so I will come back to this tomorrow!

How much quicker does a 570/580GTX TF?[/QUOTE]

Check out James' site [URL="http://mersenne-aries.sili.net/mfaktc.php"]here[/URL]. It will give you a good idea of what performace you'd get on different GPUs.

Dubslow 2012-01-06 23:10

As a matter of fact, for all those talking about whether or not he should run another instance, I have the exact same setup. A 2600K hosting a 460. They are both slightly OCd, however if his are both at stock his experience should be pretty much the same. I run it with hyper threading on, Prime95 with 3 workers running two threads per worker on cores 1-6, which correspond to physical cores 1-3 in Windows. That is, Worker 1 has one thread on 1 and one thread on 2, Worker 2 has one thread on 3 and one thread on 4, etc. I pin mfaktc to core 8 (I've found little difference between running it on just one of the pair or both of the pair of logical cores) with SievePrimes=5000 (and AutoAdjust=0) and get, depending on frequencies, 80-95% load, (sometimes up to 100%) which for me, the gain from running two instances is less than the LL/P-1 that the core can do for Prime95.

xtreme2k 2012-01-06 23:56

Would you guys recommend running 4-6 threads on 1 LL test? (due to HT)

James Heinrich 2012-01-07 00:37

[QUOTE=xtreme2k;285152]Would you guys recommend running 4-6 threads on 1 LL test? (due to HT)[/QUOTE]No, best performance is one L-L test per (real) core. You can assign 2 threads per test to make use of hyperthreading if you like, it may or may not make any actual performance difference.

Dubslow 2012-01-07 00:42

As I said above, on my 2600K, I run two threads per test, where each thread is assigned to half of the same physical core.

nucleon 2012-01-07 00:46

I've finally migrated all mfaktc instances over to version 0.18.

Last 6day average across the entire farm with all results by ver 0.18 is 1662GHz-days/day.

Previous average was 1400-ish GHz-days/day. To confuse things, I've added about 100GHz-days/day capacity in there as well by swapping out a cpu. :)

What I've noticed is that v0.18 needs a little more cpu grunt to max out GPUs.

-- Craig

Chuck 2012-01-07 02:08

1 Attachment(s)
[QUOTE=xtreme2k;285089]Thanks for all the response. I have used task manager to set the affinity for the thread for now. Its getting late for me now in Sydney so I will come back to this tomorrow!

How much quicker does a 570/580GTX TF?[/QUOTE]

I have two instances of mfaktc running. Each does TF 71—>72 in a little under two hours on GTX580 and corei7-970 @ 3.8 GHz.


All times are UTC. The time now is 23:16.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.