mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-01-06, 16:13   #1530
bcp19
 
bcp19's Avatar
 
Oct 2011

7×97 Posts
Default

Quote:
Originally Posted by flashjh View Post
Your CPU does some sieving, so that's normal. Everything from you're picture seems good. You could try running two instances to see if you get more throughput.
Doubtful. The sieveprimes is not down to 5000 which means the GPU is not outperforming the CPU. From his M/s I'd guess that he'd only see a 10-15% increase with 2 instances, which is kind of a waste of a second core.

Yes, you'd need 2 folders to run 2 instances.

Last fiddled with by bcp19 on 2012-01-06 at 16:14
bcp19 is offline   Reply With Quote
Old 2012-01-06, 16:14   #1531
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
Originally Posted by xtreme2k View Post
I am completely new to GPU TFing and have some stats to share with you guys.

System
2600K/GTX460 stock

During TFing on the GPU it seems like the CPU is rather busy as well, which in turns use a fair bit of power, 26.8 watt in this case. Is this normal? Is there anyway to run this more effectively? The same CPU idling uses about 5-7w of power.

The GTX seems to churn out a 70-71 bit TF in about 1 hour and a 71-72 in about 2 hours which is fairly fast (i think) compared to pure CPU TFing.
When mfaktc is running the CPU prepares data to feed to the GPU. It is normal for the CPU usage to go up. Many people here (though not all, I think) assign affinity for an instance of mfaktc to a particular core. There is considerable discussion in this thread on ways to do this when mfaktc is started. The previous page has some of this discussion.

The GPU is a lot faster than a CPU for TF. The GTX 460 is a good worker in this, though of course not the latest and greatest. My 460 does turn out lots of TF work.

EDIT: The consensus now seems to be that it is a waste of CPU time to do TF there. Fortunately, there are other tasks at which an i7 will shine, such as LL or P-1.

Last fiddled with by kladner on 2012-01-06 at 16:18
kladner is offline   Reply With Quote
Old 2012-01-06, 16:18   #1532
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

Quote:
Originally Posted by xtreme2k View Post
Would you also run prime95 on 3 CPU cores (TF or other units) at the same time?
Absolutely run Prime95 on the other 3 cores (as said above, best if affinity is locked to individual cores for both mfaktc and Prime95). But do not run TF on the CPU cores, that's a waste of time (your GPU is much much faster), use your CPU for P-1 or L-L.
James Heinrich is offline   Reply With Quote
Old 2012-01-06, 16:27   #1533
xtreme2k
 
xtreme2k's Avatar
 
Aug 2002

2×3×29 Posts
Default

Thanks for all the response. I have used task manager to set the affinity for the thread for now. Its getting late for me now in Sydney so I will come back to this tomorrow!

How much quicker does a 570/580GTX TF?
xtreme2k is offline   Reply With Quote
Old 2012-01-06, 16:44   #1534
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

Quote:
Originally Posted by bcp19 View Post
Doubtful. The sieveprimes is not down to 5000 which means the GPU is not outperforming the CPU. From his M/s I'd guess that he'd only see a 10-15% increase with 2 instances, which is kind of a waste of a second core.

Yes, you'd need 2 folders to run 2 instances.
You're right. I wouldn't put more than one core per instance, but I think his GPU could handle more.

Quote:
Originally Posted by kladner View Post
When mfaktc is running the CPU prepares data to feed to the GPU. It is normal for the CPU usage to go up. Many people here (though not all, I think) assign affinity for an instance of mfaktc to a particular core. There is considerable discussion in this thread on ways to do this when mfaktc is started. The previous page has some of this discussion.

The GPU is a lot faster than a CPU for TF. The GTX 460 is a good worker in this, though of course not the latest and greatest. My 460 does turn out lots of TF work.

EDIT: The consensus now seems to be that it is a waste of CPU time to do TF there. Fortunately, there are other tasks at which an i7 will shine, such as LL or P-1.
This is how I run my dual GTX 580s on a Phenom 1055T:

- Five instances each running TF on one 580.
- CUDA running on the other 580
- Prime95 running 1 P-1

Instances 1 - 5 affinity set to one core, respectively.
CUDA uses core 5 also
Prime95 uses core 6

Instances 1-4 run First time TFs. Instance 5 runs DC TF.

This balance gives me SievePrimes about 10,000, wait times range from 2.3 to 16 depending on the work being done. CUDA is 9ms per iteration and Prime95 gets the leftover time not used by the OS on core 6. System get a little laggy during stage 2 P-1 using 16Gb of ram, but otherwise it all runs well.

Without settig affinity I never get a good balance and SievePrimes always bottoms out at 5000.

Quote:
Originally Posted by xtreme2k View Post
Thanks for all the response. I have used task manager to set the affinity for the thread for now. Its getting late for me now in Sydney so I will come back to this tomorrow!

How much quicker does a 570/580GTX TF?
Check out James' site here. It will give you a good idea of what performace you'd get on different GPUs.

Last fiddled with by flashjh on 2012-01-06 at 16:45
flashjh is offline   Reply With Quote
Old 2012-01-06, 23:10   #1535
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

As a matter of fact, for all those talking about whether or not he should run another instance, I have the exact same setup. A 2600K hosting a 460. They are both slightly OCd, however if his are both at stock his experience should be pretty much the same. I run it with hyper threading on, Prime95 with 3 workers running two threads per worker on cores 1-6, which correspond to physical cores 1-3 in Windows. That is, Worker 1 has one thread on 1 and one thread on 2, Worker 2 has one thread on 3 and one thread on 4, etc. I pin mfaktc to core 8 (I've found little difference between running it on just one of the pair or both of the pair of logical cores) with SievePrimes=5000 (and AutoAdjust=0) and get, depending on frequencies, 80-95% load, (sometimes up to 100%) which for me, the gain from running two instances is less than the LL/P-1 that the core can do for Prime95.
Dubslow is offline   Reply With Quote
Old 2012-01-06, 23:56   #1536
xtreme2k
 
xtreme2k's Avatar
 
Aug 2002

2·3·29 Posts
Default

Would you guys recommend running 4-6 threads on 1 LL test? (due to HT)
xtreme2k is offline   Reply With Quote
Old 2012-01-07, 00:37   #1537
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

342110 Posts
Default

Quote:
Originally Posted by xtreme2k View Post
Would you guys recommend running 4-6 threads on 1 LL test? (due to HT)
No, best performance is one L-L test per (real) core. You can assign 2 threads per test to make use of hyperthreading if you like, it may or may not make any actual performance difference.
James Heinrich is offline   Reply With Quote
Old 2012-01-07, 00:42   #1538
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

11100001101012 Posts
Default

As I said above, on my 2600K, I run two threads per test, where each thread is assigned to half of the same physical core.
Dubslow is offline   Reply With Quote
Old 2012-01-07, 00:46   #1539
nucleon
 
nucleon's Avatar
 
Mar 2003
Melbourne

5×103 Posts
Default

I've finally migrated all mfaktc instances over to version 0.18.

Last 6day average across the entire farm with all results by ver 0.18 is 1662GHz-days/day.

Previous average was 1400-ish GHz-days/day. To confuse things, I've added about 100GHz-days/day capacity in there as well by swapping out a cpu. :)

What I've noticed is that v0.18 needs a little more cpu grunt to max out GPUs.

-- Craig
nucleon is offline   Reply With Quote
Old 2012-01-07, 02:08   #1540
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

37516 Posts
Default

Quote:
Originally Posted by xtreme2k View Post
Thanks for all the response. I have used task manager to set the affinity for the thread for now. Its getting late for me now in Sydney so I will come back to this tomorrow!

How much quicker does a 570/580GTX TF?
I have two instances of mfaktc running. Each does TF 71—>72 in a little under two hours on GTX580 and corei7-970 @ 3.8 GHz.
Attached Thumbnails
Click image for larger version

Name:	mfaktc.jpg
Views:	130
Size:	231.5 KB
ID:	7525  
Chuck is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 10:27.


Mon Aug 2 10:27:45 UTC 2021 up 10 days, 4:56, 0 users, load averages: 1.70, 1.46, 1.26

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.