mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-03-28, 20:43   #1717
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

100110001001112 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
My interpretation is that TF should be done to the 200 mark, or a little bit higher. Since "200" will rarely fall exactly on an integer bitlevel (actual breakeven point for "100" is in the rightmost column), TF to the rounded-up-from-that is appropriate when 2 L-Ls would be saved. If only 1 L-L would be saved, then TF to 1 bitlevel less (half the TF effort).
Thanks again for putting this together James. It has helped bring some solid answers to what was hotly debated (read: just how far should we GPU TF).

Would it be possible to have another right-most column for "2 L-L"? I'm guessing it isn't exactly "1 L-L" + 1.0.
chalsall is offline   Reply With Quote
Old 2012-03-28, 21:48   #1718
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

Quote:
Originally Posted by chalsall View Post
It has helped bring some solid answers to what was hotly debated (read: just how far should we GPU TF).
Remember this is only a partial answer. This is based on the assumption that CPUs do no useful work, and is purely between GPU-TF and GPU-LL. It completely sidesteps the whole debate of balance between GPU-TF and CPU-LL, and totally ignores P-1 (which is currently CPU-only, but that too may change soon).

Quote:
Originally Posted by chalsall View Post
Would it be possible to have another right-most column for "2 L-L"? I'm guessing it isn't exactly "1 L-L" + 1.0.
I could (and have), but it would be (is) exactly +1.0
James Heinrich is offline   Reply With Quote
Old 2012-03-28, 21:51   #1719
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
Remember this is only a partial answer. This is based on the assumption that CPUs do no useful work, and is purely between GPU-TF and GPU-LL. It completely sidesteps the whole debate of balance between GPU-TF and CPU-LL, and totally ignores P-1 (which is currently CPU-only, but that too may change soon).
The hand-wavy method to deal with those is just to take one bit off, and then you wind up with more-or-less what we're doing now.
Dubslow is offline   Reply With Quote
Old 2012-03-28, 22:21   #1720
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

9,767 Posts
Default

Quote:
Originally Posted by Dubslow View Post
The hand-wavy method to deal with those is just to take one bit off, and then you wind up with more-or-less what we're doing now.
Is it? Or it is the other way around (add one bit)?

Given that there are many more CPUs than GPUs participating in GIMPS, I think the argument can be made that doing TFing past the break-even point for each GPU makes sense for GIMPS, even if not for the individual participant.

But, as always, people are encouraged to do whatever they want (other than poaching or hording). It's their hardware, time and electricity.
chalsall is offline   Reply With Quote
Old 2012-03-28, 22:50   #1721
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

1,123 Posts
Default

Quote:
Originally Posted by chalsall View Post
Is it? Or it is the other way around (add one bit)?

Given that there are many more CPUs than GPUs participating in GIMPS, I think the argument can be made that doing TFing past the break-even point for each GPU makes sense for GIMPS, even if not for the individual participant.

But, as always, people are encouraged to do whatever they want (other than poaching or hording). It's their hardware, time and electricity.
Based on the charts it looks like we should GPU TF almost all exponents one more level.

Last fiddled with by flashjh on 2012-03-28 at 22:50
flashjh is offline   Reply With Quote
Old 2012-03-28, 23:03   #1722
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7,537 Posts
Default

Quote:
Originally Posted by chalsall View Post
Given that there are many more CPUs than GPUs participating in GIMPS, I think the argument can be made that doing TFing past the break-even point for each GPU makes sense for GIMPS, even if not for the individual participant.
No, going past the breakeven point makes no sense. The GPU will clear more exponents by switching to CUDALucas rather than TFing past the breakeven. The question for the GPU owner faces is: Do a I go TF a range that hasn't reached the breakeven or do I switch to CUDALucas?

How should we modify the chart to take into account the loss of CPU cores? You need to know how much CPU power is lost to keep mfaktc busy. For example, if it takes 2 i7-860 cores, then you'd compare mfaktc's factor found rate to CUdaLucas + 2 i7-860 cores LL rate. Has anyone tried to gather this kind of data?

Last fiddled with by Prime95 on 2012-03-28 at 23:04
Prime95 is offline   Reply With Quote
Old 2012-03-28, 23:21   #1723
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

1101010111012 Posts
Default

Quote:
Originally Posted by Prime95 View Post
...then you'd compare mfaktc's factor found rate to CUdaLucas + 2 i7-860 cores LL rate.
GPUs are ridiculously good at TF, even after factoring in the "lost" CPU cores. GPUs are still much faster than CPU for LL, but less distinctly so. To throw some numbers out, with a GTX 570 and 2 cores of a i7-3930K @ 4.5GHz, I can:
* mfaktc = 281GHz-days/day (TF)
* CUDALucas + Prime95 = 31 + 15 = 46GHz-days/day (LL + LL or P-1)
P-1 still needs doing, and has no GPU option at the moment.
James Heinrich is offline   Reply With Quote
Old 2012-03-28, 23:41   #1724
axn
 
axn's Avatar
 
Jun 2003

2×3×7×112 Posts
Default

Quote:
Originally Posted by Prime95 View Post
No, going past the breakeven point makes no sense. The GPU will clear more exponents by switching to CUDALucas rather than TFing past the breakeven.
It is worse than that. Break-even point is calculated vis-a-vis two LL tests. The second of the LL test is something we'd get to many many years later. If the GPU instead focuses on LL, it'd clear twice the number of exponents (compared to TF) thus speeding up the main LL wave even further.

Last fiddled with by axn on 2012-03-28 at 23:41
axn is online now   Reply With Quote
Old 2012-03-28, 23:59   #1725
bcp19
 
bcp19's Avatar
 
Oct 2011

2A716 Posts
Default

Quote:
Originally Posted by Prime95 View Post
No, going past the breakeven point makes no sense. The GPU will clear more exponents by switching to CUDALucas rather than TFing past the breakeven. The question for the GPU owner faces is: Do a I go TF a range that hasn't reached the breakeven or do I switch to CUDALucas?

How should we modify the chart to take into account the loss of CPU cores? You need to know how much CPU power is lost to keep mfaktc busy. For example, if it takes 2 i7-860 cores, then you'd compare mfaktc's factor found rate to CUdaLucas + 2 i7-860 cores LL rate. Has anyone tried to gather this kind of data?
I worked on that a bit using my systems, if I were to devote every core and GPU to DC in the 26M range, I could clear, on average, around 5.9 per day. My current active cores total 1.7 DC per day, so I am 'losing' 4.2 DC/day in exchange for ~800 GHzD of TF. Since very few 26-28M still needing an extra bit or 2 of TF are showing up lately, most of my work goes towards 29-30M exponents. These exponents are 'worth' 26-30% more credit than the 26M ones, so you might say I am only 'losing' 3.3-3.6 DC/day, but still generating the 800GHzD of TF.

CPU/GPU combinations and # of instances are also a factor. I find the Core2 Quads running 3 instances on mid-high end GPUs (450/550/460/560) are 30%-50% more 'efficient' than the AMD x4 and i5/i7. At the 26M level, the quads are easily efficient to ^69 even adding in the 'loss' of 3 cores, while the others would 'break even' about 1/3 of the way between ^68 and ^69. Using 2 instances is actually 10-15% less efficient than 3.
bcp19 is offline   Reply With Quote
Old 2012-03-29, 00:34   #1726
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7,537 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
GPUs are ridiculously good at TF, even after factoring in the "lost" CPU cores.
I'm not denying that. My point is the lost CPU cores affect your calculation of the breakeven point.

Last fiddled with by Prime95 on 2012-03-29 at 00:41
Prime95 is offline   Reply With Quote
Old 2012-03-29, 01:22   #1727
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Code:
CPU/GPU combinations and # of instances are also a factor.  I find the  Core2 Quads running 3 instances on mid-high end GPUs (450/550/460/560)  are 30%-50% more 'efficient' than the AMD x4 and i5/i7.
I have certainly been observing on a x6 AMD, that running mfaktc by itself, even a single instance with Priority assigned to a core, is faster than when it is competing with P95-64 running 4x P-1 and 1x LL (all with Priority assigned.) Starting P95-64 has more effect on mfaktc than making it compete with CuLu for the GPU.
kladner is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 07:30.


Mon Aug 2 07:30:02 UTC 2021 up 10 days, 1:59, 0 users, load averages: 1.50, 1.33, 1.40

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.