![]() |
[QUOTE=James Heinrich;294540]My interpretation is that TF should be done to the 200 mark, or a little bit higher. Since "200" will rarely fall exactly on an integer bitlevel (actual breakeven point for "100" is in the rightmost column), TF to the rounded-up-from-that is appropriate when 2 L-Ls would be saved. If only 1 L-L would be saved, then TF to 1 bitlevel less (half the TF effort).[/QUOTE]
Thanks again for putting this together James. It has helped bring some solid answers to what was hotly debated (read: just how far should we GPU TF). Would it be possible to have another right-most column for "2 L-L"? I'm guessing it isn't exactly "1 L-L" + 1.0. |
[QUOTE=chalsall;294543]It has helped bring some solid answers to what was hotly debated (read: just how far should we GPU TF).[/quote]Remember this is only a partial answer. This is based on the assumption that CPUs do no useful work, and is purely between GPU-TF and GPU-LL. It completely sidesteps the whole debate of balance between GPU-TF and CPU-LL, and totally ignores P-1 (which is currently CPU-only, but that too may change soon).
[QUOTE=chalsall;294543]Would it be possible to have another right-most column for "2 L-L"? I'm guessing it isn't exactly "1 L-L" + 1.0.[/QUOTE]I could (and have), but it would be (is) exactly +1.0 |
[QUOTE=James Heinrich;294548]Remember this is only a partial answer. This is based on the assumption that CPUs do no useful work, and is purely between GPU-TF and GPU-LL. It completely sidesteps the whole debate of balance between GPU-TF and CPU-LL, and totally ignores P-1 (which is currently CPU-only, but that too may change soon).[/QUOTE]
The hand-wavy method to deal with those is just to take one bit off, and then you wind up with more-or-less what we're doing now. |
[QUOTE=Dubslow;294549]The hand-wavy method to deal with those is just to take one bit off, and then you wind up with more-or-less what we're doing now.[/QUOTE]
Is it? Or it is the other way around (add one bit)? Given that there are [I][U]many[/U][/I] more CPUs than GPUs participating in GIMPS, I think the argument can be made that doing TFing past the break-even point for each GPU makes sense for [I][U]GIMPS[/U][/I], even if not for the individual participant. But, as always, people are encouraged to do whatever they want (other than poaching or hording). It's their hardware, time and electricity. |
[QUOTE=chalsall;294553]Is it? Or it is the other way around (add one bit)?
Given that there are [I][U]many[/U][/I] more CPUs than GPUs participating in GIMPS, I think the argument can be made that doing TFing past the break-even point for each GPU makes sense for [I][U]GIMPS[/U][/I], even if not for the individual participant. But, as always, people are encouraged to do whatever they want (other than poaching or hording). It's their hardware, time and electricity.[/QUOTE] Based on the charts it looks like we should GPU TF almost all exponents one more level. |
[QUOTE=chalsall;294553]Given that there are [I][U]many[/U][/I] more CPUs than GPUs participating in GIMPS, I think the argument can be made that doing TFing past the break-even point for each GPU makes sense for [I][U]GIMPS[/U][/I], even if not for the individual participant.[/QUOTE]
No, going past the breakeven point makes no sense. The GPU will clear more exponents by switching to CUDALucas rather than TFing past the breakeven. The question for the GPU owner faces is: Do a I go TF a range that hasn't reached the breakeven or do I switch to CUDALucas? How should we modify the chart to take into account the loss of CPU cores? You need to know how much CPU power is lost to keep mfaktc busy. For example, if it takes 2 i7-860 cores, then you'd compare mfaktc's factor found rate to CUdaLucas + 2 i7-860 cores LL rate. Has anyone tried to gather this kind of data? |
[QUOTE=Prime95;294558]...then you'd compare mfaktc's factor found rate to CUdaLucas + 2 i7-860 cores LL rate.[/QUOTE]GPUs are ridiculously good at TF, even after factoring in the "lost" CPU cores. GPUs are still much faster than CPU for LL, but less distinctly so. To throw some numbers out, with a GTX 570 and 2 cores of a i7-3930K @ 4.5GHz, I can:
* mfaktc = 281GHz-days/day (TF) * CUDALucas + Prime95 = 31 + 15 = 46GHz-days/day (LL + LL or P-1) P-1 still needs doing, and has no GPU option at the moment. |
[QUOTE=Prime95;294558]No, going past the breakeven point makes no sense. The GPU will clear more exponents by switching to CUDALucas rather than TFing past the breakeven. [/QUOTE]
It is worse than that. Break-even point is calculated vis-a-vis two LL tests. The second of the LL test is something we'd get to many many years later. If the GPU instead focuses on LL, it'd clear twice the number of exponents (compared to TF) thus speeding up the main LL wave even further. |
[QUOTE=Prime95;294558]No, going past the breakeven point makes no sense. The GPU will clear more exponents by switching to CUDALucas rather than TFing past the breakeven. The question for the GPU owner faces is: Do a I go TF a range that hasn't reached the breakeven or do I switch to CUDALucas?
How should we modify the chart to take into account the loss of CPU cores? You need to know how much CPU power is lost to keep mfaktc busy. For example, if it takes 2 i7-860 cores, then you'd compare mfaktc's factor found rate to CUdaLucas + 2 i7-860 cores LL rate. Has anyone tried to gather this kind of data?[/QUOTE] I worked on that a bit using my systems, if I were to devote every core and GPU to DC in the 26M range, I could clear, on average, around 5.9 per day. My current active cores total 1.7 DC per day, so I am 'losing' 4.2 DC/day in exchange for ~800 GHzD of TF. Since very few 26-28M still needing an extra bit or 2 of TF are showing up lately, most of my work goes towards 29-30M exponents. These exponents are 'worth' 26-30% more credit than the 26M ones, so you might say I am only 'losing' 3.3-3.6 DC/day, but still generating the 800GHzD of TF. CPU/GPU combinations and # of instances are also a factor. I find the Core2 Quads running 3 instances on mid-high end GPUs (450/550/460/560) are 30%-50% more 'efficient' than the AMD x4 and i5/i7. At the 26M level, the quads are easily efficient to ^69 even adding in the 'loss' of 3 cores, while the others would 'break even' about 1/3 of the way between ^68 and ^69. Using 2 instances is actually 10-15% less efficient than 3. |
[QUOTE=James Heinrich;294561]GPUs are ridiculously good at TF, even after factoring in the "lost" CPU cores.[/QUOTE]
I'm not denying that. My point is the lost CPU cores affect your calculation of the breakeven point. |
[CODE]CPU/GPU combinations and # of instances are also a factor. I find the Core2 Quads running 3 instances on mid-high end GPUs (450/550/460/560) are 30%-50% more 'efficient' than the AMD x4 and i5/i7.[/CODE]
I have certainly been observing on a x6 AMD, that running mfaktc by itself, even a single instance with Priority assigned to a core, is faster than when it is competing with P95-64 running 4x P-1 and 1x LL (all with Priority assigned.) Starting P95-64 has more effect on mfaktc than making it compete with CuLu for the GPU. |
| All times are UTC. The time now is 23:16. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.