![]() |
|
|
#1728 | |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3·29·83 Posts |
Quote:
Last fiddled with by Dubslow on 2012-03-29 at 02:35 Reason: really need to get better at quoting rather than just quick reply |
|
|
|
|
|
|
#1729 | |
|
Romulan Interpreter
Jun 2011
Thailand
226658 Posts |
Quote:
|
|
|
|
|
|
|
#1730 | |
|
Oct 2011
7·97 Posts |
Quote:
2500/480 - 2 cores mfaktc - 149.26GHzD/DC 'lost' 2400/560Ti - 3 cores mfaktc - 159.53 X4 645/460 - 3 cores mfaktc - 172.78 Q6600/560 - 3 cores mfaktc - 204.85 Q8200/550Ti - 3 cores mfaktc - 228.15 Hard to believe, but the older system is actually 50% more efficient than my 'speed demon'. The 'shared memory' in the quads that messes up running all cores as DC/LL has no such effect on mfaktc, which is what makes the quad so highly efficient compared to newer systems. Also, as I mentioned before, if the above systems with 3 instances are trimmed down to 2, they produce 10-15% less GHzD per 'lost' DC. |
|
|
|
|
|
|
#1731 |
|
Jun 2005
3·43 Posts |
Let me run through an example with my system - a 560ti448 (basically a 570) and overclocked i5-750 system. I'll use M55000000 as an example. I don't have exact measurements but this is more or less correct, I think. OTOH I'm rushing through this on my lunch break so any of the math could be wrong. On the third hand, at least I get the same rough answer that's been shown previously, so that's good (or confirmation bias).
TF on my system uses 3 CPU cores to saturate the GPU. The instances settle down to about 7.85 sec/class or 7540 seconds per exponent (TF from 71 to 72). Since there are 3 of them running, I get 3 results each 7540 seconds, or 1 result every 2510 seconds = 0.7 hours / exponent. Switching that to LL testing, I get the results of the GPU plus 3 CPU cores. Here I'm using data from mersennaries since I don't have it in front of me, but it should be a reasonable guess. The GPU gives a result every 95.1 hours. Each CPU core gives a result every 675 hours (~28 days per exponent). Since the GPU & CPU rates aren't the same, you have to do 1/(1/GPU rate + core count/CPU rate) to get the average, which = 66.9 hours per exponent. Assuming we're finding factors 1.12% of the time as shown on the GPUto72 stats, TF takes about 62 hours to find an exponent while LLing takes 67. But since each factor found saves 2 LL tests, and each extra bit level doubles the run time, I should be TFing to one more bit level to make the time for 1 factor roughly the same as the time for 2 LL tests. This is the same 73 bit optimal depth as we've seen calculated by ignoring the CPUs entirely. Some problems - mfaktc run time scales with exponent size, while P95 scales differently (nlog(n)?) so the decision may be different along the exponent range. CPUs don't scale linearly when adding more cores to LL testing. Different CPUs are relatively more or less effective at mfaktc sieving versus LL tests (AVX, etc). The calculation is sensitive to TF found factor rate. 1% vs 1.1% isn't many extra successes, but it is 10% more of them... On the plus side, though, since TF scales by 2x each time you increase the bit level a few 10-20% hits on either side don't change the conclusion. My gut is telling me that we could test some of the extremes (CPUs really good and really bad at LL vs mfaktc) and see if we get the same answer. Since 2x performance is large, my guess is there's a good chance the answer is yes which would really simplify life. Since most of the data I have here is from the mersennaries page, we may be able to plot this stuff out over a wider range of CPU and GPU types. The big piece missing is how many CPU cores it takes to feed various GPUs. But again, going back to the idea that 10-20% each way doesn't matter when compared to 2x hit for each bit level, that might not be as bad as I imagine. For a quick test, going from 2->3->4 CPU cores takes the LL time from 74 to 67 to 61 hours per exponent. That last one looks like it would move the break even point back to 72 bits factoring (just barely) but adding the 4th CPU to TF work gives me a ~10% better TF rate as well so the overall conclusion doesn't change much. No matter how many CPUs I use, it doesn't move the results enough to justify a 2x jump in TF time one way or the other. I have no gut feel for whether this holds for faster CPUs. On the one hand, they influence the LL rate a lot more. On the other hand, you need less of them to saturate a GPU so there's less to be gained from moving CPUs from TF to LL. Last fiddled with by kjaget on 2012-03-29 at 16:38 |
|
|
|
|
|
#1732 | |
|
Oct 2011
7·97 Posts |
Quote:
A 'balanced' GPU also makes a difference within the same system. The Q6600/560 used to be a Q6600/450, which with 2 cores was at 185.56 and with 3 cores was 184.52. The switch to a 560 was a little over 10% more 'efficient' at 204.85. The 450 was 'too little' GPU, but too much GPU is as bad or worse. I initially put the 480 into the quad, but all 4 cores could not max it out. The 4 cores and the 480 could do 1.43 DC/day, but I calculated I'd only get ~200GD with 4 instances, or 139.8GD/DC 'lost'. Last fiddled with by bcp19 on 2012-03-29 at 18:05 |
|
|
|
|
|
|
#1733 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
7,537 Posts |
Quote:
Kjaget's calculations are the way to go. You compare how much LL a system can do to the amount of TF a system can do coupled with the chance of TF finding a factor (I'm sure P-1 changes the calculation slightly, but I'd bet it can safely be ignored). For GPU272, we should then set the "official" breakeven point by taking an average or typical reported breakeven points. I'm guessing were presently doing too much TF at 45M, just right into the low to mid 50M, and too little at 55M+. But we need more data! James has gotten us closer, but his estimates are a little high because of the unaccounted for CPU cores used by mfaktc. Last fiddled with by Prime95 on 2012-03-29 at 18:28 |
|
|
|
|
|
|
#1734 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
11×311 Posts |
One piece of data that might be relevant (or at least interesting) is the ratio of potential GHz-days/day of your GPU vs the potential GHd/d of the CPU cores required to power it. I don't want to pollute this thread with trivial responses, so if everyone could PM or email me with:
* what GPU you have * what CPU you have * how many instances of mfaktc you run (thereby how many CPU cores are used) * what average GPU usage you get by doing so (should typically be close to 100%). e.g.: "GTX 570; Core i7-3930K, 2 instances; 98% GPU". This gives me a GPU-CPU GHd/d ratio of almost 19:1 (281/(2*7.4)). My theory is that this ratio should be roughly constant and could serve as a basis for including CPU usage into the equation. Once I get a reasonable sample of data I'll post back with what I find. |
|
|
|
|
|
#1735 | ||
|
Oct 2011
7×97 Posts |
Quote:
Quote:
Using this, any SB is likely to be 1-2 bit lower than a non SB i3/i5/i7 which would probably be 1/2 to 1 bit lower than the quads. AMD quads seem to fall between the i3/i5/i7 bracket and the C2Q's, but I only have 1 data point to fall upon. |
||
|
|
|
|
|
#1736 | |
|
"James Heinrich"
May 2004
ex-Northern Ontario
11×311 Posts |
Quote:
* CPU usage per core (assuming AllowSleep=1) * CPU speed (whether overclocked or not) * SievePrimes value |
|
|
|
|
|
|
#1737 | |
|
Oct 2011
7·97 Posts |
Quote:
So, I just finished timings on 45M exps on all my CPUs and GPUs. The credit for a 45M exp is around 72.22, a 26M is 22.208, so a 45M exp takes 3.24 times more effort than a 26M. Using the timings at 45M the same as I did for 26M, I ended up getting an increase between 2.9 and 3.05 times, which is fairly close. If I use 3 I get 448GHzD on my worst system and 684GHzD on my most efficient. Double that for 2 LL saved and you get 996 to 1368. This tells me all of my machines are efficient doing 45M exponents to ^71, while one could maybe get away with doing ^72, seeing that 12 factors in 1708 runs have been found, which is kinda of a small pool to use for an estimate. |
|
|
|
|
|
|
#1738 |
|
Romulan Interpreter
Jun 2011
Thailand
72·197 Posts |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |