![]() |
|
|
#1651 |
|
Romulan Interpreter
Jun 2011
Thailand
32×29×37 Posts |
Why can't you? Is the ETA not reliable, or what? (especially for the one instance who already did ~800 classes). For me the estimation seems to be quite reliable.
Last fiddled with by LaurV on 2012-03-09 at 06:20 |
|
|
|
|
|
#1652 | ||
|
Oct 2011
7×97 Posts |
Quote:
Quote:
|
||
|
|
|
|
|
#1653 |
|
Romulan Interpreter
Jun 2011
Thailand
32·29·37 Posts |
This happens if your computer does something else which steals clocks from mfaktx, and some classes get more CPU/GPU time then others, therefore the former showing shorter ETAs then the later. If the computer is balanced, all affinities are set right, neither CPU not GPU is "starving", then the ETA's are VERY stable and reliable. And if not, you can anytime "guess" some EMA or SMA quite accurate from the sequence of classes you see on screen.
|
|
|
|
|
|
#1654 | |
|
"Kieren"
Jul 2011
In My Own Galaxy!
2·3·1,693 Posts |
Quote:
@LaurV -Yeah. It's difficult to impossible to prevent the Windows from doing other stuff. And, it's unpredictable which cores Windows will steal time from. I guess if I had completely killed P95 instead of just stopping workers for the cores I was giving to mfaktc, it would have left 2 cores idle and available for the system to mess with. |
|
|
|
|
|
|
#1655 | |
|
Jun 2005
8116 Posts |
Quote:
But since the number of classes is constant it's easier to compare time/class and avoid doing extra math. So the average time/class for your examples : 1 cpu = 19.9 sec/class 2 cpus = 11.0 (80% speedup vs 1 CPU) 3 cpus = 9.27 (19% speedup vs 2 CPUs) 4 cpus = 8.82 (5.2% speedup vs 3 CPUs) The 1 cpu version is just grabbed from the timing. The tough part is that this timing jumps around a bit, so 3 sig digits might be pushing it - if you left the computer totally idle for half an hour I bet they'd stabilize but what we have here is good enough. The N cpu calculation is 1/(1/t1 + 1/t2 + 1/t3 + ... t/tN). Basically you're converting to a class/sec throughput rate, adding that up among instances, and then converting back to a time/class. Using the formula posted a few pages back, each run of 46,166,291 from 68-72 gives 19.4GHz-days. Figuring the throughput is just converting to exponents/day : seconds * 970 = run time per exponent in seconds, convert to hours by dividing by 3600 seconds/hour, then 24/(hours/exponent) = exponents per day). 1 CPU = 4.76exp/day = 86.9 GHz-days / day 2 CPUs = 8.08 exp/day = 157 GHz-days / day 3 CPUs = 9.61 exp/day = 187 GHz-days / day 4 CPUs = 10.1 exp/day = 196 GHz-days / day Since all of these are just scaling by a constant, the percentage difference between the values is the same as above. But it gives a more concrete example of how many GHz-days/day you're giving up in exchange for doing something else with the extra CPUs. Your biggest gain is going from 1 to 2 CPUs, since that's where you finally max out the GPU. You don't get quite a 2x scaling because it takes less than the full second CPU to max the GPU. Since there's then extra CPU power, sieve prime increases to add extra load to the CPU to even it out with the work the GPU is doing. Moving to 3 is a smaller gain - here the increase is strictly from increases in sieve primes reducing the candidates per class that the GPU has to test . This increase may or may not be worth it - you're trading 30GHz-days/day of TF for 5-8(?) of LL results. Depends on how much you value each type of result along with lots of other factors. Same for the 3-4 CPU jump, except that the increases is less. If all you care about is max GHz-days/day from any source it makes sense (barely), but that's not the only way to decide this. I have a similar situation (adding the 4th core gives me ~9% better throughput) but decided to leave at least one core to do LL/DCs since I want to give balanced results from all the work types. ETA - my guess on classes/exponent might be off. Looking at the code it might be 960 or 961. That changes the numbers here by ~1%, but all in the same way so the percentage changes are the same. Considering the noise in the timing data not a huge deal, but hopefully someone who understands the math better than me (i.e. most anyone :) ) can help. Last fiddled with by kjaget on 2012-03-09 at 15:40 |
|
|
|
|
|
|
#1656 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
236568 Posts |
Thanks for the explanation. It was an interesting exercise. As mentioned, I can't really afford to make this machine into a P95/mfaktc slave. At some point, I will probably experiment with replacing the second mfaktc with CUDALucas, thus regaining most of a core for other uses. (I think?) It will be interesting to see how that affects GIMPS performance vs. overall usability.
I'm just watching the conversation on the progressive versions of CL to see when there is a general consensus on reliability of results. |
|
|
|
|
|
#1657 |
|
"James Heinrich"
May 2004
ex-Northern Ontario
342110 Posts |
I just started experimenting with CUDAlucas yesterday. First impressions: it uses zero CPU, but the GPU usage is more aggressive than mfaktc. Normal Windows usage is fine, I can't watch even DVD-quality video smoothly with CUDAlucas whereas it's only 1080 video I have to switch mfaktc off for. Most likely I'll go back to mfaktc, partly for usability, but also because the extra two cores don't scale so well with the new AVX cores in Prime95 (iteration times when running 6 workers are significantly slower than 4 workers).
|
|
|
|
|
|
#1658 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
2·3·1,693 Posts |
Thanks for the info, James. It's good to know how it works out for others. It might not hit me as hard, since YouTube and low-res .wmv's are just about the only videos I watch on the computer.
|
|
|
|
|
|
#1659 |
|
"Mike"
Aug 2002
2×23×179 Posts |
|
|
|
|
|
|
#1660 |
|
"Oliver"
Mar 2005
Germany
11×101 Posts |
I guess I need to buy a GTX 6[78]0... ;)
|
|
|
|
|
|
#1661 |
|
Romulan Interpreter
Jun 2011
Thailand
32·29·37 Posts |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |