mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

LaurV 2012-03-09 06:19

[QUOTE=bcp19;292404]FYI, without completing the run, you cannot tell how much increase you have in throuput with 4 running compared to 1,2 or 3.[/QUOTE]
Why can't you? Is the ETA not reliable, or what? (especially for the one instance who already did ~800 classes). For me the estimation seems to be quite reliable.

bcp19 2012-03-09 07:05

[QUOTE=LaurV;292405]Why can't you? Is the ETA not reliable, or what? (especially for the one instance who already did ~800 classes). For me the estimation seems to be quite reliable.[/QUOTE]

Look at the 4 together, upper left goes 9:11, 9:13, 9:14, 9:11 remaining. Bottom right 7:50, 7:46, 7:47. All of the timings for the last 3-4 lines show 34.9-35.9 seconds. Looking at the ETA may give you a ROUGH estimation, but you need to complete a full run to get an accurate one. AS an example:
[quote]no factor for M29231441 from 2^68 to 2^69 [mfaktc 0.18 barrett79_mul32]
tf(): total time spent: 1h 0m 10.935s
no factor for M29231479 from 2^68 to 2^69 [mfaktc 0.18 barrett79_mul32]
tf(): total time spent: 1h 0m 4.711s
no factor for M29231617 from 2^68 to 2^69 [mfaktc 0.18 barrett79_mul32]
tf(): total time spent: 1h 0m 10.746s
no factor for M29231743 from 2^68 to 2^69 [mfaktc 0.18 barrett79_mul32]
tf(): total time spent: 1h 0m 12.207s
no factor for M29231773 from 2^68 to 2^69 [mfaktc 0.18 barrett79_mul32]
tf(): total time spent: 1h 0m 5.245s
[/quote]
These exponents print a new line every 3.7 seconds, but I can see the ETA jump by 10-60 seconds up or down. That's a pretty huge error rate.

LaurV 2012-03-09 07:31

[QUOTE=bcp19;292408]These exponents print a new line every 3.7 seconds, but I can see the ETA jump by 10-60 seconds up or down. That's a pretty huge error rate.[/QUOTE]
This happens if your computer does something else which steals clocks from mfaktx, and some classes get more CPU/GPU time then others, therefore the former showing shorter ETAs then the later. If the computer is balanced, all affinities are set right, neither CPU not GPU is "starving", then the ETA's are VERY stable and reliable. And if not, you can anytime "guess" some EMA or SMA quite accurate from the sequence of classes you see on screen.

kladner 2012-03-09 14:22

[QUOTE=bcp19;292404]FYI, without completing the run, you cannot tell how much increase you have in throuput with 4 running compared to 1,2 or 3.[/QUOTE]

Oops. I guess I didn't fully grasp what was significant information. I was mainly going for Time/Class readings once Sieve Primes had more or less stabilized.

@LaurV -Yeah. It's difficult to impossible to prevent the Windows from doing other stuff. And, it's unpredictable which cores Windows will steal time from. I guess if I had completely killed P95 instead of just stopping workers for the cores I was giving to mfaktc, it would have left 2 cores idle and available for the system to mess with.

kjaget 2012-03-09 15:25

[QUOTE=kladner;292423]Oops. I guess I didn't fully grasp what was significant information. I was mainly going for Time/Class readings once Sieve Primes had more or less stabilized.[/QUOTE]

Cool, that's more than enough. There's 970 classes tested per exponent, so simple multiplication will get you the theoretical run time when nothing else is stealing CPU & GPU time from the system.

But since the number of classes is constant it's easier to compare time/class and avoid doing extra math. So the average time/class for your examples :

1 cpu = 19.9 sec/class
2 cpus = 11.0 (80% speedup vs 1 CPU)
3 cpus = 9.27 (19% speedup vs 2 CPUs)
4 cpus = 8.82 (5.2% speedup vs 3 CPUs)

The 1 cpu version is just grabbed from the timing. The tough part is that this timing jumps around a bit, so 3 sig digits might be pushing it - if you left the computer totally idle for half an hour I bet they'd stabilize but what we have here is good enough.

The N cpu calculation is 1/(1/t1 + 1/t2 + 1/t3 + ... t/tN). Basically you're converting to a class/sec throughput rate, adding that up among instances, and then converting back to a time/class.

Using the formula posted a few pages back, each run of 46,166,291 from 68-72 gives 19.4GHz-days. Figuring the throughput is just converting to exponents/day : seconds * 970 = run time per exponent in seconds, convert to hours by dividing by 3600 seconds/hour, then 24/(hours/exponent) = exponents per day).

1 CPU = 4.76exp/day = 86.9 GHz-days / day
2 CPUs = 8.08 exp/day = 157 GHz-days / day
3 CPUs = 9.61 exp/day = 187 GHz-days / day
4 CPUs = 10.1 exp/day = 196 GHz-days / day

Since all of these are just scaling by a constant, the percentage difference between the values is the same as above. But it gives a more concrete example of how many GHz-days/day you're giving up in exchange for doing something else with the extra CPUs.

Your biggest gain is going from 1 to 2 CPUs, since that's where you finally max out the GPU. You don't get quite a 2x scaling because it takes less than the full second CPU to max the GPU. Since there's then extra CPU power, sieve prime increases to add extra load to the CPU to even it out with the work the GPU is doing.

Moving to 3 is a smaller gain - here the increase is strictly from increases in sieve primes reducing the candidates per class that the GPU has to test . This increase may or may not be worth it - you're trading 30GHz-days/day of TF for 5-8(?) of LL results. Depends on how much you value each type of result along with lots of other factors.

Same for the 3-4 CPU jump, except that the increases is less. If all you care about is max GHz-days/day from any source it makes sense (barely), but that's not the only way to decide this. I have a similar situation (adding the 4th core gives me ~9% better throughput) but decided to leave at least one core to do LL/DCs since I want to give balanced results from all the work types.

ETA - my guess on classes/exponent might be off. Looking at the code it might be 960 or 961. That changes the numbers here by ~1%, but all in the same way so the percentage changes are the same. Considering the noise in the timing data not a huge deal, but hopefully someone who understands the math better than me (i.e. most anyone :) ) can help.

kladner 2012-03-09 15:39

Thanks for the explanation. It was an interesting exercise. As mentioned, I can't really afford to make this machine into a P95/mfaktc slave. At some point, I will probably experiment with replacing the second mfaktc with CUDALucas, thus regaining most of a core for other uses. (I think?) It will be interesting to see how that affects GIMPS performance vs. overall usability.

I'm just watching the conversation on the progressive versions of CL to see when there is a general consensus on reliability of results.

James Heinrich 2012-03-09 16:12

[QUOTE=kladner;292427]I will probably experiment with replacing the second mfaktc with CUDALucas, thus regaining most of a core for other uses. (I think?) It will be interesting to see how that affects GIMPS performance vs. overall usability.[/QUOTE]I just started experimenting with CUDAlucas yesterday. First impressions: it uses zero CPU, but the GPU usage is more aggressive than mfaktc. Normal Windows usage is fine, I can't watch even DVD-quality video smoothly with CUDAlucas whereas it's only 1080 video I have to switch mfaktc off for. Most likely I'll go back to mfaktc, partly for usability, but also because the extra two cores don't scale so well with the new AVX cores in Prime95 (iteration times when running 6 workers are significantly slower than 4 workers).

kladner 2012-03-09 16:26

Thanks for the info, James. It's good to know how it works out for others. It might not hit me as hard, since YouTube and low-res .wmv's are just about the only videos I watch on the computer.

Xyzzy 2012-03-12 04:56

A cute post from a few years ago:

[url]http://www.mersenneforum.org/showpost.php?p=10266&postcount=12[/url]

TheJudger 2012-03-23 23:07

I guess I need to buy a GTX 6[78]0... ;)

LaurV 2012-03-24 05:33

[QUOTE=TheJudger;293953]I guess I need to buy a GTX 6[78]0... ;)[/QUOTE]
Do you mean 6(or 7, or 8)80 ? :smile:


All times are UTC. The time now is 23:16.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.