mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-03-09, 06:19   #1651
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

32·29·37 Posts
Default

Quote:
Originally Posted by bcp19 View Post
FYI, without completing the run, you cannot tell how much increase you have in throuput with 4 running compared to 1,2 or 3.
Why can't you? Is the ETA not reliable, or what? (especially for the one instance who already did ~800 classes). For me the estimation seems to be quite reliable.

Last fiddled with by LaurV on 2012-03-09 at 06:20
LaurV is online now   Reply With Quote
Old 2012-03-09, 07:05   #1652
bcp19
 
bcp19's Avatar
 
Oct 2011

7·97 Posts
Default

Quote:
Originally Posted by LaurV View Post
Why can't you? Is the ETA not reliable, or what? (especially for the one instance who already did ~800 classes). For me the estimation seems to be quite reliable.
Look at the 4 together, upper left goes 9:11, 9:13, 9:14, 9:11 remaining. Bottom right 7:50, 7:46, 7:47. All of the timings for the last 3-4 lines show 34.9-35.9 seconds. Looking at the ETA may give you a ROUGH estimation, but you need to complete a full run to get an accurate one. AS an example:
Quote:
no factor for M29231441 from 2^68 to 2^69 [mfaktc 0.18 barrett79_mul32]
tf(): total time spent: 1h 0m 10.935s
no factor for M29231479 from 2^68 to 2^69 [mfaktc 0.18 barrett79_mul32]
tf(): total time spent: 1h 0m 4.711s
no factor for M29231617 from 2^68 to 2^69 [mfaktc 0.18 barrett79_mul32]
tf(): total time spent: 1h 0m 10.746s
no factor for M29231743 from 2^68 to 2^69 [mfaktc 0.18 barrett79_mul32]
tf(): total time spent: 1h 0m 12.207s
no factor for M29231773 from 2^68 to 2^69 [mfaktc 0.18 barrett79_mul32]
tf(): total time spent: 1h 0m 5.245s
These exponents print a new line every 3.7 seconds, but I can see the ETA jump by 10-60 seconds up or down. That's a pretty huge error rate.
bcp19 is offline   Reply With Quote
Old 2012-03-09, 07:31   #1653
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

32×29×37 Posts
Default

Quote:
Originally Posted by bcp19 View Post
These exponents print a new line every 3.7 seconds, but I can see the ETA jump by 10-60 seconds up or down. That's a pretty huge error rate.
This happens if your computer does something else which steals clocks from mfaktx, and some classes get more CPU/GPU time then others, therefore the former showing shorter ETAs then the later. If the computer is balanced, all affinities are set right, neither CPU not GPU is "starving", then the ETA's are VERY stable and reliable. And if not, you can anytime "guess" some EMA or SMA quite accurate from the sequence of classes you see on screen.
LaurV is online now   Reply With Quote
Old 2012-03-09, 14:22   #1654
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
Originally Posted by bcp19 View Post
FYI, without completing the run, you cannot tell how much increase you have in throuput with 4 running compared to 1,2 or 3.
Oops. I guess I didn't fully grasp what was significant information. I was mainly going for Time/Class readings once Sieve Primes had more or less stabilized.

@LaurV -Yeah. It's difficult to impossible to prevent the Windows from doing other stuff. And, it's unpredictable which cores Windows will steal time from. I guess if I had completely killed P95 instead of just stopping workers for the cores I was giving to mfaktc, it would have left 2 cores idle and available for the system to mess with.
kladner is offline   Reply With Quote
Old 2012-03-09, 15:25   #1655
kjaget
 
kjaget's Avatar
 
Jun 2005

3·43 Posts
Default

Quote:
Originally Posted by kladner View Post
Oops. I guess I didn't fully grasp what was significant information. I was mainly going for Time/Class readings once Sieve Primes had more or less stabilized.
Cool, that's more than enough. There's 970 classes tested per exponent, so simple multiplication will get you the theoretical run time when nothing else is stealing CPU & GPU time from the system.

But since the number of classes is constant it's easier to compare time/class and avoid doing extra math. So the average time/class for your examples :

1 cpu = 19.9 sec/class
2 cpus = 11.0 (80% speedup vs 1 CPU)
3 cpus = 9.27 (19% speedup vs 2 CPUs)
4 cpus = 8.82 (5.2% speedup vs 3 CPUs)

The 1 cpu version is just grabbed from the timing. The tough part is that this timing jumps around a bit, so 3 sig digits might be pushing it - if you left the computer totally idle for half an hour I bet they'd stabilize but what we have here is good enough.

The N cpu calculation is 1/(1/t1 + 1/t2 + 1/t3 + ... t/tN). Basically you're converting to a class/sec throughput rate, adding that up among instances, and then converting back to a time/class.

Using the formula posted a few pages back, each run of 46,166,291 from 68-72 gives 19.4GHz-days. Figuring the throughput is just converting to exponents/day : seconds * 970 = run time per exponent in seconds, convert to hours by dividing by 3600 seconds/hour, then 24/(hours/exponent) = exponents per day).

1 CPU = 4.76exp/day = 86.9 GHz-days / day
2 CPUs = 8.08 exp/day = 157 GHz-days / day
3 CPUs = 9.61 exp/day = 187 GHz-days / day
4 CPUs = 10.1 exp/day = 196 GHz-days / day

Since all of these are just scaling by a constant, the percentage difference between the values is the same as above. But it gives a more concrete example of how many GHz-days/day you're giving up in exchange for doing something else with the extra CPUs.

Your biggest gain is going from 1 to 2 CPUs, since that's where you finally max out the GPU. You don't get quite a 2x scaling because it takes less than the full second CPU to max the GPU. Since there's then extra CPU power, sieve prime increases to add extra load to the CPU to even it out with the work the GPU is doing.

Moving to 3 is a smaller gain - here the increase is strictly from increases in sieve primes reducing the candidates per class that the GPU has to test . This increase may or may not be worth it - you're trading 30GHz-days/day of TF for 5-8(?) of LL results. Depends on how much you value each type of result along with lots of other factors.

Same for the 3-4 CPU jump, except that the increases is less. If all you care about is max GHz-days/day from any source it makes sense (barely), but that's not the only way to decide this. I have a similar situation (adding the 4th core gives me ~9% better throughput) but decided to leave at least one core to do LL/DCs since I want to give balanced results from all the work types.

ETA - my guess on classes/exponent might be off. Looking at the code it might be 960 or 961. That changes the numbers here by ~1%, but all in the same way so the percentage changes are the same. Considering the noise in the timing data not a huge deal, but hopefully someone who understands the math better than me (i.e. most anyone :) ) can help.

Last fiddled with by kjaget on 2012-03-09 at 15:40
kjaget is offline   Reply With Quote
Old 2012-03-09, 15:39   #1656
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

27AE16 Posts
Default

Thanks for the explanation. It was an interesting exercise. As mentioned, I can't really afford to make this machine into a P95/mfaktc slave. At some point, I will probably experiment with replacing the second mfaktc with CUDALucas, thus regaining most of a core for other uses. (I think?) It will be interesting to see how that affects GIMPS performance vs. overall usability.

I'm just watching the conversation on the progressive versions of CL to see when there is a general consensus on reliability of results.
kladner is offline   Reply With Quote
Old 2012-03-09, 16:12   #1657
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

Quote:
Originally Posted by kladner View Post
I will probably experiment with replacing the second mfaktc with CUDALucas, thus regaining most of a core for other uses. (I think?) It will be interesting to see how that affects GIMPS performance vs. overall usability.
I just started experimenting with CUDAlucas yesterday. First impressions: it uses zero CPU, but the GPU usage is more aggressive than mfaktc. Normal Windows usage is fine, I can't watch even DVD-quality video smoothly with CUDAlucas whereas it's only 1080 video I have to switch mfaktc off for. Most likely I'll go back to mfaktc, partly for usability, but also because the extra two cores don't scale so well with the new AVX cores in Prime95 (iteration times when running 6 workers are significantly slower than 4 workers).
James Heinrich is offline   Reply With Quote
Old 2012-03-09, 16:26   #1658
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

236568 Posts
Default

Thanks for the info, James. It's good to know how it works out for others. It might not hit me as hard, since YouTube and low-res .wmv's are just about the only videos I watch on the computer.
kladner is offline   Reply With Quote
Old 2012-03-12, 04:56   #1659
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

200528 Posts
Default

A cute post from a few years ago:

http://www.mersenneforum.org/showpos...6&postcount=12
Xyzzy is offline   Reply With Quote
Old 2012-03-23, 23:07   #1660
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11×101 Posts
Default

I guess I need to buy a GTX 6[78]0... ;)
TheJudger is offline   Reply With Quote
Old 2012-03-24, 05:33   #1661
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

32·29·37 Posts
Default

Quote:
Originally Posted by TheJudger View Post
I guess I need to buy a GTX 6[78]0... ;)
Do you mean 6(or 7, or 8)80 ?
LaurV is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 11:57.


Mon Aug 2 11:57:48 UTC 2021 up 10 days, 6:26, 0 users, load averages: 1.67, 1.70, 1.43

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.