![]() |
|
|
#1640 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
2×3×1,693 Posts |
|
|
|
|
|
|
#1641 | |
|
Jun 2005
3·43 Posts |
Quote:
I'm working from a small sample size (just my personal hardware at home) so I don't have enough different systems to say exactly where the break-even point is (and even that would just be an rough approximation). But I see lots of people locking sieve primes at 5000 to free up CPU time without measuring the TF GHz-day performance hit you take, and just assuming that once the Mcandidates /sec is maxed out that there's nothing more that more mfaktc can do. Of course, this exact tradeoff depends on a complex interaction of how efficient your CPU is at both LL testing and sieving and how fast the GPU is. That's why I keep going back to the fact it's hard to give specific recommendations without knowing mfaktc timings running a bunch of instances in any particular system. I'm genuinely interested to see them for various CPU & GPU combinations since I have a limited set to test with here at home. And also I keep reiterating that GHz-days/day isn't the only way to measure this, so there can be other correct answers. For example, you might be willing to give up the absolute max GHz-day/day if you value your ranking in each category above absolute total throughput (so a TF GHz-day isn't equally valuable to an LL GHz-day or whatever). Can't argue with that approach, especially considering the GPU firepower |
|
|
|
|
|
|
#1642 |
|
"Mike"
Aug 2002
2·23·179 Posts |
We have four cores per box sieving for our GTX570s. If we let Mfaktc automatically choose the SievePrimes parameter, the cores go up to about 30,000 each and we net around 1250 GHz/days/day overall.
If we run Prime95 on each box as well, doing P-1 work using all four cores for one instance, and we set SievePrimes to 5000, we net around 1100 GHZ/days/day overall. In the second example, we are able to complete (roughly) three P-1 tests every two days per box. Since each box has 8GiB or more of memory it kinda makes sense to do the P-1 work and lose the 150 GHz/days/day. It does, however, push our temperatures up about 5°C. We are not sure what the optimal settings are but we know P-1 testing needs to be done, so we think it is more helpful for the project. While it is fun to crank out GHz/days/day, doing optimal work for the project is probably best. We are currently running through a pile of 70-71 bit TF work but once that is done we will go back to just taking whatever work we get to 72 bits, which is kinda the goal of the project, or something like that. (The fact that Craig is smoking us real bad makes it easier to make decisions like this.)
|
|
|
|
|
|
#1643 | |
|
Oct 2011
7×97 Posts |
Quote:
As a test, I just set the SP on the 2500 to 10k. M/s dropped by over 40 and it took over a minute longer to run the same exponent, which is approx a 5% decrease in throughput. |
|
|
|
|
|
|
#1644 | ||
|
Jun 2005
3×43 Posts |
Quote:
Quote:
Try an experiment. Set sieve prime adjust to 1. Run 1 instance and let it stabilize and note how long a class takes. Then run 2 of the same exponent, again letting it stabilize and keep track of the time. Repeat for 3 and 4 (up to however many cores you have). Post the results here and I'll show you what I'm talking about with respect to scaling. Your throughput will increase rapidly with each additional core until you load the GPU 100%. Then you'll see smaller increases as the increased sieve primes make the GPU run quicker per class. The large increase is obviously worth it, the smaller one is the one that's closer to a GHz-day parity with CPU power so it takes more careful analysis to figure out whether it's worth it. |
||
|
|
|
|
|
#1645 | |||
|
Oct 2011
7×97 Posts |
Quote:
Quote:
Quote:
2 cores of a 2500k and adjust = 1, SP = ~25000, 31 min each = ~93 exp/day CPU wait ~20% each ~183 GHzD/Day *(should claify this, once SP climbed above 25k, the est time also increased, so the run was actually adjust=0 and SP=25k) 1 core of a 2400 and adjust =1, SP=5000, ~25 min = ~58 exp/day <3% wait, ~114 GHzD/Day 2 core of a 2400 and adjust = 0, SP = 5000, ~36 min each, ~80 exp/day, CPU wait 20% , ~157 GHzD/Day 2 cores, adjust = 1, SP = ~12000, ~33.75 min, ~85.3 exp/day, <3% CPU wait, ~168 GHzD/Day So, a 7% gain on the 2500 from 160% cpu usage. Barely worth using a 2nd core. With such results, I did not continue. The 2400, 1 core, obvious, not enough CPU. 2 core @ 5k vs 2 core @ 12K. Same 7% gain, but CPU usage goes from ~160 to ~200%. Did not bother to continue. It can be argued that since one has a 560 and one a 560 Ti that you cannot adequately compare this, but it sure seems like the 2400 is more efficient. Edit, reran it after thinking about the slowdown, must have had a process running as new run was 36K SP and ~25min giving a 32% increase. Fair improvement, but takes a lot of resources. It's possible similiar increases could be had on the 2400, will have to test later. Last fiddled with by bcp19 on 2012-03-08 at 22:12 |
|||
|
|
|
|
|
#1646 | |
|
"Kieren"
Jul 2011
In My Own Galaxy!
27AE16 Posts |
Quote:
On the other hand, I would be willing to produce some data on different numbers of mfaktc instances if that would be useful. EDIT: At the moment, I'm trying out locking Sieve Primes at 14000 for two instances. This was aimed at making other programs run a little better on the system. Clearly, there a trade offs. If I left mfaktc to decide, it would be running SP ~18-19K, depending on the exponents. The GPU fluctuates around 95%, and can be driven up with 3 instances. Thing is, I'm not sure I can live with the system under those circumstances. This is my only machine, and I do need for it to behave moderately well for general use. Last fiddled with by kladner on 2012-03-08 at 21:46 |
|
|
|
|
|
|
#1647 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
100111101011102 Posts |
@kjaget I will have to put these test runs up in more than one post. I guess I should have redirected the outputs to text files. In any case, I ran from 1 to 4 instances of mfaktc, with affinities set to individual cores of the 1090T. For this test, I used the same exponent for all instances. I let the tests run until Sieve Primes had held steady for several classes.
These are the results for 2 and 3 instances. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |