![]() |
|
|
#34 |
|
Romulan Interpreter
Jun 2011
Thailand
2×5×312 Posts |
I saw the lot of reasonable arguments coming after my post of yesterday. I would like to remind you that the BIG GOAL of this project is to get rid of AS MANY exponents as possible, AS FAST as possible. In fact, I feel a bit guilty as I was the guy who started all this discussion on the "extension to GPu-2-72" parallel thread, with my wonders about TF-ing at DC front.
So, if the goal is to get rid of the exponents (and NOT to find factors, though that is "cool" too), we REALLY NEED a "measure" of how fast we can get rid of exponents. I give no shit of all gigahertz per second or whatever you want to call it. It is fun to say "I have more gigahertz then you" or "i can be faster then you" and I am doing it often as a joke, too, but there is nothing (progress) coming from it, except fun of pissing off some other of you, who I can imagine, can stand a joke. What really matter is "how fast we can clear the exponents", working all together. For that, we need to know what we can do in a delimited period of time (one day, for example), depending of the hardware/software we have, and depending of the size of the exponents we are working with. You can call it gigahertz per day, you can call it crocodiles, whatever. Up to now it is very clear that a good GPU should be doing DC-LL tests if its owner chooses to work at the DC front, and it should do TF if its owner chooses to work at the LL front. Finding a factor at DC front (i.e. clearing one exponent) needs 2 (two) days, but DC-LL at DC front (i.e. clearing on exponent by confirming its residue) needs 1 (one!) day only. Finding a factor at LL-front needs also 2-3 days, but that will save 2 LL tests, which would take 10-14 days otherwise, on the same GPU, plus an amount of P-1 on some CPU. We have already collected enough data to prove this. And moreover, if someone wants to do P-1, the he should not give up and change to other work type, because P-1 proved its efficiency, as seen in the table. It cleared more exponents than any other method, for the time allocated. As I said on the other thread in the beginninf, we are not talking in gigahertz or crocodiles, or whatever. We are talking in TIME. Time is the king. Time is money! And here is where the discussion starts. One said, yes, BUT I don't have a "good" GPU "like yours", or "like his". OK. POST YOUR DATA. And and we can see how they compare (see my post #10 above). Another one said we compare apples with oranges. YES. But imagine you have to carry them over the ocean and you want to take as much as possible, avoiding your boat sink to davy jones. What you do? You have to COMPARE them. By scaling (weighting) them. No mater what's in the box, apples or oranges. You WILL scale them. No matter if you high end GPU is paired with antediluvian CPU, or viceversa. What is important is how fast you can move with it, through the sea of the exponents. So, LET US SEE. Another one said "the 100 should be higher and 3 should be lower". I completely agree, at least that is my case, but we need you to POST YOUR DATA to prove it. And then we can see how much REAL TIME is necessary, wall clock, to TF, DC, LL, etc, and we can adjust. Third one is worried that new formula will affect their credit on Primenet. Well, it will not. We can not touch that, unless George is reading this topic and he get some strange idea from all this discussion, and he start chopping down the limbs of primenet server. We hope he will not. All this discussion and calculus is just for few of us who really care about how fast we can clear the exponents, and how. By CPU, by GPU, by whatever, of course, without wasting money on supercomputers and electricity bills. It will not affect the credit given by primenet or anything else. We want to see how fast we can move, in REAL time, wall clock, and how much work we save, in REAL time, wall clock. Not in gigahertz. On a dog race, it is important how much space (distance) the dog covers in the unit of time, and not how many billion times per second he wag his tail during the race. For this we need reliable data. POST YOUR RESULTS. Then the formula could be adjusted. And we would decide (everyone for himself/herself) what is the best work to do, after that. I am pretty sure chalsall will find the best way to show the results in his tables. And chalsall, by the way, the file which has to be modified is called "prime.txt" and not "primenet.txt" ("get assignments" pages). |
|
|
|
|
|
#35 | |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3·29·83 Posts |
First let me say thank you for articulating what I couldn't. This isn't about credit, but about optimization.
Quote:
Last fiddled with by Dubslow on 2011-12-01 at 04:50 |
|
|
|
|
|
|
#36 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
2×3×1,693 Posts |
I would be happy to comply. What results are useful?
I have been thinking that my current allocation could be adjusted. Now it is: 4 -P-1 (2 cores each on 2 machines)Perhaps I should peel off one of the last group and let one of those CPU cores do DC. I'm not sure about the efficiency of also adding CUDALucas to that mix. |
|
|
|
|
|
#37 |
|
Apr 2010
Over the rainbow
23×52×13 Posts |
data?
here is one point Factor=N/A,56313277,70,71 GTX 460, done in 1 H and 34 min, granting 4.24638177 GHz day, underfed by a core 2 duo E8300 @2833 Mhz (one instance) that is 65GhZ/day With 2 instance, the speed remain almost the same,almost doubling the output : that would make around 125 GHz/day Last fiddled with by firejuggler on 2011-12-01 at 05:06 |
|
|
|
|
|
#38 |
|
Romulan Interpreter
Jun 2011
Thailand
2·5·312 Posts |
The CPU results are very well known from gimps already. We would be interested in GPU results. And for mfaktX, as this takes CPU resources too, would be better to see the GPU+CPU combination. So:
1. How long time you need to complete a TF at DC-front? (68 to 69 bitlevel, 25-29M exponent). What CPU+GPU combination? 2. How long time you need to complete a TF at LL-front? (71 to 72 bitlevel, ~50M exponent). What CPU+GPU combination? 3. How long time you need to complete a DC-front, DC assignment? (CudaLucas, 25-29M exponent). What GPU? 3. How long time you need to complete a LL-front, LL assignment? (CudaLucas, ~50M exponent). What GPU? Wall clock. We do not need Ghz/days, we have a well known formula to compute them exactly as Primenet would compute them, and we know how much CPU credit they worth. See my post #10 above. |
|
|
|
|
|
#39 | |
|
Romulan Interpreter
Jun 2011
Thailand
226128 Posts |
Quote:
Thank you. |
|
|
|
|
|
|
#40 | |
|
"Kieren"
Jul 2011
In My Own Galaxy!
2×3×1,693 Posts |
Quote:
|
|
|
|
|
|
|
#41 |
|
Dec 2010
Monticello
70316 Posts |
@Laurv:
Result numbers are given here, and Chalsall was surprised at just how effective the TF'ing has been: http://mersenneforum.org/showpost.ph...37&postcount=4 His result measure is LL checks avoided or completed. This is what we are trying to optimise. Obviously, the optimisation surface is complex -- Should you run mfaktc? How many instances? CUDALucas? (But I have trouble keeping it going beyond the exponent it is DC'ing at full speed) How much P-1 should I do? And running mfaktc at the moment costs either P-1 or LL time, too! And GPU to 72 means that after we have our TF party next month, the cost of finding TF factors is going to start rising...no more 70 bit exponents left to TF! And when someone moves the sieving in mfaktc off the CPU, there will be yet another choice. P.S. @Laurv: Don't feel guilty about the discussion -- I'm enjoying it, and certainly think it's worthwhile....I'll get you a benchmark soon. Last fiddled with by Christenson on 2011-12-01 at 06:31 |
|
|
|
|
|
#42 |
|
Romulan Interpreter
Jun 2011
Thailand
961010 Posts |
I am aware of the complexity of the stuff, and I personally mentioned before the thing that mfaktX is "sucking" juice from CPU, that could be used for P-1 (or DC/LL, but this anyhow is more effective with GPU). Therefore it is not simple to evaluate all the benefits, I agree with that, and that is why we need those data. I have them (and posted them) for my system, with my GPU, and a couple of users (including chalsall's "mini"-statistics you mentioned) posted their too. Therefore I think we are on the right path.
|
|
|
|
|
|
#43 |
|
"Kieren"
Jul 2011
In My Own Galaxy!
2×3×1,693 Posts |
GTX 460, PII 1090T (@ 3.5 GHz) dedicated CPU core
5201xxxx, 69-70, 57m 17s, 2.2987 GHz-Days EDIT: 3 instances running GPU usage 98-99% EDIT2: 5200xxxx, 71-72, 3h 46m 33s, 9.1957 GHz-Days The above worker was allowed to run out of work. The 2 remaining workers changed as follows: SievePrime 48980 to 14042, time/class ~6.8 to ~5.0 SievePrime 44257 to 11640, time/class ~13.7 to ~10.3 GPU usage climbed back to ~95% Results may be slightly skewed. Something in System/Kernel is using ~16% CPU time. This seems to be hitting mostly on the cores running mfaktc. Last fiddled with by kladner on 2011-12-01 at 16:34 Reason: Added data for another run and changes w/2 wrkrs |
|
|
|
|
|
#44 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
100110000000102 Posts |
Quote:
So, in order to be able to (approximately) compare GPUs to CPUs, we either then also divide all the CPU GHzDays values by 3, or else multiply the GPU wall-clock time by 3 to bring it to (as you pointed out) "Normalized GHzDays". I have chosen to do the latter because GHzDays are a metric which PrimeNet has been using and people are comfortable with it. To convert everything in the report to (approximate) wall-clock time, divide all the GHzDays values in the report by three. And, I know we'll never get an exact comparison transform coefficient value because a) it would require the statistics of every CPU and GPU submitting results to PrimeNet over time, and b) the values will continuously be changing and the hardware and software gets faster. I find it interesting that this entire discussion has come out of one small report which isn't really all that important.... Last fiddled with by chalsall on 2011-12-01 at 17:23 Reason: s/wall time/wall-clock time/ |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| What percentage of CPUs/GPUs have done a double check? | Mark Rose | Data | 4 | 2016-06-17 14:38 |
| Anyone using GPUs to do DC, LL or P-1 work? | chalsall | GPU to 72 | 56 | 2014-04-24 02:36 |
| GPUs impact on TF | petrw1 | GPU Computing | 0 | 2013-01-06 03:23 |
| LMH Factoring on GPUs | Uncwilly | LMH > 100M | 60 | 2012-05-15 08:37 |
| Compare interim files with different start shifts? | zanmato | Software | 12 | 2012-04-18 14:56 |