mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet > GPU to 72

Reply
 
Thread Tools
Old 2011-12-01, 04:39   #34
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

2×5×312 Posts
Default

I saw the lot of reasonable arguments coming after my post of yesterday. I would like to remind you that the BIG GOAL of this project is to get rid of AS MANY exponents as possible, AS FAST as possible. In fact, I feel a bit guilty as I was the guy who started all this discussion on the "extension to GPu-2-72" parallel thread, with my wonders about TF-ing at DC front.

So, if the goal is to get rid of the exponents (and NOT to find factors, though that is "cool" too), we REALLY NEED a "measure" of how fast we can get rid of exponents. I give no shit of all gigahertz per second or whatever you want to call it. It is fun to say "I have more gigahertz then you" or "i can be faster then you" and I am doing it often as a joke, too, but there is nothing (progress) coming from it, except fun of pissing off some other of you, who I can imagine, can stand a joke. What really matter is "how fast we can clear the exponents", working all together.

For that, we need to know what we can do in a delimited period of time (one day, for example), depending of the hardware/software we have, and depending of the size of the exponents we are working with. You can call it gigahertz per day, you can call it crocodiles, whatever.

Up to now it is very clear that a good GPU should be doing DC-LL tests if its owner chooses to work at the DC front, and it should do TF if its owner chooses to work at the LL front. Finding a factor at DC front (i.e. clearing one exponent) needs 2 (two) days, but DC-LL at DC front (i.e. clearing on exponent by confirming its residue) needs 1 (one!) day only. Finding a factor at LL-front needs also 2-3 days, but that will save 2 LL tests, which would take 10-14 days otherwise, on the same GPU, plus an amount of P-1 on some CPU. We have already collected enough data to prove this. And moreover, if someone wants to do P-1, the he should not give up and change to other work type, because P-1 proved its efficiency, as seen in the table. It cleared more exponents than any other method, for the time allocated.

As I said on the other thread in the beginninf, we are not talking in gigahertz or crocodiles, or whatever. We are talking in TIME. Time is the king. Time is money!

And here is where the discussion starts. One said, yes, BUT I don't have a "good" GPU "like yours", or "like his". OK. POST YOUR DATA. And and we can see how they compare (see my post #10 above).

Another one said we compare apples with oranges. YES. But imagine you have to carry them over the ocean and you want to take as much as possible, avoiding your boat sink to davy jones. What you do? You have to COMPARE them. By scaling (weighting) them. No mater what's in the box, apples or oranges. You WILL scale them. No matter if you high end GPU is paired with antediluvian CPU, or viceversa. What is important is how fast you can move with it, through the sea of the exponents. So, LET US SEE.

Another one said "the 100 should be higher and 3 should be lower". I completely agree, at least that is my case, but we need you to POST YOUR DATA to prove it. And then we can see how much REAL TIME is necessary, wall clock, to TF, DC, LL, etc, and we can adjust.

Third one is worried that new formula will affect their credit on Primenet. Well, it will not. We can not touch that, unless George is reading this topic and he get some strange idea from all this discussion, and he start chopping down the limbs of primenet server. We hope he will not.

All this discussion and calculus is just for few of us who really care about how fast we can clear the exponents, and how. By CPU, by GPU, by whatever, of course, without wasting money on supercomputers and electricity bills. It will not affect the credit given by primenet or anything else. We want to see how fast we can move, in REAL time, wall clock, and how much work we save, in REAL time, wall clock. Not in gigahertz. On a dog race, it is important how much space (distance) the dog covers in the unit of time, and not how many billion times per second he wag his tail during the race.

For this we need reliable data. POST YOUR RESULTS.

Then the formula could be adjusted. And we would decide (everyone for himself/herself) what is the best work to do, after that. I am pretty sure chalsall will find the best way to show the results in his tables.

And chalsall, by the way, the file which has to be modified is called "prime.txt" and not "primenet.txt" ("get assignments" pages).
LaurV is offline   Reply With Quote
Old 2011-12-01, 04:49   #35
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

First let me say thank you for articulating what I couldn't. This isn't about credit, but about optimization.
Quote:
Originally Posted by LaurV View Post
Up to now it is very clear that a good GPU should be doing DC-LL tests if its owner chooses to work at the DC front, and it should do TF if its owner chooses to work at the LL front. Finding a factor at DC front (i.e. clearing one exponent) needs 2 (two) days, but DC-LL at DC front (i.e. clearing on exponent by confirming its residue) needs 1 (one!) day only. Finding a factor at LL-front needs also 2-3 days, but that will save 2 LL tests, which would take 10-14 days otherwise, on the same GPU, plus an amount of P-1 on some CPU. We have already collected enough data to prove this. And moreover, if someone wants to do P-1, the he should not give up and change to other work type, because P-1 proved its efficiency, as seen in the table. It cleared more exponents than any other method, for the time allocated.

As I said on the other thread in the beginninf, we are not talking in gigahertz or crocodiles, or whatever. We are talking in TIME. Time is the king. Time is money!
This is why I support doing GHz-Days/100 and NOT times 3 (NOT the 'normalized credit') (or whatever coefficients, that's not the important part), because then we get an idea of how much time finding the factor took, not however many FLOPs it took (much harder to interpret). I realize that introducing GPU-Day and CPU-Day units might be a bit much, but they're the easiest to use to actually get a sense of what the data means and thus easiest to then optimize for.

Last fiddled with by Dubslow on 2011-12-01 at 04:50
Dubslow is offline   Reply With Quote
Old 2011-12-01, 04:56   #36
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×1,693 Posts
Default

Quote:
Originally Posted by LaurV View Post
For this we need reliable data. POST YOUR RESULTS.
I would be happy to comply. What results are useful?

I have been thinking that my current allocation could be adjusted. Now it is:
4 -P-1 (2 cores each on 2 machines)
1 -LL
3 -TF (GPU + 3 CPU cores)
Perhaps I should peel off one of the last group and let one of those CPU cores do DC. I'm not sure about the efficiency of also adding CUDALucas to that mix.
kladner is offline   Reply With Quote
Old 2011-12-01, 05:04   #37
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

23×52×13 Posts
Default

data?
here is one point
Factor=N/A,56313277,70,71

GTX 460, done in 1 H and 34 min, granting 4.24638177 GHz day, underfed by a core 2 duo E8300 @2833 Mhz (one instance) that is 65GhZ/day
With 2 instance, the speed remain almost the same,almost doubling the output : that would make around 125 GHz/day

Last fiddled with by firejuggler on 2011-12-01 at 05:06
firejuggler is offline   Reply With Quote
Old 2011-12-01, 05:19   #38
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

2×5×312 Posts
Default

Quote:
Originally Posted by kladner View Post
I would be happy to comply. What results are useful?
The CPU results are very well known from gimps already. We would be interested in GPU results. And for mfaktX, as this takes CPU resources too, would be better to see the GPU+CPU combination. So:

1. How long time you need to complete a TF at DC-front? (68 to 69 bitlevel, 25-29M exponent). What CPU+GPU combination?
2. How long time you need to complete a TF at LL-front? (71 to 72 bitlevel, ~50M exponent). What CPU+GPU combination?
3. How long time you need to complete a DC-front, DC assignment? (CudaLucas, 25-29M exponent). What GPU?
3. How long time you need to complete a LL-front, LL assignment? (CudaLucas, ~50M exponent). What GPU?

Wall clock. We do not need Ghz/days, we have a well known formula to compute them exactly as Primenet would compute them, and we know how much CPU credit they worth. See my post #10 above.
LaurV is offline   Reply With Quote
Old 2011-12-01, 05:22   #39
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

2×5×312 Posts
Default

Quote:
Originally Posted by firejuggler View Post
data?
here is one point
Factor=N/A,56313277,70,71

GTX 460, done in 1 H and 34 min, granting 4.24638177 GHz day, underfed by a core 2 duo E8300 @2833 Mhz (one instance) that is 65GhZ/day
With 2 instance, the speed remain almost the same,almost doubling the output : that would make around 125 GHz/day
you posted in between (I did not see your post when I started writing my former post) and you are right, the number of instances you can run would really make a difference. This should be reported too.
Thank you.
LaurV is offline   Reply With Quote
Old 2011-12-01, 05:34   #40
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·1,693 Posts
Default

Quote:
Originally Posted by LaurV View Post
1. How long time you need to complete a TF at DC-front? (68 to 69 bitlevel, 25-29M exponent). What CPU+GPU combination?
2. How long time you need to complete a TF at LL-front? (71 to 72 bitlevel, ~50M exponent). What CPU+GPU combination?
3. How long time you need to complete a DC-front, DC assignment? (CudaLucas, 25-29M exponent). What GPU?
3. How long time you need to complete a LL-front, LL assignment? (CudaLucas, ~50M exponent). What GPU?
Thanks. I'll try to keep track and post when I have data.
kladner is offline   Reply With Quote
Old 2011-12-01, 06:24   #41
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

5×359 Posts
Default

@Laurv:

Result numbers are given here, and Chalsall was surprised at just how effective the TF'ing has been:

http://mersenneforum.org/showpost.ph...37&postcount=4

His result measure is LL checks avoided or completed. This is what we are trying to optimise.

Obviously, the optimisation surface is complex -- Should you run mfaktc? How many instances? CUDALucas? (But I have trouble keeping it going beyond the exponent it is DC'ing at full speed) How much P-1 should I do? And running mfaktc at the moment costs either P-1 or LL time, too! And GPU to 72 means that after we have our TF party next month, the cost of finding TF factors is going to start rising...no more 70 bit exponents left to TF!

And when someone moves the sieving in mfaktc off the CPU, there will be yet another choice.

P.S. @Laurv: Don't feel guilty about the discussion -- I'm enjoying it, and certainly think it's worthwhile....I'll get you a benchmark soon.

Last fiddled with by Christenson on 2011-12-01 at 06:31
Christenson is offline   Reply With Quote
Old 2011-12-01, 07:55   #42
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

2×5×312 Posts
Default

I am aware of the complexity of the stuff, and I personally mentioned before the thing that mfaktX is "sucking" juice from CPU, that could be used for P-1 (or DC/LL, but this anyhow is more effective with GPU). Therefore it is not simple to evaluate all the benefits, I agree with that, and that is why we need those data. I have them (and posted them) for my system, with my GPU, and a couple of users (including chalsall's "mini"-statistics you mentioned) posted their too. Therefore I think we are on the right path.
LaurV is offline   Reply With Quote
Old 2011-12-01, 15:38   #43
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·1,693 Posts
Default An mfaktc result

GTX 460, PII 1090T (@ 3.5 GHz) dedicated CPU core

5201xxxx, 69-70, 57m 17s, 2.2987 GHz-Days

EDIT: 3 instances running
GPU usage 98-99%

EDIT2:
5200xxxx, 71-72, 3h 46m 33s, 9.1957 GHz-Days

The above worker was allowed to run out of work. The 2 remaining workers changed as follows:
SievePrime 48980 to 14042, time/class ~6.8 to ~5.0
SievePrime 44257 to 11640, time/class ~13.7 to ~10.3
GPU usage climbed back to ~95%

Results may be slightly skewed. Something in System/Kernel is using ~16% CPU time. This seems to be hitting mostly on the cores running mfaktc.

Last fiddled with by kladner on 2011-12-01 at 16:34 Reason: Added data for another run and changes w/2 wrkrs
kladner is offline   Reply With Quote
Old 2011-12-01, 16:15   #44
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2·5·7·139 Posts
Default

Quote:
Originally Posted by Dubslow View Post
This is why I support doing GHz-Days/100 and NOT times 3 (NOT the 'normalized credit') (or whatever coefficients, that's not the important part), because then we get an idea of how much time finding the factor took, not however many FLOPs it took (much harder to interpret). I realize that introducing GPU-Day and CPU-Day units might be a bit much, but they're the easiest to use to actually get a sense of what the data means and thus easiest to then optimize for.
But as you pointed out when the report was only doing GHzDays/100 for GPUs, that wasn't fair to the CPUs because it was converting the GPU metric to (approximate) "wall-clock time", while leaving CPUs at GHzDays.

So, in order to be able to (approximately) compare GPUs to CPUs, we either then also divide all the CPU GHzDays values by 3, or else multiply the GPU wall-clock time by 3 to bring it to (as you pointed out) "Normalized GHzDays". I have chosen to do the latter because GHzDays are a metric which PrimeNet has been using and people are comfortable with it.

To convert everything in the report to (approximate) wall-clock time, divide all the GHzDays values in the report by three.

And, I know we'll never get an exact comparison transform coefficient value because a) it would require the statistics of every CPU and GPU submitting results to PrimeNet over time, and b) the values will continuously be changing and the hardware and software gets faster.

I find it interesting that this entire discussion has come out of one small report which isn't really all that important....

Last fiddled with by chalsall on 2011-12-01 at 17:23 Reason: s/wall time/wall-clock time/
chalsall is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
What percentage of CPUs/GPUs have done a double check? Mark Rose Data 4 2016-06-17 14:38
Anyone using GPUs to do DC, LL or P-1 work? chalsall GPU to 72 56 2014-04-24 02:36
GPUs impact on TF petrw1 GPU Computing 0 2013-01-06 03:23
LMH Factoring on GPUs Uncwilly LMH > 100M 60 2012-05-15 08:37
Compare interim files with different start shifts? zanmato Software 12 2012-04-18 14:56

All times are UTC. The time now is 14:22.


Fri Jul 16 14:22:12 UTC 2021 up 49 days, 12:09, 2 users, load averages: 2.23, 1.76, 1.72

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.