mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   An obvious question (sorry). (https://www.mersenneforum.org/showthread.php?t=25896)

Aramis Wyler 2020-09-02 03:56

An obvious question (sorry).
 
So I'm pretty excited to have gotten colab to compile and execute gpuowl, and am happily chugging along at about 25% of the way through my first PRP (on gpuowl). Now, the output says it will take 20 hours to run this (111M range) and so I'm like, ok, that sounds pretty awesome, how many GHzD/Day is that? Some back of the napkin math tells me that it is about 600.


Now that card does TF at about 6x times that fast. Is it really that much slower to do PRPs on GPUs, or is it more likely that I've done a bad job compiling or configuring gpuowl?


EDIT: For reference, the card is a Tesla V100-SXM2

Prime95 2020-09-02 04:41

You cannot use Primenet GHzDays to compare TF and PRP efficiency. The GHz days formulas were set in stone based on how fast a 2008(?) Core2 Intel CPU performed these calculations. That CPU was (relatively speaking) good at LL, bad at TF. Thus, when an architecture was developed that was good at TF, the GHz-days credited to that architecture were inflated (compared to actual wall clock time invested).

Hope that made sense :)

Aramis Wyler 2020-09-02 04:59

That does make sense, thank you - though now I feel like I'm abusing the Top Producers ranking every time I TF something.


Is there some other unit of work that is used to work out the value of TF bit depth vs P-1 vs the PRP itself? Flops?

Viliam Furik 2020-09-02 11:56

[QUOTE=Aramis Wyler;555740]That does make sense, thank you - though now I feel like I'm abusing the Top Producers ranking every time I TF something.


Is there some other unit of work that is used to work out the value of TF bit depth vs P-1 vs the PRP itself? Flops?[/QUOTE]

You can convert the GHz-D/D to FLOPS (FLoating Point OPerations per Second), by applying a simple formula: 500 GHz-D/D = 1 TFLOPS (10[SUP]12[/SUP] FLOPS). So if you have a GPU with TF performance say 2000 GHz-D/D, that's 4 TFLOPS in FP32 (single-precision floating-point operations).

Aramis Wyler 2020-09-02 14:35

Interesting. I think we've already established that GHzD/Day aren't comparable with TF vs PRP, so is there a different calculation converting PRP GHzD/Days to Flops?

Prime95 2020-09-02 15:57

[QUOTE=Aramis Wyler;555740]Is there some other unit of work that is used to work out the value of TF bit depth vs P-1 vs the PRP itself? Flops?[/QUOTE]

The unit is "wall clock time".

To decide the correct TF level we compare "how many exponents can this hardware eliminate per day by TFing to 2^N" to "how many exponents can this hardware eliminate per day by PRPing".

Since the above comparison is different for each piece of hardware we kind of guess as to the average piece of consumer hardware to determine our target TF levels.

Viliam Furik 2020-09-02 16:07

[QUOTE=Aramis Wyler;555765]Interesting. I think we've already established that GHzD/Day aren't comparable with TF vs PRP, so is there a different calculation converting PRP GHzD/Days to Flops?[/QUOTE]

Well, it's all the same, at least that's how the server treats those numbers when displaying TFLOPS throughput ([URL="https://www.mersenne.org/primenet/"]here[/URL]).

chalsall 2020-09-02 16:25

[QUOTE=Prime95;555776]Since the above comparison is different for each piece of hardware we kind of guess as to the average piece of consumer hardware to determine our target TF levels.[/QUOTE]

Actually, it's a bit more than guesses... James stepped up many years ago to answer the question "Just where should we TF to?"

Before this we /were/ just guessing, but with really absolutely no idea what was optimal. Please see the charts shown on each drill-down page from his [URL="https://www.mersenne.ca/cudalucas.php"]GPU Lucas-Lehmer performance comparison chart[/URL]. For example, [URL="https://www.mersenne.ca/cudalucas.php?model=715"]for a Tesla V100[/URL] it ***used*** to be economically optimal to go to 77 "bits" at 92M or so.

One of the exciting things about the project is development is ongoing. So the economically optimal cross-over points have changed several times over the years.

Now that the Proof Mechnisim has been introduced, DCs will soon (read: in a few years) be obsolete, so the cross-over analysis will once again have to be revisited.

We live in very interesting times!!! :smile:

P.S. Oh, also... Optimal is something to be strived for, but difficult to achieve. Further complicating the calculus is different people like to do different things. Their kit, time, and electrons...

P.P.S. Perfect is the enemy of good.

Uncwilly 2020-09-02 18:00

[QUOTE=chalsall;555783]P.S. Oh, also... Optimal is something to be strived for, but difficult to achieve. Further complicating the calculus is different people like to do different things. Their kit, time, and electrons...[/QUOTE]
And throw in the volume of available work force.

LaurV 2020-09-03 05:10

Yep. On James' graphic, the "PRP Line" has to be somewhere in the middle between "First LL Line" and "DC Line". The reason is that in the future, we will mostly do PRP+CERT, which is a bit more than a single LL, but less than two LLs. So, click on your hardware (GPU) and see where you are, and decide how high you have to go with TF with your hardware, to eliminate the exponents faster (wall clock time).



On the other hand, James, your filters are missing the newest cards (RTX30xx).


All times are UTC. The time now is 02:35.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.