mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet > GPU to 72

Reply
Thread Tools
Old 2014-05-21, 21:54   #2938
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

342110 Posts
Default

Quote:
Originally Posted by Bdot View Post
There are 3 factors that influence mfakto (and mfaktc) performance
Quote:
Originally Posted by James Heinrich View Post
What would be brilliant would be if Oliver/Bertram could include a broad benchmark that runs a few classes for a range of exponents (every 1M, 5M, etc across the range specified [e.g. 30M-80M]) and for each test at various bit ranges and give throughput performance at that exponent+bitlevel. That would provide consistent data to map the 3D performance variance for the various GPUs.
@Bdot: how hard would it be to implement a benchmark such as I suggest?
James Heinrich is offline   Reply With Quote
Old 2014-05-22, 03:05   #2939
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

2×3×1,609 Posts
Default

Quote:
Originally Posted by manfred4 View Post
seems to be a lot smoother between the exponents and bitlevels.
Yes, it is. As explained few posts above, mfaktc uses (almost?) the same kernel (barrett76?) for all this stuff. The "big drop" in performance you will feel only for very short assignments, or only for bitlevels over 76, when a less performant kernel will be used. Also, as discussed, mfakto has "lower bitlevel" kernels, optimized to fit the AMD/OpenCL architecture (see Bdot's posts).
LaurV is online now   Reply With Quote
Old 2014-05-22, 06:31   #2940
NickOfTime
 
Apr 2014

528 Posts
Default

Hmm, or how much work to create a BARRETT76_MUL15 kernel?
NickOfTime is offline   Reply With Quote
Old 2014-05-22, 14:01   #2941
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by NickOfTime View Post
Hmm, or how much work to create a BARRETT76_MUL15 kernel?
Well, I think the question is: How fast or efficient would it be?
kracker is offline   Reply With Quote
Old 2014-05-22, 16:41   #2942
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by kracker View Post
Well, I think the question is: How fast or efficient would it be?

It would be exactly as fast as the 82_MUL15 kernel, because it would need to be implemented like it.

When using 32-bit chunks of data, all current kernels need 3 of them, giving 96 bits. Now, certain short-cuts are possible that reduce the available bits. It's basically using less exact intermediate values during the calculation that are cheaper to compute, like skipping evaluation of some carry flags. Different short-cuts have different costs, but adding them all in brings you down to 76 bits usable out of the 96.

Now, when using 15-bit chunks, you can use 5 of them for 75 "raw" bits, or 6 chunks for 90. Adding in all short-cuts results in 69 and 82 usable bits, respectively. 73 bits is the full implementation with 5 chunks (no short-cuts; there are always small rounding errors that eat one or two bits).

It might be worth checking again, if I can squeeze out 74 bits in 5 chunks - I currently don't remember why I did not succeed the last time ...
Bdot is offline   Reply With Quote
Old 2014-05-23, 18:56   #2943
NickOfTime
 
Apr 2014

2×3×7 Posts
Default

Quote:
Originally Posted by Bdot View Post

It would be exactly as fast as the 82_MUL15 kernel, because it would need to be implemented like it.

When using 32-bit chunks of data, all current kernels need 3 of them, giving 96 bits. Now, certain short-cuts are possible that reduce the available bits. It's basically using less exact intermediate values during the calculation that are cheaper to compute, like skipping evaluation of some carry flags. Different short-cuts have different costs, but adding them all in brings you down to 76 bits usable out of the 96.

Now, when using 15-bit chunks, you can use 5 of them for 75 "raw" bits, or 6 chunks for 90. Adding in all short-cuts results in 69 and 82 usable bits, respectively. 73 bits is the full implementation with 5 chunks (no short-cuts; there are always small rounding errors that eat one or two bits).

It might be worth checking again, if I can squeeze out 74 bits in 5 chunks - I currently don't remember why I did not succeed the last time ...
Hmm, I guess it depends on how many to 74 exponents are left and how long we will be processing them :-). I seem to be mostly processing 73-74's in the 65/66M range at the moment...

Last fiddled with by NickOfTime on 2014-05-23 at 19:13
NickOfTime is offline   Reply With Quote
Old 2014-05-23, 20:11   #2944
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11·311 Posts
Default

Quote:
Originally Posted by NickOfTime View Post
Hmm, I guess it depends on how many to 74 exponents are left and how long we will be processing them :-)
Many, and a long time.
As of 01-May-2014 "many" was approximately 21,176,383 exponents above 65M (for requiring 274) in PrimeNet range (below 1000M) that are currently TF'd to less than 274 and will eventually need to be taken there. I didn't bother to calculate the THz-years required, but it'll be a bunch.

Small trivia: if we continue TF limits in the current curve, TF for the range between 1000M-4294M will require 1.5 EHz-days (exahertz-days, as in thousand-million GHz-days. That means 1000 TitanBlack/780ti GPUs running continuously for 1000 years.)

Last fiddled with by James Heinrich on 2014-05-23 at 20:12
James Heinrich is offline   Reply With Quote
Old 2014-05-23, 23:16   #2945
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

10010101012 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
@Bdot: how hard would it be to implement a benchmark such as I suggest?
I was extending the --perftest mode of mfakto over the last versions, but so far it is mainly testing the sieving performance, in order to find the best config values.

Doing the performance tests for each kernel is on the list ... Oliver and I discussed that a while ago, in order to have some comparable results. we need to revive that.


And my attempt for a 74_15 kernel comes in less than 1% ahead of the 82_15 kernel, but still misses some factors I will need to use an even more accurate modulo function that will slow down the kernel even more ...
Bdot is offline   Reply With Quote
Old 2014-05-24, 21:10   #2946
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

9,767 Posts
Default

Quote:
Originally Posted by Bdot View Post
Doing the performance tests for each kernel is on the list ... Oliver and I discussed that a while ago, in order to have some comparable results. we need to revive that.
That would be really cool. What would be even more cool is if such results could then be submitted to Primenet, and then made available to those interested. Perhaps James could help with that.

Quote:
Originally Posted by Bdot View Post
And my attempt for a 74_15 kernel comes in less than 1% ahead of the 82_15 kernel, but still misses some factors I will need to use an even more accurate modulo function that will slow down the kernel even more ...
Not meaning to blow inappropriate sunshine. But what you and Oliver (et al) do is (IMO) quite impressive.
chalsall is offline   Reply With Quote
Old 2014-05-24, 21:59   #2947
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

11×311 Posts
Default

Quote:
Originally Posted by chalsall View Post
What would be even more cool is if such results could then be submitted to Primenet, and then made available to those interested. Perhaps James could help with that.
I don't know about on PrimeNet since I'm not all that comfortable with database interactions there, but I'd be happy to make such data available in raw and aggregated form on mersenne.ca
James Heinrich is offline   Reply With Quote
Old 2014-05-24, 22:07   #2948
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

9,767 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
I don't know about on PrimeNet since I'm not all that comfortable with database interactions there, but I'd be happy to make such data available in raw and aggregated form on mersenne.ca
LOL... If you could "make it so", is would be appreciated and useful.
chalsall is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Status Primeinator Operation Billion Digits 5 2011-12-06 02:35
62 bit status 1997rj7 Lone Mersenne Hunters 27 2008-09-29 13:52
OBD Status Uncwilly Operation Billion Digits 22 2005-10-25 14:05
1-2M LLR status paulunderwood 3*2^n-1 Search 2 2005-03-13 17:03
Status of 26.0M - 26.5M 1997rj7 Lone Mersenne Hunters 25 2004-06-18 16:46

All times are UTC. The time now is 09:50.


Mon Aug 2 09:50:36 UTC 2021 up 10 days, 4:19, 0 users, load averages: 1.56, 1.32, 1.29

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.