mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet > GPU to 72

Reply
Thread Tools
Old 2014-05-20, 19:25   #2927
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

9,767 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
That would provide consistent data to map the 3D performance variance for the various GPUs. More data than I currently want to analyze, but could be interesting.
I would be very interested in having access to that kind of data -- to analyze in a 3 or 4 dimensional space.

As you know, I don't have privileged access to Primenet. But I understand that Primenet records (or, at least, is told) what client did what work.

If this knowledge could be exposed to those interested, it could be quite valuable.
chalsall is offline   Reply With Quote
Old 2014-05-20, 19:28   #2928
NickOfTime
 
Apr 2014

2·3·7 Posts
Default

Quote:
Originally Posted by kracker View Post
60M on a HD 7770:

70-71: 153 GHz
71-72: 154 GHz
72-73: 154 GHz
73-74: 132 GHz

35M on same:

68-69: 188
69-70: 178
70-71: 160
71-72: 160

I'm curious if mfaktc is more "smooth".
Well, with mfakto, it switches from barrett15_73_gs to barrett15_82_gs where mfaktc is using barrett76_mul32

Last fiddled with by NickOfTime on 2014-05-20 at 19:36
NickOfTime is offline   Reply With Quote
Old 2014-05-20, 19:40   #2929
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by NickOfTime View Post
Well, with mfakto, it switches from barrett15_73_gs to barrett15_82_gs where mfaktc is using barrett76_mul32
If interested...
kracker is offline   Reply With Quote
Old 2014-05-20, 20:03   #2930
NickOfTime
 
Apr 2014

2×3×7 Posts
Default

Quote:
Originally Posted by kracker View Post
Hmm, there is a BARRETT76_MUL32_GS. The only obvious difference is that it has stages 1 flag. Checked my ini and stages=1, maybe something about GCN is disabling it or some other bug....

Nope 76 is Mul32 where 82 is mul15 in find_fastest_kernel
Code:
/*  GPU_GCN  (7850@1050MHz, v=2) / (7770@1100MHz)*/       
BARRETT69_MUL15,  // "cl_barrett15_69" (393.88 M/s) / (259.96 M/s)       
BARRETT70_MUL15,  // "cl_barrett15_70" (393.47 M/s) / (259.69 M/s)       
BARRETT71_MUL15,  // "cl_barrett15_71" (365.89 M/s) / (241.50 M/s)       
BARRETT73_MUL15,  // "cl_barrett15_73" (322.45 M/s) / (212.96 M/s)       
BARRETT82_MUL15,  // "cl_barrett15_82" (285.47 M/s) / (188.74 M/s)       
BARRETT76_MUL32,  // "cl_barrett32_76" (282.95 M/s) / (186.72 M/s)       
BARRETT77_MUL32,  // "cl_barrett32_77" (274.09 M/s) / (180.93 M/s)

Last fiddled with by NickOfTime on 2014-05-20 at 20:23
NickOfTime is offline   Reply With Quote
Old 2014-05-20, 22:47   #2931
VictordeHolland
 
VictordeHolland's Avatar
 
"Victor de Hollander"
Aug 2011
the Netherlands

23×3×72 Posts
Default

My HD7950 @900MHz is also more 'efficient' in the DC TF range.

mfakto v.014

35M
69-70 [cl_barrett15_71_gs_2] 420GHz-d
70-71 [cl_barrett15_73_gs_2] 380GHz-d

69M
71-72 [cl_barrett15_73_gs_2] 366GHz-d
72-73 [cl_barrett15_73_gs_2] 366GHz-d
73-74 [cl_barrett15_82_gs_2] 327GHz-d
VictordeHolland is offline   Reply With Quote
Old 2014-05-21, 00:29   #2932
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

230478 Posts
Default

Quote:
Originally Posted by VictordeHolland View Post
My HD7950 @900MHz is also more 'efficient' in the DC TF range.
Then do everything you can in the DC range to 70.

Others will finish the exponents and release them for DCing.
chalsall is offline   Reply With Quote
Old 2014-05-21, 03:30   #2933
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

2·3·1,609 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
That seems unexpectedly lower than the 212GHd/d my chart predicts.
Not really, as we commented/discussed before, mfakto (AMD/OpenCL (?!?)) is known for getting lazy at higher bit levels. See my former posts about the subject. Now I can prove that it come from the (barrett? monty?) kernels which are better taking advantage of architecture, for lower bit levels.

For example my 7970 crunches 630G at ~40M to 69, but it gets as low as 400G at ~65M to 74. The best use (optimum point) for this card is either TF to ~70/71 bits, or DC of a ~37M exponent (where a power of 2 FFT is used optimally).

Last fiddled with by LaurV on 2014-05-21 at 04:26
LaurV is online now   Reply With Quote
Old 2014-05-21, 09:30   #2934
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

59710 Posts
Default

There are 3 factors that influence mfakto (and mfaktc) performance:
  • most important: the kernel being used (selected only by target bitlevel)
    Different algorithms / data chunk sizes have different effects ... For mfaktc you can see the effect when going beyond 76 bits, then it will also switch kernels.
  • measurable: size of the exponent in bits
    For each bit, the exponentiation/modulo loop needs to be run once. The first ~6 bits are for free, and there is some one-time overhead, so the effect is not proportional, but that is why the same bit-level in the DC-range is faster than in the LL-range.
  • negligible: the number of '1's vs. '0's in the exponent (in binary)
    For every '1' a small step needs to be done in addition. I think this is only measurable if you have no other 'noise' impacting the speed.
On AMD H/W, the 32-bit kernel have quite some penalty because 32-bit muls are executed by the DP unit, so they have the same SP/DP performance ratio (1:16 on low and mid-level H/W, 1:4 on high end). In addition, the carry flag is not usable in OpenCL and needs extra mimic to get it. Therefore, 15-bit kernels were my fastest implementation, utilizing fast 24-bit multiplications and having room for the carry flag.

Last fiddled with by Bdot on 2014-05-21 at 10:18 Reason: s/linear/proportional/ (because it is linear)
Bdot is offline   Reply With Quote
Old 2014-05-21, 17:06   #2935
kracker
 
kracker's Avatar
 
"Mr. Meeseeks"
Jan 2012
California, USA

23×271 Posts
Default

Quote:
Originally Posted by LaurV View Post
Not really, as we commented/discussed before, mfakto (AMD/OpenCL (?!?)) is known for getting lazy at higher bit levels. See my former posts about the subject. Now I can prove that it come from the (barrett? monty?) kernels which are better taking advantage of architecture, for lower bit levels.

For example my 7970 crunches 630G at ~40M to 69, but it gets as low as 400G at ~65M to 74. The best use (optimum point) for this card is either TF to ~70/71 bits, or DC of a ~37M exponent (where a power of 2 FFT is used optimally).
I think anything below 73 bits is fine.
kracker is offline   Reply With Quote
Old 2014-05-21, 17:17   #2936
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

262716 Posts
Default

Quote:
Originally Posted by kracker View Post
I think anything below 73 bits is fine.
OK. Can we think about and discuss this?

The whole point of GPU72 is to optimize the available GPU firepower.

I have been using James' analysis as to where the cross-over points should be (read: where TF'ing Makes More Sense than LL'ing or DC'ing).

I'm more than happy to add additional "WMS" options for different card types.
chalsall is offline   Reply With Quote
Old 2014-05-21, 21:38   #2937
manfred4
 
manfred4's Avatar
 
Mar 2014
Germany

23·3·5 Posts
Default

If you are collecting these tests now, I can participate: Just checked the stats for my cards on mfactc 0.20:

GTX670@1176MHz:
Code:
Exp	toBit	Ghzd/d
66M	74	275.8
66M	73	275.9
66M	72	276.2
66M	71	276.2

35M	71	284.8
35M	70	284.7
35M	69	284.6
GTX460M@675MHz

Code:
66M	74	96.5
66M	73	96.4
66M	72	96.4
66M	71	96.4

35M	71	100.2
35M	70	100.2
35M	69	100.1
seems to be a lot smoother between the exponents and bitlevels.
manfred4 is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Status Primeinator Operation Billion Digits 5 2011-12-06 02:35
62 bit status 1997rj7 Lone Mersenne Hunters 27 2008-09-29 13:52
OBD Status Uncwilly Operation Billion Digits 22 2005-10-25 14:05
1-2M LLR status paulunderwood 3*2^n-1 Search 2 2005-03-13 17:03
Status of 26.0M - 26.5M 1997rj7 Lone Mersenne Hunters 25 2004-06-18 16:46

All times are UTC. The time now is 09:50.


Mon Aug 2 09:50:34 UTC 2021 up 10 days, 4:19, 0 users, load averages: 1.56, 1.32, 1.29

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.