mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU to 72 (https://www.mersenneforum.org/forumdisplay.php?f=95)
-   -   GPU to 72 status... (https://www.mersenneforum.org/showthread.php?t=16263)

chalsall 2014-05-20 19:25

[QUOTE=James Heinrich;373890]That would provide consistent data to map the 3D performance variance for the various GPUs. More data than I currently want to analyze, but could be interesting.[/QUOTE]

I would be very interested in having access to that kind of data -- to analyze in a 3 or 4 dimensional space.

As you know, I don't have privileged access to Primenet. But I understand that Primenet records (or, at least, is told) what client did what work.

If this knowledge could be exposed to those interested, it could be quite valuable.

NickOfTime 2014-05-20 19:28

[QUOTE=kracker;373891]60M on a HD 7770:

70-71: 153 GHz
71-72: 154 GHz
72-73: 154 GHz
73-74: 132 GHz

35M on same:

68-69: 188
69-70: 178
70-71: 160
71-72: 160

I'm curious if mfaktc is more "smooth".[/QUOTE]

Well, with mfakto, it switches from barrett15_73_gs to barrett15_82_gs where mfaktc is using barrett76_mul32

kracker 2014-05-20 19:40

[QUOTE=NickOfTime;373893]Well, with mfakto, it switches from barrett15_73_gs to barrett15_82_gs where mfaktc is using barrett76_mul32[/QUOTE]

[URL="https://github.com/Bdot42/mfakto/blob/master/src/mfakto.cpp"]If interested...[/URL]

NickOfTime 2014-05-20 20:03

[QUOTE=kracker;373894][URL="https://github.com/Bdot42/mfakto/blob/master/src/mfakto.cpp"]If interested...[/URL][/QUOTE]

Hmm, there is a BARRETT76_MUL32_GS. The only obvious difference is that it has stages 1 flag. Checked my ini and stages=1, maybe something about GCN is disabling it or some other bug....

Nope 76 is Mul32 where 82 is mul15 in find_fastest_kernel
[CODE]/* GPU_GCN (7850@1050MHz, v=2) / (7770@1100MHz)*/
BARRETT69_MUL15, // "cl_barrett15_69" (393.88 M/s) / (259.96 M/s)
BARRETT70_MUL15, // "cl_barrett15_70" (393.47 M/s) / (259.69 M/s)
BARRETT71_MUL15, // "cl_barrett15_71" (365.89 M/s) / (241.50 M/s)
BARRETT73_MUL15, // "cl_barrett15_73" (322.45 M/s) / (212.96 M/s)
BARRETT82_MUL15, // "cl_barrett15_82" (285.47 M/s) / (188.74 M/s)
BARRETT76_MUL32, // "cl_barrett32_76" (282.95 M/s) / (186.72 M/s)
BARRETT77_MUL32, // "cl_barrett32_77" (274.09 M/s) / (180.93 M/s)[/CODE]

VictordeHolland 2014-05-20 22:47

My HD7950 @900MHz is also more 'efficient' in the DC TF range.

mfakto v.014

35M
69-70 [cl_barrett15_71_gs_2] [B]420GHz-d[/B]
70-71 [cl_barrett15_73_gs_2] 380GHz-d

69M
71-72 [cl_barrett15_73_gs_2] 366GHz-d
72-73 [cl_barrett15_73_gs_2] 366GHz-d
73-74 [cl_barrett15_82_gs_2] 327GHz-d

chalsall 2014-05-21 00:29

[QUOTE=VictordeHolland;373902]My HD7950 @900MHz is also more 'efficient' in the DC TF range.[/QUOTE]

Then do everything you can in the DC range to 70.

Others will finish the exponents and release them for DCing.

LaurV 2014-05-21 03:30

[QUOTE=James Heinrich;373882]That seems unexpectedly lower than the 212GHd/d [URL="http://www.mersenne.ca/mfaktc.php"]my chart[/URL] predicts. [/QUOTE]
Not really, as we commented/discussed before, mfakto (AMD/OpenCL (?!?)) is known for getting lazy at higher bit levels. See my former posts about the subject. Now I can prove that it come from the (barrett? monty?) kernels which are better taking advantage of architecture, for lower bit levels.

For example my 7970 crunches 630G at ~40M to 69, but it gets as low as 400G at ~65M to 74. The best use (optimum point) for this card is either TF to ~70/71 bits, or DC of a ~37M exponent (where a power of 2 FFT is used optimally).

Bdot 2014-05-21 09:30

There are 3 factors that influence mfakto (and mfaktc) performance:
[LIST][*]most important: the kernel being used (selected only by target bitlevel)
Different algorithms / data chunk sizes have different effects ... For mfaktc you can see the effect when going beyond 76 bits, then it will also switch kernels.[*]measurable: size of the exponent in bits
For each bit, the exponentiation/modulo loop needs to be run once. The first ~6 bits are for free, and there is some one-time overhead, so the effect is not proportional, but that is why the same bit-level in the DC-range is faster than in the LL-range.[*]negligible: the number of '1's vs. '0's in the exponent (in binary)
For every '1' a small step needs to be done in addition. I think this is only measurable if you have no other 'noise' impacting the speed.[/LIST]On AMD H/W, the 32-bit kernel have quite some penalty because 32-bit muls are executed by the DP unit, so they have the same SP/DP performance ratio (1:16 on low and mid-level H/W, 1:4 on high end). In addition, the carry flag is not usable in OpenCL and needs extra mimic to get it. Therefore, 15-bit kernels were my fastest implementation, utilizing fast 24-bit multiplications and having room for the carry flag.

kracker 2014-05-21 17:06

[QUOTE=LaurV;373916]Not really, as we commented/discussed before, mfakto (AMD/OpenCL (?!?)) is known for getting lazy at higher bit levels. See my former posts about the subject. Now I can prove that it come from the (barrett? monty?) kernels which are better taking advantage of architecture, for lower bit levels.

For example my 7970 crunches 630G at ~40M to 69, but it gets as low as 400G at ~65M to 74. The best use (optimum point) for this card is either TF to ~70/71 bits, or DC of a ~37M exponent (where a power of 2 FFT is used optimally).[/QUOTE]

I think anything below 73 bits is fine.

chalsall 2014-05-21 17:17

[QUOTE=kracker;373938]I think anything below 73 bits is fine.[/QUOTE]

OK. Can we think about and discuss this?

The whole point of GPU72 is to optimize the available GPU firepower.

I have been using James' analysis as to where the cross-over points should be (read: where TF'ing Makes More Sense than LL'ing or DC'ing).

I'm more than happy to add additional "WMS" options for different card types.

manfred4 2014-05-21 21:38

If you are collecting these tests now, I can participate: Just checked the stats for my cards on mfactc 0.20:

GTX670@1176MHz:
[CODE]Exp toBit Ghzd/d
66M 74 275.8
66M 73 275.9
66M 72 276.2
66M 71 276.2

35M 71 284.8
35M 70 284.7
35M 69 284.6[/CODE]

GTX460M@675MHz

[CODE]66M 74 96.5
66M 73 96.4
66M 72 96.4
66M 71 96.4

35M 71 100.2
35M 70 100.2
35M 69 100.1[/CODE]

seems to be a lot smoother between the exponents and bitlevels.


All times are UTC. The time now is 23:17.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.