![]() |
Honestly it sounds a bit odd to me too, but the speed difference is real. It may also be related to the fact that mfakto is using many different kernels for (almost) every bit, and the higher-bit kernels are less efficient (from the "barrett15" family, there are kernels for 69,71,73,74,76 bits, or so) when they do the inverses, but I still can't understand why the differences are so big. Doing a 74 bit for a 120M exponent still gets higher speed, so I considered that the difference is not so much related to the bitlevel (kernel) but to sieving (the amount of candidates for 120M to 72 is the same as 60M to 73).
|
I've always known that lower exponents and bitlevels on mfakto were much faster but I've never really thought about it.
|
Sieving does not really care about bit levels. The sieve block size can be configured. Using different kernels, however, makes the majority of the difference. As long as a 5x15-bit kernel can be used, it should still be quite fast, up to 73 bits. Below 73, there are multiple levels of optimizations based on George's improvements for mfaktc 0.20. The lower the bitlevel, the more "relaxed" is mfakto's math as there are more bits available to catch calculation errors.
However, I cannot explain why mfakto's speed should be different for the same bitlevel and different exponents. Every binary 1 in the exponent causes a bit more effort than a 0. Every doubling of the exponent causes one more round in the exponentiation algorithm, causing it to slow down as the exponent increases. I cannot see any reason for gaining speed on higher exponents. |
[QUOTE=LaurV;362809]Honestly it sounds a bit odd to me too, but the speed difference is real. It may also be related to the fact that mfakto is using many different kernels for (almost) every bit, and the higher-bit kernels are less efficient (from the "barrett15" family, there are kernels for 69,71,73,74,76 bits, or so) when they do the inverses, but I still can't understand why the differences are so big. Doing a 74 bit for a 120M exponent still gets higher speed, so I considered that the difference is not so much related to the bitlevel (kernel) but to sieving (the amount of candidates for 120M to 72 is the same as 60M to 73).[/QUOTE]
Here is an excerpt from mfakto source: [CODE]/* GPU_GCN (7850@1050MHz, v=2) / (7770@1100MHz)*/ BARRETT69_MUL15, // "cl_barrett15_69" (393.88 M/s) / (259.96 M/s) BARRETT70_MUL15, // "cl_barrett15_70" (393.47 M/s) / (259.69 M/s) BARRETT71_MUL15, // "cl_barrett15_71" (365.89 M/s) / (241.50 M/s) BARRETT73_MUL15, // "cl_barrett15_73" (322.45 M/s) / (212.96 M/s) BARRETT82_MUL15, // "cl_barrett15_82" (285.47 M/s) / (188.74 M/s) BARRETT76_MUL32, // "cl_barrett32_76" (282.95 M/s) / (186.72 M/s) BARRETT77_MUL32, // "cl_barrett32_77" (274.09 M/s) / (180.93 M/s) BARRETT83_MUL15, // "cl_barrett15_83" (267.27 M/s) / (176.79 M/s) BARRETT87_MUL32, // "cl_barrett32_87" (248.77 M/s) / (164.12 M/s) BARRETT79_MUL32, // "cl_barrett32_79" (241.48 M/s) / (159.38 M/s) BARRETT88_MUL15, // "cl_barrett15_88" (239.83 M/s) / (158.46 M/s) BARRETT88_MUL32, // "cl_barrett32_88" (239.69 M/s) / (158.22 M/s) BARRETT70_MUL24, // "cl_barrett24_70" (226.74 M/s) / (149.63 M/s) BARRETT92_MUL32, // "cl_barrett32_92" (216.10 M/s) / (142.61 M/s) [/CODE] This shows the fastest-to-slowest kernels for GCN type GPUs (HD 7xxx is a GCN). 69-70 will be handled by BARRETT70_MUL15, 70-71 by BARRETT71_MUL15, etc... The numbers in parentheses are raw candidates/second (this too comes from the source code -- not my numbers:smile:). As you can see there is a significant difference between each of the kernels, which is entirely sufficient to explain the differences between your columns. No idea how to explain the differences b/w rows :sad: [BTW, you have two "70-71" columns. Whats up with that? :confused:] |
[QUOTE=axn;362926][BTW, you have two "70-71" columns. Whats up with that? :confused:][/QUOTE]
That is me trying to be more explanatory than necessary. The initial columns were 69, 70, 71, etc, and I tried to "add a clarification" during the manual alignments. :blush: So, the first two columns are labeled wrong (a bit more). |
A question regarding barett15 kernels and their bitlevels. From the source:
[CODE] { BARRETT69_MUL15, "cl_barrett15_69", 60, 69, 0, NULL}, { BARRETT70_MUL15, "cl_barrett15_70", 60, 69, 0, NULL}, { BARRETT71_MUL15, "cl_barrett15_71", 60, 70, 0, NULL}, [/CODE] From the naming convention, I'd had assumed that the 70 & 71 kernels would have bit max of 70 & 71 respectively, instead of 69 & 70 that is hardwired into the code. Is this intentional? If so, the 70 kernel is superfluous as the 69 kernel also handles same range and is (almost?) always faster. |
The naming reflects what I planned the kernels originally for, but some did not live up to that. So yes, _70 could be removed. I just thought that some day I could find a tweak to really make it do the 70 bits.
|
BTW, for those hunting for factors in the <10M range: GPU-sieving on the fast 15-bit kernels is available down to 60 bits. Just no "LESS_CLASSES"-version.
For comparison, same card (HD7850@1050MHz): 2.3M 63->64: 315 GHz-days/day 2.3M 64->65: 325 GHz-days/day 2.3M 65->67: 330 GHz-days/day 65M 63->64: 112 GHz-days/day 65M 64->65: 180 GHz-days/day 65M 65->66: 234 GHz-days/day 65M 66->67: 274 GHz-days/day 65M 67->69: 276 GHz-days/day 65M 69->70: 261 GHz-days/day 65M 70->73: 234 GHz-days/day 65M 73->80: 213 GHz-days/day |
:beer:
|
Is this automatic? Or do I have to manually do something for those kernels?
Thanks! :smile: |
This thread should really be stickied, just like the mfaktc thread.
|
| All times are UTC. The time now is 23:09. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.