![]() |
LL speed vs cores
Thinking Kaby Lake or Coffee Lake will 4 real cores be faster than 2 real cores? In LL testing will there be 50% faster performance?
Thank you! |
LL test is not easily parallel-able.
So unless you are running multiple tests on multiple candidates or you have a very advanced algorithm, multiplicity of the cores would not make much of a difference, if any. :smile: |
[QUOTE=a1call;486404]LL test is not easily parallel-able.
So unless you are running multiple tests on multiple candidates or you have a very advanced algorithm, multiplicity of the cores would not make much of a difference, if any. :smile:[/QUOTE] :bs meter: LL has been parallelized. 4 real cores are better than 2 real cores. The bottleneck seems to be memory bandwidth -- so fast RAM of the order of 3200MHz is recommended |
Do you have any reference sources for that?
Thanks in advance. |
[QUOTE=a1call;486409]Do you have any reference sources for that?
Thanks in advance.[/QUOTE] When I say that LL has been parallelized really mean FFT has been, by George and Ernst, and in the extreme for GPUs. I believe by what I hear from others benchmarking Prime95 that 4 cores are better than 2, and fast memory is a must to get maximal performance, |
I had a thread here where I discussed this before and no one claimed it has been done. As far as I can remember you can parallel-compute the exponentiation only up to the candidate and the Mod result above that is unpredictable and necessary for the next computation.
If paralleling beyond that is done, it means that you can predict the PowerMod result and assign it to a separate core, I think that is very unlikely. |
The Fourier transform that does the multiplication is the thing that's split over multiple cores - it's not that different iterations can be done in parallel, it's that the iterations themselves are faster.
I got to under 24 hours for a 40M-range double-check on a 14-core Skylake-X machine. |
Thank you,
I am glad that I posted to learn that.:smile: |
[QUOTE=paulunderwood;486410]When I say that LL has been parallelized really mean FFT has been[/QUOTE]
Indeed. [QUOTE=paulunderwood;486410]fast memory is a must to get maximal performance[/QUOTE] Do we know that the limits to parallelism are, or are we (as I suspect) not able to supply sufficient fast memory? |
[QUOTE=a1call;486409]Do you have any reference sources for that?
Thanks in advance.[/QUOTE] Prime95 or mprime produced benchmarks on multicore hardware vs. various fft lengths. Actual experience. Total system throughput may be higher with use of one core per exponent, or not, depending on cache size limitations, data bandwidth between cores, and other variables including the exponent or fft length. Total wall clock run time of one primality test is reduced, in some instances up to 20 or more cores, per Madpoo on some dual-14-core system. On older dual-6 hardware, 3-cores-each is more efficient than 4-cores each, since then all the cores per instance can be in the same package and connected by greater bandwidth than between packages. Try it. I think the multi-core capability goes back to v25 or so. |
Acknowledged with many thanks.
|
| All times are UTC. The time now is 06:58. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.