mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   LL speed vs cores (https://www.mersenneforum.org/showthread.php?t=23304)

danmur 2018-04-27 20:08

LL speed vs cores
 
Thinking Kaby Lake or Coffee Lake will 4 real cores be faster than 2 real cores? In LL testing will there be 50% faster performance?

Thank you!

a1call 2018-04-27 20:22

LL test is not easily parallel-able.
So unless you are running multiple tests on multiple candidates or you have a very advanced algorithm, multiplicity of the cores would not make much of a difference, if any. :smile:

paulunderwood 2018-04-27 20:27

[QUOTE=a1call;486404]LL test is not easily parallel-able.
So unless you are running multiple tests on multiple candidates or you have a very advanced algorithm, multiplicity of the cores would not make much of a difference, if any. :smile:[/QUOTE]

:bs meter:

LL has been parallelized. 4 real cores are better than 2 real cores. The bottleneck seems to be memory bandwidth -- so fast RAM of the order of 3200MHz is recommended

a1call 2018-04-27 20:33

Do you have any reference sources for that?

Thanks in advance.

paulunderwood 2018-04-27 20:38

[QUOTE=a1call;486409]Do you have any reference sources for that?

Thanks in advance.[/QUOTE]

When I say that LL has been parallelized really mean FFT has been, by George and Ernst, and in the extreme for GPUs. I believe by what I hear from others benchmarking Prime95 that 4 cores are better than 2, and fast memory is a must to get maximal performance,

a1call 2018-04-27 20:49

I had a thread here where I discussed this before and no one claimed it has been done. As far as I can remember you can parallel-compute the exponentiation only up to the candidate and the Mod result above that is unpredictable and necessary for the next computation.
If paralleling beyond that is done, it means that you can predict the PowerMod result and assign it to a separate core, I think that is very unlikely.

fivemack 2018-04-27 20:55

The Fourier transform that does the multiplication is the thing that's split over multiple cores - it's not that different iterations can be done in parallel, it's that the iterations themselves are faster.

I got to under 24 hours for a 40M-range double-check on a 14-core Skylake-X machine.

a1call 2018-04-27 20:58

Thank you,
I am glad that I posted to learn that.:smile:

CRGreathouse 2018-04-27 21:08

[QUOTE=paulunderwood;486410]When I say that LL has been parallelized really mean FFT has been[/QUOTE]

Indeed.

[QUOTE=paulunderwood;486410]fast memory is a must to get maximal performance[/QUOTE]

Do we know that the limits to parallelism are, or are we (as I suspect) not able to supply sufficient fast memory?

kriesel 2018-04-27 21:49

[QUOTE=a1call;486409]Do you have any reference sources for that?

Thanks in advance.[/QUOTE]

Prime95 or mprime produced benchmarks on multicore hardware vs. various fft lengths.
Actual experience.

Total system throughput may be higher with use of one core per exponent, or not, depending on cache size limitations, data bandwidth between cores, and other variables including the exponent or fft length. Total wall clock run time of one primality test is reduced, in some instances up to 20 or more cores, per Madpoo on some dual-14-core system. On older dual-6 hardware, 3-cores-each is more efficient than 4-cores each, since then all the cores per instance can be in the same package and connected by greater bandwidth than between packages. Try it. I think the multi-core capability goes back to v25 or so.

a1call 2018-04-27 22:00

Acknowledged with many thanks.


All times are UTC. The time now is 06:58.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.