mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2018-04-27, 20:08   #1
danmur
 
Dec 2016

11 Posts
Default LL speed vs cores

Thinking Kaby Lake or Coffee Lake will 4 real cores be faster than 2 real cores? In LL testing will there be 50% faster performance?

Thank you!
danmur is offline   Reply With Quote
Old 2018-04-27, 20:22   #2
a1call
 
a1call's Avatar
 
"Rashid Naimi"
Oct 2015
Remote to Here/There

2×1,009 Posts
Default

LL test is not easily parallel-able.
So unless you are running multiple tests on multiple candidates or you have a very advanced algorithm, multiplicity of the cores would not make much of a difference, if any.

Last fiddled with by a1call on 2018-04-27 at 20:22
a1call is offline   Reply With Quote
Old 2018-04-27, 20:27   #3
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

3,673 Posts
Default

Quote:
Originally Posted by a1call View Post
LL test is not easily parallel-able.
So unless you are running multiple tests on multiple candidates or you have a very advanced algorithm, multiplicity of the cores would not make much of a difference, if any.


LL has been parallelized. 4 real cores are better than 2 real cores. The bottleneck seems to be memory bandwidth -- so fast RAM of the order of 3200MHz is recommended
paulunderwood is offline   Reply With Quote
Old 2018-04-27, 20:33   #4
a1call
 
a1call's Avatar
 
"Rashid Naimi"
Oct 2015
Remote to Here/There

111111000102 Posts
Default

Do you have any reference sources for that?

Thanks in advance.
a1call is offline   Reply With Quote
Old 2018-04-27, 20:38   #5
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

3,673 Posts
Default

Quote:
Originally Posted by a1call View Post
Do you have any reference sources for that?

Thanks in advance.
When I say that LL has been parallelized really mean FFT has been, by George and Ernst, and in the extreme for GPUs. I believe by what I hear from others benchmarking Prime95 that 4 cores are better than 2, and fast memory is a must to get maximal performance,

Last fiddled with by paulunderwood on 2018-04-27 at 20:39
paulunderwood is offline   Reply With Quote
Old 2018-04-27, 20:49   #6
a1call
 
a1call's Avatar
 
"Rashid Naimi"
Oct 2015
Remote to Here/There

111111000102 Posts
Default

I had a thread here where I discussed this before and no one claimed it has been done. As far as I can remember you can parallel-compute the exponentiation only up to the candidate and the Mod result above that is unpredictable and necessary for the next computation.
If paralleling beyond that is done, it means that you can predict the PowerMod result and assign it to a separate core, I think that is very unlikely.

Last fiddled with by a1call on 2018-04-27 at 20:50
a1call is offline   Reply With Quote
Old 2018-04-27, 20:55   #7
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

13×491 Posts
Default

The Fourier transform that does the multiplication is the thing that's split over multiple cores - it's not that different iterations can be done in parallel, it's that the iterations themselves are faster.

I got to under 24 hours for a 40M-range double-check on a 14-core Skylake-X machine.
fivemack is offline   Reply With Quote
Old 2018-04-27, 20:58   #8
a1call
 
a1call's Avatar
 
"Rashid Naimi"
Oct 2015
Remote to Here/There

2×1,009 Posts
Default

Thank you,
I am glad that I posted to learn that.
a1call is offline   Reply With Quote
Old 2018-04-27, 21:08   #9
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

176116 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
When I say that LL has been parallelized really mean FFT has been
Indeed.

Quote:
Originally Posted by paulunderwood View Post
fast memory is a must to get maximal performance
Do we know that the limits to parallelism are, or are we (as I suspect) not able to supply sufficient fast memory?
CRGreathouse is offline   Reply With Quote
Old 2018-04-27, 21:49   #10
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10011111110112 Posts
Default

Quote:
Originally Posted by a1call View Post
Do you have any reference sources for that?

Thanks in advance.
Prime95 or mprime produced benchmarks on multicore hardware vs. various fft lengths.
Actual experience.

Total system throughput may be higher with use of one core per exponent, or not, depending on cache size limitations, data bandwidth between cores, and other variables including the exponent or fft length. Total wall clock run time of one primality test is reduced, in some instances up to 20 or more cores, per Madpoo on some dual-14-core system. On older dual-6 hardware, 3-cores-each is more efficient than 4-cores each, since then all the cores per instance can be in the same package and connected by greater bandwidth than between packages. Try it. I think the multi-core capability goes back to v25 or so.
kriesel is offline   Reply With Quote
Old 2018-04-27, 22:00   #11
a1call
 
a1call's Avatar
 
"Rashid Naimi"
Oct 2015
Remote to Here/There

2×1,009 Posts
Default

Acknowledged with many thanks.
a1call is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
4 Cores, only 1 worker? sonjohan Software 13 2016-08-26 05:34
Best use of six cores for L-L testing Chuck Hardware 4 2011-05-10 03:04
32 cores limitation gabrieltt Software 12 2010-07-15 10:26
CPU cores Unregistered Information & Answers 7 2009-11-02 08:27
Running on 4 Cores Unregistered Information & Answers 9 2008-09-25 00:53

All times are UTC. The time now is 04:25.

Tue May 11 04:25:36 UTC 2021 up 32 days, 23:06, 1 user, load averages: 1.77, 2.20, 2.32

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.