mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Most efficient way to LL (https://www.mersenneforum.org/showthread.php?t=11421)

hj47 2009-01-28 10:45

Most efficient way to LL
 
Hey,

Is it more efficient/productive to do two LL's on a dual core or 1 LL shared between both cores. If the latter is better, how do I go about doing it?

Cheers :D

Mr. P-1 2009-01-28 12:13

It's more efficient in the long run to do two LLs - one on each core.

petrw1 2009-01-28 15:10

[QUOTE=Mr. P-1;160805]It's more efficient in the long run to do two LLs - one on each core.[/QUOTE]

There have been a couple reports on this forum (not this thread) where two cores will process one LL in half or less the time as one core but it is unlikely.

I have a core2 Due E6600 that appears to process a DC in slightly less than half the time with two cores. I haven't tried a LL test yet.

Also read this thread, especially posts by Phantomas that talk about his experience: [url]http://www.mersenneforum.org/showthread.php?t=11025[/url]

R.D. Silverman 2009-01-28 16:54

[QUOTE=petrw1;160825]There have been a couple reports on this forum (not this thread) where two cores will process one LL in half or less the time as one core but it is unlikely.

[/QUOTE]

I am curious as to why you might even think that it is possible.

If one cuts a piece of string in half and ties the pieces together,
is the result ever longer than the original??????

CRGreathouse 2009-01-28 17:39

[QUOTE=R.D. Silverman;160845]I am curious as to why you might even think that it is possible.

If one cuts a piece of string in half and ties the pieces together,
is the result ever longer than the original?[/QUOTE]

With cache locality it's possible for toy applications. But I don't think there are any real applications where this is a significant effect.

I do like your analogy, though.

ewmayer 2009-01-28 17:44

On my Core2 duo, running Prime95 2-threaded is only a few% less throughput than running 2 single-threaded jobs, so I prefer doing that because there's only one job to be managed, restarted-after-interrupt, etc.

Bob, it is actually possible for superlinear scaling - my Mlucas code actually runs slightly faster running on 2 and 4 Itanium cores in multithreaded mode than linear scaling would predict, likely due to cache effects, e.g. each core in multithreaded mode deals with a working data set which fits in the L2 cache, whereas one job per core leads to a working set that exceeds the L2 cache size. I admit such occurrences appear to be the exception rather than the rule, but be aware that it is possible.

CRGreathouse 2009-01-28 17:53

ewmayer, do you know of anything other than memory hierarchy that would lead to superlinear scaling?

petrw1 2009-01-28 17:55

[QUOTE=R.D. Silverman;160845]I am curious as to why you might even think that it is possible.

If one cuts a piece of string in half and ties the pieces together,
is the result ever longer than the original??????[/QUOTE]

I agree it seems impossible but on the link I provided (Post #7) Phantomas did report his stats suggesting it is possible. He also suggested an explanation.

[QUOTE]This is the case in my test with my Q9450. With 4 independent LL-Tests (2560K) one itteration is about 54.somewhat ms. With 2 LL-Tests with 2 cores it's about 26.somewhatelse ms. So it is in fact a little, tiny bit faster. And I assume that this is because it's using the L2 Cache better. My RAM runs at 1200MHz 6,6,6,15, and maybe the effect is bigger on 800MHz Ram's (hope so...)[/QUOTE]

My personal test was very brief ... on my E6600 I switched one 18M DC test from 1 CPU to 2 CPUs for a couple minutes only and saw the per iteration time drop from 0.028 to 0.014 and sometimes 0.013 ... granted too short to draw any conclusions.

davieddy 2009-01-28 18:47

Before the last few responses, I nearly suggested that
RDS was being a bit naive re the vagaries of programming,
but I chickened out.

David

cheesehead 2009-01-28 19:25

[quote=davieddy;160871]Before the last few responses, I nearly suggested that RDS was being a bit naive re the vagaries of programming, but I chickened out.[/quote]Last time I cut a piece of string in half and then tied the pieces together, the resulting span from one end to the other was shorter than the original ... but I may be mistaking whether the string length is intended to be analogous to elapsed time, or to speed, and thus whether hyperthreading (or split caches) is analogous to mathematical knots, or to nautical knots.

Perhaps we should use chalk, as in the famous joke about one-half piece ...

davieddy 2009-01-28 21:05

[quote=cheesehead;160884]
Perhaps we should use chalk, as in the famous joke about one-half piece ...[/quote]

I'm sure someone will tell me to Google this, but the
keyword(s) aren't clear. Enlightenment welcome:smile:
Meanwhile I shall attempt to reconstruct the joke.

Best thought so far - you are getting your own back for my
silly fish joke.


All times are UTC. The time now is 13:50.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.