mersenneforum.org  

Go Back   mersenneforum.org > New To GIMPS? Start Here! > Information & Answers

Reply
 
Thread Tools
Old 2012-04-11, 18:42   #12
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

22·7·167 Posts
Default My quads (part I)

On the less effective end:

Code:
Q9550 
20,000,001 (Time... 25 iterations)
Cores  Per Iteration  %Improvement
1  14.51
2  9.46  +53%
3  8.32  +21%
4  6.96  +34%
Yes, odd results between 3 and 4 cores, but I ran the test a few times with similar results.

Over all effectiveness 4 cores just slightly more than twice as fast as 1.

Last fiddled with by petrw1 on 2012-04-11 at 18:56
petrw1 is offline   Reply With Quote
Old 2012-04-11, 18:56   #13
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

22·7·167 Posts
Default My quads (part II)

On the more effective end


Code:
i5-750 OC to 3.2
20,000,001 (Time... 25 iterations)
Cores  Per Iteration  %Improvement
1  13.05	
2  6.99	87%
3  4.83	84%
4  4.04	53%
Over all effectiveness 4 cores just about 3.25 times as fast as 1.

Last fiddled with by petrw1 on 2012-04-11 at 18:57
petrw1 is offline   Reply With Quote
Old 2012-04-12, 14:36   #14
Stef42
 
Feb 2012
the Netherlands

2·29 Posts
Default

The Q9550 is not really a quad-core CPU, just like every other S775 4core CPU.
Actually, 2 dual-cores on one die, so I suppose this architecture has something to do with the test results.

Last fiddled with by Stef42 on 2012-04-12 at 14:36
Stef42 is offline   Reply With Quote
Old 2012-04-12, 15:32   #15
bcp19
 
bcp19's Avatar
 
Oct 2011

7·97 Posts
Default

Quote:
Originally Posted by Stef42 View Post
The Q9550 is not really a quad-core CPU, just like every other S775 4core CPU.
Actually, 2 dual-cores on one die, so I suppose this architecture has something to do with the test results.
It does, a Q8200 I recently tested ran 18.9, 12.1, 20.7 and 9.2 on a 1024k FFT using 1,2,3,4 cores. The slowdown with 3 cores was a bit of a shock, but I noticed it worked better at higher FFT's: 40.5, 24.7, 21.5, 17.7 at 2048k. The memory I was using also probably had an effect, as I was using P-95 to stress test a motherboard I had been given and only had 3GB of PC2-5300 memory to throw in it. I noticed one interesting item during the testing of that motherboard, which had a GT520 that I was also testing mfaktc on. I ran tests using all possible combinations of P95 and mfaktc.
Code:
Single thread total throughput on 27.5M exps:
Workers   %/day
1         11.636
2 (C1&C2) 17.95
2 (C1&C3) 21.30
3         22.89
4         24.17
3+mfak 5kSP - 22.1
3+mfak 200Ksp - 20.3
 
Double thread total throughput on 27.5M exps:
Workers  %/day
1        18.48
2        25.13
 
2 instances of P95, 1 single threads, 1 double threads, total throughput on 27.5M exps:
Workers %/day
1 & 1     26.18
2 & 1     25.11
1 & 1 + mfak 5k - 23.80
1 & 1 + mfak 200K - 21.12
2 & 1 + mfak 5k - 24.01
I hope to be able to get some pc2-6400 and pc2-8500 memory and run the same tests to see how much the memory affects timings.
bcp19 is offline   Reply With Quote
Old 2012-04-12, 16:29   #16
zanmato
 
Apr 2012

2×5 Posts
Default

So a multi-computer test isn't ruled out as potentially faster than a single processor. But a number of known things limit it, as well as probably loads of unknowns which may rule it out completely. For a start the ram of the controller computer has to be fast enough to feed all processors (as I think the fft and inverse can't be split, and it's the multiplication stage which allows multi-core. So the start and end of an iteration would have to be on a single core?). If ram is already the limiting factor then that's that.

Thank you for the replies.
zanmato is offline   Reply With Quote
Old 2012-04-12, 18:19   #17
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22·5·373 Posts
Default

Quote:
Originally Posted by Unregistered View Post
Could an LL test be split across multiple local computers, the goal being to speed up computation of a single LL test for large exponents?
Seeing this same question come up over and over and over again gets
tiring. And the answer remains the same.

The "goal" you suggest is distinctly sub-optimal.

It is much more efficient, given the availability of multiple cores, to simply run
multiple (separate) LL tests on different exponents. Throughput would be
substantially higher.
R.D. Silverman is offline   Reply With Quote
Old 2012-04-12, 18:23   #18
bcp19
 
bcp19's Avatar
 
Oct 2011

7×97 Posts
Default

Quote:
Originally Posted by zanmato View Post
So a multi-computer test isn't ruled out as potentially faster than a single processor. But a number of known things limit it, as well as probably loads of unknowns which may rule it out completely. For a start the ram of the controller computer has to be fast enough to feed all processors (as I think the fft and inverse can't be split, and it's the multiplication stage which allows multi-core. So the start and end of an iteration would have to be on a single core?). If ram is already the limiting factor then that's that.

Thank you for the replies.
I am not sure how you would get multiple computers tied together to work on it. From the Gimps math page:

The Lucas-Lehmer primality test is remarkably simple. It states that for P > 2, 2P-1 is prime if and only if Sp-2 is zero in this sequence: S0 = 4, SN = (SN-12 - 2) mod (2P-1). For example, to prove 27 - 1 is prime:

S0 = 4
S1 = (4 * 4 - 2) mod 127 = 14
S2 = (14 * 14 - 2) mod 127 = 67
S3 = (67 * 67 - 2) mod 127 = 42
S4 = (42 * 42 - 2) mod 127 = 111
S5 = (111 * 111 - 2) mod 127 = 0

S2 cannot be calculated before S1 is known, and likely the time to send the multiplication to multiple computers to calculate and receive it back would probably take longer than for a single computer to complete it.

Quote:
Originally Posted by R.D. Silverman View Post
It is much more efficient, given the availability of multiple cores, to simply run
multiple (separate) LL tests on different exponents. Throughput would be
substantially higher.
This is true, to an extent. The properties of certain processors make them more efficient under certain circumstances. From the testing I was doing, it shows a Core 2 Quad with 1 double thread and 1 single thread exponent running can outperform 4 single thread exponents. With the slowdown George has come across with the optimization of AVX with 4 cores running, the above example may also prove more efficient.

Last fiddled with by bcp19 on 2012-04-12 at 18:33
bcp19 is offline   Reply With Quote
Old 2012-04-12, 19:18   #19
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

1C3516 Posts
Default

Well, okay, in the "typical" use case, RDS is right, where typical excludes such things as a duo-dual-core and AVX. Even with AVX, it's still more efficient to run one per core.
Dubslow is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
does half-precision have any use for GIMPS? ixfd64 GPU Computing 9 2017-08-05 22:12
Single vs Dual channel memory TObject Hardware 5 2014-12-24 05:58
How to have all 4 cores working on a single number? tech96 Information & Answers 5 2014-07-04 09:53
Why factoring is single-core designed? otutusaus Software 33 2010-11-20 21:05
4 checkins in a single calendar month from a single computer Gary Edstrom Lounge 7 2003-01-13 22:35

All times are UTC. The time now is 08:15.


Sat Jul 17 08:15:52 UTC 2021 up 50 days, 6:03, 1 user, load averages: 2.60, 1.72, 1.47

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.