mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2008-10-26, 07:54   #1
hj47
 
hj47's Avatar
 
Oct 2008

4016 Posts
Red face Questions from a GIMPS newb

Hello, this is my first post, and I must say I feel overwhelmed by all the mathematical terminology that goes around in this place .

Anyhoo I'm a regular folder, so I am in the 'know' in regards to distributed computing and pc hardware. I know that for folding, the general consensus is that the bigger the L2 cache, the quicker the cpu can process data.
Does this apply to gimps?

Also, (don't quote me for this) I heard that AMD's are better for gimps as they have the integrated memory controller. Is this true? Is it better to have a lower clocked, say, Athlon X3 than a higher clocked intel e5200 proc? (these cpu's are the one's I'm thinking of getting). What about Phenoms, are they any good for gimps?

And how much RAM should you dedicate to the application? (the default being 8mb).

Finally, I've downloaded the latest client, and it shows on the main screen Worker #1 and Worker #2. I'm assuming these are the 2 instances of the work being divided on my dual core? (my cpu is running at 100% in task manager).

Sorry for the noob questions, I've tried the FAQs and what not, but there's still a lot I don't get.

Cheers

hj
hj47 is offline   Reply With Quote
Old 2008-10-26, 13:22   #2
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

3·919 Posts
Default

No AMD is NOT better than Intel for Prime95. The Phenom are better but I still don't think there is parity at clockspeed.

A bigger CPU cache helps, but only to a degree. Prime95 is coded in assembly and has been optimized for certain cache sizes. I believe the benefits are marginal after 1 or 2MB per core. The determining factor for Prime95 is memory bandwidth.

Unless you are doing P-1 factoring - which most people don't - 128MB should be more than enough. If you do trial factoring, 8MB is enough.

Yes the two workers are for two cores.
garo is offline   Reply With Quote
Old 2008-10-26, 22:28   #3
hj47
 
hj47's Avatar
 
Oct 2008

4016 Posts
Default

Hi, thanks for your reply.

So for clarification, would and AMD 9550 be better or worse off than an overclocked e5200?

Cheers
hj47 is offline   Reply With Quote
Old 2008-10-27, 08:01   #4
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

769210 Posts
Default

Quote:
Originally Posted by hj47 View Post
I heard that AMD's are better for gimps as they have the integrated memory controller. Is this true? Is it better to have a lower clocked, say, Athlon X3 than a higher clocked intel e5200 proc? (these cpu's are the one's I'm thinking of getting). What about Phenoms, are they any good for gimps?
As garo wrote, "The determining factor for Prime95 is memory bandwidth."

That is, on most current CPUs, prime95's main compute loops will execute about as fast as the memory controller can feed the caches. So, if you can determine that speed, that's the best single measure of potential prime95 speed.

Furthermore, if on a multi-core system you have prime95 running on more than one core, that limitation is true for each of the instances. If each core has its own dedicated memory controller, that will usually be faster than a system with only one memory controller shared by all cores, because usually a single memory controller cannot feed multiple caches all at their top speeds simultaneously.

Note that I wrote "caches"!

If all cores share a single cache, there may be contention between cores for cache space. A dual-core system with a single shared 1MB L2 cache may be slower than an otherwise-identical system that has a 512KB L2 cache dedicated to each core.

If you read some of the benchmark threads, you'll see examples where 2 simultaneous L-Ls will each have a slightly slower iteration speed than a single L-L running alone, and that's usually because of memory contention. Three or four simultaneous L-Ls on separate cores will each have significantly slower iterations than when fewer instances are simultaneous. This nonlinearity may be caused by either a shared memory controller, a shared cache (usually L2), or both.

One recommended way to get around this limitation is to assign trial factoring (TF) to one or two cores, and do L-L testing on the other(s), because TF uses less memory than any other function.

If you want to do four (, eight, whatever) simultaneous L-Ls, you can go ahead and do so; it just won't give you four (, eight, whatever) times the total throughput of a single L-L running alone, so from a GIMPS perspective (though not from a $150,000 prize-winning perspective) sharing a bit of TF with L-Ls is better.

Quote:
And how much RAM should you dedicate to the application? (the default being 8mb).
Where prime95 asks for "Daytime available memory" and "Nighttime available memory", it's not asking you to specify how much it uses for most operations (TF, L-L, or stage 1 of either P-1 or ECM factoring). In each of those cases, prime95 knows exactly how much it has to have, and allocates that much.

"Daytime available memory" and "Nighttime available memory" are only for you to specify how much extra memory prime95 can use for special workareas during stage 2 of P-1 factoring or stage 2 of ECM factoring. (Yes, it ought to say so more prominently!) The default values of 8MB there are enough for prime95 to perform those stages, but if you specify higher amounts, it can search faster/farther for factors during P-1 stage 2 and ECM stage 2.

A few past threads where this stuff has been discussed:

http://mersenneforum.org/showthread.php?t=2157

http://mersenneforum.org/showthread.php?t=3828

http://mersenneforum.org/showthread.php?t=10198

Quote:
Originally Posted by garo View Post
Unless you are doing P-1 factoring - which most people don't -
Yes, most people will not be performing explicit P-1 assignments.

Quote:
128MB should be more than enough.
Yes, I agree that a user who fills in "128" in the available memory fields will benefit from higher limits in case of either a P-1/ECM assignment or an implicit P-1 preliminary to an L-L assignment, and that is a Good Thing.

Last fiddled with by cheesehead on 2008-10-27 at 08:36 Reason: Revised responses to garo.
cheesehead is offline   Reply With Quote
Old 2008-10-27, 10:44   #5
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

3·181 Posts
Default

Quote:
Originally Posted by cheesehead View Post
As garo wrote, "The determining factor for Prime95 is memory bandwidth."

That is, on most current CPUs, prime95's main compute loops will execute about as fast as the memory controller can feed the caches. So, if you can determine that speed, that's the best single measure of potential prime95 speed.
I just want to point out that your argument has to be taken with great care since Phenom outperforms Core2 as far as raw memory bandwidth is concerned and by a big factor (see here for instance). For many people memory bandwidth is the speed between RAM and the CPU.

I know what you meant by the above statement, I just wanted to make it clearer in case some beginner reads too quickly
ldesnogu is offline   Reply With Quote
Old 2008-10-27, 11:59   #6
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

22×3×641 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
I just want to point out that your argument has to be taken with great care since Phenom outperforms Core2 as far as raw memory bandwidth is concerned and by a big factor (see here for instance). For many people memory bandwidth is the speed between RAM and the CPU.

I know what you meant by the above statement,
Don't be too sure about knowing that -- I'm not so hardware-knowledgable as to reliably make the proper distinctions myself or always avoid fuzzy thinking.

I was thinking of memory bandwidth in terms of how fast contents of RAM could be transferred to (and from) L2 (and L1) cache. I was assuming that data transfer between L1 cache and any more-inner parts of a CPU would always be at least as fast as RAM->L2->L1. (So, does that put me in the "many people" category, or not?)

Can you explain more about memory bandwidth, and tighten-up any previous statements I made that need such?

Will you please interpret for us the meaning of the "Sandra XII SP1 Memory Bandwidth" chart at http://www.legitreviews.com/article/597/4 and explain what's shown there that is, or is not, relevant to prime95 performance?

Last fiddled with by cheesehead on 2008-10-27 at 12:06
cheesehead is offline   Reply With Quote
Old 2008-10-27, 13:58   #7
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

3×181 Posts
Default

Quote:
Originally Posted by cheesehead View Post
I was thinking of memory bandwidth in terms of how fast contents of RAM could be transferred to (and from) L2 (and L1) cache. I was assuming that data transfer between L1 cache and any more-inner parts of a CPU would always be at least as fast as RAM->L2->L1. (So, does that put me in the "many people" category, or not?)

Can you explain more about memory bandwidth, and tighten-up any previous statements I made that need such?

Will you please interpret for us the meaning of the "Sandra XII SP1 Memory Bandwidth" chart at http://www.legitreviews.com/article/597/4 and explain what's shown there that is, or is not, relevant to prime95 performance?
Basically the chart says that a Phenom 9900 can read/write memory at 10.3 GB/s while a QX9650 can only do as at 6.2 GB/s. (More info here).
The reason the Phenom is so much faster is the integrated memory controller. The soon-to-be-released core i7 will also have one, and will surely fly (IIRC, they can reach 16 GB/s using 3 banks of DDR3).

The obvious conclusion is that the bandwidth with main memory is not enough to qualify the speed of GIMPS, given how Phenom and C2 compare :)

There are many other factors that come into play from the memory subsystem point of view (where "memory subsystem" is made of all the components that are between the RAM and the computation units); as examples:
- efficiency of preload instructions (how many can be in fly? do they block other parts of the processor?)
- efficiency of TLB (how many entries in the TLB? number of levels of TLB?)
- cache access latency and bandwidth.

This list can be very long.

As usual when comparing two things a single criterion is far from enough

As far as prime95 is concerned, I don't know the source code enough to tell you how all these factors play a role; what I can say is that:
- the TLB plays a very important role; a single entry usually maps 4 Kb data and you will very quickly run out of TLB entries when playing with huge data sets; so more entries at level 0 is very important (I don't know the numbers for Phenom and C2)
- prefetching is also extremely important, and it looks like the C2 is much more efficient here
- C2 has a much higher bandwidth to L1 and L2 caches IIRC, and for prime95 it's very important.

To sum up: memory subsystems are hugely complex beasts that can be more difficult to design than a CPU and also more difficult to use efficiently in a program

I'm afraid this was very technical...

Last fiddled with by ldesnogu on 2008-10-27 at 13:59
ldesnogu is offline   Reply With Quote
Old 2008-10-27, 14:09   #8
S485122
 
S485122's Avatar
 
Sep 2006
Brussels, Belgium

110011100112 Posts
Default

I cannot compare AMD's products to those of Intel, but I know by experience that memory chips speed is a limiting factor of the first order. On one and the same processor (P4 or Core2Duo Quad) Prime 95 performance is directly proportional to memory speed. It is possible that the memory controller speed kicks in as a limiting factor with very fast DDR3, but I have no experience about that.

Jacob
S485122 is offline   Reply With Quote
Old 2008-10-28, 06:30   #9
hj47
 
hj47's Avatar
 
Oct 2008

26 Posts
Default

Yeah this is pretty technical. So basically intel/amd's trade performance with each other in some tests?

So is it possible to answer somewhat my original question? (whether an e5200 is better for prime95 than a phenom 9550?). Sorry for the [annoying] questions, I just want to get this cleared up :)

Cheers people ;)
hj47 is offline   Reply With Quote
Old 2008-10-28, 07:22   #10
sdbardwick
 
sdbardwick's Avatar
 
Aug 2002
North San Diego County

2×11×31 Posts
Default

The Phenom 9550 will do more work per unit of time, but it will use more power. My 9500 is equal to about 2 of my E7200 boxes (although the 7200s are oddly slow - probably due to the cheapo G31 boards they are on).
sdbardwick is offline   Reply With Quote
Old 2008-10-28, 20:00   #11
Phantomas
 
Phantomas's Avatar
 
Oct 2008
Germany, Hamburg

5×13 Posts
Default

hi, please take a look at
http://v5www.mersenne.org/report_benchmarks/
here you can compare many cpu-types and performances directly. (no e5200 now, but my appear later...)
But keep in mind, that the times noted are only for one core activ. So they didn't show the panelty when all cores are active and using the memory

I own a Q9450 @3200 and had a very lucky hand with my memory-sticks. They are running at 1200Mhz (original speed is 800Mhz). So all my 4 Cores can run with nearly no panelty (2560K FFT) at 55ms/Iteration. (47ms one core active, which is faster then the fastest noted Phenom :-)

Phenom is running at 2.4 the core 2 at 2.5 Mhz, so no big difference here.
But when you think of overclocking, the Core 2 is much better. Some guys have managed stable 4GHz (woudn't try that with Prime)
Would produce less heat, and use less power then Phenom.

With a Dualcore (phenom or Intel) the memorybandwith shouldn't be that problem.

Last fiddled with by Phantomas on 2008-10-28 at 20:20
Phantomas is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Newb who needs help with PC EddieTheBear Hardware 19 2015-10-23 13:22
Newb question PicGrabber Msieve 20 2014-10-31 20:06
I have a few questions about getting my GPU working for GIMPS Red Raven GPU Computing 73 2014-10-13 20:26
Newb help (it crashes) Proggie Software 4 2005-01-05 07:35
linux question ( newb) crash893 Software 2 2003-12-26 18:50

All times are UTC. The time now is 04:34.

Mon Mar 1 04:34:52 UTC 2021 up 88 days, 46 mins, 0 users, load averages: 1.37, 1.74, 1.87

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.