How important is L2 cache size for LL work?
 2011-08-16, 17:32 #1 stars10250     Jul 2008 San Francisco, CA 3×67 Posts How important is L2 cache size for LL work? I've never quite understood how important the L2 cache size is for LL work. Since the LL calculation itself is too big to reside in cache, read/write to memory is constant and at high bandwidth. Would having a slightly larger L2 cache help reduce iteration times all that much? I'm asking because I'm considering the 6 core Sandy-E when it comes out. Preliminary pricing says the 12MB L2 version will cost around $583 and the 15MB L2 version$999 (I plan to overclock, so the fact that the stock speeds of these two CPUs differ by 0.1 GHz seems irrelevant to me). So would an additional 3MB of L2 spread across 6 physical cores really improve LL iteration times all that much? I sorta doubt it.
 Originally Posted by stars10250 I've never quite understood how important the L2 cache size is for LL work. Since the LL calculation itself is too big to reside in cache, read/write to memory is constant and at high bandwidth. Would having a slightly larger L2 cache help reduce iteration times all that much?
Once the data are in the L2 then optimal software would work every last computation out of it before evicting it and moving on to the next block of data.

For your Q you would need to see if P95 has been optimised for either or both of the cache sizes you are considering.

 2011-08-24, 16:36 #3 JohnFullspeed   May 2011 France 7×23 Posts Memory cach All the cash L1,L2 and now 3 are impportant for the userThe goal ofvthe cash is to reduce the flux in the memory bus not to help the procesor: The cash is done to solve the spacio temporl problm of data WWhen you use a data often 1You use it 4ou 5 times 2-Yo use data around so the spirit ofthe cache is to move a cash line and looad 664biyts uin one move If you need the same datat ot an other liocaal data the meùmùory manager dont have to acces RAm For the rocessor it's invisible: he ask to a data and wait the repons but the time of the executio don't move: if you need one cycle for a+1 if a is in one cash n differe,ce Why L1 is smaler and L2 large? Because L1 are transistors in the CPU and L2 a separat chips The transistors are the more expansives in a proc Every where you see Pipeline with X floor If the number of floors is important for the processor it is bad for the user` i/e If you have a 20 floors pipe The op-code arrive very small at the pros and the cPU needs for sample 5 if the pipe have only 5 floors the CPU need,for exaample 7 So more youu have floors more the CPU is speed But...... a branch arrive so the pipeline stop waiting the result. This is call a bubble the cpu stop after the test and wait a new opcode. But the pipe is empty and you wait that all the floor are full. You wait for filling 20 floors or only five So depending your code a long Pipeline could be a good or a bad things but for Intel or other t good (actually the pipe in a Core 2 have 29 floors) The bubbles are very expensive John If all I said is not always true it is not also always false...
 2011-09-12, 15:21 #4 stars10250     Jul 2008 San Francisco, CA 3·67 Posts I'm still not sure I understand how to even begin to gauge the difference in LL iteration time between two similarly clocked CPUs with different L2 cache, 15 MB vs. 12 MB. I'm speaking of the i7-3960X and i7-3930K (recall I plan to OC so one could easily force these to have the same clock speed). I'm not sure we will really know until the chip is out. In the mean time, we just got this: http://www.tomshardware.com/reviews/...ance,3026.html
 2011-09-12, 22:28 #5 Christenson     Dec 2010 Monticello 5×359 Posts Tom's hardware gives that as L3 cache, not L2 cache...15M vs 12M it means that a somewhat larger FFT stays on the CPU before the main memory bus has to be used. P95 will have to tell us how much P95 that's increase is worth....I imagine 5%.
 2012-02-19, 18:32 #6 emily   Feb 2012 Athens, Greece 47 Posts SandyBridge-E also has quad-channel memory support, so this must help feed all the cores faster than dual-channel available in SandyBridge non-E. As for the cache, all SandyBridge and SandyBridge-E CPUs have: L0: 6KB (must be per core I think) L1: 32KB I + 32 KB D per core L2: 256KB per core L3: 3MB for i3, 6MB for i5, 8MB for i7, 10MB-12MB-15MB for SandyBridge-E) IvyBridge-E server CPUs with 10 cores will feature up to 30MB L3 cache :) Last fiddled with by emily on 2012-02-19 at 18:34

