mersenneforum.org Did anybody get an AMD Phenom II X6 1055T so far?
 Register FAQ Search Today's Posts Mark Forums Read

2010-05-08, 14:34   #34
mdettweiler
A Sunny Moo

Aug 2007
USA (GMT-5)

3×2,083 Posts

Quote:
 Originally Posted by S485122 First of all some programs will list the specification of your processor. Prime95 and programs like CPU-Z (www.cpuid.com will list some spec's (mainly about size). CPUid had a program called ltency that will give you the latencies of memory and cache. I do not know if it will still work on current CPUs (it works on a Q6700.) I attach it as a zip file, use at your own risk ;-). According to Intel Intel® Core™2 Extreme Processor X6800 and Intel® Core™2 Duo Desktop Processor E6000 and E4000 Sequence Features the Core2 Duo E4500 has two 32 KiB 8 way associative L1 data caches (one per core) and one 2MiB L2 cache shared by the two cores. The size AND latency increase with the cache level (a higher latency means slower.) The size of Prime95 might be round 5 MiB but it doesn't have to fit in cache as a whole, only the much smaller running routine will be in L1 code cache. Jacob
Thanks! Here's what I got:
Code:
2 cache levels detected
Level 1         size = 32Kb     latency = 3 cycles
Level 2         size = 1024Kb   latency = 16 cycles
Without having any other data to compare the latency to, that information isn't particularly useful, but I do see this confirms the figure of 1024Kb L2 cache per core. Interesting that they used Kb (kilobit) rather than KB or KiB (kilobyte)...is this a typo on the part of the program designers, or is it really just 1024 kilobits?

2010-05-08, 16:27   #35
S485122

Sep 2006
Brussels, Belgium

110100001102 Posts

Quote:
 Originally Posted by mdettweiler Interesting that they used Kb (kilobit) rather than KB or KiB (kilobyte)...is this a typo on the part of the program designers, or is it really just 1024 kilobits?
Just a typo it should be KiB of course. Be careful though : CPUid does not advertise the latency program any more,even if it is still downloadable from their site at www.cpuid.com/download/latency.zip it might not be accurate for newer types of processors.

Jacob

Last fiddled with by S485122 on 2010-05-08 at 16:53

2010-05-08, 20:17   #36

"Richard B. Woods"
Aug 2002
Wisconsin USA

11110000011002 Posts

Quote:
 Originally Posted by mdettweiler Ah, I see. My copy of Prime95 25.11 is 4.89 MB; does that therefore mean that it needs 4.89 MB + (FFT expressed in megabytes) of cache for the whole thing to fit?
In addition to joblack's quite correct answer, let me introduce the concept of the working set.

The working set is the subset of a process's pages which, at any particular time, need to be resident in the fastest memory layer (e.g., L1 cache) in order for the program not to have to pause for fetching code or data from slower memory (e.g., L2 or L3 cache, or main RAM, or even pagefile on disk).

(There are other definitions, such as at http://en.wikipedia.org/wiki/Working_set, http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx, or http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx)

Like many programs, prime95 spends most of its time in a relatively small loop of code. It processes a certain chunk of data for a while before moving on to process the next chunk. So, most of the time prime95 is active, its working set consists of (1) the pages holding instructions for the inner loop plus (2) the pages holding the block of data that is being processed. That working set is what needs to fit into L1 cache in order to have fastest program execution.

Unlike many programs, prime95 knows exactly what data it's going to process next when it gets through with the current data block, so it can prefetch data, which essentially brings it into the working set along with the current data. Once it's through with a data block, prime95 no longer references it, so that data block drops out of the working set.

Most systems have separate L1 caches for instructions and for data, so the criterion for top speed would be that the working set instructions fit into the L1 instruction cache, and the working set data (current block plus what's being prefetched) fit into the L1 data cache.

Last fiddled with by cheesehead on 2010-05-08 at 20:26

2010-05-08, 23:57   #37
joblack

Oct 2008
n00bville

72510 Posts

Quote:
 Originally Posted by cheesehead Unlike many programs, prime95 knows exactly what data it's going to process next when it gets through with the current data block, so it can prefetch data, which essentially brings it into the working set along with the current data. Once it's through with a data block, prime95 no longer references it, so that data block drops out of the working set.
I haven't checked the source code in great detail but I hope our Gimps coding guru has included some clues for the cpu branch prediction :D ...

2010-05-09, 02:35   #38
lfm

Jul 2006
Calgary

6518 Posts

Quote:
 Originally Posted by joblack I haven't checked the source code in great detail but I hope our Gimps coding guru has included some clues for the cpu branch prediction :D ...
When you spend all your time in loops processing big arrays, branch prediction kind of takes care of itself. IE branch back to the top of the loop is the right choice 99.99% of the time.

2010-05-09, 18:09   #39
science_man_88

"Forget I exist"
Jul 2009
Dumbassville

26×131 Posts

Quote:
 Originally Posted by lfm When you spend all your time in loops processing big arrays, branch prediction kind of takes care of itself. IE branch back to the top of the loop is the right choice 99.99% of the time.
in asm would that be like:

Code:
loops:
cmp(bx,cx);
je skiploops;
inc(bx);
jmp loops;
skiploops:
?

Last fiddled with by science_man_88 on 2010-05-09 at 18:13

2010-05-09, 20:52   #40
lfm

Jul 2006
Calgary

6518 Posts

Quote:
 Originally Posted by science_man_88 in asm would that be like: Code: loops: cmp(bx,cx); je skiploops; inc(bx); jmp loops; skiploops: ?
Well if you are writing slow assembler, ya, something like that.

 2010-05-09, 21:02 #41 science_man_88     "Forget I exist" Jul 2009 Dumbassville 838410 Posts slow assembler this is basically the same as: for(bx
 2010-05-09, 21:10 #42 science_man_88     "Forget I exist" Jul 2009 Dumbassville 26·131 Posts 86 asm vs. HLA I'm guessing ? really they can basically come back to the same thing so if you compare them they should act the same speed if processed by the same cpu. BTW the reason I was able to come up with that possible loop was from reading the art of assembly second edition copyright 2010 Last fiddled with by science_man_88 on 2010-05-09 at 21:12
 2010-05-09, 22:05 #43 science_man_88     "Forget I exist" Jul 2009 Dumbassville 26×131 Posts also it depends on what you call slow lets say it takes 6 computations/checks to do one loop of loops then even if cut to 80% my CPU in one second should be able to put bx value up to cx even when the original difference is 13,333,333 at peak rate of the CPU. oh wait maybe 2.8 times that actually(all without over clocking as far as i can calculate). Last fiddled with by science_man_88 on 2010-05-09 at 22:07
 2010-05-09, 22:16 #44 science_man_88     "Forget I exist" Jul 2009 Dumbassville 26·131 Posts at 80% used for non OS and assuming the amount of memory is sufficient this newer cpu should be able to work at about 2.8 billion of my loops a second. if this is accurate info I see a way of amd's competitors using religion against them 6(core)6(weeks)6(core) sign of the devil anyone ? 666 Last fiddled with by science_man_88 on 2010-05-09 at 22:25

 Similar Threads Thread Thread Starter Forum Replies Last Post KingKurly Lounge 5 2010-10-19 04:01 Batalov Hardware 0 2009-04-23 01:23 uigrad Hardware 12 2009-01-20 20:43 sdbardwick Hardware 6 2008-08-18 01:39 fivemack Hardware 5 2008-08-18 01:30

All times are UTC. The time now is 04:50.

Sun May 16 04:50:36 UTC 2021 up 37 days, 23:31, 0 users, load averages: 1.19, 1.15, 1.24