mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2010-05-08, 14:34   #34
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by S485122 View Post
First of all some programs will list the specification of your processor. Prime95 and programs like CPU-Z (www.cpuid.com will list some spec's (mainly about size). CPUid had a program called ltency that will give you the latencies of memory and cache. I do not know if it will still work on current CPUs (it works on a Q6700.) I attach it as a zip file, use at your own risk ;-).

According to Intel Intel® Core™2 Extreme Processor X6800 and Intel® Core™2 Duo Desktop Processor E6000 and E4000 Sequence Features the Core2 Duo E4500 has two 32 KiB 8 way associative L1 data caches (one per core) and one 2MiB L2 cache shared by the two cores.

The size AND latency increase with the cache level (a higher latency means slower.)

The size of Prime95 might be round 5 MiB but it doesn't have to fit in cache as a whole, only the much smaller running routine will be in L1 code cache.

Jacob
Thanks! Here's what I got:
Code:
2 cache levels detected
Level 1         size = 32Kb     latency = 3 cycles
Level 2         size = 1024Kb   latency = 16 cycles
Without having any other data to compare the latency to, that information isn't particularly useful, but I do see this confirms the figure of 1024Kb L2 cache per core. Interesting that they used Kb (kilobit) rather than KB or KiB (kilobyte)...is this a typo on the part of the program designers, or is it really just 1024 kilobits?
mdettweiler is offline   Reply With Quote
Old 2010-05-08, 16:27   #35
S485122
 
S485122's Avatar
 
Sep 2006
Brussels, Belgium

110100001102 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Interesting that they used Kb (kilobit) rather than KB or KiB (kilobyte)...is this a typo on the part of the program designers, or is it really just 1024 kilobits?
Just a typo it should be KiB of course. Be careful though : CPUid does not advertise the latency program any more,even if it is still downloadable from their site at www.cpuid.com/download/latency.zip it might not be accurate for newer types of processors.

Jacob

Last fiddled with by S485122 on 2010-05-08 at 16:53
S485122 is offline   Reply With Quote
Old 2010-05-08, 20:17   #36
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

11110000011002 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Ah, I see. My copy of Prime95 25.11 is 4.89 MB; does that therefore mean that it needs 4.89 MB + (FFT expressed in megabytes) of cache for the whole thing to fit?
In addition to joblack's quite correct answer, let me introduce the concept of the working set.

The working set is the subset of a process's pages which, at any particular time, need to be resident in the fastest memory layer (e.g., L1 cache) in order for the program not to have to pause for fetching code or data from slower memory (e.g., L2 or L3 cache, or main RAM, or even pagefile on disk).

(There are other definitions, such as at http://en.wikipedia.org/wiki/Working_set, http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx, or http://msdn.microsoft.com/en-us/libr...8VS.85%29.aspx)

Like many programs, prime95 spends most of its time in a relatively small loop of code. It processes a certain chunk of data for a while before moving on to process the next chunk. So, most of the time prime95 is active, its working set consists of (1) the pages holding instructions for the inner loop plus (2) the pages holding the block of data that is being processed. That working set is what needs to fit into L1 cache in order to have fastest program execution.

Unlike many programs, prime95 knows exactly what data it's going to process next when it gets through with the current data block, so it can prefetch data, which essentially brings it into the working set along with the current data. Once it's through with a data block, prime95 no longer references it, so that data block drops out of the working set.

Most systems have separate L1 caches for instructions and for data, so the criterion for top speed would be that the working set instructions fit into the L1 instruction cache, and the working set data (current block plus what's being prefetched) fit into the L1 data cache.

Last fiddled with by cheesehead on 2010-05-08 at 20:26
cheesehead is offline   Reply With Quote
Old 2010-05-08, 23:57   #37
joblack
 
joblack's Avatar
 
Oct 2008
n00bville

72510 Posts
Default

Quote:
Originally Posted by cheesehead View Post
Unlike many programs, prime95 knows exactly what data it's going to process next when it gets through with the current data block, so it can prefetch data, which essentially brings it into the working set along with the current data. Once it's through with a data block, prime95 no longer references it, so that data block drops out of the working set.
I haven't checked the source code in great detail but I hope our Gimps coding guru has included some clues for the cpu branch prediction :D ...
joblack is offline   Reply With Quote
Old 2010-05-09, 02:35   #38
lfm
 
lfm's Avatar
 
Jul 2006
Calgary

6518 Posts
Default

Quote:
Originally Posted by joblack View Post
I haven't checked the source code in great detail but I hope our Gimps coding guru has included some clues for the cpu branch prediction :D ...
When you spend all your time in loops processing big arrays, branch prediction kind of takes care of itself. IE branch back to the top of the loop is the right choice 99.99% of the time.
lfm is offline   Reply With Quote
Old 2010-05-09, 18:09   #39
science_man_88
 
science_man_88's Avatar
 
"Forget I exist"
Jul 2009
Dumbassville

26×131 Posts
Default

Quote:
Originally Posted by lfm View Post
When you spend all your time in loops processing big arrays, branch prediction kind of takes care of itself. IE branch back to the top of the loop is the right choice 99.99% of the time.
in asm would that be like:

Code:
loops:
cmp(bx,cx);
je skiploops;
inc(bx);
jmp loops;
skiploops:
?

Last fiddled with by science_man_88 on 2010-05-09 at 18:13
science_man_88 is offline   Reply With Quote
Old 2010-05-09, 20:52   #40
lfm
 
lfm's Avatar
 
Jul 2006
Calgary

6518 Posts
Default

Quote:
Originally Posted by science_man_88 View Post
in asm would that be like:

Code:
loops:
cmp(bx,cx);
je skiploops;
inc(bx);
jmp loops;
skiploops:
?
Well if you are writing slow assembler, ya, something like that.
lfm is offline   Reply With Quote
Old 2010-05-09, 21:02   #41
science_man_88
 
science_man_88's Avatar
 
"Forget I exist"
Jul 2009
Dumbassville

838410 Posts
Default

slow assembler this is basically the same as:

for(bx<cx,bx=bx++){
}
if(bx=cx){
goto next;
}
next:

Last fiddled with by science_man_88 on 2010-05-09 at 21:06
science_man_88 is offline   Reply With Quote
Old 2010-05-09, 21:10   #42
science_man_88
 
science_man_88's Avatar
 
"Forget I exist"
Jul 2009
Dumbassville

26·131 Posts
Default

86 asm vs. HLA I'm guessing ? really they can basically come back to the same thing so if you compare them they should act the same speed if processed by the same cpu.


BTW the reason I was able to come up with that possible loop was from reading the art of assembly second edition copyright 2010

Last fiddled with by science_man_88 on 2010-05-09 at 21:12
science_man_88 is offline   Reply With Quote
Old 2010-05-09, 22:05   #43
science_man_88
 
science_man_88's Avatar
 
"Forget I exist"
Jul 2009
Dumbassville

26×131 Posts
Default

also it depends on what you call slow lets say it takes 6 computations/checks to do one loop of loops then even if cut to 80% my CPU in one second should be able to put bx value up to cx even when the original difference is 13,333,333 at peak rate of the CPU. oh wait maybe 2.8 times that actually(all without over clocking as far as i can calculate).

Last fiddled with by science_man_88 on 2010-05-09 at 22:07
science_man_88 is offline   Reply With Quote
Old 2010-05-09, 22:16   #44
science_man_88
 
science_man_88's Avatar
 
"Forget I exist"
Jul 2009
Dumbassville

26·131 Posts
Default

at 80% used for non OS and assuming the amount of memory is sufficient this newer cpu should be able to work at about 2.8 billion of my loops a second.

if this is accurate info I see a way of amd's competitors using religion against them 6(core)6(weeks)6(core) sign of the devil anyone ? 666

Last fiddled with by science_man_88 on 2010-05-09 at 22:25
science_man_88 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
AMD Phenom(tm) II X6 1045T KingKurly Lounge 5 2010-10-19 04:01
Phenom II X4 955's have arrived Batalov Hardware 0 2009-04-23 01:23
Phenom 2? uigrad Hardware 12 2009-01-20 20:43
Phenom Phun sdbardwick Hardware 6 2008-08-18 01:39
Phenom question fivemack Hardware 5 2008-08-18 01:30

All times are UTC. The time now is 04:50.

Sun May 16 04:50:36 UTC 2021 up 37 days, 23:31, 0 users, load averages: 1.19, 1.15, 1.24

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.