mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2010-05-07, 19:46   #23
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2×5×587 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Ah, that makes sense. So, I have a Core 2 Duo E4500 with 2 MB of L2 cache (which I presume is shared between the two cores). Therefore, any FFT less than about 256K should be able to fit in cache. Did I get that right?
for one of you cores not two
henryzz is offline   Reply With Quote
Old 2010-05-07, 19:52   #24
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by henryzz View Post
for one of you cores not two
Ah, right, duh. So, that would mean that it can fit an FFT up to 256K into cache assuming the other core is either idle; or, both cores can concurrently fit 128K FFTs. Additionally, one could (say) have a 64K FFT while the other has 192K. Is that right?

Also, which cache--L1, L2, or L3--is considered with regard to this?
mdettweiler is offline   Reply With Quote
Old 2010-05-07, 22:41   #25
joblack
 
joblack's Avatar
 
Oct 2008
n00bville

10110101012 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Ah, that makes sense. So, I have a Core 2 Duo E4500 with 2 MB of L2 cache (which I presume is shared between the two cores). Therefore, any FFT less than about 256K should be able to fit in cache. Did I get that right?
If you don't have a highspeed server cpu (with 12M+ cache) you probably won't get all the data in the cpu cache.
joblack is offline   Reply With Quote
Old 2010-05-07, 23:47   #26
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by joblack View Post
If you don't have a highspeed server cpu (with 12M+ cache) you probably won't get all the data in the cpu cache.
Hmm...I'm not sure what you mean. Per post #21, it would seem that an FFT will fit in the CPU's cache unless FFT size * 8 is greater than the cache size. It's possible I misinterpreted that, though.
mdettweiler is offline   Reply With Quote
Old 2010-05-08, 01:22   #27
joblack
 
joblack's Avatar
 
Oct 2008
n00bville

52×29 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Hmm...I'm not sure what you mean. Per post #21, it would seem that an FFT will fit in the CPU's cache unless FFT size * 8 is greater than the cache size. It's possible I misinterpreted that, though.
To be effective you must squeeze the mprime code in the cache as well. These are the rare circumstances where a bigger cache gets you a speed bump.
joblack is offline   Reply With Quote
Old 2010-05-08, 02:41   #28
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

3×2,083 Posts
Default

Quote:
Originally Posted by joblack View Post
To be effective you must squeeze the mprime code in the cache as well. These are the rare circumstances where a bigger cache gets you a speed bump.
Ah, I see. My copy of Prime95 25.11 is 4.89 MB; does that therefore mean that it needs 4.89 MB + (FFT expressed in megabytes) of cache for the whole thing to fit?
mdettweiler is offline   Reply With Quote
Old 2010-05-08, 02:45   #29
axn
 
axn's Avatar
 
Jun 2003

115468 Posts
Default

Quote:
Originally Posted by joblack View Post
To be effective you must squeeze the mprime code in the cache as well. These are the rare circumstances where a bigger cache gets you a speed bump.
Most modern processors have split L1 cache - i.e separate sections for Instructions and Data. In such cases, there is no pressure on cache for the code. Besides, the actual working set for the code is very small.

However, _no_ processor today has big enough L2 cache to hold entire FFT data (that GIMPS currently uses) -- so bigger L2 will definitely avoid less memory access and therefore lead to better performance.

PS:- Bigger L2 is good, but lower latency L2 is better.
axn is online now   Reply With Quote
Old 2010-05-08, 03:00   #30
mdettweiler
A Sunny Moo
 
mdettweiler's Avatar
 
Aug 2007
USA (GMT-5)

186916 Posts
Default

Quote:
Originally Posted by axn View Post
Most modern processors have split L1 cache - i.e separate sections for Instructions and Data. In such cases, there is no pressure on cache for the code. Besides, the actual working set for the code is very small.

However, _no_ processor today has big enough L2 cache to hold entire FFT data (that GIMPS currently uses) -- so bigger L2 will definitely avoid less memory access and therefore lead to better performance.

PS:- Bigger L2 is good, but lower latency L2 is better.
Most of the numbers I generally work on (k*b^n+-c LLR/PRP tests) are well below GIMPS's size range, and therefore many of them may well fit, if I'm understanding everything correctly. Since my CPU is relatively modern, I'm guessing it therefore has split L1 cache. If therefore the FFT is free to occupy the entire L1 cache size, then my earlier calculation of 256K worth of FFTs (between the two cores) being able to fit in cache should still hold.

Intel's spec page for my CPU doesn't list the L1 cache size, just L2; are they by chance interrelated so that one can be determined from the other? I did some googling to try to ascertain the difference between L1, L2, and L3, and from what I gather they increase in size but decrease in latency as the number increases. Which ones does Prime95 use for LL test (and other FFT-based worktypes)?

How do I find the latency of my CPU's L2 cache? It doesn't seem to be listed on the spec page; might that be inferrable from other data as well?
mdettweiler is offline   Reply With Quote
Old 2010-05-08, 07:16   #31
S485122
 
S485122's Avatar
 
Sep 2006
Brussels, Belgium

110100001102 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
...
Since my CPU is relatively modern, I'm guessing it therefore has split L1 cache. If therefore the FFT is free to occupy the entire L1 cache size, then my earlier calculation of 256K worth of FFTs (between the two cores) being able to fit in cache should still hold.
...
I did some googling to try to ascertain the difference between L1, L2, and L3, and from what I gather they increase in size but decrease in latency as the number increases. Which ones does Prime95 use for LL test (and other FFT-based worktypes)?

How do I find the latency of my CPU's L2 cache? It doesn't seem to be listed on the spec page; might that be inferrable from other data as well?
First of all some programs will list the specification of your processor. Prime95 and programs like CPU-Z (www.cpuid.com will list some spec's (mainly about size). CPUid had a program called ltency that will give you the latencies of memory and cache. I do not know if it will still work on current CPUs (it works on a Q6700.) I attach it as a zip file, use at your own risk ;-).

According to Intel Intel® Core™2 Extreme Processor X6800 and Intel® Core™2 Duo Desktop Processor E6000 and E4000 Sequence Features the Core2 Duo E4500 has two 32 KiB 8 way associative L1 data caches (one per core) and one 2MiB L2 cache shared by the two cores.

The size AND latency increase with the cache level (a higher latency means slower.)

The size of Prime95 might be round 5 MiB but it doesn't have to fit in cache as a whole, only the much smaller running routine will be in L1 code cache.

Jacob
Attached Files
File Type: zip latency.zip (20.9 KB, 69 views)

Last fiddled with by S485122 on 2010-05-08 at 07:17 Reason: the program is CPU-Z
S485122 is offline   Reply With Quote
Old 2010-05-08, 13:46   #32
lfm
 
lfm's Avatar
 
Jul 2006
Calgary

1101010012 Posts
Default

Quote:
Originally Posted by S485122 View Post
First of all some programs will list the specification of your processor. Prime95 and programs like CPU-Z (www.cpuid.com will list some spec's (mainly about size). CPUid had a program called ltency that will give you the latencies of memory and cache. I do not know if it will still work on current CPUs (it works on a Q6700.) I attach it as a zip file, use at your own risk ;-).

According to Intel Intel® Core™2 Extreme Processor X6800 and Intel® Core™2 Duo Desktop Processor E6000 and E4000 Sequence Features the Core2 Duo E4500 has two 32 KiB 8 way associative L1 data caches (one per core) and one 2MiB L2 cache shared by the two cores.

The size AND latency increase with the cache level (a higher latency means slower.)
For Prime95/mprime latency is largely overcome by pre-fetching I believe. The access time would be more significant.
lfm is offline   Reply With Quote
Old 2010-05-08, 14:06   #33
joblack
 
joblack's Avatar
 
Oct 2008
n00bville

52·29 Posts
Default

Quote:
Originally Posted by mdettweiler View Post
Ah, I see. My copy of Prime95 25.11 is 4.89 MB; does that therefore mean that it needs 4.89 MB + (FFT expressed in megabytes) of cache for the whole thing to fit?
No - there are parts of the program which aren't used all the time (menu option, ...).

The same concept can be found in virtual memory. You almost never use all Microsoft Word functions in one time so Windows can move the code which is responsible to the harddisk (swap out).
joblack is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
AMD Phenom(tm) II X6 1045T KingKurly Lounge 5 2010-10-19 04:01
Phenom II X4 955's have arrived Batalov Hardware 0 2009-04-23 01:23
Phenom 2? uigrad Hardware 12 2009-01-20 20:43
Phenom Phun sdbardwick Hardware 6 2008-08-18 01:39
Phenom question fivemack Hardware 5 2008-08-18 01:30

All times are UTC. The time now is 05:16.

Sun May 16 05:16:25 UTC 2021 up 37 days, 23:57, 0 users, load averages: 1.46, 1.31, 1.25

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.