![]() |
|
|
#540 | |
|
Serpentine Vermin Jar
Jul 2014
7·11·43 Posts |
Quote:
It *seems* to me that the L4 cache should help, but only if an LL test is memory limited and not CPU limited. Has anyone run a benchmark and ramped up the # of threads in a single worker to see how it scales? The issue I always see on many-core systems is that the 2nd core might come close to doubling iteration times, but beyond that you start to see smaller and smaller gains. The working assumption I have is that memory starts to be the bottleneck. With a 128MB L4 cache and the higher bandwidth it offers, it seems like you could have 4 threads in a single worker and get closer to 4 times the performance of a single core? I was trying to lookup the specs on Crystalwell... 1600 MHz, 128-bit path, so it's not super fast or anything (not as fast as the L2/L3) but still decent bandwidth compared to main memory I gather? The one thing that stuck out to me was that the eDRAM is still shared with the GPU, so if you really expect to use *ALL* of that for the CPU, you need to have a discrete GPU installed on the system so the L4 can be just for CPU. I was mostly just curious about the technology... I don't have plans to buy a Haswell with that feature, but I ran across something about it and it caught my eye. |
|
|
|
|
|
|
#541 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
2·53·71 Posts |
I have a Crystalwell, bandwidth benchmarks attached (running in Windows using VMware). Highlights:
L3 (3rd Level) Data/Unified Cache : 91.27GB/s L4 (4th Level) Data/Unified Cache : 33GB/s 256MB Data Set : 10.88GB/s (8.7GB/s - 10.88GB/s) (DDR3-1600 memory) Alas, the L4 cache did not substantially change prime95's performance. I still get more throughput running one worker per core rather than 1 or 2 multithreaded workers. This may because it is a Mac laptop -- Apple has been known to do weird things with OS tuning and users are not allowed to tamper with any BIOS configurations. Also, does anyone know how to measure or estimate delays introduced by maintaining cache coherence? I'm wondering if that could be one of the problems with prime95 multi-thread performance. Last fiddled with by Prime95 on 2015-10-04 at 20:45 |
|
|
|
|
|
#542 | |
|
"/X\(‘-‘)/X\"
Jan 2013
22×733 Posts |
Quote:
|
|
|
|
|
|
|
#543 |
|
Romulan Interpreter
Jun 2011
Thailand
961110 Posts |
That is a brilliant document sir! Thanks for sharing it!
![]() edit: To bring a bit of contribution, this paper (about floating numbers) is referred few times (and given in the citations), I googled it and it is very interesting to read too. It explains what the floats are, what they do, why they do it, and if their parents know... It is one of the first links google gives, I assume that a deeper google dig can find a better format (this format is a bit difficult to read due to the fact that almost all text is underlined, at least in my pdf viewer). Last fiddled with by LaurV on 2015-10-05 at 02:33 |
|
|
|
|
|
#544 | |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
11100001101012 Posts |
Quote:
|
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Haswell-E Prelim. Benchmark | sdbardwick | Hardware | 37 | 2015-02-10 18:49 |
| Prime95 and Haswell | Pleco | Information & Answers | 22 | 2014-07-13 16:03 |
| Haswell Rig | Mini-Geek | Hardware | 64 | 2014-05-27 13:22 |
| Prime95 version 27.1 early preview, not-even-close-to-beta release | Prime95 | Software | 126 | 2012-02-09 16:17 |
| Missing mouse-over preview text | retina | Forum Feedback | 1 | 2011-09-12 15:32 |