![]() |
![]() |
#1 |
Bemusing Prompter
"Danny"
Dec 2002
California
2×1,171 Posts |
![]()
Does anyone else think that Intel's upcoming "Nehalem" processors will exceed 100 GFLOPS in performance?
One of the fastest processors currently available is the "Clovertown" 2.66 GHz X5355, which can perform up to 43 GFLOPS. A hypothetical 4 GHz version of this processor would run at about 66 GFLOPS. According to some sources, Intel's "Penryn" (45 nm) chips would be up to 45% faster than any 65 nm ones. It's also believed that "Penryn" processors will go up to 4 GHz. If this is true, quad-core "Penryn" processors could reach 100 GFLOPS. Since "Nehalem" is further down the road, there's not much information about it. However, there are rumors the "Nehalem" series will up to 40% faster than the "Penryn" series. In addition to this, the "Nehalem" series will no longer have a memory bottleneck. Plus, "Nehalem" processors will have integrated GPUs, from which we'll probably be able to squeeze a few GFLOPS of double precision. If all this is true, we should be really excited for next year. |
![]() |
![]() |
![]() |
#2 |
Jun 2003
47·103 Posts |
![]()
I think you're being over optimistic. A 4GHz, 4-core, 4-issue processor will do 4x4x4 = 64 GFLOPS theoretical peak. PERIOD! It doesn't matter whether it is called clovertown or penryn or nehalem or barcelona or whatever.
The "up to 40%" faster or whatever is NOT based on theoretical peak, but based on actual workloads. Actual GFLOPS numbers are improved by architectural improvements like Integrated Memory controller, bigger cache, etc. Theoretical peaks are improved by raw compute capabilities and clock cycle (i.e. ips and GHz and # of cores) An 8-way, 4-GHz nehalem will indeed achieve 128 GFLOPS theoretical peak. I have no idea how the integrated GPU will fit in with this calculation. |
![]() |
![]() |
![]() |
#3 |
Bemusing Prompter
"Danny"
Dec 2002
California
44468 Posts |
![]()
Intel does have a 8-core processor in the works. Unfortunately, it won't be here until 2009. :(
|
![]() |
![]() |
![]() |
#4 |
Oct 2005
23×5 Posts |
![]()
-> A 4GHz, 4-core, 4-issue processor will do 4x4x4 = 64 GFLOPS theoretical peak. PERIOD!
You are assuming a processor is capable of only one floating point operation per clock cycle. This hasn't been true for over a decade. |
![]() |
![]() |
![]() |
#5 |
Jun 2003
47·103 Posts |
![]()
I have used a 4 FLOP/cycle count in the calculation (see bold). This holds for Core 2 architecture onwards for Intel and Barcelona onwards for AMD. P3, P4, P-M, Athlon, Athlon 64, etc were 2-issue processors.
|
![]() |
![]() |
![]() |
#6 |
Oct 2005
2816 Posts |
![]()
Gotcha - My mistake.
|
![]() |
![]() |
![]() |
#7 |
Bemusing Prompter
"Danny"
Dec 2002
California
2×1,171 Posts |
![]()
I'm sorry for the dumb question, but are there any differences between n-way, n-issue, and n-thread processors? Can these descriptions be used interchangeably?
Thanks. |
![]() |
![]() |
![]() |
#8 | |
Apr 2003
Berlin, Germany
192 Posts |
![]() Quote:
The max. number of threads, which can be run on one processor simultaneously is more important to other applications like web servers, databases. HPC needs throughput and is usually able to come close to it without needing to share the computational ressources accross multiple threads. SMT (simultaneous multithreading) is nice to have, if you have a lot of random memory accesses with many stalls due to slow memory or if you can't reach a high a high throughput with your code. The (up to) "45% faster" argument regarding Penryn just means special cases (like video encoding or certain game engines). It doesn't mean, that Penryn offers this advantage in general. More recent (p)reviews have shown an average 10-15% advantage clock-for-clock compared to current Core 2 processors. Part of this is contributed by the larger L2 cache and small enhancements to the core, like several optimized FP instructions (SSE shuffles, FP division and square roots, all not of much use for Prime95). Finally there is a new instruction set extension (SSE4.1) which offers you a lot of additional performance if you can make use of it (e.g. in video encoding). But back to the topic: Nehalem will bring more cores to the table - thus increasing the "n-way", which is useful for Prime95. It will offer a better memory subsystem and processor interconnect, the former more important for home users than the latter. Well, the memory bottleneck won't be gone away, but less severe - DRAM is still a lot slower than the caches. OTOH the Core 2 prefetchers are already good and the caches rather big, that you won't see miracles in memory performance - especially in Prime95, since it already makes so good use of the caches, that memory performance became less of a factor. Maybe Nehalem will have increased FP throughput (some guys at aceshardware.freeforums.org already try to analyze the Nehalem die photo in this regard), but it might be hard to utilize it without either a new SSE extension (256- or 512-bit variants are likely to come some day) or using SMT. Nehalem's SMT will offer again the possibility to do LL testing + TF on the same CPU, but maybe coming with a smaller advantage vs. the single tests. On P4 with HT the SSE2 units had a lower throughput than Core 2's, leaving a lot of the processor's other ressources (especially the integer units) available for TF. Now this SSE2 throughput doubled, while the overall instruction throughput capability increased by only 33% (a bit more, if there are jumps/loops). This has to be tested. But given, that relatively more LL testing instances have to fight both for cache and mem bandwidth, the chances are good, that TF might get a nice share of the CPU's less utilized ressources. |
|
![]() |
![]() |
![]() |
#9 |
"Jason Goatcher"
Mar 2005
66618 Posts |
![]()
I think that explanation was a little much for the question. From the way he phrased it, he is probably unfamiliar with at least two of the words. And so am I.
I did a search on those three terms together, but nothing promising came up. Since the search was an attempted favor to the poster and I'm really not that interested in the terms, I won't attempt more specific searches. Oh, well. ![]() |
![]() |
![]() |
![]() |
#10 |
Bemusing Prompter
"Danny"
Dec 2002
California
1001001001102 Posts |
![]()
Thanks for the detailed explanation, Dresdenboy. :)
|
![]() |
![]() |
![]() |
#11 |
Apr 2003
Berlin, Germany
1011010012 Posts |
![]()
I extended the answer to his last question by what I would have answered to his first posting. So I was referring to more questions than I quoted ;)
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Stockfish game: "Move 8 poll", not "move 3.14159 discussion" | MooMoo2 | Other Chess Games | 5 | 2016-10-22 01:55 |
Aouessare-El Haddouchi-Essaaidi "test": "if Mp has no factor, it is prime!" | wildrabbitt | Miscellaneous Math | 11 | 2015-03-06 08:17 |
Intel i7 ("Nehalem") chips launched | ixfd64 | Hardware | 34 | 2008-11-25 18:22 |
Would Minimizing "iterations between results file" may reveal "is not prime" earlier? | nitai1999 | Software | 7 | 2004-08-26 18:12 |