mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2007-10-02, 10:25   #1
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

1000111001002 Posts
Thumbs up "Nehalem" quad-cores faster than 100 GFLOPS?

Does anyone else think that Intel's upcoming "Nehalem" processors will exceed 100 GFLOPS in performance?

One of the fastest processors currently available is the "Clovertown" 2.66 GHz X5355, which can perform up to 43 GFLOPS. A hypothetical 4 GHz version of this processor would run at about 66 GFLOPS. According to some sources, Intel's "Penryn" (45 nm) chips would be up to 45% faster than any 65 nm ones. It's also believed that "Penryn" processors will go up to 4 GHz. If this is true, quad-core "Penryn" processors could reach 100 GFLOPS.

Since "Nehalem" is further down the road, there's not much information about it. However, there are rumors the "Nehalem" series will up to 40% faster than the "Penryn" series. In addition to this, the "Nehalem" series will no longer have a memory bottleneck. Plus, "Nehalem" processors will have integrated GPUs, from which we'll probably be able to squeeze a few GFLOPS of double precision.

If all this is true, we should be really excited for next year.
ixfd64 is offline   Reply With Quote
Old 2007-10-02, 12:35   #2
axn
 
axn's Avatar
 
Jun 2003

463110 Posts
Default

I think you're being over optimistic. A 4GHz, 4-core, 4-issue processor will do 4x4x4 = 64 GFLOPS theoretical peak. PERIOD! It doesn't matter whether it is called clovertown or penryn or nehalem or barcelona or whatever.

The "up to 40%" faster or whatever is NOT based on theoretical peak, but based on actual workloads. Actual GFLOPS numbers are improved by architectural improvements like Integrated Memory controller, bigger cache, etc. Theoretical peaks are improved by raw compute capabilities and clock cycle (i.e. ips and GHz and # of cores)

An 8-way, 4-GHz nehalem will indeed achieve 128 GFLOPS theoretical peak. I have no idea how the integrated GPU will fit in with this calculation.
axn is offline   Reply With Quote
Old 2007-10-02, 20:04   #3
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

22×569 Posts
Default

Intel does have a 8-core processor in the works. Unfortunately, it won't be here until 2009. :(
ixfd64 is offline   Reply With Quote
Old 2007-10-03, 16:54   #4
Ethan Hansen
 
Ethan Hansen's Avatar
 
Oct 2005

1010002 Posts
Default

-> A 4GHz, 4-core, 4-issue processor will do 4x4x4 = 64 GFLOPS theoretical peak. PERIOD!

You are assuming a processor is capable of only one floating point operation per clock cycle. This hasn't been true for over a decade.
Ethan Hansen is offline   Reply With Quote
Old 2007-10-03, 18:23   #5
axn
 
axn's Avatar
 
Jun 2003

11·421 Posts
Default

Quote:
Originally Posted by Ethan Hansen View Post
-> A 4GHz, 4-core, 4-issue processor will do 4x4x4 = 64 GFLOPS theoretical peak. PERIOD!

You are assuming a processor is capable of only one floating point operation per clock cycle. This hasn't been true for over a decade.
I have used a 4 FLOP/cycle count in the calculation (see bold). This holds for Core 2 architecture onwards for Intel and Barcelona onwards for AMD. P3, P4, P-M, Athlon, Athlon 64, etc were 2-issue processors.
axn is offline   Reply With Quote
Old 2007-10-04, 19:00   #6
Ethan Hansen
 
Ethan Hansen's Avatar
 
Oct 2005

508 Posts
Default

Gotcha - My mistake.
Ethan Hansen is offline   Reply With Quote
Old 2007-10-05, 02:44   #7
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

8E416 Posts
Default

I'm sorry for the dumb question, but are there any differences between n-way, n-issue, and n-thread processors? Can these descriptions be used interchangeably?

Thanks.
ixfd64 is offline   Reply With Quote
Old 2007-10-08, 13:19   #8
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

16916 Posts
Default

Quote:
Originally Posted by ixfd64 View Post
I'm sorry for the dumb question, but are there any differences between n-way, n-issue, and n-thread processors? Can these descriptions be used interchangeably?
This is a complex topic. Any single one of these factors brings it's own benefits and problems with it. If we are into HPC (where Prime95 nicely fits into), "n-way" and "n-issue" would be the most important metrics - but "n-issue" should better be replaced by the more specific FP-throughput number, since n-issue just means max. number of instructions being decoded, issued or executed per cycle, which could also mean max. 4 integer ops/cycle, but only max. 2 fp ops/cycle.

The max. number of threads, which can be run on one processor simultaneously is more important to other applications like web servers, databases. HPC needs throughput and is usually able to come close to it without needing to share the computational ressources accross multiple threads.

SMT (simultaneous multithreading) is nice to have, if you have a lot of random memory accesses with many stalls due to slow memory or if you can't reach a high a high throughput with your code.

The (up to) "45% faster" argument regarding Penryn just means special cases (like video encoding or certain game engines). It doesn't mean, that Penryn offers this advantage in general. More recent (p)reviews have shown an average 10-15% advantage clock-for-clock compared to current Core 2 processors. Part of this is contributed by the larger L2 cache and small enhancements to the core, like several optimized FP instructions (SSE shuffles, FP division and square roots, all not of much use for Prime95). Finally there is a new instruction set extension (SSE4.1) which offers you a lot of additional performance if you can make use of it (e.g. in video encoding).

But back to the topic:
Nehalem will bring more cores to the table - thus increasing the "n-way", which is useful for Prime95. It will offer a better memory subsystem and processor interconnect, the former more important for home users than the latter. Well, the memory bottleneck won't be gone away, but less severe - DRAM is still a lot slower than the caches. OTOH the Core 2 prefetchers are already good and the caches rather big, that you won't see miracles in memory performance - especially in Prime95, since it already makes so good use of the caches, that memory performance became less of a factor.

Maybe Nehalem will have increased FP throughput (some guys at aceshardware.freeforums.org already try to analyze the Nehalem die photo in this regard), but it might be hard to utilize it without either a new SSE extension (256- or 512-bit variants are likely to come some day) or using SMT.

Nehalem's SMT will offer again the possibility to do LL testing + TF on the same CPU, but maybe coming with a smaller advantage vs. the single tests. On P4 with HT the SSE2 units had a lower throughput than Core 2's, leaving a lot of the processor's other ressources (especially the integer units) available for TF. Now this SSE2 throughput doubled, while the overall instruction throughput capability increased by only 33% (a bit more, if there are jumps/loops). This has to be tested. But given, that relatively more LL testing instances have to fight both for cache and mem bandwidth, the chances are good, that TF might get a nice share of the CPU's less utilized ressources.
Dresdenboy is offline   Reply With Quote
Old 2007-10-08, 14:20   #9
jasong
 
jasong's Avatar
 
"Jason Goatcher"
Mar 2005

1101101100012 Posts
Default

I think that explanation was a little much for the question. From the way he phrased it, he is probably unfamiliar with at least two of the words. And so am I.

I did a search on those three terms together, but nothing promising came up. Since the search was an attempted favor to the poster and I'm really not that interested in the terms, I won't attempt more specific searches. Oh, well.
jasong is offline   Reply With Quote
Old 2007-10-08, 19:59   #10
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

8E416 Posts
Default

Thanks for the detailed explanation, Dresdenboy. :)
ixfd64 is offline   Reply With Quote
Old 2007-10-10, 08:16   #11
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

192 Posts
Default

Quote:
Originally Posted by jasong View Post
I think that explanation was a little much for the question. From the way he phrased it, he is probably unfamiliar with at least two of the words. And so am I.
I extended the answer to his last question by what I would have answered to his first posting. So I was referring to more questions than I quoted ;)
Dresdenboy is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Stockfish game: "Move 8 poll", not "move 3.14159 discussion" MooMoo2 Other Chess Games 5 2016-10-22 01:55
Aouessare-El Haddouchi-Essaaidi "test": "if Mp has no factor, it is prime!" wildrabbitt Miscellaneous Math 11 2015-03-06 08:17
Intel i7 ("Nehalem") chips launched ixfd64 Hardware 34 2008-11-25 18:22
Would Minimizing "iterations between results file" may reveal "is not prime" earlier? nitai1999 Software 7 2004-08-26 18:12

All times are UTC. The time now is 20:57.

Fri Jul 3 20:57:20 UTC 2020 up 100 days, 18:30, 1 user, load averages: 1.07, 1.41, 1.45

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.