mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   High IPC bad for GIMPS? (https://www.mersenneforum.org/showthread.php?t=4355)

db597 2005-07-14 15:28

High IPC bad for GIMPS?
 
Intel is moving away from the P4/Netburst architecture and towards high IPC designs. Pentium M type chips are going to be move into the desktop arena, and Netburst will soon disappear. Sites like Tomshardware are even applauding the demise of P4/Netburst and are very welcoming of the much cooler Pentium M designs.

This seems to be bad news for GIMPS. So far, it's proven to run very fast on P4/Netburst. Much of the boost being due to the high clockspeeds. I wonder if this move will mean that our GIMPS output won't increase too much in the next couple of years. People will actually upgrade to machines that are SLOWER than their current ones - at least where GIMPS is concerned.

Of course, this is only on the Intel front.

garo 2005-07-14 15:34

Yes, I believe this is correct. We are not going to see faster iteration times but hopefully as dual core processor come through we will see two times the amount of work being done in 1.5 times the time effectively increasing throughput.

But certainly the glory days from the Willamette to the Northwood with doubling of processor speeds in 18 months are gone.

Dresdenboy 2005-07-15 15:17

Higher IPC (if this is true for the main set of instructions used by Prime95 like SSE2 or x87 ops) is not bad. What we are currently seeing is either[list][*]something like a somewhat similar IPC (of FP code, because the throughput is the same, but there are some small differences in latencies) - but at a lower clock (K7/K8 vs. P4), or[*]a lower IPC (of FP code again) and a lower clock (P-M vs. P4)[/list]
Future P-M derivates like Yonah will have somewhat improved FPU-IPC. Some early benchmark results (Cinebench) have shown per-clock performance improvements, but in that case there were too many factors, which contribute to a different performance. Besides core changes there were a different cache design, FSB, chipset and maybe even a different type of memory used in this test than in the Dothan based system, which has been used for comparison.

But both at Intel and AMD the CPU designs won't stop at the current throughput of 1 fp mul + 1 fp add per cycle per core. Besides increasing the number of execution units by increasing the number of cores, there will also come designs with increased throughput per core.

An Intel CPU developer guy, who switched to AMD recently, wrote a dissertation about multiscalar CPUs. These (if implemented in hardware eventually) could execute parts of a single thread, which would usually be executed serially, in parallel as long as their data dependencies and other conditions permit that. This implies, that such CPUs would have much more execution units than now. Even different loop iterations could be executed in parallel. But even without such dramatically new architectures, we will see at least higher throughput architectures.

Alone from going to full width 128 bit SSEn operation in hardware we would get twice the throughput per clock, than we have now. Currently P4, K8, P-M do their 128 bit SSEn operations as two 64 bit operations, which are executed sequentially by the units. IIRC, only the latest VIA CPU (C7 or Esther) is capable of doing full 128 bit SSEn operations.

TheJudger 2005-07-15 20:21

Intels Netburst has a very high raw power (compared to AMDs K8-Core and Intels Pentium-M core)... _BUT_ it is _VERY_ hard to use this power cause of the very deep pipeline of the Netburst cores...
Typical "desktop-application" utilities the CPUs not very efficent the penalties for "bad code" on K8 and Pentium-M are much smaller than on P4s

@Dresdenboy
I'm very sure that Netburst CPUs do 128bit SSE operations... it would be
hard to imagin that a P4 does more than one FLOP per clock in the Linpack
benchmarks (solves a dense linear system using double-precission floats (64bit))
without 128bit ;)

Dresdenboy 2005-07-15 20:47

[QUOTE=TheJudger]Intels Netburst has a very high raw power (compared to AMDs K8-Core and Intels Pentium-M core)... _BUT_ it is _VERY_ hard to use this power cause of the very deep pipeline of the Netburst cores...
Typical "desktop-application" utilities the CPUs not very efficent the penalties for "bad code" on K8 and Pentium-M are much smaller than on P4s[/QUOTE]The reason for Netburst's "raw power" is its clock frequency, which is the reason for the long pipeline. It's more a "throughput architecture", which doesn't like branches and tight dependency chains that much as a result. With "raw power" compared to the other 2 mentioned cores I assume, you mean the Prime95 LLR testing performance, because the image one gets from performance of these cores running other applications is different. Otherwise both P-M and K8 - and even more the K8 in 64 bit mode - the raw power of the Netburst doesn't show up. :wink:

[QUOTE=TheJudger]@Dresdenboy
I'm very sure that Netburst CPUs do 128bit SSE operations... it would be
hard to imagin that a P4 does more than one FLOP per clock in the Linpack
benchmarks (solves a dense linear system using double-precission floats (64bit)) without 128bit ;)[/QUOTE]Don't forget, that Linpack uses MUL and ADD, which can be executed in parallel on all 3 architectures mentioned here. So your proof does not count :smile: Especially because the math libraries used for Linpack benchmarking (like ATLAS, MKL, ACML) are so efficient in using cache blocking and have very good optimized kernels, that they perform at 90% or more efficiency - even for large arrays because of cache blocking (which Prime95 is using also).

For some details about the way, how a Netburst CPU executes 128 bit SSE2 instructions, you can read here: [url]http://chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html#1.5[/url]


All times are UTC. The time now is 00:20.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.