mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2005-07-14, 15:28   #1
db597
 
db597's Avatar
 
Jan 2003

7·29 Posts
Default High IPC bad for GIMPS?

Intel is moving away from the P4/Netburst architecture and towards high IPC designs. Pentium M type chips are going to be move into the desktop arena, and Netburst will soon disappear. Sites like Tomshardware are even applauding the demise of P4/Netburst and are very welcoming of the much cooler Pentium M designs.

This seems to be bad news for GIMPS. So far, it's proven to run very fast on P4/Netburst. Much of the boost being due to the high clockspeeds. I wonder if this move will mean that our GIMPS output won't increase too much in the next couple of years. People will actually upgrade to machines that are SLOWER than their current ones - at least where GIMPS is concerned.

Of course, this is only on the Intel front.
db597 is offline   Reply With Quote
Old 2005-07-14, 15:34   #2
garo
 
garo's Avatar
 
Aug 2002
Termonfeckin, IE

2×5×251 Posts
Default

Yes, I believe this is correct. We are not going to see faster iteration times but hopefully as dual core processor come through we will see two times the amount of work being done in 1.5 times the time effectively increasing throughput.

But certainly the glory days from the Willamette to the Northwood with doubling of processor speeds in 18 months are gone.
garo is offline   Reply With Quote
Old 2005-07-15, 15:17   #3
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

192 Posts
Default

Higher IPC (if this is true for the main set of instructions used by Prime95 like SSE2 or x87 ops) is not bad. What we are currently seeing is either
  • something like a somewhat similar IPC (of FP code, because the throughput is the same, but there are some small differences in latencies) - but at a lower clock (K7/K8 vs. P4), or
  • a lower IPC (of FP code again) and a lower clock (P-M vs. P4)

Future P-M derivates like Yonah will have somewhat improved FPU-IPC. Some early benchmark results (Cinebench) have shown per-clock performance improvements, but in that case there were too many factors, which contribute to a different performance. Besides core changes there were a different cache design, FSB, chipset and maybe even a different type of memory used in this test than in the Dothan based system, which has been used for comparison.

But both at Intel and AMD the CPU designs won't stop at the current throughput of 1 fp mul + 1 fp add per cycle per core. Besides increasing the number of execution units by increasing the number of cores, there will also come designs with increased throughput per core.

An Intel CPU developer guy, who switched to AMD recently, wrote a dissertation about multiscalar CPUs. These (if implemented in hardware eventually) could execute parts of a single thread, which would usually be executed serially, in parallel as long as their data dependencies and other conditions permit that. This implies, that such CPUs would have much more execution units than now. Even different loop iterations could be executed in parallel. But even without such dramatically new architectures, we will see at least higher throughput architectures.

Alone from going to full width 128 bit SSEn operation in hardware we would get twice the throughput per clock, than we have now. Currently P4, K8, P-M do their 128 bit SSEn operations as two 64 bit operations, which are executed sequentially by the units. IIRC, only the latest VIA CPU (C7 or Esther) is capable of doing full 128 bit SSEn operations.
Dresdenboy is offline   Reply With Quote
Old 2005-07-15, 20:21   #4
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

33×41 Posts
Default

Intels Netburst has a very high raw power (compared to AMDs K8-Core and Intels Pentium-M core)... _BUT_ it is _VERY_ hard to use this power cause of the very deep pipeline of the Netburst cores...
Typical "desktop-application" utilities the CPUs not very efficent the penalties for "bad code" on K8 and Pentium-M are much smaller than on P4s

@Dresdenboy
I'm very sure that Netburst CPUs do 128bit SSE operations... it would be
hard to imagin that a P4 does more than one FLOP per clock in the Linpack
benchmarks (solves a dense linear system using double-precission floats (64bit))
without 128bit ;)
TheJudger is offline   Reply With Quote
Old 2005-07-15, 20:47   #5
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

16916 Posts
Default

Quote:
Originally Posted by TheJudger
Intels Netburst has a very high raw power (compared to AMDs K8-Core and Intels Pentium-M core)... _BUT_ it is _VERY_ hard to use this power cause of the very deep pipeline of the Netburst cores...
Typical "desktop-application" utilities the CPUs not very efficent the penalties for "bad code" on K8 and Pentium-M are much smaller than on P4s
The reason for Netburst's "raw power" is its clock frequency, which is the reason for the long pipeline. It's more a "throughput architecture", which doesn't like branches and tight dependency chains that much as a result. With "raw power" compared to the other 2 mentioned cores I assume, you mean the Prime95 LLR testing performance, because the image one gets from performance of these cores running other applications is different. Otherwise both P-M and K8 - and even more the K8 in 64 bit mode - the raw power of the Netburst doesn't show up.

Quote:
Originally Posted by TheJudger
@Dresdenboy
I'm very sure that Netburst CPUs do 128bit SSE operations... it would be
hard to imagin that a P4 does more than one FLOP per clock in the Linpack
benchmarks (solves a dense linear system using double-precission floats (64bit)) without 128bit ;)
Don't forget, that Linpack uses MUL and ADD, which can be executed in parallel on all 3 architectures mentioned here. So your proof does not count Especially because the math libraries used for Linpack benchmarking (like ATLAS, MKL, ACML) are so efficient in using cache blocking and have very good optimized kernels, that they perform at 90% or more efficiency - even for large arrays because of cache blocking (which Prime95 is using also).

For some details about the way, how a Netburst CPU executes 128 bit SSE2 instructions, you can read here: http://chip-architect.com/news/2003_..._Core.html#1.5
Dresdenboy is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
High weight k's kar_bon Riesel Prime Data Collecting (k*2^n-1) 26 2013-09-11 23:12
how high will CRUS go Mini-Geek Conjectures 'R Us 1 2010-11-08 20:50
High CPU usage Primix Hardware 2 2008-07-20 23:44
Very high weight k robert44444uk Riesel Prime Search 22 2007-12-18 20:27
GIMPS get's mentioned at High School math banquet Kevin Lounge 1 2003-03-10 14:01

All times are UTC. The time now is 02:44.

Thu Oct 29 02:44:47 UTC 2020 up 48 days, 23:55, 1 user, load averages: 1.83, 1.76, 1.67

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.