mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2004-10-10, 02:17   #1
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

22×2,089 Posts
Default Very interesting K8 paper...

http://www.cpuid.com/K8/index.php

After reading this, I can see why Prime95/mprime might be hard to optimize for the K8...

Pay close attention to the inclusive cache design, especially how on K8 an L1 hit involves writing to the L2... The P4's exclusive design is more efficient in this regard... Also look at the L1 latency... The P4 has a very small L1, but it has a very low latency... Finally, look at the SSE2 section labeled "Floating points calculations: x87, SSE and SSE2"... I barely understand it, but it looks like the Intel wins by a large margin in this category...

The second half of the paper shows great promise for the extra registers and stuff, but I don't know how those will affect our work...

I guess when you compare the K8 to the P4, you get two totally different ways of doing things, that eventually accomplish the same goal... It just looks like the P4 way happens to be more efficient for SIMD stuff, like Prime95...

I don't know if a new client, written from the ground up for the K8, would do better than the P4 client we have now... I'm thinking probably not... As it is, this paper explains why a P4 is faster in Prime95 than an equally clocked K8... Just making them equal at the same clock speed looks tough...

I do hope that the extra registers and 64-bit stuff will help general scientific computing in the future though...

One other thing I picked up from this article is if you are going to buy a K8, and you have a choice between a smaller cache model that is clocked higher and a larger cache model that is clocked lower, go for the higher clocked model... The cache part of that paper makes it very clear that the L2 size isn't that important in the overall design... (I learn this after I specifically bought a 1MB L2 3200+... Doh!)

Personally, I tend to agree more with the K8 design than the P4 design... Yes, the P4 is very fast for some tasks, but overall, it looks like the K7/K8 is a better "general purpose" CPU... And we all know that the P4 design is starting to show its limits, and we haven't even officially hit 4GHz yet... In fact, Intel plans to scrap the P4 design pretty soon...

Please read this paper and let us know your thoughts...

Edit: I mirrored the zip file they linked at the bottom of that article since it had a bad URL...
Attached Files
File Type: zip K8.zip (361.1 KB, 235 views)
Xyzzy is offline   Reply With Quote
Old 2004-10-10, 12:54   #2
PrimeCruncher
 
PrimeCruncher's Avatar
 
Sep 2003
Borg HQ, Delta Quadrant

2×33×13 Posts
Default

Quote:
Originally Posted by Xyzzy
http://www.cpuid.com/K8/index.php
And we all know that the P4 design is starting to show its limits, and we haven't even officially hit 4GHz yet... In fact, Intel plans to scrap the P4 design pretty soon...
Indeed! And didn't Intel say the P4 architecture would extend to 6-10GHz?
PrimeCruncher is offline   Reply With Quote
Old 2004-10-12, 11:37   #3
Jeff Gilchrist
 
Jeff Gilchrist's Avatar
 
Jun 2003
Ottawa, Canada

3×17×23 Posts
Default

Very cool article.
Jeff Gilchrist is offline   Reply With Quote
Old 2004-10-12, 20:07   #4
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

170018 Posts
Default

Quote:
Originally Posted by PrimeCruncher
didn't Intel say the P4 architecture would extend to 6-10GHz?
The scary Intel news out of trade rags is that they like the Pentium-M architecture and are looking into incorporating many of its features in future CPUs. The Pentium-M is horrible at prime95. I hope that in two years time we aren't looking back fondly on those great Northwood processors...
Prime95 is offline   Reply With Quote
Old 2004-10-15, 01:49   #5
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

22×2,089 Posts
Default

Quote:
Originally Posted by Prime95
The scary Intel news out of trade rags is that they like the Pentium-M architecture and are looking into incorporating many of its features in future CPUs. The Pentium-M is horrible at prime95. I hope that in two years time we aren't looking back fondly on those great Northwood processors...
http://arstechnica.com/news.ars/post/20041014-4311.html
Xyzzy is offline   Reply With Quote
Old 2004-10-15, 04:45   #6
moo
 
moo's Avatar
 
Jul 2004
Nowhere

809 Posts
Default

intresting lol poor intel but i bet u could oc those 3.8 up to mabey 4 to 4.2
also alienware seams to off 4 ghzs system useing the intel p4 proibly over clocked funneyest part of it all is the fact that they water cool it meaning those things are kicking out more heat then the older ones :)... probily?
moo is offline   Reply With Quote
Old 2004-10-20, 03:12   #7
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

100000101001002 Posts
Default

Quote:
Originally Posted by Prime95
The scary Intel news out of trade rags is that they like the Pentium-M architecture and are looking into incorporating many of its features in future CPUs. The Pentium-M is horrible at prime95. I hope that in two years time we aren't looking back fondly on those great Northwood processors...
http://www.theinquirer.net/?article=19105
http://www.theinquirer.net/?article=19110
Xyzzy is offline   Reply With Quote
Old 2004-11-02, 15:28   #8
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

202448 Posts
Default

http://developers.slashdot.org/devel...2/050232.shtml
Xyzzy is offline   Reply With Quote
Old 2004-11-18, 20:24   #9
Joe O
 
Joe O's Avatar
 
Aug 2002

3×52×7 Posts
Default

The real reason the AMD 64 and the Pentium-M lag behind the Pentium-IV is that they both use 80 bit Wallace trees for Floating Point operations whereas the Pentium-IV uses 128 bit Wallace trees.
Joe O is offline   Reply With Quote
Old 2004-11-19, 00:04   #10
lycorn
 
lycorn's Avatar
 
"GIMFS"
Sep 2002
Oeiras, Portugal

2×5×151 Posts
Default

Absolutely...
That´s why the SSE2 implementation in the P4s is superior to the one at Ath64
Let´s hope that the forthcoming Win64 will give AMD a push
lycorn is online now   Reply With Quote
Old 2004-11-23, 08:24   #11
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

192 Posts
Default

Quote:
Originally Posted by Joe O
The real reason the AMD 64 and the Pentium-M lag behind the Pentium-IV is that they both use 80 bit Wallace trees for Floating Point operations whereas the Pentium-IV uses 128 bit Wallace trees.
Definitely not.

The Pentium 4 has other advantages, but not a 128 bit Wallace tree. The same idea came up in another forum (sudhian?) but is plain wrong, since the P4 still does the 2 calculations in a SSE2 vector operation one after another. That's why the throughput is 2 for such ops and not 1 (which it would be with a wider Wallace tree).

But there is hope:
  • Intel did not say, that Netburst is dead (Pat Gelsinger wouldn't allow that ), but will just go multicore.
  • AMD said in their Optimization manual, that we have to expect wider execution ressources for vector operations in future CPUs. And they will go multicore too.
Dresdenboy is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Finding a paper CRGreathouse Information & Answers 1 2010-08-17 22:32
LLT for Fermats : need a paper T.Rex Math 3 2010-01-06 19:47
An interesting paper: Pomerance-Lucas T.Rex Math 5 2009-01-30 22:50
Need a paper! Citrix Math 21 2005-12-18 08:45
Composing a paper devarajkandadai Miscellaneous Math 4 2005-03-30 10:26

All times are UTC. The time now is 23:06.


Thu Dec 2 23:06:10 UTC 2021 up 132 days, 17:35, 1 user, load averages: 2.24, 1.56, 1.34

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.