mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   Upcoming Prime95 monsters (processors) (https://www.mersenneforum.org/showthread.php?t=5593)

Dresdenboy 2006-03-10 12:19

Upcoming Prime95 monsters (processors)
 
This week definitely was Conroe's week, when we're speaking about processors. This CPU, which will later this year, will cause a boost in Prime95 performance per clock thanks to full width 128 bit SSE execution with throughput of 1/cycle and the bigger and better cache subsystem.

It looks like AMD's next core with improved FPU will arrive not earlier than in 2007.

For a start a nice article on Realworldtech:
[url]http://www.realworldtech.com/page.cfm?ArticleID=RWT030906143144&p=1[/url]

dsouza123 2006-03-10 15:53

Will they Merom/Conroe/Woodcrest (mobile/desktop/server)
have the extra SSE2 registers like the Athlon64/Opteron ?

(Are the extra SSE2 (AMD) for 64 bit modes only or also in a 32 bit OS ?)

Is the 128 bit SSE also for SSE2 ?

Is it only multiple data, ie 2 quad words (64 bit data) or 4 dwords (32 bit data) etc,
or is there a 128 data type ?

What are the L2 cache sizes ?

Dresdenboy 2006-03-10 16:24

[QUOTE=dsouza123]Will they Merom/Conroe/Woodcrest (mobile/desktop/server)
have the extra SSE2 registers like the Athlon64/Opteron ?[/QUOTE]
Yes, they include the x64 stuff.

[QUOTE=dsouza123](Are the extra SSE2 (AMD) for 64 bit modes only or also in a 32 bit OS ?)[/QUOTE]I assume, there won't be any exception here.

[QUOTE=dsouza123]Is the 128 bit SSE also for SSE2 ?[/QUOTE]It's for the whole lot of SSEn implementations. Else they would have wasted ressources.

[QUOTE=dsouza123]Is it only multiple data, ie 2 quad words (64 bit data) or 4 dwords (32 bit data) etc,
or is there a 128 data type ?[/QUOTE]Maybe in SSE4. But so far it will just be compatible to SSEn with n up to 3 like these extensions are implemented on existing architectures.

[QUOTE=dsouza123]What are the L2 cache sizes ?[/QUOTE]Conroe has a shared 4 MB L2 cache. If a task on core 1 needs more cache than the task on core 2, then the first task will also be able to utilize more of the L2 cache.

Also L1-L1 connections between the cores are better and the 64 bit implementations will surely be better than on Prescott. This could also mean faster running 64 bit TF code.

nngs 2006-03-10 18:01

[QUOTE=Dresdenboy]This week definitely was Conroe's week, when we're speaking about processors. This CPU, which will later this year, will cause a boost in Prime95 performance per clock thanks to full width 128 bit SSE execution with throughput of 1/cycle and the bigger and better cache subsystem.

It looks like AMD's next core with improved FPU will arrive not earlier than in 2007.

For a start a nice article on Realworldtech:
[url]http://www.realworldtech.com/page.cfm?ArticleID=RWT030906143144&p=1[/url][/QUOTE]

Quoted from the article
[QUOTE]...However, the bottom line is that we expect the Core microarchitecture to provide a 20-40% performance boost over the prior generation products, and more in certain cases. At the same time, [COLOR="Red"]power consumption will drop dramatically for the desktop and server devices, in the range of 30-40% and possibly more[/COLOR]. As a result, the performance/watt will improve substantially for Intel...[/QUOTE]

very attractive to GIMPS farmers :w00t:

Prime95 2006-03-10 20:33

[QUOTE=Dresdenboy]For a start a nice article on Realworldtech:
[url]http://www.realworldtech.com/page.cfm?ArticleID=RWT030906143144&p=1[/url][/QUOTE]

A nice article. While the full 128-bit SSEn implementation with FADD and FMUL on separate ports looks very, very promising, this will likely shift the GIMPS bottleneck to another part of the CPU. For example, if the latency on add/mul is high, then the bottleneck will become "register pressure" (not enough SSE2 registers to schedule independent floating point operations). If the add/mul latency is reasonable, then the bottleneck will move to how fast data can be stored and loaded -- L1 and L2 cache latency & bandwidth may be the bottleneck.

In any event, it will be interesting to read more and get some benchmarks in the coming months!

Jeff Gilchrist 2006-03-10 20:52

Yes, I saw them mention that SSE instructions would now take 1 clock cycle instead of the 2 cycles on average before. I figured that should give GIMPS a nice speed boost assuming that another bottleneck didn't get hit really fast.

It will be interesting to see.

ColdFury 2006-03-10 21:00

[QUOTE]For example, if the latency on add/mul is high, then the bottleneck will become "register pressure" (not enough SSE2 registers to schedule independent floating point operations).[/QUOTE]

All the more reason to write an AMD64/EMT64T version!

Prime95 2006-03-10 22:28

[QUOTE=Jeff Gilchrist]I saw them mention that SSE instructions would now take 1 clock cycle instead of the 2 cycles on average before. [/QUOTE]

Just to clarify, the "1 clock cycle" figure is for maximum throughput in a pipelined architecture. Latency refers to how fast a single add or mul operation takes. The doubling in maximum thoughput is definitely good news but won't result in a doubling of prime95 speed.

BTW, AMD has typically been a clock or two faster in latency with the AMD64 and P4 equal in throughput.

Prime95 2006-03-10 22:34

[QUOTE=ColdFury]All the more reason to write an AMD64/EMT64T version![/QUOTE]

Uh, raise your hand if you are running 64-bit Windows.... I don't see many hands raised :rant:

ColdFury 2006-03-10 23:06

[QUOTE=Prime95]Uh, raise your hand if you are running 64-bit Windows.... I don't see many hands raised :rant:[/QUOTE]

True, but lucky people could at least run mprime on Linux. :flex:

Prime95 2006-03-11 00:10

[QUOTE=ColdFury]True, but lucky people could at least run mprime on Linux. :flex:[/QUOTE]

Not until binutils is upgraded to support 64-bit COFF object files


All times are UTC. The time now is 04:33.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.