mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2009-02-12, 07:17   #12
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

22616 Posts
Default

Quote:
Originally Posted by ewmayer View Post
The fused floating-point mul/add in AMD's SSE5 could make a big difference, but given the sorry state of AMD's business these days I'm not holding my breath.

The 256-bit-wide Intel SIMD stuff that's coming in a few years ... that could be big, especially if they back it up with a quad-pumped-double-precision-capable chip.
I think there's one upcoming chip that will be more interesting: Larrabee, which (according to Intel claims) will support IEEE SP and DP with 512-bit wide registers and FMA. That should please the crowd that claims GPU's can help GIMPS
ldesnogu is offline   Reply With Quote
Old 2009-02-12, 14:52   #13
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

11001000110002 Posts
Default

Quote:
Originally Posted by Robert Holmes View Post
For the record, the 256-bit wide instruction set (AVX) is the same extension that brings PCLMULQDQ and AES* instructions.
This is not the case. PCLMULQDQ and AES are coming on Westmere (32nm implementation of the microarchitecture currently shipping as Core i7) at the end of this year; AVX comes on Sandy Bridge (the next microarchitecture) at the end of next year.
fivemack is offline   Reply With Quote
Old 2009-02-12, 14:54   #14
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

23×11×73 Posts
Default

Quote:
Originally Posted by akruppa View Post
Wouldn't PCLMULQDQ be handy for Matrix-Vector products in BL/BW, at least for the dense part of the matrix?
I don't see how; PCLMULQDQ is

for i from 0 to 63
if ((left>>i)&1) out ^= (right>>i);

but matrix-vector products want

for i from 0 to 63
if ((left>>i)&1) out ^= right[i]
fivemack is offline   Reply With Quote
Old 2009-02-12, 16:19   #15
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

354310 Posts
Default

Oops, Tom is right, the positional shift invalidates the idea.

Last fiddled with by jasonp on 2009-02-12 at 16:20
jasonp is offline   Reply With Quote
Old 2009-02-12, 17:05   #16
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

19×613 Posts
Default

This is a mere niggling aside, but I find it curious that the PMULLD instruction is described as "Packed signed multiplication", since (at least in twos-complement arithmetic) lower-half-multiply doesn't care whether the inputs are treated as signed or unsigned.
ewmayer is online now   Reply With Quote
Old 2009-02-13, 20:40   #17
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

100010101112 Posts
Default

Quote:
Originally Posted by ewmayer View Post
The 256-bit-wide Intel SIMD stuff that's coming in a few years ... that could be big, especially if they back it up with a quad-pumped-double-precision-capable chip.
what about "two real*16 ops at the same time on the 256bit registers"-chip?

IIRC Intel couldn't do real*16 on the 8087(?) but wanted something better than real*8 so they build real*10...
TheJudger is offline   Reply With Quote
Old 2009-02-14, 01:50   #18
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

22·32·173 Posts
Default

Quote:
Originally Posted by TheJudger View Post
what about "two real*16 ops at the same time on the 256bit registers"-chip?

IIRC Intel couldn't do real*16 on the 8087(?) but wanted something better than real*8 so they build real*10...
Oh, do you mean quad precision? The mainstream use of FPUs does not seem to require QP, so the CPU makers don't make it, not enough demand.

Besides, I think it would use a lot of silicon space with such a large mantissa. Perhaps as much as four DP units to make one QP unit?
retina is offline   Reply With Quote
Old 2009-02-14, 22:33   #19
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

101101011111112 Posts
Default

Quote:
Originally Posted by TheJudger View Post
what about "two real*16 ops at the same time on the 256bit registers"-chip?

IIRC Intel couldn't do real*16 on the 8087(?) but wanted something better than real*8 so they build real*10...
If the hardware can average half as many quad-float ops per cycle as double-float, that would be a clear win, since the extra precision (nearly 60 bits more in the significand) means the FFT length needed for a given big-mul input would be cut by a factor of more than 2, closer to 2.5x.

But as retina notes, not enough demand to justify the silicon cost.
ewmayer is online now   Reply With Quote
Old 2009-05-11, 23:35   #20
Robert Holmes
 
Robert Holmes's Avatar
 
Oct 2007

2·53 Posts
Default

It appears AMD also jumped to the AVX wagon, ditching the initial SSE5 proposal --- Now it's AVX + XOP.

Highlights include 4 operand FMA (as opposed to 3 operand in Intel's AVX), integer multiply-and-add, variable and fixed count rotation and decent integer compare.

These new extensions look nice for cryptographic and number-theoretic purposes.

Link:
http://forums.amd.com/devblog/blogpo...hreadid=112934
Robert Holmes is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Different Speed in different OS's Dubslow Software 11 2011-08-02 00:04
TF speed Unregistered Information & Answers 10 2011-07-27 12:34
Changes to the speed of light. Flatlander Homework Help 67 2011-01-22 13:37
CPU Speed Incorrect AZMango Software 8 2010-03-20 21:55
40 Times the speed of PC's lpmurray Hardware 2 2007-02-17 19:53

All times are UTC. The time now is 22:38.


Fri Aug 6 22:38:26 UTC 2021 up 14 days, 17:07, 1 user, load averages: 4.35, 3.84, 3.53

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.