mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Hardware (https://www.mersenneforum.org/forumdisplay.php?f=9)
-   -   PCLMULQDQ SSE 4.2 enhances speed? (https://www.mersenneforum.org/showthread.php?t=11486)

ldesnogu 2009-02-12 07:17

[quote=ewmayer;162524]The fused floating-point mul/add in AMD's SSE5 could make a big difference, but given the sorry state of AMD's business these days I'm not holding my breath.

The 256-bit-wide Intel SIMD stuff that's coming in a few years ... that could be big, especially if they back it up with a quad-pumped-double-precision-capable chip.[/quote]
I think there's one upcoming chip that will be more interesting: Larrabee, which (according to Intel claims) will support IEEE SP and DP with 512-bit wide registers and FMA. That should please the crowd that claims GPU's can help GIMPS :smile:

fivemack 2009-02-12 14:52

[QUOTE=Robert Holmes;162538]For the record, the 256-bit wide instruction set (AVX) is the same extension that brings PCLMULQDQ and AES* instructions.[/QUOTE]

This is not the case. PCLMULQDQ and AES are coming on Westmere (32nm implementation of the microarchitecture currently shipping as Core i7) at the end of this year; AVX comes on Sandy Bridge (the next microarchitecture) at the end of next year.

fivemack 2009-02-12 14:54

[QUOTE=akruppa;162521]Wouldn't PCLMULQDQ be handy for Matrix-Vector products in BL/BW, at least for the dense part of the matrix?
[/QUOTE]

I don't see how; PCLMULQDQ is

for i from 0 to 63
if ((left>>i)&1) out ^= (right>>i);

but matrix-vector products want

for i from 0 to 63
if ((left>>i)&1) out ^= right[i]

jasonp 2009-02-12 16:19

Oops, Tom is right, the positional shift invalidates the idea.

ewmayer 2009-02-12 17:05

This is a mere niggling aside, but I find it curious that the PMULLD instruction is described as "Packed signed multiplication", since (at least in twos-complement arithmetic) lower-half-multiply doesn't care whether the inputs are treated as signed or unsigned.

TheJudger 2009-02-13 20:40

[QUOTE=ewmayer;162524]The 256-bit-wide Intel SIMD stuff that's coming in a few years ... that could be big, especially if they back it up with a quad-pumped-double-precision-capable chip.[/QUOTE]

what about "two real*16 ops at the same time on the 256bit registers"-chip?

IIRC Intel couldn't do real*16 on the 8087(?) but wanted something better than real*8 so they build real*10...

retina 2009-02-14 01:50

[QUOTE=TheJudger;162728]what about "two real*16 ops at the same time on the 256bit registers"-chip?

IIRC Intel couldn't do real*16 on the 8087(?) but wanted something better than real*8 so they build real*10...[/QUOTE]Oh, do you mean quad precision? The mainstream use of FPUs does not seem to require QP, so the CPU makers don't make it, not enough demand.

Besides, I think it would use a lot of silicon space with such a large mantissa. Perhaps as much as four DP units to make one QP unit?

ewmayer 2009-02-14 22:33

[QUOTE=TheJudger;162728]what about "two real*16 ops at the same time on the 256bit registers"-chip?

IIRC Intel couldn't do real*16 on the 8087(?) but wanted something better than real*8 so they build real*10...[/QUOTE]

If the hardware can average half as many quad-float ops per cycle as double-float, that would be a clear win, since the extra precision (nearly 60 bits more in the significand) means the FFT length needed for a given big-mul input would be cut by a factor of more than 2, closer to 2.5x.

But as retina notes, not enough demand to justify the silicon cost.

Robert Holmes 2009-05-11 23:35

It appears AMD also jumped to the AVX wagon, ditching the initial SSE5 proposal --- Now it's AVX + XOP.

Highlights include 4 operand FMA (as opposed to 3 operand in Intel's AVX), integer multiply-and-add, variable and fixed count rotation and decent integer compare.

These new extensions look nice for cryptographic and number-theoretic purposes.

Link:
[url]http://forums.amd.com/devblog/blogpost.cfm?catid=208&threadid=112934[/url]


All times are UTC. The time now is 22:54.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.