mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2009-02-11, 18:28   #1
Joshua2
 
Joshua2's Avatar
 
Sep 2004

13·41 Posts
Default PCLMULQDQ SSE 4.2 enhances speed?

Intel just announced a new carryless multiply and AES enhancements. It sounds like these are for multiplying large numbers. It seems that this has a potential for great improvements to GIMPS and mathematics projects in general. See http://anandtech.com/cpuchipsets/int...spx?i=3513&p=7
Joshua2 is offline   Reply With Quote
Old 2009-02-11, 18:41   #2
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DD716 Posts
Default

In this context, "carryless multiply" means integer multiplication in the finite field with two elements, so basically an integer multiply where partial products are XOR'ed together rather than added. Elliptic curve crypto becomes enormously faster with this operation, but large-number arithmetic doesn't benefit at all.
jasonp is offline   Reply With Quote
Old 2009-02-11, 20:11   #3
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

19·613 Posts
Default

The one SSE4.2 op which might be useful to big-int arithmetic is PCMPGTQ - funny how they decided that 64-bit-int test-equality (PCMPEQQ) was worth having in 4.1 but only added the greater-than check (which alas comes only in signed form) later.

The other ops in 4.1 that could be useful are the ROUND ops, as well as PMULLD (4-way 32x32-bit low-half-of-product mul). Too bad there is no 4-way upper-half analog.
ewmayer is online now   Reply With Quote
Old 2009-02-11, 21:17   #4
Joshua2
 
Joshua2's Avatar
 
Sep 2004

13·41 Posts
Default

What is the highest optimization in GIMPS or other mathematics software? I suppose ones that you compile yourself can use all the optimizations...
Joshua2 is offline   Reply With Quote
Old 2009-02-11, 21:22   #5
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

101101011111112 Posts
Default

Quote:
Originally Posted by Joshua2 View Post
What is the highest optimization in GIMPS or other mathematics software? I suppose ones that you compile yourself can use all the optimizations...
I'm not sure I understand the question ... you mean compile-time optimization? Or within the code itself?
ewmayer is online now   Reply With Quote
Old 2009-02-11, 22:35   #6
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

2·52·11 Posts
Default

Quote:
Originally Posted by Joshua2 View Post
What is the highest optimization in GIMPS or other mathematics software? I suppose ones that you compile yourself can use all the optimizations...
If you mean what instructions are used that have impact on performance, then it's independent of the compiler, since the time critical code is written in assembly language.
ldesnogu is offline   Reply With Quote
Old 2009-02-11, 22:38   #7
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

246710 Posts
Default

Wouldn't PCLMULQDQ be handy for Matrix-Vector products in BL/BW, at least for the dense part of the matrix?

Alex
akruppa is offline   Reply With Quote
Old 2009-02-11, 23:06   #8
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

23×11×73 Posts
Default

I guess Joshua's question is 'which instruction set does Prime95 use', to which I think the answer is SSE2 because nothing subsequently has helped all that much at double precision.
fivemack is offline   Reply With Quote
Old 2009-02-11, 23:24   #9
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

19·613 Posts
Default

Quote:
Originally Posted by fivemack View Post
I guess Joshua's question is 'which instruction set does Prime95 use', to which I think the answer is SSE2 because nothing subsequently has helped all that much at double precision.
Yeah, last time I talked to George about SSE-related stuff, he only mentioned the SSE4 ROUND instructions as being of interest - but I can't see that knocking more than 1-2% off an LL test timing.

The fused floating-point mul/add in AMD's SSE5 could make a big difference, but given the sorry state of AMD's business these days I'm not holding my breath.

The 256-bit-wide Intel SIMD stuff that's coming in a few years ... that could be big, especially if they back it up with a quad-pumped-double-precision-capable chip.
ewmayer is online now   Reply With Quote
Old 2009-02-12, 02:26   #10
Robert Holmes
 
Robert Holmes's Avatar
 
Oct 2007

2·53 Posts
Default

For the record, the 256-bit wide instruction set (AVX) is the same extension that brings PCLMULQDQ and AES* instructions.
Robert Holmes is offline   Reply With Quote
Old 2009-02-12, 03:20   #11
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3×1,181 Posts
Default

Quote:
Originally Posted by akruppa View Post
Wouldn't PCLMULQDQ be handy for Matrix-Vector products in BL/BW, at least for the dense part of the matrix?
They could drastically speed up that part, but IIRC dense multiplies only take about 10-15% of the time in a full sparse matrix multiply. That could be changed by rearranging matrix entries and packing some of the sparse part of the matrix into dense blocks too, but that's really tricky. Floating point codes already do that, to take advantage of vendor-optimized level 3 BLAS.
jasonp is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Different Speed in different OS's Dubslow Software 11 2011-08-02 00:04
TF speed Unregistered Information & Answers 10 2011-07-27 12:34
Changes to the speed of light. Flatlander Homework Help 67 2011-01-22 13:37
CPU Speed Incorrect AZMango Software 8 2010-03-20 21:55
40 Times the speed of PC's lpmurray Hardware 2 2007-02-17 19:53

All times are UTC. The time now is 22:38.


Fri Aug 6 22:38:28 UTC 2021 up 14 days, 17:07, 1 user, load averages: 4.35, 3.84, 3.53

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.