![]() |
|
|
#1 |
|
Sep 2004
21516 Posts |
Intel just announced a new carryless multiply and AES enhancements. It sounds like these are for multiplying large numbers. It seems that this has a potential for great improvements to GIMPS and mathematics projects in general. See http://anandtech.com/cpuchipsets/int...spx?i=3513&p=7
|
|
|
|
|
|
#2 |
|
Tribal Bullet
Oct 2004
3·1,181 Posts |
In this context, "carryless multiply" means integer multiplication in the finite field with two elements, so basically an integer multiply where partial products are XOR'ed together rather than added. Elliptic curve crypto becomes enormously faster with this operation, but large-number arithmetic doesn't benefit at all.
|
|
|
|
|
|
#3 |
|
∂2ω=0
Sep 2002
República de California
19·613 Posts |
The one SSE4.2 op which might be useful to big-int arithmetic is PCMPGTQ - funny how they decided that 64-bit-int test-equality (PCMPEQQ) was worth having in 4.1 but only added the greater-than check (which alas comes only in signed form) later.
The other ops in 4.1 that could be useful are the ROUND ops, as well as PMULLD (4-way 32x32-bit low-half-of-product mul). Too bad there is no 4-way upper-half analog. |
|
|
|
|
|
#4 |
|
Sep 2004
13×41 Posts |
What is the highest optimization in GIMPS or other mathematics software? I suppose ones that you compile yourself can use all the optimizations...
|
|
|
|
|
|
#5 |
|
∂2ω=0
Sep 2002
República de California
19·613 Posts |
|
|
|
|
|
|
#6 |
|
Jan 2008
France
2·52·11 Posts |
If you mean what instructions are used that have impact on performance, then it's independent of the compiler, since the time critical code is written in assembly language.
|
|
|
|
|
|
#7 |
|
"Nancy"
Aug 2002
Alexandria
246710 Posts |
Wouldn't PCLMULQDQ be handy for Matrix-Vector products in BL/BW, at least for the dense part of the matrix?
Alex |
|
|
|
|
|
#8 |
|
(loop (#_fork))
Feb 2006
Cambridge, England
23×11×73 Posts |
I guess Joshua's question is 'which instruction set does Prime95 use', to which I think the answer is SSE2 because nothing subsequently has helped all that much at double precision.
|
|
|
|
|
|
#9 | |
|
∂2ω=0
Sep 2002
República de California
1164710 Posts |
Quote:
The fused floating-point mul/add in AMD's SSE5 could make a big difference, but given the sorry state of AMD's business these days I'm not holding my breath. The 256-bit-wide Intel SIMD stuff that's coming in a few years ... that could be big, especially if they back it up with a quad-pumped-double-precision-capable chip. |
|
|
|
|
|
|
#10 |
|
Oct 2007
2·53 Posts |
For the record, the 256-bit wide instruction set (AVX) is the same extension that brings PCLMULQDQ and AES* instructions.
|
|
|
|
|
|
#11 |
|
Tribal Bullet
Oct 2004
3×1,181 Posts |
They could drastically speed up that part, but IIRC dense multiplies only take about 10-15% of the time in a full sparse matrix multiply. That could be changed by rearranging matrix entries and packing some of the sparse part of the matrix into dense blocks too, but that's really tricky. Floating point codes already do that, to take advantage of vendor-optimized level 3 BLAS.
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Different Speed in different OS's | Dubslow | Software | 11 | 2011-08-02 00:04 |
| TF speed | Unregistered | Information & Answers | 10 | 2011-07-27 12:34 |
| Changes to the speed of light. | Flatlander | Homework Help | 67 | 2011-01-22 13:37 |
| CPU Speed Incorrect | AZMango | Software | 8 | 2010-03-20 21:55 |
| 40 Times the speed of PC's | lpmurray | Hardware | 2 | 2007-02-17 19:53 |