mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2010-07-20, 23:04   #1
jasong
 
jasong's Avatar
 
"Jason Goatcher"
Mar 2005

DB316 Posts
Default New SSE shtuff by Intel(anybody know anything?)

I don't know a lot about cpu instructions, but I know that a lot of SSE stuff is really helpful with prime-finding and a lot of DCing projects in general.

Anybody have any opinions about the new SSE code Intel will be releasing soon on their cpus?

Edit: OMG, you can't edit the title of a thread even if you click edit 30 seconds after posting. I want smart comments.

NOOOOOOOOOOOO!!!

Last fiddled with by jasong on 2010-07-20 at 23:07 Reason: I'm an idiot, but only for part of the time each day
jasong is offline   Reply With Quote
Old 2010-08-05, 03:56   #2
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

1100010112 Posts
Default

There haven't been any really major improvements for sieving since SSE2. I haven't seen anything that's come out since then that's worth trying to support the few processors that might have it. (Of course, I'm not doing FP math in the SSE registers; I need the full 80 bits of precision, and I suspect most FFTs do as well.)

The next interesting change will be AVX in 2011: double-sized registers (for doing twice as much at once), and instructions that put their output in a third register.
Ken_g6 is offline   Reply With Quote
Old 2010-08-05, 06:48   #3
cheesehead
 
cheesehead's Avatar
 
"Richard B. Woods"
Aug 2002
Wisconsin USA

11110000011002 Posts
Default

Quote:
Originally Posted by jasong View Post
Edit: OMG, you can't edit the title of a thread even if you click edit 30 seconds after posting. I want smart comments.
Would you settle for a friendly moderator? Xyzzy changes titles all the time just for fun.
cheesehead is offline   Reply With Quote
Old 2010-08-05, 14:28   #4
Primeinator
 
Primeinator's Avatar
 
"Kyle"
Feb 2005
Somewhere near M52..

3·5·61 Posts
Default

Quote:
Originally Posted by Ken_g6 View Post

The next interesting change will be AVX in 2011: double-sized registers (for doing twice as much at once), and instructions that put their output in a third register.
Would this lead to any significant improvements in the rate at which LL tests are run? Pardon my ignorance, but I'm not sure as to where the current bottleneck is in speed (other than a thermal barrier in throttle speed). What other components of the processor would speed up these tests if modified?

Last fiddled with by Primeinator on 2010-08-05 at 14:28
Primeinator is offline   Reply With Quote
Old 2010-08-05, 16:09   #5
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

16F816 Posts
Default

Quote:
Originally Posted by Primeinator View Post
Would this lead to any significant improvements in the rate at which LL tests are run? Pardon my ignorance, but I'm not sure as to where the current bottleneck is in speed (other than a thermal barrier in throttle speed). What other components of the processor would speed up these tests if modified?
Considering that this is an extension to SSE2 which seriously speeds up LL tests then I would expect this will do even more. Especially if they increase from 256 bits at somepoint which has been commented on several times so might happen.

@Prime95 how fast can we expect these instructions to be used by Prime95? Is development before release out of the question?
henryzz is offline   Reply With Quote
Old 2010-08-05, 23:57   #6
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101011001102 Posts
Default

Quote:
Originally Posted by henryzz View Post
Considering that this is an extension to SSE2 which seriously speeds up LL tests ...

@Prime95 how fast can we expect these instructions to be used by Prime95? Is development before release out of the question?
AVX with 256-bit registers should double the FPU throughput (although Intel could implement it in a way that there is no increase). The 3-register instruction format reduces register pressure - another decent-sized win (especially on Intel chips which have half the load-to-FPU-register capability of AMD chips). AVX instructions are also more compact, though I doubt that will yield any performance benefit.

Finally, AVX has spec'ed a fused multiply-add instruction that will be very useful in the future. The first Intel chips will not support fused multiply-add. The AMD chips will emulate this instruction.

In short, AVX is a very well thought out extension of the x86 architecture. The instruction format is ready to support 512 and 1024 bit registers in the future.

I doubt I'll be able to work on an AVX version before Sandy Bridge comes out in the 4th quarter.
Prime95 is online now   Reply With Quote
Old 2010-08-06, 03:15   #7
Primeinator
 
Primeinator's Avatar
 
"Kyle"
Feb 2005
Somewhere near M52..

3×5×61 Posts
Default

Quote:
Originally Posted by Prime95 View Post
AVX with 256-bit registers should double the FPU throughput (although Intel could implement it in a way that there is no increase).
How much of an increase in LL speed would be achieved by doubling the floating point throughput? Surely not double it?
Primeinator is offline   Reply With Quote
Old 2010-08-06, 06:30   #8
__HRB__
 
__HRB__'s Avatar
 
Dec 2008
Boycotting the Soapbox

24·32·5 Posts
Default

At most AVX will speed up the search by 100%, but since it will take a while until a significant portion of users owns a processor that supports these new extensions...

Considering that a $200 entry-level ATI 5830 delivers around 450-900 giga-flops in double precision arithmetic (which would be roughly equivalent to a system with 16 AVX-capable cores clocked at 4Ghz) I seriously doubt that George will want to spend more than an afternoon on contemplating the specifics.
__HRB__ is offline   Reply With Quote
Old 2010-08-06, 07:25   #9
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

2×52×11 Posts
Default

Quote:
Originally Posted by __HRB__ View Post
Considering that a $200 entry-level ATI 5830 delivers around 450-900 giga-flops in double precision arithmetic
Huh? ATI themselves are quoting 358 DP GFLOP/sec. ref

Quote:
(which would be roughly equivalent to a system with 16 AVX-capable cores clocked at 4Ghz)
Using the 358 number above, that gives ~11 cores. And I bet it's easier to get closer to peak performance on a CPU than it is on a GPU.
ldesnogu is offline   Reply With Quote
Old 2010-08-06, 21:20   #10
Primeinator
 
Primeinator's Avatar
 
"Kyle"
Feb 2005
Somewhere near M52..

3×5×61 Posts
Default

Quote:
Originally Posted by __HRB__ View Post
At most AVX will speed up the search by 100%,
Again, 100% increase in.... factoring? P-1? LL? All? My knowledge of how CPUs actually work is limited, please accept my ignorance and my apologies.
Primeinator is offline   Reply With Quote
Old 2010-08-06, 22:41   #11
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

101101011101112 Posts
Default

Quote:
Originally Posted by __HRB__ View Post
At most AVX will speed up the search by 100%, but since it will take a while until a significant portion of users owns a processor that supports these new extensions...
Could quite possibly be > 100% ... doing 4-way double FPU instructions per cycle will double the throughput vs SSE2. But as George notes, the RISC-style 3-operand instructions will reduce register pressure, meaning fewer cycles needed for spill-and-fill and more available for computation. Not a massive speedup, but another 10-20% seems doable.

Quote:
Considering that a $200 entry-level ATI 5830 delivers around 450-900 giga-flops in double precision arithmetic (which would be roughly equivalent to a system with 16 AVX-capable cores clocked at 4Ghz) I seriously doubt that George will want to spend more than an afternoon on contemplating the specifics.
Depends on the relative numbers of folks who will have AVX-supporting CPUs versus ones with fast-double-supporting GFX cards.
ewmayer is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
AMD vs Intel dtripp Software 3 2013-02-19 20:20
Intel NUC nucleon Hardware 2 2012-05-10 23:53
Intel RNG API? R.D. Silverman Programming 19 2011-09-17 01:43
AMD or Intel mack Information & Answers 7 2009-09-13 01:48
Intel Mac? penguain NFSNET Discussion 0 2006-06-12 01:31

All times are UTC. The time now is 23:23.


Fri Jul 16 23:23:47 UTC 2021 up 49 days, 21:11, 1 user, load averages: 1.84, 1.70, 1.68

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.