mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2012-02-09, 19:51   #1
ixfd64
Bemusing Prompter
 
ixfd64's Avatar
 
"Danny"
Dec 2002
California

2×19×61 Posts
Default Intel announces multi-core enhancements for Haswell chips

I wonder if this will benefit Prime95 in any way?

http://www.extremetech.com/computing...e-enhancements

Last fiddled with by ixfd64 on 2012-02-09 at 19:52
ixfd64 is online now   Reply With Quote
Old 2012-02-09, 21:00   #2
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

1C1D16 Posts
Default

This could only ever be possibly useful for multithreaded tests (of which there are admittedly a lot, especially for hyperthreaded users). Even then, it would only be useful if George hasn't already optimized the multithreaded FFTs to the point where each thread will never have conflicts with the others. The steps Prime95 executes to test numbers are much more well defined, than, say, modifying an Excel spreadsheet based on what's happening in real life. That is to say, it would be easier with FFTs to know exactly when each thread will need to access what data and for how long, than for almost any other program out there. If George has put the time to do this, then it will be of little use I think. Even if he hasn't, TSX's use would still be fairly limited. In general though, I think this is really cool.

(Does this sound totally wrong to anybody? I wouldn't call this anything more than an educated guess.)

Last fiddled with by Dubslow on 2012-02-09 at 21:01
Dubslow is offline   Reply With Quote
Old 2012-02-09, 21:34   #3
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

13×257 Posts
Default

Quote:
Originally Posted by Dubslow View Post
(Does this sound totally wrong to anybody? I wouldn't call this anything more than an educated guess.)
I would say that the answer is complicated, and depends on how tighly coupled a parallel algorithm is (defined as how often threads need to share data) and to what degree the data sharing problem is bottlenecked by the threading overhead, and not, for example, by raw data movement.

I think the more tightly coupled a parallel algorithm is, the more it could possibly benefit from this new approach, assuming the data to be shared is readily available to the processors.

That said, I think the multitheading of FFT's is bottlenecked by raw data movement, not by threading overhead, and I don't know if any amount of "free" improvements in the granularity of data sharing will be a win. The processors will be waiting for the bus to deliver the data anyway. (disclaimer: I have not written a multi-threaded FFT, so I may have missed something.)

For something like multithreaded SIQS or NFS, where the parallelism is very loosely coupled, this new approach would likely be useless.

Maybe things get interesting for parallel block Lanczos...

In any event, the usefullness is limited by the willingness of folks who have already invested a great deal of time developing a parallel solution with available tools, to reinvest that time in this new set of tools.
bsquared is offline   Reply With Quote
Old 2012-02-10, 02:41   #4
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

24×107 Posts
Default

Uhh, I think this one is squarely aimed at database access...with lots of sparse random writes of whole records, this is an excellent tool -- TSX makes the record locks and transactions a lot simpler.

Block Lanczos, possibly not..but sieving, now, there's something useful...noticeably reduces average synchronisation overhead....just set the bits in your sieve, be prepared to lose a write here and there and re-do it.

History says that Jasonp and the like will find a way to take advantage.
Christenson is offline   Reply With Quote
Old 2012-02-10, 03:08   #5
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

64158 Posts
Default

Quote:
Originally Posted by Christenson View Post
Uhh, I think this one is squarely aimed at database access...with lots of sparse random writes of whole records, this is an excellent tool -- TSX makes the record locks and transactions a lot simpler.
Agreed, from what I've seen (which isn't much), it might help there.

Quote:
Originally Posted by Christenson View Post
Block Lanczos, possibly not..but sieving, now, there's something useful...noticeably reduces average synchronisation overhead....just set the bits in your sieve, be prepared to lose a write here and there and re-do it.

History says that Jasonp and the like will find a way to take advantage.
Using multiple threads to write simultaneously to the same sieve block would be crazy. There is no earthly reason why you would need to share data to that resolution when sieving with multiple threads. I don't care how low overhead TSX is, it can't be less overhead than 0.
bsquared is offline   Reply With Quote
Old 2012-02-10, 10:50   #6
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

23×67 Posts
Default

Quote:
Originally Posted by Christenson View Post
but sieving, now, there's something useful...noticeably reduces average synchronisation overhead....just set the bits in your sieve, be prepared to lose a write here and there and re-do it.
As bsquared already pointed that'd be a very poor use case. No matter what their technology does, you'd have to move cache lines from CPU to CPU, and you simply don't want that to happen
ldesnogu is offline   Reply With Quote
Old 2012-02-10, 19:32   #7
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

110101100002 Posts
Default

True, those cache lines have to move from time to time in a sieve...but this allows only the NECESSARY cache lines to move...no substitute for a good hand-optimisation of a mathematical operation, but definitely useful for a database taking random hits.
Christenson is offline   Reply With Quote
Old 2012-02-10, 19:42   #8
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

23·67 Posts
Default

Quote:
Originally Posted by Christenson View Post
True, those cache lines have to move from time to time in a sieve...but this allows only the NECESSARY cache lines to move...
There's no need for transactional memory for that: the MESI (or MOESI) protocol takes care of moving and sharing cache lines that are in a shared memory area depending on what each CPU does. Necessary is a bit abusive in that case: the cache lines that move have to move, or correctness is lost.

As far as sieving goes, the only way I can think of to efficiently use multi CPU would be to sieve non-overlapping segments. Probably some lack of imagination
ldesnogu is offline   Reply With Quote
Old 2012-02-10, 20:32   #9
firejuggler
 
firejuggler's Avatar
 
Apr 2010
Over the rainbow

2×5×13×19 Posts
Default

haswell to be launched march-june 2013
firejuggler is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 and future multi-core chips ixfd64 Software 4 2011-01-02 18:34
Multi-Core / Multi-CPU Assignments (missing) worknplay Software 3 2008-11-05 17:26
CGPU multi core x86, Intel Larrabee dsouza123 Hardware 2 2007-02-14 18:00
My prediction about multi-core chips jasong Hardware 6 2006-02-14 16:07
Upcoming INTEL chips????? georgekh Hardware 28 2004-11-20 03:53

All times are UTC. The time now is 05:05.

Wed Nov 25 05:05:41 UTC 2020 up 76 days, 2:16, 4 users, load averages: 1.90, 1.66, 1.50

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.