![]() |
New "Cell" Chips coming out?
ok guys check this article out......
[url]http://www.geek.com/news/geeknews/2004Nov/bch20041129028023.htm[/url] if i'm not mistaken, there are goin to be chips coming out in a few years clocking at 6.4Ghz :banana: let me know what u think. |
I'm not excited about it, because i won't be allowed to program for it. For all i care, it doesn't exist.
Same thing with the Emotion Engine. It's a great architecture, but i'm not allowed to program for it. |
Saw this interesting tidbit about the Cell in a recent NY Times article about Apple's decision to ditch the Power PC processor (emphasis mine):
[quote]As it happens, Intel's was not the only alternative chip design that Apple had explored for the Mac. An executive close to Sony said that last year Mr. Jobs met in California with both Nobuyuki Idei, then the chairman and chief executive of the Japanese consumer electronics firm, and with Kenichi Kutaragi, the creator of the Sony PlayStation. Mr. Kutaragi tried to interest Mr. Jobs in adopting the Cell chip, which is being developed by I.B.M. for use in the coming PlayStation 3, in exchange for access to certain Sony technologies. [b]Mr. Jobs rejected the idea, telling Mr. Kutaragi that he was disappointed with the Cell design, which he believes will be even less effective than the PowerPC.[/b][/quote] One is left to speculate what "less effective" means. Apparently the major reason Apple is saying bye-bye to the PPC is that it's been trending in the wrong direction in terms of performance-per-watt, which is the reason you won't see a G5 in a laptop. If that is Jobs' main metric for Apple's CPU roadmap, it would imply that the Cell has a relatively poor performance for its power-consumption. |
Cell Broadband Engine documentation
IBM has just pre-published several documents (~ 520 pages) presenting the architecture of the Cell: [URL=http://www-128.ibm.com/developerworks/power/cell/]Cell Broadband Engine documentation[/URL] .
Tony |
Cell Double Floating Instructions
There is a mistake in their page. The total is rather more that 750 pages.
There are 7 double floating instructions: add multiply substract multiply and add multiply and substract negative multiply and add negative multiply and substract and 17 single floating instructions. Tony |
Apparently the Cells do support IEEE double-precision for the most part, however not all operations obey the standard.
The small local memory is still a problem when it comes to GIMPS. I suppose one could stream parts of the FFT in and out of the units using DMA, but you'd have to find enough operations to cover the time the DMA transfers take. |
[QUOTE=ColdFury]Apparently the Cells do support IEEE double-precision for the most part, however not all operations obey the standard.[/quote]
The non-fully-IEEE-compliant part shouldn't be an issue for decently-written FFT code. So what if underflows flush to zero? That sort of thing is more important in e.g. linear algebra, especially with ill-conditioned and near-singular matrices. The data that occur in an FFT-based big-integer MUL tend to be extremely well-conditioned, especially when balanced-digit representation is used for the whole-number input digits. I was mainly concerned about rounding mode, but in this respect the Cell SPU is actually better w.r.to double than single precision - double-floats round to nearest by default (though chopping can also be invoked), whereas for single-floats only chopping is available. Similarly, the lack of sNAN and Infinity isn't a problem for carefully written big-FFT code. [quote]The small local memory is still a problem when it comes to GIMPS. I suppose one could stream parts of the FFT in and out of the units using DMA, but you'd have to find enough operations to cover the time the DMA transfers take.[/QUOTE] Also shouldn't be a problem in principle, though one will likely need to restructure one's FFT slightly to make it more small-local-data-chunk friendly. The fact that DMA transfers between the local stores of different SPUs are fast is also helpful - basically one will need similar tricks for mitigating main-memory-access latencies that one uses on all the other currently-deployed cache-based microprocessors, just with a view to dealing with multiple processors, each with a small local cache and fast communication with its neighbors, but very slow communication with main memory. Decent profiling tools (in order to track down bottlenecks, rather than just poking around half-blind and having to guess what might be happening to slow down one's code, as one winds up doing with 90% of compiler/CPU combos, especially with freeware compilers and systems one only has remote access to) will be crucial for code development on the Cell. IBM's website has some links to soon-to-be-available [url=http://www.alphaworks.ibm.com/tech/systemsim970]Cell simulators[/url], but I'm always leery of simulators in terms of gauging how well code will run on the real hardware. The weird thing to me is, it seems that IBM developers are already building, profiling and running code on real Cell processors (e.g. this [url=http://www.power.org/news/events/barcelona/11_chow.pdf]Big-FFT paper[/url] linked by Matthias (a.k.a. Dresdenboy) in the [url=http://www.mersenneforum.org/showthread.php?t=3686]"Gimps Awaiting a Major Transition?[/url] thread certainly gives one that impression,) so why not make actual Cell-based systems available for codedev? |
mprime and future multi-core machines
Hi,
It seems that the future of PCs are multi-core machines. With 2-cores-only machines, mprime could run with a core while the applications used by the end-user run mainly on the other core. With 4-cores or more, it could be interesting for mprime to use 2 cores or more. So, my question is: Is it possible/interesting to modify the architecture of mprime so that it can run 2 or more threads, like GLucas does ? Tony |
That paper claims a 100x speed-up on a 16 MB FFT, most impressive. I wonder if such a speed-up is achievable in practice.
|
[QUOTE=ColdFury]That paper claims a 100x speed-up on a 16 MB FFT, most impressive. I wonder if such a speed-up is achievable in practice.[/QUOTE]
The IBM engineers actually *did* achieve this speedup, i.e. it was in practice. Note however that this was for a single-precision FFT (Apple used to love these for showing off its AltiVec SIMD unit, as well, since that plays to the SIMD hardware's strengths), which isn't useful for large-integer arithmetic. For double-precision FFTs (and other kinds of DP computations as well) the potential speedup is more modest, but still potentially in the 10x realm. |
[QUOTE=T.Rex]Is it possible/interesting to modify the architecture of mprime so that it can run 2 or more threads, like GLucas does ?[/QUOTE]
It is possible. I might try coding up such an FFT if I buy a Pentium D machine. However, it is probably the case that testing two different exponents will have more throughput. |
| All times are UTC. The time now is 10:07. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.