![]() |
Hey guys, just to let you all know, i might be getting an early shipment of the Cell chips when they are finalized for PC use and i will put them up for auction on Ebay. i will keep everyone posted on the situation.
PS: Its nice to know people in North America and Asia :showoff: |
[QUOTE=dsouza123]The Intel P4 can do 8 single precision ops using SSE2, (using two SSE2 registers), so the cell processor isn't ahead of it, yet.
As paulie mentioned GIMPS needs double precision so it doesn't help.[/QUOTE]8 single precision ops per 2 cycles or 4 SP ops per cycle. Like with SSE2, the P4 FPU only handles one 64bit half per unit (FP Add/Mul) at once per cycle (but in parallel across different pipeline stages). [B]@all: [/B] Let's discuss, if the buyers of PS3s (not those, who'd buy them intentionally for Prime95) would see some use in letting the machine do some calculations while they aren't using it. |
about this single precision stuff; couldn't we just increase the FFT length to improve accuracy?
Even if we wanted to do this, we have to approach sony for the DRM keys to unlock the PS3 hardware. I don't think sony will do this, because they make all their money on the software. Sony would want over 10USD for each copy of P95 for PS3. |
[QUOTE=Dresdenboy]Let's discuss, if the buyers of PS3s (not those, who'd buy them intentionally for Prime95) would see some use in letting the machine do some calculations while they aren't using it.[/QUOTE]
People don't normally leave their consoles running unless they are playing, so that means P95 wouldn't get much effective running time. Taking into account the difficulties involved, it is probably not worth investing the time to port P95 to these machines. That is, unless for some reason they start substituting Intel/AMD processors in general purpose PCs, which doesn't look very likely. Anyway, just my 2 cents... |
not really in the early era of ps2s there was linux distros for the ps2 there still avaible but only jap versions of it.. wiat here
[url]http://www.us.playstation.com/peripherals.aspx?id=SCPH-97047[/url] [url]http://blackrhino.xrhino.com/main.php?page=home[/url] oo there is a free distro i think. intresting net bootng [url]http://playstation2-linux.com/projects/diskless[/url] |
[QUOTE=E_tron]about this single precision stuff; couldn't we just increase the FFT length to improve accuracy?
Even if we wanted to do this, we have to approach sony for the DRM keys to unlock the PS3 hardware. I don't think sony will do this, because they make all their money on the software. Sony would want over 10USD for each copy of P95 for PS3.[/QUOTE] Someone posted about this once and using single-precision would bloat the FFT to totally unreasonable sizes. The FFTs can't fit in the Cell's memory anyways. People don't realize they don't have MMUs and uses DMA to perform memory transfer. Imagine running Prime95 with 256kb of RAM and paging everything to and from the hard drive. |
[QUOTE=ColdFury]Someone posted about this once and using single-precision would bloat the FFT to totally unreasonable sizes.
The FFTs can't fit in the Cell's memory anyways. People don't realize they don't have MMUs and uses DMA to perform memory transfer. Imagine running Prime95 with 256kb of RAM and paging everything to and from the hard drive.[/QUOTE]A Cell SPE (as the PPE) can do double precision math. But at a slower rate than SP (something like factor 10 IIRC). It is much better for the FFT to work with such a slow double precision but having enough mantissa bits for calculation than to do a huge FFT, which can get use only a few bits per SP number. The next thing is: Why would the FFT have to fit into Cell's memory? It usually doesn't fit into the caches of a K7, K8, P4 or Pentium-M CPU. Instead Cell has a dual channel XDR memory controller, delivering 25GB/s. That's a hell more than what we get with the MCT of a K8 (although it already is at ~98% of the max bandwidth of 6103 MiB/s for 2xDDR400 RAM) or with DDR2 on a newer P4 board. FFTs can be calculated in parallel very well if the interconnection bandwidth is high enough. And the algorithms are very straightforward. While executing the first instruction you could actually say, what'd happen 1000 instructions later. A FFT algorithm for a certain size has a pattern how it is being executed and when and where it reads and stores data. The perfect job for a SPE on Cell. Even the fact, that the local memory is not a cache is not as bad as it may seem, since it has low latency (6 cycles, because it's SRAM like in a cache) and it's behaviour is predictable (not like a cache) since it does nothing on its own. It's like a cache without logic. And because of the mentioned access patterns you can easily load the data 6 cycles before it will be used. And even the times, where the memory's data has to be exchanged, will be small thanks to the EIB. The SPEs can also access the L2 and external XDR memory. The 256kB local SRAM should be good for possibly up to 14 levels of the Prime95 FFT (it also needs space for code and some tables). Some links (although already mentioned in some threads): [URL=http://anandtech.com/cpuchipsets/showdoc.aspx?i=2379&p=1]Understanding the Cell Microprocessor[/URL] [URL=http://www.realworldtech.com/page.cfm?ArticleID=RWT021005084318]ISSCC 2005: The Cell Microprocessor[/URL] [URL=http://arstechnica.com/articles/paedia/cpu/cell-1.ars]Introducing the IBM/Sony/Toshiba Cell Processor — Part I[/URL] [URL=http://arstechnica.com/articles/paedia/cpu/cell-2.ars]Introducing the IBM/Sony/Toshiba Cell Processor — Part II[/URL] |
In addition to Dresdenboys comments i'd like to point you to the excellent anandtech article [URL=http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2379]Understanding the Cell Microprocessor[/URL].
The article covers the implications of the cell cacheless In Order architecture. Tau |
An addition regarding DP capabilities:
David Wang wrote (2nd link in my earlier posting): "Given this estimate, the peak DP FP throughput of an 8 SPE CELL processor is approximately 25~30 GFlops when the DP FP capability of the PPE is also taken into consideration." Lets look at a Netburst CPU at 3.4 GHz as an example: 6.8 GFlops. What is left to say, is: Cell (and similar MPUs) should currently give the best bang for the buck regarding LLR testing or even TF. No FPGA, GPU or general purpose CPU could currently deliver more, because of high price, missing universality or FP throughput. |
i wounder what this will give...
Viral processor that builds it self. 50nm :geek: [url]http://www.spectrum.ieee.org/WEBONLY/publicfeature/nov03/1103bio.html[/url] |
There is a [URL=http://research.scea.com/research/html/CellGDC05/index.html]Cell Presentation from GDC 2005[/URL] online, which sheds further light on the capabilities of this class of MPUs.
IMO the [URL=http://research.scea.com/research/html/CellGDC05/16.html]Cell's SPE FP and other capabilities[/URL] look even more useful for algorithms like FFTs than before. |
| All times are UTC. The time now is 05:30. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.