![]() |
|
|
#12 |
|
Oct 2004
232 Posts |
As well known on the forum, LL testing is a sequential algorithm.
The FFT used in each iteration is, however parallelisable. Leaving aside the single/big precision reasons, the main PowerPC processor in Cell chip could pass the FFT across the 8 subsidiary cells to give a fast answer to the FFT part of the math. The cells are designed to talk to each other and cooperate. Main cpu has 32K L1, 512K L2. Each of 8 cells around has 256K cache. There is onboard memory/io controller, namely Rambus XDR@3.2Ghz and FlexIO@6.4Ghz. Anyway, even if it's no good for LL testing, maybe it could be used as a way to do trial factoring quickly? Or do the same limitations apply? |
|
|
|
|
|
#13 |
|
Aug 2002
26×5 Posts |
A bigger problem is that each "cell" only has 256KB of local memory, and they do not share memory addresses.
Information transfer between units is accomplished using DMA, and would be a major bottleneck. The cells are designed to work on tasks that don't require any interprocessor communication. |
|
|
|
|
|
#14 | |
|
6809 > 6502
"""""""""""""""""""
Aug 2003
101×103 Posts
2·4,909 Posts |
Quote:
|
|
|
|
|
|
|
#15 | |
|
Aug 2002
26·5 Posts |
Quote:
|
|
|
|
|
|
|
#16 | |
|
Jun 2004
UK
2138 Posts |
Fwiw, another article from today talks about single/double precision abilities (http://www.realworldtech.com/page.cf...WT021005084318).
What sounds relevant to this discussion is on page 4 (http://www.realworldtech.com/page.cf...1005084318&p=4). Quote:
In the last week PrimeNet did an average of 1483 P90 years per day. In order to equal PrimeNets output it'd only take 595 CELLs. It's highly likely I made some sort of error in that calculation but if these processors are going to be so widespread as to be in our Playstations doesn't it seem possible that we might get a few in PrimeNet? And if so they could make quite a contribution. |
|
|
|
|
|
|
#17 |
|
Sep 2002
89 Posts |
It looks like within 2005, PC's with the cell processor could be completing 10 million digit numbers in 1-2 days instead of a month. If someone wants to make some real money they need to put these things complete with memory on a pci card so you can pop 5 or 6 in a PC.
Imagine being able to complete 25-120, 10 million digit numbers per PC per month...........SIGN ME UP
|
|
|
|
|
|
#18 | |
|
Aug 2002
2×33 Posts |
Quote:
|
|
|
|
|
|
|
#19 |
|
"Mark"
Apr 2003
Between here and the
2·32·353 Posts |
Not to rain on your parade, but the CELL is a PowerPC CPU, not an x86 CPU. In other words, it will not run Prime95. It should run GLucas or MLucas. Also according to the article 'Moreover, these SP operations are not fully IEEE754 compliant in terms of rounding modes' and 'the SPE’s double precision unit is fully IEEE854 compliant'. Since IEEE854 is a generalization of IEEE754, DP FP might be IEEE754 compliant, but I don't know. I'm not an expert on FFT's, but I have to assume that the current versions of GLucas and MLucas assume IEEE754 compliance.
The current 2.5 GHz PowerpC 970 (aka G5) is around 19 GFLOPS for a single CPU, whereas the CELL (with 8 SPE) is around 25-30 GFLOPS. That might sound inpressive, but even on G5, GLucas/MLucas run at about half the speed of Prime95 on a similarly clocked P4. There are a number of reasons for this. One is that GLucas and MLucas are not coded in assembler, they have some assembler macros, but not much. Prime95 can take advantage of SSE and SSE2 on x86, but AltiVec on PPC is useless since it only supports single precision. |
|
|
|
|
|
#20 |
|
Dec 2002
5·163 Posts |
If the Cell processor significantly benefits from a hand optimized FFT routine than all the better. Such a routine would have great benefits, not just for Mersenne, but all math programs that make use of it. As quite some TOP 1000 number crunchers are used to run FFT dependant algorithms such an optimized routine could win some fame.
|
|
|
|
|
|
#21 |
|
Aug 2002
5008 Posts |
The entire FFT algorithm's dataset would need to fit in the cell's 256K memory. These are very simple devices. There is no memory virtualization, and the cells do not share a common memory space, like normal co-processors. This means no swapping or any other tricks.
|
|
|
|
|
|
#22 |
|
Feb 2004
France
22·229 Posts |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Some transition probabilities | fivemack | Aliquot Sequences | 9 | 2012-03-16 08:49 |
| Caught in transition? | cheesehead | Forum Feedback | 1 | 2011-12-11 05:14 |
| Major overhaul of the DB | 10metreh | Aliquot Sequences | 5 | 2010-08-29 01:10 |
| How will the transition to v5 work? | ixfd64 | PrimeNet | 3 | 2008-10-01 01:42 |
| server transition news | ltd | Prime Sierpinski Project | 4 | 2006-04-19 20:25 |