mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2005-02-09, 06:01   #12
Peter Nelson
 
Peter Nelson's Avatar
 
Oct 2004

232 Posts
Default Cell

As well known on the forum, LL testing is a sequential algorithm.
The FFT used in each iteration is, however parallelisable.

Leaving aside the single/big precision reasons, the main PowerPC processor in Cell chip could pass the FFT across the 8 subsidiary cells to give a fast answer to the FFT part of the math.

The cells are designed to talk to each other and cooperate.
Main cpu has 32K L1, 512K L2. Each of 8 cells around has 256K cache.

There is onboard memory/io controller, namely Rambus XDR@3.2Ghz and FlexIO@6.4Ghz.

Anyway, even if it's no good for LL testing, maybe it could be used as a way to do trial factoring quickly? Or do the same limitations apply?
Peter Nelson is offline   Reply With Quote
Old 2005-02-11, 03:06   #13
ColdFury
 
ColdFury's Avatar
 
Aug 2002

26×5 Posts
Default

A bigger problem is that each "cell" only has 256KB of local memory, and they do not share memory addresses.

Information transfer between units is accomplished using DMA, and would be a major bottleneck.

The cells are designed to work on tasks that don't require any interprocessor communication.
ColdFury is offline   Reply With Quote
Old 2005-02-11, 03:11   #14
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

2·4,909 Posts
Default

Quote:
Originally Posted by ColdFury
A bigger problem is that each "cell" only has 256KB of local memory
Would they be able to factor well?
Uncwilly is offline   Reply With Quote
Old 2005-02-11, 03:34   #15
ColdFury
 
ColdFury's Avatar
 
Aug 2002

26·5 Posts
Default

Quote:
Originally Posted by Uncwilly
Would they be able to factor well?
Depends on how ameniable the factoring code is to vectorization.
ColdFury is offline   Reply With Quote
Old 2005-02-11, 05:11   #16
marc
 
marc's Avatar
 
Jun 2004
UK

2138 Posts
Default

Fwiw, another article from today talks about single/double precision abilities (http://www.realworldtech.com/page.cf...WT021005084318).

What sounds relevant to this discussion is on page 4 (http://www.realworldtech.com/page.cf...1005084318&p=4).

Quote:
Given this estimate, the peak DP FP throughput of an 8 SPE CELL processor is approximately 25~30 GFlops when the DP FP capability of the PPE is also taken into consideration.
According to (http://mersenneforum.org/showthread.php?t=2718) one P90 year is 1.04e15 Flops. This means one 8 SPE CELL could do one P90 year every 1.04e15 / 25e9 seconds or every 9.5 hours. This translates to 2.49 P90 years per day.

In the last week PrimeNet did an average of 1483 P90 years per day. In order to equal PrimeNets output it'd only take 595 CELLs.

It's highly likely I made some sort of error in that calculation but if these processors are going to be so widespread as to be in our Playstations doesn't it seem possible that we might get a few in PrimeNet? And if so they could make quite a contribution.
marc is offline   Reply With Quote
Old 2005-02-11, 07:53   #17
lpmurray
 
lpmurray's Avatar
 
Sep 2002

89 Posts
Default

It looks like within 2005, PC's with the cell processor could be completing 10 million digit numbers in 1-2 days instead of a month. If someone wants to make some real money they need to put these things complete with memory on a pci card so you can pop 5 or 6 in a PC.
Imagine being able to complete 25-120, 10 million digit numbers per PC per month...........SIGN ME UP
lpmurray is offline   Reply With Quote
Old 2005-02-11, 07:58   #18
Digital Concepts
 
Digital Concepts's Avatar
 
Aug 2002

2×33 Posts
Default

Quote:
Originally Posted by Paulie
Unfortunately Cell is geared to single precision SIMD. GIMPS needs double precision.
GIMPS uses double precision (floating point) but I don't think it is necessary, it could have used integer storage. One of the lucas (was that g or m?) clients uses single precision, doesn't it?
Digital Concepts is offline   Reply With Quote
Old 2005-02-11, 14:15   #19
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

2·32·353 Posts
Default

Not to rain on your parade, but the CELL is a PowerPC CPU, not an x86 CPU. In other words, it will not run Prime95. It should run GLucas or MLucas. Also according to the article 'Moreover, these SP operations are not fully IEEE754 compliant in terms of rounding modes' and 'the SPE’s double precision unit is fully IEEE854 compliant'. Since IEEE854 is a generalization of IEEE754, DP FP might be IEEE754 compliant, but I don't know. I'm not an expert on FFT's, but I have to assume that the current versions of GLucas and MLucas assume IEEE754 compliance.

The current 2.5 GHz PowerpC 970 (aka G5) is around 19 GFLOPS for a single CPU, whereas the CELL (with 8 SPE) is around 25-30 GFLOPS. That might sound inpressive, but even on G5, GLucas/MLucas run at about half the speed of Prime95 on a similarly clocked P4. There are a number of reasons for this. One is that GLucas and MLucas are not coded in assembler, they have some assembler macros, but not much. Prime95 can take advantage of SSE and SSE2 on x86, but AltiVec on PPC is useless since it only supports single precision.
rogue is offline   Reply With Quote
Old 2005-02-11, 19:17   #20
tha
 
tha's Avatar
 
Dec 2002

5·163 Posts
Default

If the Cell processor significantly benefits from a hand optimized FFT routine than all the better. Such a routine would have great benefits, not just for Mersenne, but all math programs that make use of it. As quite some TOP 1000 number crunchers are used to run FFT dependant algorithms such an optimized routine could win some fame.
tha is offline   Reply With Quote
Old 2005-02-11, 21:22   #21
ColdFury
 
ColdFury's Avatar
 
Aug 2002

5008 Posts
Default

The entire FFT algorithm's dataset would need to fit in the cell's 256K memory. These are very simple devices. There is no memory virtualization, and the cells do not share a common memory space, like normal co-processors. This means no swapping or any other tricks.
ColdFury is offline   Reply With Quote
Old 2005-02-11, 21:56   #22
T.Rex
 
T.Rex's Avatar
 
Feb 2004
France

22·229 Posts
Default Cell Architecture Explained

Also have a look at:
http://www.blachford.info/computer/Cells/Cell0.html

Tony
T.Rex is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Some transition probabilities fivemack Aliquot Sequences 9 2012-03-16 08:49
Caught in transition? cheesehead Forum Feedback 1 2011-12-11 05:14
Major overhaul of the DB 10metreh Aliquot Sequences 5 2010-08-29 01:10
How will the transition to v5 work? ixfd64 PrimeNet 3 2008-10-01 01:42
server transition news ltd Prime Sierpinski Project 4 2006-04-19 20:25

All times are UTC. The time now is 05:30.


Mon Aug 2 05:30:54 UTC 2021 up 9 days, 23:59, 0 users, load averages: 1.25, 1.25, 1.36

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.