Merrimac & Stanford Streaming Supercomputer Project

Looking at the recent stir concerning programming GPUs especially concerning the Brook for GPUs language, I was reminded of another project that is going on at Stanford and is somewhat connected as far as streaming programming is concerned.

This is the article I first saw, and I feel it captures best what is going on with Merrimac as it relates to other supercomputer projects:
EE TIMES--Novel processor stirs petascale controversy -- http://www.eetimes.com/semi/news/OEG20031124S0033

I feel this project, regardless of success, is a valuable part of a paradigm. Basically, they are making a strong push to innovate the core silicon of supercomputers rather than build from clusters of comercial off the shelf processors. They identify that memory bandwith and exposing parallelism are the most important places to focus attention for supercomputer performance for scientific applications.

Quote:
 Merrimac uses stream architecture and advanced interconnection networks to give an order of magnitude more performance per unit cost than cluster-based scientific computers built from the same technology. Organizing the computation into streams and exploiting the resulting locality using a register hierarchy enables a stream architecture to reduce the memory bandwidth required by representative applications by an order of magnitude or more. Hence a processing node with a fixed bandwidth (expensive) can support an order of magnitude more arithmetic units (inexpensive). This in turn allows a given level of performance to be achieved with fewer nodes (a 1-PFLOPS machine, for example, with just 8,192 nodes) resulting in greater reliability, and simpler system management. Merrimac is designed to be a streaming scientific computer that can be scaled from a $20K 2 TFLOPS workstation to a$20M 2 PFLOPS supercomputer.
A bunch of documents are found here: http://merrimac.stanford.edu/resources.html

