![]() |
|
|
#89 | |
|
Nov 2003
22×5×373 Posts |
Quote:
Nice. If I provide code (source if you like) and data could you run both a single-thread and double-thread benchmark of the lattice sieve on this machine?
|
|
|
|
|
|
|
#90 | |
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
Quote:
But sadly it's not my machine :( I've found these screenshots in a german hardware forum. |
|
|
|
|
|
|
#91 |
|
Sep 2002
2×331 Posts |
There are now two companies making FPGA based Opteron coprocessors.
http://www.eetimes.com/news/semi/sho...leID=188702712 The coprocessors plug directly into an empty CPU socket and can be dynamically reconfigured, thus permitting users to change logic configurations to better match the algorithms that need acceleration. DRC Computer Corp. and XtremeData Inc., are delivering programmable solutions that can accelerate time-critical algorithms. These coprocessors leverage the flexibility of Xilinx and Altera FPGAs, respectively, so that they can be configured to accelerate graphics, XML, floating point, video transcoding and other applications. Both the DRC and XtremeData solutions are modules that combine an FPGA with static RAM, flash memory (XtremeData only), and interface logic to support 8- or 16-bit HyperTransport interfaces. DRC offers three versions of its module: the DRC100-L60ES and L60, which are based on the 60k logic cell LX60 Virtex 4 FPGA, and the DRC110-L160, which is based on the 152k logic cell LX160 FPGA. The XD1000 from XtremeData employs Altera's largest Stratix II FPGA, the EP2S180...the company has several enhanced versions of XD1000 planned for future release. To develop the hardware-based algorithms XtremeData leverages Altera's SOPC Builder and C2H (C-language to hardware) tools as well as Altera's soft intellectual property blocks such as the NIOS processor core. A full development system with a dual-socket motherboard and one XD1000 module sells for about $15,000 in small quantities; the XD1000 module sells for $6,500 a piece. |
|
|
|
|
|
#92 |
|
Sep 2002
2×331 Posts |
A link with more details for the XtremeData coprocessor using Altera's FPGA.
http://www.altera.com/corporate/news...tml?f=hp&k=wn1 XtremeData has packaged the Stratix II EP2S180 device, the industry’s highest-density, highest-performance FPGA in production, onto a credit card-sized board that fits into the secondary CPU sockets of any 2P or 4P AMD Opteron processor-based motherboard. The XD1000 supports tight board-height form factors, including 1U servers, server blades and Advanced Telecom Computing Architecture (ATCA) platforms. The XD1000 includes multiple HyperTransport interfaces that are 16 bits wide running at 3.2 Gbps. It also features a 128-bit-wide DDR333 memory interface, up to 8 Mbytes of high-speed SRAM and 32 Mbytes of flash memory. Additionally, XtremeData has several next-generation variants of XD1000 planned for future release. XtremeData used Altera’s SOPC Builder system integration tool and the Nios® II soft-core CPU to develop the XD1000. The XD1000 uses a HyperTransport bus to achieve low-latency communication with the host AMD Opteron processor. This means that the traditional latency chain of CPU-to-north bridge-to-south bridge (via PCI interface)-to-FPGA has been reduced to a point-to-point CPU-to-FPGA link. Compared to competing I/O board systems, the XD1000 offers a more scalable solution. It gives access to more memory (via DIMM modules) and provides higher bandwidth and lower latency interconnects than north bridge solutions, at a much lower total cost of ownership. For example, FPGA-based hardware acceleration used in medical CT imaging runs the overall application 10 times faster when each 3-GHz AMD Opteron processor is coupled with an FPGA, resulting in significant system-level savings for power, space and cost. The XtremeData coprocessing development system is a complete design environment. It includes a 2P AMD Opteron processor-based PC with an XD1000 coprocessor module, a reference design containing HyperTransport and DDR interfaces and a JTAG download cable for configuring the FPGA and probing internal FPGA signals using Altera’s SignalTap® II embedded logic analyzer. Altera and XtremeData are committed to jointly developing libraries and tools that can be easily used by application developers. The two companies are also working with several leading universities to make the XD1000 available as a research platform to enable additional innovations. Last fiddled with by dsouza123 on 2006-06-08 at 18:17 |
|
|
|
|
|
#93 |
|
Apr 2003
Berlin, Germany
5518 Posts |
AMD presented more details on MPF:
http://www.thechannelinsider.com/pri...ls/191008.aspx A photo of the beast: http://news.com.com/2300-1006_3-6124...4500&subj=news Most interesting for Prime95 should be these features (many are already known, but several details were not):
|
|
|
|
|
|
#94 |
|
Sep 2002
2×331 Posts |
Other AMD features (reductions):
The L1 cache drops from 128KB (64KB data and 64KB code) to 64KB (32 and 32), and the L2 drops from 1024KB to 512KB. The 64KB L1 is a supprising change, the Athlon/Opteron chips have had 128KB since the beginning, the 512KB is within the range of previous L2 amounts from 256KB to 512KB to more recently 1024KB. |
|
|
|
|
|
#95 | |
|
Apr 2003
Berlin, Germany
16916 Posts |
Quote:
The confusion might be caused by an AMD slide showing the cache infrastructure, where only 64 kB L1 per core are shown. But this is actually the infrastructure for data cache. See here: http://epscontest2.home.comcast.net/...ad/Slide51.JPG About these 64kB they say: "keeps most critical data", "2 128 bit data paths" (L1D+L1I will have four 128 bit data paths in Barcelona), "2 loads per cycle" (same as for K8 L1D). |
|
|
|
|
|
|
#96 | |
|
Apr 2003
Berlin, Germany
16916 Posts |
Confirmation for 128 kB L1 from Johan (from Anandtech):
Quote:
|
|
|
|
|
|
|
#97 |
|
Apr 2003
Berlin, Germany
192 Posts |
The "Software Optimization Guide for AMD Family 10h Processors" is available now:
http://www.amd.com/us-en/assets/cont...docs/40546.pdf Besides all the stuff already known, there are some informations even new to me, like that the L3 cache is bandwidth adaptive, which means, that goes to lower latency and bandwidth, if there is less traffic and increases bandwidth (while also increasing latency) in the case of cache traffic reaching some treshold. Most SSEn instructions are now decoded more efficiently, allowing more of them to reside in the scheduler, so that it can exploit ILP better. I've got an idea how to find out, how Prime95 might run on K10 compared to K8. The availability of this manual allows to run some simulations, which should come closer to reality in the labs than any SWAG. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| AMD's 8- and 12-core CPU monsters | joblack | Hardware | 4 | 2010-04-02 14:23 |
| Upcoming features | Xyzzy | Forum Feedback | 1 | 2007-11-26 18:57 |
| Prime95 and Dual Processors | AntonVrba | Hardware | 6 | 2006-06-14 19:49 |
| Prime95, hyperthreading, multiple processors, Win2003, etc... | pcr | Software | 8 | 2005-12-22 14:43 |
| Monsters and Monster farms | Unregistered | Data | 6 | 2004-08-12 00:28 |