![]() |
|
|
#34 |
|
Dec 2010
Monticello
5×359 Posts |
|
|
|
|
|
|
#35 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3·29·83 Posts |
I don't know about any super computers, but 515 GFLOPS peak Tesla cards run about $1500 a pop.
http://compeve.com/video-cards/pci-e...85b-633246-001 Server card http://www.tigerdirect.com/applicati...?EdpNo=6391103 Workstation card Stats http://en.wikipedia.org/wiki/Nvidia_Tesla They both run on a PCIe x16 2.0 slot. Now, $1500 for a 1% performance increase isn't bad (assuming GIMPS throughput is 50 TFLOPS, which admittedly is a bit low) |
|
|
|
|
|
#36 |
|
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
270268 Posts |
I have recently been looking at low-power computation and have purchased an mbed system to play with. It is a 96MHz ARM processor, memory, a micro-USB socket, some LEDs and a reset button, all built into a standard 40-pin DIP package. The whole thing takes 100mW. It's also easy to program in C and/or C++. Roughly speaking, the compute performance is comparable with a good PC of around 1995 vintage.
I then started seeing what else is available in the ARM range and came across these beasties: TI am3703. If I read it correctly, you get a 1GHz 32-bit cpu for around USD15 and it draws about 1W. They come in a 15mm square package. It seems to me that a PCI board could easily hold 16 of them, a significant amount of memory and any necessary glue, perhaps including ethernet and/or USB and/or another ARM for system control. What you would then have is a snazzy little system for learning real parallel computing because the interconnect and topology could be entirely under software control. Computational power should be useful but not astounding --- comparable with a 4GHz 4-core x86 processor perhaps. What would be much more interesting would to be build boards with 100 or 128 of them on each side ... Comments? Paul |
|
|
|
|
|
#37 |
|
Dec 2010
Monticello
5×359 Posts |
Le't see..you are saying 16W to match a 4 core x86 CPU that probably pulls 100W. At least approximately...very interesting.....I like the power efficiency. PC's aren't terribly energy efficient in lots of ways. All of that branch prediction circuitry and speculative execution must use some power!
Have to say I think the on-board memory and interconnect is going to be the biggest issue. You'll need memory to run FFTs for LL tests, but quite a bit of interconnect to run factoring algorithms. How do you think your card will do compared to a processor on blue gene, an NVIDIA TI-560, or your other favorite supercomputer, in terms of J/GFlop or J/GHz-day? |
|
|
|
|
|
#38 | |
|
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
101110000101102 Posts |
Quote:
Interconnect shouldn't be too hard --- the ARM chips have any number of I/O pins under software control. My guess is that each ARM will be comparable to each processor in a GPU in compute performance. A GPU has hundreds of them for a power budget of 0.3W each, say, so will outperform one of these cards many times over. OTOH, the programming model of a GPU is heavily constrained and it's very hard to get sustained compute performance. The real attraction of the idea, from my point of view is that it might be an ideal educational tool for developing parallel algorithms and designing parallel computers. Paul |
|
|
|
|
|
|
#39 |
|
Tribal Bullet
Oct 2004
DED16 Posts |
Years ago some researchers put 8x100MHz StrongARM processors on a 33MHz PCI board, ostensibly for neural net related computations. I think their board got to the prototype stage with a few copies made, and the total power draw was well short of the PCI limit (25W).
Nowadays high-performance integer-only embedded processors run at 1+GHz with very low power, and come with lots of onboard cache and interfaces to high-performance DRAM (check out the Cortex family in this list). Most of them use the ARM architecture, although there are also high-performance MIPS models in the network processor space. It's possible the highest-performance MIPS chips are in the Loongson line. |
|
|
|
|
|
#40 | |
|
Jan 2008
France
10010101012 Posts |
Quote:
First, Cortex-A8 FPU is non-pipelined. Second, it's only dual issue without out of order execution. Third, it's not 64-bit. Fourth, memory bandwidth is typically rather low because the target market doesn't require high bandwidth. So as a compute engine probably not a good thing (even from a perf/W point of view), but as an educational tool might be fun for sure
|
|
|
|
|
|
|
#41 | |
|
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
2×17×347 Posts |
Quote:
Second: Also correct. Third: I confess I was benchmarking my mbed on problems which don't need 64-bit arithmetic. Fourth: Correct per cpu. Give each cpu its own memory and the effective bandwidth is raised 16-fold. To a first approximation, anyway. Still make a nice educational tool IMO, so we're in agreement there. Paul |
|
|
|
|
|
|
#42 | ||
|
Jan 2008
France
3×199 Posts |
Quote:
![]() Quote:
|
||
|
|
|
|
|
#43 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
722110 Posts |
My core i7-2600k is running about 75W at 3.5 GHz over 4 cores + hyperthreads.
|
|
|
|
|
|
#44 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
1C3516 Posts |
http://en.wikipedia.org/wiki/FLOPS#Cost_of_computing
According to that, these days we run about $1.80 per GFLOPS. That means all of GIMPS on current hardware is ~60,000*$1.80=$108000. So if we're smart about it, with $2000 we could increase throughput by 1-2%. |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| New PC dedicated to Mersenne Prime Search | Taiy | Hardware | 12 | 2018-01-02 15:54 |
| How would you design a CPU/GPU for prime number crunching? | emily | Hardware | 4 | 2012-02-20 18:46 |
| DSP hardware for number crunching? | ixfd64 | Hardware | 15 | 2011-08-09 01:11 |
| The prime-crunching on dedicated hardware FAQ | jasonp | Hardware | 142 | 2009-11-15 23:20 |
| Optimal Hardware for Dedicated Crunching Computer | Angular | Hardware | 5 | 2004-01-16 12:37 |