mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2011-09-20, 21:41   #34
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

179510 Posts
Default

Quote:
Originally Posted by jasonp View Post
What a great thread. Is that Salem guy still protesting by turning off more computers?
If he is, I'll have to protest by turning another, new one on!
Christenson is offline   Reply With Quote
Old 2011-09-21, 06:20   #35
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

I don't know about any super computers, but 515 GFLOPS peak Tesla cards run about $1500 a pop.
http://compeve.com/video-cards/pci-e...85b-633246-001 Server card
http://www.tigerdirect.com/applicati...?EdpNo=6391103 Workstation card

Stats
http://en.wikipedia.org/wiki/Nvidia_Tesla

They both run on a PCIe x16 2.0 slot. Now, $1500 for a 1% performance increase isn't bad (assuming GIMPS throughput is 50 TFLOPS, which admittedly is a bit low)
Dubslow is offline   Reply With Quote
Old 2011-09-22, 09:48   #36
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

22·2,663 Posts
Default

I have recently been looking at low-power computation and have purchased an mbed system to play with. It is a 96MHz ARM processor, memory, a micro-USB socket, some LEDs and a reset button, all built into a standard 40-pin DIP package. The whole thing takes 100mW. It's also easy to program in C and/or C++. Roughly speaking, the compute performance is comparable with a good PC of around 1995 vintage.

I then started seeing what else is available in the ARM range and came across these beasties: TI am3703. If I read it correctly, you get a 1GHz 32-bit cpu for around USD15 and it draws about 1W. They come in a 15mm square package.

It seems to me that a PCI board could easily hold 16 of them, a significant amount of memory and any necessary glue, perhaps including ethernet and/or USB and/or another ARM for system control. What you would then have is a snazzy little system for learning real parallel computing because the interconnect and topology could be entirely under software control. Computational power should be useful but not astounding --- comparable with a 4GHz 4-core x86 processor perhaps.

What would be much more interesting would to be build boards with 100 or 128 of them on each side ...

Comments?

Paul
xilman is offline   Reply With Quote
Old 2011-09-22, 12:18   #37
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

5×359 Posts
Default

Le't see..you are saying 16W to match a 4 core x86 CPU that probably pulls 100W. At least approximately...very interesting.....I like the power efficiency. PC's aren't terribly energy efficient in lots of ways. All of that branch prediction circuitry and speculative execution must use some power!

Have to say I think the on-board memory and interconnect is going to be the biggest issue. You'll need memory to run FFTs for LL tests, but quite a bit of interconnect to run factoring algorithms.

How do you think your card will do compared to a processor on blue gene, an NVIDIA TI-560, or your other favorite supercomputer, in terms of J/GFlop or J/GHz-day?
Christenson is offline   Reply With Quote
Old 2011-09-22, 12:30   #38
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

299C16 Posts
Default

Quote:
Originally Posted by Christenson View Post
Le't see..you are saying 16W to match a 4 core x86 CPU that probably pulls 100W. At least approximately...very interesting.....I like the power efficiency. PC's aren't terribly energy efficient in lots of ways. All of that branch prediction circuitry and speculative execution must use some power!

Have to say I think the on-board memory and interconnect is going to be the biggest issue. You'll need memory to run FFTs for LL tests, but quite a bit of interconnect to run factoring algorithms.

How do you think your card will do compared to a processor on blue gene, an NVIDIA TI-560, or your other favorite supercomputer, in terms of J/GFlop or J/GHz-day?
It's the low power aspect that first attracted me.

Interconnect shouldn't be too hard --- the ARM chips have any number of I/O pins under software control.

My guess is that each ARM will be comparable to each processor in a GPU in compute performance. A GPU has hundreds of them for a power budget of 0.3W each, say, so will outperform one of these cards many times over. OTOH, the programming model of a GPU is heavily constrained and it's very hard to get sustained compute performance.

The real attraction of the idea, from my point of view is that it might be an ideal educational tool for developing parallel algorithms and designing parallel computers.


Paul
xilman is offline   Reply With Quote
Old 2011-09-22, 13:05   #39
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DD116 Posts
Default

Years ago some researchers put 8x100MHz StrongARM processors on a 33MHz PCI board, ostensibly for neural net related computations. I think their board got to the prototype stage with a few copies made, and the total power draw was well short of the PCI limit (25W).

Nowadays high-performance integer-only embedded processors run at 1+GHz with very low power, and come with lots of onboard cache and interfaces to high-performance DRAM (check out the Cortex family in this list). Most of them use the ARM architecture, although there are also high-performance MIPS models in the network processor space. It's possible the highest-performance MIPS chips are in the Loongson line.
jasonp is offline   Reply With Quote
Old 2011-09-23, 09:49   #40
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

3·181 Posts
Default

Quote:
Originally Posted by xilman View Post
I then started seeing what else is available in the ARM range and came across these beasties: TI am3703. If I read it correctly, you get a 1GHz 32-bit cpu for around USD15 and it draws about 1W. They come in a 15mm square package.

[...]Computational power should be useful but not astounding --- comparable with a 4GHz 4-core x86 processor perhaps.
The computational power of 16 Cortex-A8@1 GHz will be much lower than 4 x86@4 GHz.

First, Cortex-A8 FPU is non-pipelined.
Second, it's only dual issue without out of order execution.
Third, it's not 64-bit.
Fourth, memory bandwidth is typically rather low because the target market doesn't require high bandwidth.

So as a compute engine probably not a good thing (even from a perf/W point of view), but as an educational tool might be fun for sure
ldesnogu is online now   Reply With Quote
Old 2011-09-23, 14:06   #41
xilman
Bamboozled!
 
xilman's Avatar
 
"π’‰Ίπ’ŒŒπ’‡·π’†·π’€­"
May 2003
Down not across

246348 Posts
Default

Quote:
Originally Posted by ldesnogu View Post
The computational power of 16 Cortex-A8@1 GHz will be much lower than 4 x86@4 GHz.

First, Cortex-A8 FPU is non-pipelined.
Second, it's only dual issue without out of order execution.
Third, it's not 64-bit.
Fourth, memory bandwidth is typically rather low because the target market doesn't require high bandwidth.

So as a compute engine probably not a good thing (even from a perf/W point of view), but as an educational tool might be fun for sure
First: correct.
Second: Also correct.
Third: I confess I was benchmarking my mbed on problems which don't need 64-bit arithmetic.
Fourth: Correct per cpu. Give each cpu its own memory and the effective bandwidth is raised 16-fold. To a first approximation, anyway.

Still make a nice educational tool IMO, so we're in agreement there.

Paul
xilman is offline   Reply With Quote
Old 2011-09-23, 14:15   #42
ldesnogu
 
ldesnogu's Avatar
 
Jan 2008
France

3×181 Posts
Default

Quote:
Originally Posted by xilman View Post
Fourth: Correct per cpu. Give each cpu its own memory and the effective bandwidth is raised 16-fold. To a first approximation, anyway.
Good point. I was just pointing the weaknesses of existing ARM chips which alas can't saturate their memory interface due to poor memory controllers. Sad to see a 2-core Cortex-A9 cluster only able to reach a 3 or 4 hundreds of MB/s while it could be several GB/s

Quote:
Still make a nice educational tool IMO, so we're in agreement there.
Definitely.
ldesnogu is online now   Reply With Quote
Old 2011-09-24, 01:04   #43
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

My core i7-2600k is running about 75W at 3.5 GHz over 4 cores + hyperthreads.
Dubslow is offline   Reply With Quote
Old 2011-09-25, 06:37   #44
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

1C3516 Posts
Default

http://en.wikipedia.org/wiki/FLOPS#Cost_of_computing
According to that, these days we run about $1.80 per GFLOPS. That means all of GIMPS on current hardware is ~60,000*$1.80=$108000. So if we're smart about it, with $2000 we could increase throughput by 1-2%.
Dubslow is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
New PC dedicated to Mersenne Prime Search Taiy Hardware 12 2018-01-02 15:54
How would you design a CPU/GPU for prime number crunching? emily Hardware 4 2012-02-20 18:46
DSP hardware for number crunching? ixfd64 Hardware 15 2011-08-09 01:11
The prime-crunching on dedicated hardware FAQ jasonp Hardware 142 2009-11-15 23:20
Optimal Hardware for Dedicated Crunching Computer Angular Hardware 5 2004-01-16 12:37

All times are UTC. The time now is 05:59.

Fri Apr 23 05:59:48 UTC 2021 up 15 days, 40 mins, 0 users, load averages: 1.94, 1.78, 1.61

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.