mersenneforum.org ARM-based servers...
 Register FAQ Search Today's Posts Mark Forums Read

 2012-08-23, 18:43 #1 debrouxl     Sep 2009 2·3·163 Posts ARM-based servers... I've stumbled across the Baserock Slab ( http://www.baserock.com/servers/specifications ) and MiTAC GFX series ( http://www.mitac.com/Business/GFX_servers.html ), two ARM-based servers announced in the past few weeks. * the Baserock Slab is 8 x (quad-core ARMv7-A @ 1.33 GHz + 2 GB ECC DDR3 + 30-120 GB SSD) + 2 x 10 Gbps SFP+ Ethernet + 4 x 1 Gbps "classical" Ethernet in a 1U rack of half depth. That's nothing to sneer at, especially with a 260W PSU. * the MiTAC GFX is 64 quad-core ARMv7-A @ 1.6 GHz + 32 HDDs in 4U rack. Not sure about the amount of RAM, since the indicated 16 GB seems low for a 256-core system - perhaps it's 16 GB for each of the 8 "compute modules" ? The performance per watt of ARM-based gear is clearly significantly higher than that of x86_64-based gear... Future 32-bit and 64-bit ARM cores will improve, but so will x86_64 cores, so the ratio might not change that much. How would people around here estimate the crunching abilities of those platforms ? High-end GPUs are probably too far above x86_64 CPUs at TF on Mersenne numbers for these ARM servers to dethrone them; but I think that servers like the Baserock Slab could prove good NFS machines, if memory bandwidth approaches that of x86_64 machines (and that might be a big "if"): * with 512 MB of RAM per core, and 1 GB already announced for the next few months (maybe they'll raise the amount of RAM per core further later, I don't know), 15e wouldn't be a problem; * 5 Gbps internal + 2x10 Gbps external network interconnect could prove attractive for MPI post-processing.
 2012-08-23, 19:01 #2 fivemack (loop (#_fork))     Feb 2006 Cambridge, England 11001001011002 Posts For crunching heavy-duty FP, I am not convinced that currently-available ARM processors are flops-per-watt competitive with Ivy Bridge. http://fullshovel.wordpress.com/2012...a-vs-c-on-arm/ runs scimark; yes, I appreciate this is a series of toy-sized benchmarks, but the Pandaboard has a pretty awful memory controller and so I'd expect it to do relatively better on things running out of cache. On one of the two cores on a Pandaboard ES, the matrix-multiply does 150MIPS; on one of the four on a Sandy Bridge it does 1770MIPS. A pandaboard running flat-out uses about six watts; I think one active core on an SNB can get by with less than sixty. The test with the best ratio gets 240MIPS on 1xARM and 1150 on 1xSNB. http://www.phoronix.com/scan.php?pag..._cluster&num=4 does something similar; running an embarrassingly parallel benchmark over 12 cores on six pandaboards, he gets 53 Mops at 30.4 watts. In http://www.phoronix.com/scan.php?pag...cluster&num=11 he runs a slightly different benchmark on four threads of one i7/3770K and gets 277 Mops at 107 watts. ARM's selling point if you're not fully loading the machines is irrefutable. But if you are, a single i7/3770K - which will run happily from a 260W PSU even if you put a dual-port 10GbE PCIe card in it - offers performance comparable to the whole baserock slab. And the ARM server machines (the other one you might want to stumble across is http://www.boston.co.uk/solutions/viridis/default.aspx ) are at present boutique items designed to give software developers a time-to-market advantage, and so are really a lot more expensive than straight IVB boxes; the Boston Viridis FAQ gives an implied price of $3000 for a single card with four quad-core ARMs on it (IE comparable performance to one dual-core IVB), though I'll admit that that system has an exciting between-cards interconnect for which you'd have to pay five hundred dollars for an Infiniband QDR HCA and another$500-per-port for the switch. I have just spent £103.28 buying myself an Odroid-X (Exynos 4412 so quad 1.4GHz Cortex-A9, 1G memory, though only 100Mbps ethernet - effectively a Galaxy S3 without the display) from http://www.hardkernel.com/renewal_20...=G133999328931 to see if I can get gnfs-lasieve4I15e running. This will inevitably cause an Exynos 5250 devboard to be released before my Odroid-X turns up from Gyeonggi Korea: consider this a public service To get more than 4GB total memory you will have to wait for Cortex-A15-based chips (eg the Exynos 5250, OMAP 543x, Tegra 4) because the memory controller for the A9 only has 32-bit physical addresses. 4GB on a package-on-package (the cellphone chips, and therefore the cheap devboards) is unlikely to show up before 2013. Last fiddled with by fivemack on 2012-08-23 at 19:19
2012-08-23, 19:45   #3
debrouxl

Sep 2009

11110100102 Posts

Quote:
 But if you are, a single i7/3770K - which will run happily from a 260W PSU even if you put a dual-port 10GbE PCIe card in it - offers performance comparable to the whole baserock slab.
ACK.

Quote:
 though I'll admit that that system has an exciting between-cards interconnect for which you'd have to pay five hundred dollars for an Infiniband QDR HCA and another \$500-per-port for the switch.
That's pretty expensive indeed... maybe, in the mid-term, they'll have no choice but lowering their price tags, due to non-IB interconnects such as the one in the Boston Viridis ?

Quote:
 I have just spent £103.28 buying myself an Odroid-X (Exynos 4412 so quad 1.4GHz Cortex-A9, 1G memory, though only 100Mbps ethernet - effectively a Galaxy S3 without the display) ... to see if I can get gnfs-lasieve4I15e running.
Good
I might get my hands on one such system in the next few months as well.

Quote:
 To get more than 4GB total memory you will have to wait for Cortex-A15-based chips
Yup, in fact I knew that but I failed to mention it explicitly in the "later".
Cortex-A15 chips will do large RAM support for the 32-bit ARM architecture, and then 64-bit ARM chips (probably not before 2014, sadly) won't have that 4 GB limit.

2012-08-24, 14:04   #4
ldesnogu

Jan 2008
France

2×52×11 Posts

Quote:
 Originally Posted by fivemack http://www.phoronix.com/scan.php?pag..._cluster&num=4 does something similar; running an embarrassingly parallel benchmark over 12 cores on six pandaboards, he gets 53 Mops at 30.4 watts. In http://www.phoronix.com/scan.php?pag...cluster&num=11 he runs a slightly different benchmark on four threads of one i7/3770K and gets 277 Mops at 107 watts.
The problem with that setup is that the ARM cluster is made of 6 full boards. That setup is said to idle at 15-16W and observed peak power is 31W.

For the example you give, let's say idle is 15W, so that'd give about 16W of power consumption, for 55.2 Mop/s. So 3.45 Mop/s/W.

The Ivy Bridge system is idling at 41W and 107W on the benchmark. So 277.9 Mop/s for 66W. So 4.21 Mop/s/W.

Of course, this assumes that idling is really idling on both platforms

Anyway I think that for many FP intensive tasks IVB would be more power efficient. Perhaps with ARMv8 and proper FP SIMD support will things change.

2012-08-24, 15:05   #5
fivemack
(loop (#_fork))

Feb 2006
Cambridge, England

22×32×179 Posts

Quote:
 Originally Posted by ldesnogu The problem with that setup is that the ARM cluster is made of 6 full boards. That setup is said to idle at 15-16W and observed peak power is 31W.
There are a couple of startups who've tried to attack the problem of the high power consumption of idling x86 systems - Seamicro's SM10000-XE Sandy Bridge machine piles up low-voltage Xeons, avoids duplicating motherboard peripherals, and 'reduces the power consumed by the CPU by consolidating and powering down unused functions', though its web page only gives an 'average power consumption' figure and that's 3.5kW for 64 quad-cores.

I've not got a good handle on the power consumption of DRAM, though I've heard disconcertingly high figures on the order of one watt per gigabyte at idle - that gives a slightly unfair advantage to the unfortunately memory-constrained ARM systems.

Quote:
 Anyway I think that for many FP intensive tasks IVB would be more power efficient. Perhaps with ARMv8 and proper FP SIMD support will things change.
Yes, ARMv8 has double-precision SIMD, but it's roughly SSE2-level: operations on only 128 bits at a time.

 Similar Threads Thread Thread Starter Forum Replies Last Post M0CZY GMP-ECM 12 2019-10-27 09:54 mdettweiler No Prime Left Behind 0 2009-12-27 15:10 opyrt Prime Sierpinski Project 13 2009-11-04 21:33 ET_ Hardware 4 2008-08-25 02:23 Prime95 Software 1 2002-09-07 19:01

All times are UTC. The time now is 21:39.

Mon Nov 29 21:39:19 UTC 2021 up 129 days, 16:08, 0 users, load averages: 1.45, 1.72, 1.60