![]() |
AMD64 on Solaris
Hi guys,
Since we found out that Linux isn't using the AMD processors and the shared memory efficiently on the machines I use to make P-1, I consider switching to OpenSolaris 11.1. Does anyone know, how I could make my P-1 on OpenSolaris? |
What models of AMD processor, chipset and memory sticks ?
|
4x AMD Opteron 6176 (4x 12 cores)
AMD SR5690/SR5670/SP5100 Chipset 4x 8x4 Go DDR3 Registered ECC (Quad Channel) ; in theory this should be either Kingston DDR3-1600 or DDR3-1333 |
That's an interesting system :smile:
Which version of the Linux kernel and which distribution, BTW ? Linux powers ~95% of the world's most-powerful super-computers (and people are not necessarily using no-fee Linux distros on them, so the cost advantage does not necessarily hold), so workloads on which Linux does a bad job are supposed to be relatively infrequent. Does GMP-ECM build on OpenSolaris ? |
Hey, we are also very suprised that Linux is doing a bad job here (we have a lot a Intel systems working with no problem); but believe me it does.
Kernel version is 2.6.32-131.17.1.el6.x86_64 (SL6.1) but we even tried with 3.2, no changes... Actually we have more than only one of such system: we installed OpenSolaris 11.1 on one of them, factor 1.6-1.7 speed improvement on Pi calculation. GMP can be compiled on it, so I think that GMP-ECM should also... I will try to compile mprime on it... I think it should work!? |
I would guess that the substantial performance changes are a result of Solaris having better default NUMA handling: have you tried playing around with numactl under linux?
I had problems with the kernel moving jobs away from the processor that their memory is attached to; starting them with 'taskset -c X numactl -l' ensures that the job stays on processor X and has its memory allocated from processor X's pool, which can help quite a bit. 'numactl -i 0-7' will allocate memory interleaved across all eight memory controllers, which may be better for jobs that want to use lots of memory from a single thread. |
Just out of curiosity, what are your compiler options under OpenSolaris - GCC-only or also SunStudio?
|
[QUOTE=fivemack;319412]I had problems with the kernel moving jobs away from the processor that their memory is attached to; starting them with 'taskset -c X numactl -l' ensures that the job stays on processor X and has its memory allocated from processor X's pool, which can help quite a bit.
'numactl -i 0-7' will allocate memory interleaved across all eight memory controllers, which may be better for jobs that want to use lots of memory from a single thread.[/QUOTE] Amazing! Thanks a lot! But... Since I have 4 CPUS (with quad channel) on each: why eight memory controllers? [QUOTE=ewmayer;319522]Just out of curiosity, what are your compiler options under OpenSolaris - GCC-only or also SunStudio?[/QUOTE] I use(d) SunStudio, with -fast -library=sunperf -xipo=2 -xtarget=barcelona |
On a Magny Cours system you don't quite have four CPUs with quad-channel memory controllers; you have eight CPUs with dual-channel memory controllers, packaged two to a socket.
If you do 'numactl --hardware' then you get the node distances table [code] node distances: node 0 1 2 3 4 5 6 7 0: 10 16 16 22 16 22 16 22 1: 16 10 22 16 22 16 22 16 2: 16 22 10 16 16 22 16 22 3: 22 16 16 10 22 16 22 16 4: 16 22 16 22 10 16 16 22 5: 22 16 22 16 16 10 22 16 6: 16 22 16 22 16 22 10 16 7: 22 16 22 16 22 16 16 10 [/code] |
[QUOTE=fivemack;319628]On a Magny Cours system you don't quite have four CPUs with quad-channel memory controllers; you have eight CPUs with dual-channel memory controllers, packaged two to a socket.
[/QUOTE] Of course: [URL]http://www.anandtech.com/print/2978[/URL] I thought the "package" was seen as one processor. Hmmm... Thanks a lot! [QUOTE=fivemack;319628] If you do 'numactl --hardware' then you get the node distances table [/QUOTE] Is node distance in CPU cycles or HT-link cycle? |
| All times are UTC. The time now is 04:27. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.