![]() |
AMD Opteron
Is there any performace advantage from running on an Opteron? Would the software have to be rewritten or recompiled to get an advanatge from the 64 bitness? Are there plans to do this? Thanks.
Andy |
Not yet to all your questions :-)
Luigi |
as far as ive heard
no becuase the "64 bitness" doenst really affect prime that much beucase it relies on floating point opperations it may help indirectly becuase other things that do take advantage of the 64 bit cpu will reqire less of its time ( like the os and other apps) leaving more time for prime |
George does not have access to an Opteron to do any enhancements for it.
Based on existing benchmark testing I've seen around the web, the existing SSE2 code DOES work well with an Opteron, but is still somewhat slower than a P-IV due to relative clock rates, though fairly close *per* clock despite not having specifically Opteron-optimised code anywhere. George has indicated that there might be some small gains to make from optimizing the cache usage, but doesn't think it would make a BIG difference. Same answers for the Athlon64, since it's not yet released and AMD doesn't appear to be sending samples (yet) to anyone other than big manufacturers (esp. motherboard makers). |
Running an Opteron vs an Athlon on a 32-bit OS, there should be a few advantages.
SSE2 code will work. Even though the extra general purpose registers available to 64-bit code aren't available, internally there should be more resources/temporary registers to work with. Each assembly level instruction is broken down into many smaller internal instructions which take advantage of all available hardware resources ( hidden registers, running multiple internal instructions in parallel, etc ). There is available extra unused hardware, needed for the extra general purpose registers, which should allow better register management/ register renaming. The L2 cache is larger, an Athlon has 256k, (exception some old slower 900 Mhz and less had 512k). The Opteron can have 512k or 1024k. So less cache misses. Also the cache bus width is double. |
There is an Opteron SMP article up at [url=http://www.amdmb.com/article-display.php?ArticleID=248&PageID=10]AMDMB.COM[/url] that has some Prime95 benchmarks. I don't know what version they tested with, but the benchmark results only go up to 1792K.
Selected benchmark figures (I chose the highest numbers listed for each FFT size): [code:1] 256K - 16.628 512K - 35.543 1024K - 78.184 1792K - 149.181[/code:1] Which puts the Opteron 244 (1.8GHz) at about equal to a 1.6Ghz-ish P4. While they cast Prime95 in an interesting light - "Prime 95 is a benchmark used to find Mersenne Prime numbers" - at least they provided a link to www.mersenne.org! :) |
If I'm not mistaken, the FPU of the Opteron has 3 modes. 1 is the normal mode, another is a hybrid mode and the last is a completely new way of doing things.
I'd expect that the classic FPU mode will be little better than an Athlon (perhaps extra cache, SSE, more FSB/memory bandwidth, and lower latency will help a little). But what about the other 2 modes? Won't these avoid the antiquated floating point stacks of the x86 architecture? Understandably, this will require a lot of re-coding as a downside. |
I believe you are mistaken about the FPU in the Opteron.
It is the same as the Athlon. The CPU/general instruction set is what has 3 modes. 32-Bit mode uses the same instruction set as an Athlon ( plus SSE2 ) Needs 32-bit OS and 32-bit programs. (use existing programs) 64-Bit mode uses the new instruction set, has extra registers, the registers are 64-bit. ( has SSE2 ) Needs 64-bit OS and 64-bit programs. (recompile or new 64-bit programs) mixed 32/64-bit can run programs using either instruction set. (has SSE2) Needs 64-bit OS ( not sure ) and 32-bit or 64-bit programs. (32-bit are existing programs) |
[quote="dsouza123"]Running an Opteron vs an Athlon on a 32-bit OS, there should be a few advantages.
SSE2 code will work. Even though the extra general purpose registers available to 64-bit code aren't available, internally there should be more resources/temporary registers to work with. Each assembly level instruction is broken down into many smaller internal instructions which take advantage of all available hardware resources ( hidden registers, running multiple internal instructions in parallel, etc ). There is available extra unused hardware, needed for the extra general purpose registers, which should allow better register management/ register renaming. The L2 cache is larger, an Athlon has 256k, (exception some old slower 900 Mhz and less had 512k). The Opteron can have 512k or 1024k. So less cache misses. Also the cache bus width is double.[/quote] x87 code is nearly as fast as SSE2 with packed doubles because SSE2 for 2 double precision values in most cases translates into 2 internal Ops. The register file (88 regs) is still the same. And Opteron + Athlon are really fast on x87 code (unlike P4). At least SSE2 doesn't need to expicitly calculate with first register but the needed FXCH in x87 code don't eat any resources besides one of three instruction issue slots. The number of internal registers depends on the max. possible load of operations. The Opteron has two new pipeline stages to arrange incoming code in a better way for decode and issueing. Currently there are some code alignment and instruction reordering optimizations necessary (by hand or by compiler) to create an optimal instruction stream. The bigger L2 cache won't help much since mem interface is much faster (max. 6.4GB/s vs. 3.2GB/s with 40-60% higher latency on Athlon) and the cache hitrate of prime95 is already very good (over 97% on my AXP with 256kB L2 AFAIR). But there are surely possible optimizations by changing code which is currently optimized for P4. There are some things which P4 does fast and Opteron and vice versa. |
The P4 was a dog until George managed to get one in order to do development on it. So I think it is too early to count the Opteron/Athlon64 out.
|
[quote="trif"]The P4 was a dog until George managed to get one in order to do development on it. So I think it is too early to count the Opteron/Athlon64 out.[/quote]There was big potential for doing such optimizations since normal FPU code (x87) is very slow on P4 and SSE2 is really fast.
On Opteron x87 is really fast and SSE2 not really faster ;) (it can't do more adds and muls per cycle than x87). But SSE2 has other advantages then - you can use the registers freely for calculation (no need to use the first as one operand) and in 64bit mode you have 16 SSE2 regs instead of 8. |
| All times are UTC. The time now is 04:50. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.