![]() |
![]() |
#1 |
Mar 2003
Braunschweig, Germany
2·113 Posts |
![]()
Some Opteron news...
I just got my paper-copy of the german publication c't-magazine. Andreas Stiller did some benchmarks with an Opteron 244 system. Prime95 v22.12 W32 on the 1.8 GHz Opteron showed about half the performance compared to a P IV 3.06 GHz. There was also a hint that George Woltman is already working on some Opteron optimizations. I am still undecided if i should wait for the Atestosteron 64 ;) The Optimisteron is far to expensive für me *g* Tau |
![]() |
![]() |
![]() |
#2 |
P90 years forever!
Aug 2002
Yeehaw, FL
22·13·157 Posts |
![]()
The only optimization I plan is to prefetch 64 byte cache lines instead of 128 byte cache lines. This will be a modest improvement but it looks like the P4 will be CPU of choice for some time to come.
|
![]() |
![]() |
![]() |
#3 |
Aug 2002
11110 Posts |
![]()
Hi,
George, I'm now writing some SSE2 code for Glucas thinking on Opterons. I see I still have a chance to reduce the gap between Glucas and Prime95 if you don't touch the code too much ;). Regards. Guillermo. |
![]() |
![]() |
![]() |
#4 |
P90 years forever!
Aug 2002
Yeehaw, FL
22·13·157 Posts |
![]()
An excellent review at http://www.xbitlabs.com/articles/cpu/display/athlon64.html does not bode well for the Opteron. From the conclusion:
Athlon 64 is not very successful in traditional calculating tasks, such as scientific calculations |
![]() |
![]() |
![]() |
#5 |
Aug 2002
3·37 Posts |
![]()
Hi,
> http://www.xbitlabs.com/articles/cpu/display/athlon64.html does not bode well for the Opteron. I'm afraid that most of these tests are made using 32 bits sofware for Pentium4. Yes, the low core frequency is a problem, but Opteron also has 8 aditional 128 registers which could reduce register pressure (no talking about its 16 64-bit integer registers). I think when software begin to use all this advantages the gap will be reduced drastically. IMHO, AMD needs urgently a good compiler. Guillermo |
![]() |
![]() |
![]() |
#6 |
Dec 2002
10102 Posts |
![]()
I might be able to come up with some actual hammer hardware to test code optimizations.
|
![]() |
![]() |
![]() |
#7 | |
Aug 2002
1578 Posts |
![]()
Hi,
Quote:
Guillermo. |
|
![]() |
![]() |
![]() |
#8 | |
Apr 2003
Berlin, Germany
16916 Posts |
![]() Quote:
The cause of Athlon 64s lower performance in this review is the code. An Athlon can "peephole" optimize a bit the fed in P3/P4 code in its schedulers. But that can't do wonders if the available resources aren't used efficiently. We know that P3 code on P4 and vice versa can cause significant differences in performance and this was a reason for P4s poor performance in it's first months of existence on the market. I'd also recommend these reviews because of their detailed information regarding FPU/FFT speed/SSE2..: Aces Hardware has an good review (look at page 14 for compiled C-code FFT and other stuff): http://www.aceshardware.com/read.jsp?id=55000251 and tecchannel (german) also looks closely at SSE2, SMP and so: http://www.tecchannel.de/hardware/1164/index.html Regards, DDB BTW 1.4 GHz Opterons are around 280$ and will go down during the next months. But oc'ed 1700+ ($50-$60) running at 2.2GHz or more (at 1.5-1.6 V) are also welcome for computation ;) |
|
![]() |
![]() |
![]() |
#9 |
P90 years forever!
Aug 2002
Yeehaw, FL
22·13·157 Posts |
![]()
From the same Ace's Hardware article, the graph on this page http://www.aceshardware.com/read.jsp?id=55000253 shows a hugh difference in L2 cache bandwidth. Prime95 reads and writes a lot of data to the L2 cache so this could have a significant impact.
Without having an Opteron on hand, I think the biggest problem the Opteron has competing with a P4 running prime95 is raw clock speed. Both have a theoretical throughput of one FPU mul and one FPU add per clock cycle. Since both are doing a pretty good job of approaching this theoretical limit (a P4 reaches about 55% of the theoretical maximum FPU throughput), the Opteron cannot overcome the 3.06 vs 1.8 GHz speed disadvantage. |
![]() |
![]() |
![]() |
#10 | ||
Aug 2002
3×37 Posts |
![]() Quote:
George, how many load/store and clock cycles could you have saved with 8 more registers? Quote:
BTW, is there any advantage with the integer 64 bit multiply in factoring task. Opteron takes only 4 clocks in a 64x64=128 bits mul, and only a clock throughtput for imuls. |
||
![]() |
![]() |
![]() |
#11 | |
P90 years forever!
Aug 2002
Yeehaw, FL
22·13·157 Posts |
![]() Quote:
When I wrote the P4 SSE2 code, my tests on small snipets of assembly code showed that I could completely ignore the L1 cache. That is, the P4's L2 bandwidth and ability to schedule reads far enough in advance lets prime95 run out of the L2 cache nearly as fast as running out of the L1 cache (which is a good thing since the L1 cache is so small). I don't remember what the Opteron's L1 cache size is, but if it is a decent size you could further reduce the L2 cache reads and writes by juggling the assembly code to process more data while it is in the L1 cache. The downside is code complexity goes up a little bit and it may be hard to avoid store-forwarding penalties. |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Prime95 crashing on dual Opteron with some workers doing P-1 | bgbeuning | Information & Answers | 2 | 2015-12-30 00:00 |
2 x AMD Opteron 2427 @ 2.39 GHz - prime95 bench- | joblack | Hardware | 2 | 2010-03-12 19:38 |
Best Prime95 Version for Opteron | Minot | Software | 1 | 2005-02-14 00:47 |
Opteron Bottleneck?? | Prime95 | Hardware | 31 | 2003-09-17 06:54 |
AMD Opteron | naclosagc | Software | 27 | 2003-08-10 19:14 |