![]() |
|
|
#188 | |
|
Aug 2003
52 Posts |
Quote:
Bok |
|
|
|
|
|
|
#189 |
|
Apr 2003
Berlin, Germany
192 Posts |
No prob, thanks.
Here is also a win32 version of the program: http://www.informatik.uni-rostock.de...test_win32.zip (should be run in the console to see the results) Regards, DB/Matthias |
|
|
|
|
|
#190 |
|
Aug 2003
52 Posts |
ok,
I've got it running XP Pro at the moment anyway, so I'll do that one first. Or do you need a 64bit Win 2003 for the test ?? Bok |
|
|
|
|
|
#191 |
|
Apr 2003
Berlin, Germany
192 Posts |
I made different versions because the result is OS-independent. By providing versions for different OS' it is easier for people to run the test quickly.
|
|
|
|
|
|
#192 |
|
Aug 2003
52 Posts |
DresdenBoy, these are the results from XP Pro running on the opteron 1.8.
Running MMX/SSE/SSE2 reading speed tests... 78bf9ff Time for reading 32kB from cache using MOVQ :2593l (12.64 Bytes per cycle) Time for reading 4kB from cache using MOVQ :341l (12.01 Bytes per cycle) Time for reading 32kB cache using MOVAPD :4169l (7.86 Bytes per cycle) Time for reading 4kB from cache using MOVAPD :533l (7.68 Bytes per cycle) Time for reading 32kB from cache using MOVDQA :4130l (7.93 Bytes per cycle) Time for reading 4kB from cache using MOVDQA :533l (7.68 Bytes per cycle) Time for reading 32kB from cache using MOVAPS :4117l (7.96 Bytes per cycle) Time for reading 4kB from cache using MOVAPS :533l (7.68 Bytes per cycle) |
|
|
|
|
|
#193 |
|
Aug 2003
52 Posts |
And results from Suse EL8.2 64-bit are
Opteron64:~/readtst # ./readtest Running MMX/SSE2 reading speed tests... Time for reading 32k from L1 cache using MMX :2239 (14.64 Bytes per cycle) Time for reading 32k from L1 cache using SSE2 :4116 (7.96 Bytes per cycle) Time for reading 32k from L1 cache using IntSSE2 :4116 (7.96 Bytes per cycle) Bok |
|
|
|
|
|
#194 |
|
Apr 2003
Berlin, Germany
5518 Posts |
Thanks. The results show that in the revision C core nothing has changed for these instructions. And please ignore the l's after the win version cycle counts - they are relicts of the 64bit code.
![]() So we have to look for other alternatives to use in the code. BTW, according to AMD, they want to sell a lot of Athlon 64's in the next months. I hope we can win many of them if they know that they could get a "tuned" Prime95 client for Opteron. ![]() And before I forget to post it: Detailed Architecture of AMDs 64bit Core ![]() That should answer all remaining questions ![]() BTW, today is the official release of Athlon 64 and Athlon FX (maybe also Athlon 64 M). But I can't follow the events since my colleagues and me have to visit a project workshop the next days in Frankfurt. That needs all our time. So my Opteron research will stall for some days. ![]() Regards, Matthias/DB |
|
|
|
|
|
#195 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
7,537 Posts |
From your article:
Quote:
|
|
|
|
|
|
|
#196 |
|
Apr 2003
Berlin, Germany
192 Posts |
If you go back to the mainpage (Chip-Architect), you'll find the articles, where Hans de Vries analyzed the Prescott core.
|
|
|
|
|
|
#197 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
7,537 Posts |
The Chip-architect article says we should be able to load 16 bytes per cycle. Your tests indicate this is not happening.
Prime95 is slow on the Opteron because the FPU is starved for data. We need to figure out why your test gets only half the expected bandwidth. It would also be nice to run your tests reading data from the L2 cache. We may well have two separate problems Everything I've tried at resolving this data bandwidth problem has failed. |
|
|
|
|
|
#198 |
|
Apr 2003
Berlin, Germany
16916 Posts |
I included the sourcecode in both versions. So you can change it to your needs. I will also extend and update the sources to do more tests.
The Chip-Architect article also gives reasons why it's not wise to use instructions on XMM registers which expect a different format. It also states that memory operands for FP operations are fetched by the Int units and delivered to the FPU. Did you also have a look at http://www.digit-life.com/articles2/...ily2-add0.html, http://www.digit-life.com/articles2/...ily2-add1.html and http://www.digit-life.com/articles2/...ily2-add2.html? The second of these has a lot of details about the behaviour of the L1/L2 caches on K8. Recently I looked at some old preview of the Hammer by www.tecchannel.de. They've run their TecMem benchmark on the CPU (an engineering sample running at 800MHz) and reached up to 16Bytes/cycle using MOVDQA. The same tests on Opteron (using a newer version of TecMem) achieved only up to 8Bytes/cycle for 128bit accesses while 64bit accesses read 16Bytes/cycle. But that could have a different reason: MOVDQA has the same opcode as MOVQ, but extended by a 0x66 prefix. If a CPU doesn't understand the SSE2 instruction (like my Athlon XP) it will execute it as the MMX equivalent. In that case the actually achieved bandwidth is just half of the expected. That could be the case for this engineering sample because SSE2 could have been disabled for it. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Opteron is Hyperthreaded ? | bgbeuning | Information & Answers | 3 | 2016-01-10 08:26 |
| Opteron web server... | Xyzzy | Lounge | 14 | 2003-11-05 23:07 |
| Opteron Bottleneck?? | Prime95 | Hardware | 31 | 2003-09-17 06:54 |
| AMD Opteron | naclosagc | Software | 27 | 2003-08-10 19:14 |
| What will an AMD Opteron be classified as ? | dsouza123 | Software | 4 | 2003-08-02 14:29 |