![]() |
|
|
#1 |
|
Apr 2003
Berlin, Germany
192 Posts |
Hello,
since I run into interesting software pieces and sites, I'll give an overview, what could be useful for prime95 development: an low overhead profiler for linux, using performance counters of Intel/AMD, can go down to instruction level: http://oprofile.sourceforge.net/ AMD CodeAnalyst 2.1 beta (like VTune, but free, has Pipeline simulator for various AMD CPUs, performance counter and timer based profiling) http://www.amd.com/us-en/Processors/...2_3604,00.html (I used 1.2 before for optimizing my SSE/MMX game of life code) Portland Group compiler 5.0 beta with support for Opterons 16 SSE2 and int registers, which is surely better optimizing for the schedulers and latencies than Intel C++. You can download a free version - with Opteron binaries - for Win and Linux here: http://www.pgroup.com/AMD64.htm GCC 3.3 has better x86-64 support. A complete package (MingW versions) can be found here: http://www.thisiscool.com/gcc33_mingw.htm Surely the compilers won't be useful for the handcrafted assembler, but one could look what code they produce for key parts of the algorithm. More interesting would be a pipeline analysis of the hotspots. I don't know if such a thing is possible for Intel based systems now, but simulating for Athlon/Opteron could give some clues. EDIT: CodeAnalyst 2.1 doesn't have the nice graphical pipeline analyzer which was present in version 1.2. Now it's a command line tool. Its still very useful (because of SSE2 and x86-64 support). To get a clue, what it looks like in 1.2 (you can get version 1.2 and some info by searching google for "AMD Codeanalyst" and opening the cached page) here's a screenshot. The colored boxes have meanings like "dispatch" (yellow), "execute" (green), "retiring" (grey), some stall (red border) and so on. Moving mouse pointer over a box gives detailed info about what's happening with this instruction during that particular cycle. http://www.informatik.uni-rostock.de...odeAnalyst.jpg |
|
|
|
|
|
#2 |
|
Apr 2003
Berlin, Germany
192 Posts |
screenshot and more info added (I write this because there is no notification about my edit)
|
|
|
|
|
|
#3 |
|
P90 years forever!
Aug 2002
Yeehaw, FL
17·487 Posts |
Dresdenboy, your ID makes me think you work for AMD! In which case I'm almost ashamed to ask this question: "Does CodeAnalyst work on Intel CPUs - especially the pipeline analyzer?" I know it won't tell me anything about Intel's pipelines but it would be cool if it would help an Intel owner optimize his code for an AMD CPU.
|
|
|
|
|
|
#4 | |
|
Aug 2002
3×37 Posts |
Hello,
Nice review!. I'm still waiting for something like CodeAnalyst ... but for linux :-( Quote:
Guillermo. |
|
|
|
|
|
|
#5 | |
|
Apr 2003
Berlin, Germany
5518 Posts |
Quote:
CodeAnalyst works on all x86 CPUs - except that event based profiling isn't possible. CA checks for availability for certain events. Here at work (not AMD) there are some PIII and P4 systems. On PIII the CodeAnalyst 1.2 pipeline analyzer works perfectly. You can select a CPU (K6-2, Athlon, Duron, Athlon XP/MP) and select the multiplier for simulation. That was a nice way to study SSE behaviour (and to find out, that MMX is faster for AND, NOR etc.). I saw that the scheduler often was just full of ops because there were no integer ops in about 30 SSE instructions and sometimes it had to wait for Load/Store unit because of reusing of a stored value. In CodeAnalyst 2.1b also Athlon 64/Opteron code can be simulated. Maybe I'll write a gui some day if it's not planned at AMD. Opteron also has better options for doing SSE2 than Athlon XP has for SSE because many of the important ops are now directpath or double (mOP) decoded and don't fill up the issue ports like before which would allow us to start some 64bit mul or so. DDB |
|
|
|
|
|
|
#6 | |
|
Apr 2003
Berlin, Germany
192 Posts |
Some interesting fact about the Intel compiler:
http://www.aceshardware.com/forum?read=95033881 quotes the Inquirer that people found out, that SSE2 code compiled on Xeon runs much faster on Opteron than when it's compiled on Opteron itself (using the same options and target). A comment (http://www.aceshardware.com/forum?read=95033983): Quote:
DDB |
|
|
|
|
|
|
#7 | ||
|
Aug 2002
207228 Posts |
Quote:
Quote:
|
||
|
|
|
|
|
#8 | ||
|
Aug 2002
3×37 Posts |
Hello,
Quote:
AMD people!, Linux has been the OS with better and faster support for your new processors. It's time to offer to the developers performance tuners and compilers for your hardware. ;) isn't it? Guillermo. |
||
|
|
|
|
|
#9 | |
|
Apr 2003
Berlin, Germany
192 Posts |
Quote:
If there is some pipeline analysis necessary one could put some hotspot functions into a small cygwin/mingw app using the same compiler (GCC/ICC) and analyse it under windows. There is even no program database or debug info needed for CodeAnalyst 2.1 but it's good to know the start adress for a trace. I used CodeAnalysts graphical interface to find a suitable start adress when going down to disassembly level - that's possible without sourcecode/program database - but then you don't see associated sourcecode lines or function names. While I profiled my game of life here, I also profiled Prime95, Win2k kernel and other running code where you also can have a look at in disassembled view. Pipeline analysis of this game of life also shows, that it runs ~40% faster on Opteron compared clock per clock to XP. Regards, DDB |
|
|
|
|
|
|
#10 |
|
Apr 2003
Berlin, Germany
192 Posts |
A short info about CodeAnalyst 2.1b:
As I found out the command line tools for pipeline analyzing produce a binary file, which could be used for creating a graphical visualisation like in CodeAnalyst 1.2. Currently it's only used to produce simulation reports. DDB |
|
|
|
|
|
#11 | |
|
Aug 2002
3·37 Posts |
Quote:
Thanks, Dresdenboy. Your suggestions have been useful to me. If you like, you can get and test Glucas code at: http://sourceforge.net/projects/glucas There is still no release for SSE2 code, this is currently partially implemented and you can download it from CVS repository. Guillermo. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Comparison of NFS tools | CRGreathouse | Factoring | 3 | 2018-02-05 14:55 |
| Murphy's Law and other tools | Uncwilly | Lounge | 5 | 2014-07-07 22:36 |
| AMD Athlon 64 vs AMD Opteron for ecm | thomasn | Factoring | 6 | 2004-11-08 13:25 |
| Creative ways to achieve Athlon 64 / Opteron optimization | GP2 | Hardware | 11 | 2004-01-21 03:01 |
| Intel Compilers? | db597 | Software | 1 | 2003-01-17 16:45 |