![]() |
interesting tools and compilers (for P4, Athlon, Opteron)
Hello,
since I run into interesting software pieces and sites, I'll give an overview, what could be useful for prime95 development: an low overhead profiler for linux, using performance counters of Intel/AMD, can go down to instruction level: [url]http://oprofile.sourceforge.net/[/url] AMD CodeAnalyst 2.1 beta (like VTune, but free, has Pipeline simulator for various AMD CPUs, performance counter and timer based profiling) [url]http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_3604,00.html[/url] (I used 1.2 before for optimizing my SSE/MMX game of life code) Portland Group compiler 5.0 beta with support for Opterons 16 SSE2 and int registers, which is surely better optimizing for the schedulers and latencies than Intel C++. You can download a free version - with Opteron binaries - for Win and Linux here: [url]http://www.pgroup.com/AMD64.htm[/url] GCC 3.3 has better x86-64 support. A complete package (MingW versions) can be found here: [url]http://www.thisiscool.com/gcc33_mingw.htm[/url] Surely the compilers won't be useful for the handcrafted assembler, but one could look what code they produce for key parts of the algorithm. More interesting would be a pipeline analysis of the hotspots. I don't know if such a thing is possible for Intel based systems now, but simulating for Athlon/Opteron could give some clues. EDIT: CodeAnalyst 2.1 doesn't have the nice graphical pipeline analyzer which was present in version 1.2. Now it's a command line tool. Its still very useful (because of SSE2 and x86-64 support). To get a clue, what it looks like in 1.2 (you can get version 1.2 and some info by searching google for "AMD Codeanalyst" and opening the cached page) here's a screenshot. The colored boxes have meanings like "dispatch" (yellow), "execute" (green), "retiring" (grey), some stall (red border) and so on. Moving mouse pointer over a box gives detailed info about what's happening with this instruction during that particular cycle. [img]http://www.informatik.uni-rostock.de/~mw212/CodeAnalyst.jpg[/img] |
screenshot and more info added (I write this because there is no notification about my edit)
|
Dresdenboy, your ID makes me think you work for AMD! In which case I'm almost ashamed to ask this question: "Does CodeAnalyst work on Intel CPUs - especially the pipeline analyzer?" I know it won't tell me anything about Intel's pipelines but it would be cool if it would help an Intel owner optimize his code for an AMD CPU.
|
Re: interesting tools and compilers (for P4, Athlon, Opteron
Hello,
Nice review!. I'm still waiting for something like CodeAnalyst ... but for linux :-( [quote="Dresdenboy"] GCC 3.3 has better x86-64 support. A complete package (MingW versions) can be found here: [url]http://www.thisiscool.com/gcc33_mingw.htm[/url] [/quote] I've had bad experiences with GCC 3.3 and SSE2 support. It is indeed bugy a lot managing SSE2 code. Actually I have to work with unstable CVS GCC 3.4 snapshots. It works OK for Glucas with no problems for pentium4 and Opteron support. Guillermo. |
[quote="Prime95"]Dresdenboy, your ID makes me think you work for AMD! In which case I'm almost ashamed to ask this question: "Does CodeAnalyst work on Intel CPUs - especially the pipeline analyzer?" I know it won't tell me anything about Intel's pipelines but it would be cool if it would help an Intel owner optimize his code for an AMD CPU.[/quote]
That's even not farfetched ;) But not me works there. It's one of my friends who's a fab worker in Dresden. I'm just someone who likes to optimize code since 6502, 68k, 386 times. But since this work is often already being done for Intel CPUs these days someone needs to do (or help to do) it for AMD. CodeAnalyst works on all x86 CPUs - except that event based profiling isn't possible. CA checks for availability for certain events. Here at work (not AMD) there are some PIII and P4 systems. On PIII the CodeAnalyst 1.2 pipeline analyzer works perfectly. You can select a CPU (K6-2, Athlon, Duron, Athlon XP/MP) and select the multiplier for simulation. That was a nice way to study SSE behaviour (and to find out, that MMX is faster for AND, NOR etc.). I saw that the scheduler often was just full of ops because there were no integer ops in about 30 SSE instructions and sometimes it had to wait for Load/Store unit because of reusing of a stored value. In CodeAnalyst 2.1b also Athlon 64/Opteron code can be simulated. Maybe I'll write a gui some day if it's not planned at AMD. Opteron also has better options for doing SSE2 than Athlon XP has for SSE because many of the important ops are now directpath or double (mOP) decoded and don't fill up the issue ports like before which would allow us to start some 64bit mul or so. DDB |
Some interesting fact about the Intel compiler:
[url]http://www.aceshardware.com/forum?read=95033881[/url] quotes the Inquirer that people found out, that SSE2 code compiled [b]on[/b] Xeon runs much faster on Opteron than when it's compiled on Opteron itself (using the same options and target). A comment ([url]http://www.aceshardware.com/forum?read=95033983[/url]): [quote]If you compile on Opteron platform, the compiler detects the chip and produces poor performing code, so you have to compile on a P4. There is an IFDEF that basically says "If AMD chip detected, disable optimizations". The opportunity to insert that function is why Intel has been spending a ton of money buying out any compiler producers it can.[/quote] Well, for me at home (Athlon XP) the created code is really fast. But I'll make a test and will also compile it on P4 for trying at home. DDB |
Re: interesting tools and compilers (for P4, Athlon, Opteron
[quote="gbvalor"]Nice review!. I'm still waiting for something like CodeAnalyst ... but for linux :-([/quote]
Is VTune the same thing? [quote]To all Software Developers: INTEL INTRODUCES UPDATED VERSION OF VTUNE(TM) PERFORMANCE ANALYZER FOR LINUX INTEL(R) VTUNE(TM) PERFORMANCE ANALYZER V1.1 FOR LINUX We are excited to inform you that the VTune(TM) Performance Analyzer 1.1 for Linux* is available now for purchase, by download and on CD-ROM directly from Intel and also from Intel Software Development Product resellers worldwide. FEATURES The VTune analyzer for Linux* provides a fully native-Linux solution that allows you to reach higher levels of software performance on the latest 32-bit Intel processors, including the new Pentium(r) M processor component of Intel Centrino(TM) mobile technology, Intel(r) Xeon(TM) and Intel(r) Pentium(r) 4 processors. This new product provides a command-line capability that allows you to collect, analyze, and display performance data for your 32-bit Linux* applications, kernels and drivers. This product version highlights include: · Powerful, flexible native-Linux command line interface · Low intrusion system-wide profiling capability · Local event based sampling and call graph support · Support for multiple Red Hat* and SuSE* Linux distributions VTune(TM) Performance Analyzer 1.1 for Linux http://intel.m0.net/m/s.asp?HB8872253498X2397262X183625X EVALUATION AND PURCHASE Please visit us at http://www.intel.com/software/products to learn more about evaluation and purchase of the VTune Performance Analyzer 1.1 for Linux*. SUPPORT Every purchase of an Intel Software Development Product includes one year of support services, which provides access to Intel Premier Support and all product updates during that time. Premier Support includes online access to technical and application notes and documentation. INTEL SOFTWARE COLLEGE Also check out the Intel Software College course selections for application developers. Intel Software College offers high-quality training worldwide on Intel processors, platforms, tools and technologies. http://intel.m0.net/m/s.asp?HB8872253498X2397263X183625X Regards, Intel VTune(TM) Performance Analyzer Product Team [/quote] |
Re: interesting tools and compilers (for P4, Athlon, Opteron
Hello,
[quote="Xyzzy"][quote="gbvalor"]Nice review!. I'm still waiting for something like CodeAnalyst ... but for linux :-([/quote] Is VTune the same thing? [/quote] Is [b]almost[/b] the same thing. The free evaluation is only valid for 7 days. When one (as me no too much skilled) has learnt the use of Vtune the license has expired :-( . OTOH, I can't spend 699$ in a Vtune License. This is why I'm anxious to see something free for non commercial pruposes (as the Intel compilers). AMD people!, Linux has been the OS with better and faster support for your new processors. It's time to offer to the developers performance tuners and compilers for your hardware. ;) isn't it? Guillermo. |
Re: interesting tools and compilers (for P4, Athlon, Opteron
[quote="gbvalor"]Hello,
Nice review!. I'm still waiting for something like CodeAnalyst ... but for linux :-( [/quote] Have a look at the also mentioned Oprofile for Linux, which has counter based profiling down to instruction level both for Intel/AMD. If there is some pipeline analysis necessary one could put some hotspot functions into a small cygwin/mingw app using the same compiler (GCC/ICC) and analyse it under windows. There is even no program database or debug info needed for CodeAnalyst 2.1 but it's good to know the start adress for a trace. I used CodeAnalysts graphical interface to find a suitable start adress when going down to disassembly level - that's possible without sourcecode/program database - but then you don't see associated sourcecode lines or function names. While I profiled my game of life here, I also profiled Prime95, Win2k kernel and other running code where you also can have a look at in disassembled view. Pipeline analysis of this game of life also shows, that it runs ~40% faster on Opteron compared clock per clock to XP. Regards, DDB |
A short info about CodeAnalyst 2.1b:
As I found out the command line tools for pipeline analyzing produce a binary file, which could be used for creating a graphical visualisation like in CodeAnalyst 1.2. Currently it's only used to produce simulation reports. DDB |
Re: interesting tools and compilers (for P4, Athlon, Opteron
[quote="Dresdenboy"]
Have a look at the also mentioned Oprofile for Linux, which has counter based profiling down to instruction level both for Intel/AMD. [/quote] Oprofile looks very interesting but I need to compile a new kernel. So I must do things carefully. Thanks, Dresdenboy. Your suggestions have been useful to me. If you like, you can get and test Glucas code at: http://sourceforge.net/projects/glucas There is still no release for SSE2 code, this is currently partially implemented and you can download it from CVS repository. Guillermo. |
| All times are UTC. The time now is 06:57. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.