View Single Post
Old 2007-07-14, 10:03   #7
diep
 
diep's Avatar
 
Sep 2006
The Netherlands

30616 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Depends very much on the application, and how much time/skill you possess.

To use some Mersenne-realted examples: the big-FFT code used in LL testing benefits far more from careful high-level coding and data movement considerations than from ASM. (Note that ASM gave much more bang on the x86 in the early days of the project, when there were no really good HLL compilers for the x86). I can get about 2/3 the performance of George's hand-tuned Prime95 code using a simple MSVC build of my C code, and the nice thing is, the C code is portable t other platforms.

Now, for smaller snippets of code, however, a whiff of ASM can often get you a lot of bang for your buck.

The general rule is: if you have a relatively small amount of code which dominates your compute time (e.g. a critical inner-loop section or macro), then ASM is worth playing with.
In integer code the difference is much bigger, as the core2 nowadays can do 4 ops a cycle. That difference you reported now between your code and George's, gets influenced by 2 phenomena's.

a) entire compiler teams who get the order to do ANYTHING they can to get THAT code as fast as possible in assembly (and even then losing a lot of speed). So very professional guys who optimize code that you and i write over the weekend (as that's when i have time for stuff like this if any).

This point A you really should take into account bigtime. It is the biggest influence.

b) right now the processors are relative bad in floating point if you look simply to the number of instructions a cycle they can execute (which of course are vectors, making it blazingly faster than single instructions in integer). This will change of course. Not long from now processors will be equal speed for floating point code as well as integer code. That means that there is more to gain there then too with assembly.

For my FFT in 64 bits integers, gcc produces code 50% slower than msvc, and neither the gcc team nor msvc has the code at this moment (though i'll open source it most likely when it's a tad better). I have only os/x 64 bits at core2, not windows. So diff there i don't know, probably more than 50%.

Vincent
diep is offline   Reply With Quote