![]() |
![]() |
#1 |
Mar 2003
34 Posts |
![]()
I have downloaded Intel Manual - Pentium Instruction Set vol.2A-2B,
files 25366613.pdf and 25366713.pdf How can I find info about number of ticks (or tacts) for every ASM instruction? One my acquaintance said that ADC takes 14(!!!) ticks on P4, but ADD - only 1 tick. I cannot believe that! |
![]() |
![]() |
#2 |
22·32·43 Posts |
![]()
try download this :
http://www.agner.org/assem/pentopt.pdf |
![]() |
#3 |
Apr 2003
Berlin, Germany
1011010012 Posts |
![]() |
![]() |
![]() |
#4 |
33610 Posts |
![]()
The last Intel processor for which you could add up per-instruction times to get a reasonably accurate execution time was the 486.
Since the Pentium, it's a lot more complicated, and with the P4, it's unbelievably complicated. The data dependencies between instructions are far more important than the instructions themselves. With pipleined superscalar processors, there can be over a dozen instructions being executed at the same time, and over a hundred instructions in various stages of execution. The difference between the latency (time from start to finish of a single instruction) and throughput (total time for a lot of instructions divided by the number of instructions) is enormous. Imagine a juggler trying to throw three balls a second when the balls spend at least 20 seconds in the air each and some stick to the ceiling for much longer. There are a lot of balls in the air. They're not black magic, but it's at least several chapters of a book to explain if you don't have the background. (Search at amazon.com for "superscalar" for a few useful books.) As for ADC taking 14 cycles on a P4, I believe it. Intel's processors tend to do common things really fast but slow down horribly on some uncommon things. (Remember the original Pentium Pro and segment register manipulation in large model 16-bit code?) The costs are exacerbated by the extremely deep P4 pipeline, and ADC is uncommon. This is one reason why some people prefer AMD processors; they don't have as many nasty corner cases. Intel can get away with making people recompile their applications. |
![]() |
#5 |
Apr 2003
Berlin, Germany
192 Posts |
![]()
The manual states a latency of 8 for ADC both for Northwood and Prescott. However, handling of flags causes some slowdown on P4.
|
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
gcc optimization/intrinsics | R.D. Silverman | Factoring | 12 | 2015-09-15 08:51 |
Program optimization | henryzz | Programming | 17 | 2012-09-26 13:21 |
Possible optimization for GIMPS | alpertron | Math | 3 | 2012-08-13 16:08 |
Size optimization | Sleepy | Msieve | 14 | 2011-10-20 10:27 |
NFS Optimization | R.D. Silverman | Factoring | 91 | 2010-01-24 20:48 |