mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Closed Thread
 
Thread Tools
Old 2004-05-09, 06:01   #1
Cyclamen Persicum
 
Cyclamen Persicum's Avatar
 
Mar 2003

34 Posts
Default ASM Optimization

I have downloaded Intel Manual - Pentium Instruction Set vol.2A-2B,
files 25366613.pdf and 25366713.pdf

How can I find info about number of ticks (or tacts) for every ASM instruction?
One my acquaintance said that ADC takes 14(!!!) ticks on P4, but ADD - only 1 tick. I cannot believe that!
Cyclamen Persicum is offline  
Old 2004-05-09, 10:06   #2
shuricus
 

22·32·43 Posts
Default

try download this :
http://www.agner.org/assem/pentopt.pdf
 
Old 2004-05-14, 12:18   #3
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

1011010012 Posts
Default

or this: http://www.intel.com/design/pentium4/manuals/248966.htm
Dresdenboy is offline  
Old 2004-05-26, 01:47   #4
Unregistered
 

33610 Posts
Default

The last Intel processor for which you could add up per-instruction times to get a reasonably accurate execution time was the 486.

Since the Pentium, it's a lot more complicated, and with the P4, it's unbelievably complicated. The data dependencies between instructions are far more important
than the instructions themselves.

With pipleined superscalar processors, there can be over a dozen instructions being executed at the same time, and over a hundred instructions in various stages of execution. The difference between the latency (time from start to finish of a single instruction) and throughput (total time for a lot of instructions divided by the number of instructions) is enormous.

Imagine a juggler trying to throw three balls a second when the balls spend at least 20 seconds in the air each and some stick to the ceiling for much longer. There are a lot of balls in the air.

They're not black magic, but it's at least several chapters of a book to explain if
you don't have the background. (Search at amazon.com for "superscalar" for a few useful books.)

As for ADC taking 14 cycles on a P4, I believe it. Intel's processors tend to do common things really fast but slow down horribly on some uncommon things. (Remember the original Pentium Pro and segment register manipulation in large model 16-bit code?) The costs are exacerbated by the extremely deep P4 pipeline, and ADC is uncommon. This is one reason why some people prefer AMD processors; they don't have as many nasty corner cases. Intel can get away with making people recompile their applications.
 
Old 2004-05-26, 07:51   #5
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

192 Posts
Default

The manual states a latency of 8 for ADC both for Northwood and Prescott. However, handling of flags causes some slowdown on P4.
Dresdenboy is offline  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gcc optimization/intrinsics R.D. Silverman Factoring 12 2015-09-15 08:51
Program optimization henryzz Programming 17 2012-09-26 13:21
Possible optimization for GIMPS alpertron Math 3 2012-08-13 16:08
Size optimization Sleepy Msieve 14 2011-10-20 10:27
NFS Optimization R.D. Silverman Factoring 91 2010-01-24 20:48

All times are UTC. The time now is 03:01.


Wed Jun 29 03:01:41 UTC 2022 up 76 days, 1:03, 1 user, load averages: 1.60, 1.57, 1.47

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔