mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2003-07-21, 06:50   #34
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

16916 Posts
Default

Quote:
Originally Posted by ColdFury
I really wish AMD moved to a 3 operand format with x86-64, but I guess that would have required too much redesign of the decoders.
You know it was planned (maybe with even more regs than 16 XMM regs) and I don't think it would have been more complex to decode than SSE2. Maybe the Athlon/Opteron RISC cores already use such instruction format for their 88 register file. That would have made it very easy to decode. But it's better to make available applications faster (with optimized SSE2 code and poor x87 code) than requiring a recompile and 64bit mode to run.

Quote:
Originally Posted by S3SJK
Out of interest what performance advantages would that give?
We want to calculate a=x²+4x
Just look at this pseudo code (for a risc like architectures like the execution cores of x86 CPUs):
[code:1]// format is instruction dest, source
fpload r0, [x]
fpload r1, [const_4]
fpmov r2, r0 ;we need it later again
fpmul r0, r0 ; x²
fpmul r2, r1 ; 4x
fpadd r0, r2 ; x²+4x
fpstore [a], r0
[/code:1]

With 3 operands it would maybe look like this:
[code:1]// format is instruction dest, source1, source2
fpload r0, [x]
fpload r1, [const_4]
fpmul r2, r0, r0 ; x²
fpmul r1, r0, r1 ; 4x
fpadd r3, r1, r2 ; x²+4x
fpstore [a], r3
[/code:1]

We saved one instruction in this simple calculation. But if you look at complex SSE2 or also x87 code you'll see a lot of shuffling, moving and saving registers (that they don't get destroyed all the time). While x86 CPUs have to move and save, the other CPUs (Alpha, Power, even G5) continue to calculate.

DDB
Dresdenboy is offline   Reply With Quote
Old 2003-07-22, 03:39   #35
ebx
 
ebx's Avatar
 
Aug 2002

101 Posts
Default

I got a better compiler for a=x²+4x:

// format is instruction dest, source
fpload r0, [x]
fpmov r1, r0 ;we need it later again
fpadd r0, [const_4] ;x+4
fpmul r1, r0 ; x²+4x
fpstore [a], r1

Moving data between regs is the fastest instruction. Cant compare to fpmul. Load/Store memory is more than one instruction usually. That further brings down the weight of fpmov.

2 operand vs 3 operand is a long debate. There isnt any clear winner.
ebx is offline   Reply With Quote
Old 2003-07-22, 06:02   #36
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

192 Posts
Default

Quote:
Originally Posted by ebx
I got a better compiler for a=x²+4x:

// format is instruction dest, source
fpload r0, [x]
fpmov r1, r0 ;we need it later again
fpadd r0, [const_4] ;x+4
fpmul r1, r0 ; x²+4x
fpstore [a], r1

Moving data between regs is the fastest instruction. Cant compare to fpmul. Load/Store memory is more than one instruction usually. That further brings down the weight of fpmov.

2 operand vs 3 operand is a long debate. There isnt any clear winner.
You are right. I didn't optimize my code, just wanted to show the difference. And because I was thinking about a RISC architecture by creating this example, I also didn't count on memory operands for fpadd/fpmul.

At least by using a simple adressing mode the load/store can be handled easily by the hardware.

IMO the advantage of 3 operand instructions is, that you may use a different destination register or just one of the sources - what fits best for the algorithm, while with 2 operands you are always required to overwrite one of the sources. And the disadvantage is, that the opcode needs additional bits for adressing the third register.

DDB
Dresdenboy is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Impact of AI xilman Lounge 19 2017-01-26 16:03
GPUs impact on TF petrw1 GPU Computing 0 2013-01-06 03:23
GPU TF work and its impact on P-1 davieddy Lounge 161 2011-08-09 10:27
Another Impact on Jupiter Spherical Cow Astronomy 24 2009-08-12 19:32
P4 Prescott - 31 Stage Pipeline ? Bad news for Prime95? Angular Hardware 18 2004-11-15 07:04

All times are UTC. The time now is 16:10.


Fri Jul 7 16:10:43 UTC 2023 up 323 days, 13:39, 0 users, load averages: 1.43, 1.37, 1.22

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔