mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Programming

Reply
 
Thread Tools
Old 2012-07-03, 20:10   #34
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3×52×149 Posts
Default

[reviving this too-long-dormant thread]

Quote:
Originally Posted by __HRB__ View Post
Synthesizing AND-NOT requires two instructions, so PANDN can be twice as fast. If you need NOT, you can do that with XOR in one instruction, using a memory operand and a location filled with FFFF, if you're experiencing register pressure.
OK, but why does the ISA give us not one but *two* separate instructions - PXOR and XORPD - to do exactly the same thing (a whole-xmm-register bitwise XOR), but neither a logical (1s-comp) nor arithmetic (2s-comp) NOT of any kind?
ewmayer is online now   Reply With Quote
Old 2012-07-03, 20:35   #35
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

62568 Posts
Default

Quote:
Originally Posted by ewmayer View Post
[reviving this too-long-dormant thread]



OK, but why does the ISA give us not one but *two* separate instructions - PXOR and XORPD - to do exactly the same thing (a whole-xmm-register bitwise XOR), but neither a logical (1s-comp) nor arithmetic (2s-comp) NOT of any kind?
Hear-hear! I got a few more gray hairs over lack of NOT a while ago. WTF.
bsquared is offline   Reply With Quote
Old 2012-07-03, 20:53   #36
Brian Gladman
 
Brian Gladman's Avatar
 
May 2008
Worcester, United Kingdom

523 Posts
Default

Quote:
Originally Posted by bsquared View Post
Hear-hear! I got a few more gray hairs over lack of NOT a while ago. WTF.
It is so sad that it was the x86 architecture and not the 68000 architecture that captured the market. The 68000 architecture was designed whereas the x86 architecture has been an unholy mess for as long as it has existed.

Repeated cycles of messy design decisions bolted on earlier mess has now produced a need for backwards compatibility that is so strong that all attempts to produce something better have inevitably failed in mass market terms.

It seems we will never escape this abomination :-(
Brian Gladman is offline   Reply With Quote
Old 2012-07-04, 01:14   #37
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

13×419 Posts
Default

Quote:
Originally Posted by Brian Gladman View Post
It seems we will never escape this abomination :-(
If and when the tablets and smart phones finally push out the desktop and laptop markets then we will all be using ARM. Let's just hope the MSs latest x86 "surface" fails and sanity prevails with ARM taking over.
retina is offline   Reply With Quote
Old 2012-07-04, 02:07   #38
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3×52×149 Posts
Default

Quote:
Originally Posted by Brian Gladman View Post
It seems we will never escape this abomination :-(
Is "abomination domination" a marketing buzz-phrase?

It should be.
ewmayer is online now   Reply With Quote
Old 2012-07-07, 02:15   #39
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

101011101001112 Posts
Default

Closely related to Useless SSE Instructions is the category "SSE Instructions Which Would Be Useful But Which Are Inexplicably Absent". Onesuch which has caused me annoyance this day is the lack of anh SSE Instruction to perform floating-double <--> 64-bit integer conversions. I have an application whose outputs are packed-double (xmm-register) representations of 50-bit nonnegative ints, which I need to convert to integer form for further manipulation as 64-bit ints.

I am thinking of emulating the missing conversion by taking advantage of the 50-bit normalization, like so:

0. Star with the outputs in packed-double form;

1. Add (packed double)250 to each to yield identical exponent fields, effectively "aligning the hidden bits", which allows us to use a constant set of mask and shift parameters in the ensuing step;

2. Now treating the operands as packed 64-bit ints, do some simple integer-mask magic to mask off the IEEE-double exponent bits and right-justify the mantissas. This would also wipe away the extra power 250 we added in step [1].

Does anyone know if the operand/operation-type-mixing which occurs in step [2] will impose a significant cycle penalty? Any other ideas - not necessarily SSE-based - for efficiently doing the above type conversions are also welcome.
ewmayer is online now   Reply With Quote
Old 2012-07-07, 03:27   #40
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

152738 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Does anyone know if the operand/operation-type-mixing which occurs in step [2] will impose a significant cycle penalty? Any other ideas - not necessarily SSE-based - for efficiently doing the above type conversions are also welcome.
IIRC, most Intel CPUs have a one clock penalty for moving an operand from the FPU to the integer units.
Prime95 is offline   Reply With Quote
Old 2012-07-07, 04:35   #41
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

13×419 Posts
Default

Quote:
Originally Posted by ewmayer View Post
1. Add (packed double)250 to each to yield identical exponent fields, effectively "aligning the hidden bits", which allows us to use a constant set of mask and shift parameters in the ensuing step;
If you add 252 instead then you only need the mask and can eliminate the shift.
retina is offline   Reply With Quote
Old 2012-07-07, 17:43   #42
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3×52×149 Posts
Default

Quote:
Originally Posted by Prime95 View Post
IIRC, most Intel CPUs have a one clock penalty for moving an operand from the FPU to the integer units.
That would be of the very-acceptable variety.

Quote:
Originally Posted by retina View Post
If you add 252 instead then you only need the mask and can eliminate the shift.
Yes, the same thought occurred to me in the shower this morning - so just a single masking operation will suffice.

Time to code it and time it...
ewmayer is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Posts that seem less than useless, or something like that jasong Forum Feedback 1050 2019-04-29 00:50
Fedora gedit for bash has become useless EdH Linux 11 2016-05-13 15:36
Useless DC assignment lycorn PrimeNet 16 2009-09-08 18:16
Useless p-1 work jocelynl Data 4 2004-11-28 13:28

All times are UTC. The time now is 23:11.

Thu Jun 4 23:11:26 UTC 2020 up 71 days, 20:44, 0 users, load averages: 1.42, 1.28, 1.28

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.