mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Programming

Reply
 
Thread Tools
Old 2009-02-25, 16:51   #23
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

3×2,473 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Note I said "runtime", not "performance". My reaction was something along the lines of "You know, if I needed a way to get my CPU to run cooler, I'd just switch my system power options to max-battery-life mode or fill my assembly code with no-ops."

There is famous quote from Seymour Cray: What do we need software for?
It just slows the machine down.......
R.D. Silverman is offline   Reply With Quote
Old 2009-02-26, 21:33   #24
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

100100001012 Posts
Default

My whinge: SSE has AND and AND-NOT, but no NOT. So I synthesize NOT from AND, AND-NOT, and say PCMPEQD, which uses an extra scratch register. Why not have AND and NOT and let the programmer synthesize AND-NOT, no scratch register required? I suppose there must be a reason.
geoff is offline   Reply With Quote
Old 2009-02-26, 22:09   #25
__HRB__
 
__HRB__'s Avatar
 
Dec 2008
Boycotting the Soapbox

24×32×5 Posts
Default

Quote:
Originally Posted by geoff View Post
My whinge: SSE has AND and AND-NOT, but no NOT. So I synthesize NOT from AND, AND-NOT, and say PCMPEQD, which uses an extra scratch register. Why not have AND and NOT and let the programmer synthesize AND-NOT, no scratch register required? I suppose there must be a reason.
Synthesizing AND-NOT requires two instructions, so PANDN can be twice as fast. If you need NOT, you can do that with XOR in one instruction, using a memory operand and a location filled with FFFF, if you're experiencing register pressure.
__HRB__ is offline   Reply With Quote
Old 2009-04-22, 04:47   #26
__HRB__
 
__HRB__'s Avatar
 
Dec 2008
Boycotting the Soapbox

24·32·5 Posts
Default rcpps, but no rcppd! WTF? (nt)

no text
__HRB__ is offline   Reply With Quote
Old 2009-04-22, 05:38   #27
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

13×419 Posts
Default

Quote:
Originally Posted by __HRB__ View Post
rcpps, but no rcppd! WTF? (nt)
Since the result is only 12bit there seems little sense in expanding it to a 53bit mantissa. Do four conversions in one cycle and go from there to whatever final precision is needed.
retina is online now   Reply With Quote
Old 2009-04-22, 16:47   #28
__HRB__
 
__HRB__'s Avatar
 
Dec 2008
Boycotting the Soapbox

24×32×5 Posts
Default divps, divpd

I should have payed attention to the thread title. What I meant was that divps & divpd are superfluous, since rcpps/rcppd & newton-raphson are faster and can be pipelined.

Quote:
Originally Posted by retina View Post
Since the result is only 12bit there seems little sense in expanding it to a 53bit mantissa. Do four conversions in one cycle and go from there to whatever final precision is needed.
The issue is that the missing rcppd forces one to use two extra instructions - convert doubles to floats and floats to doubles - blocking the execution ports for 2 cycles and adding 6-8 cycles in latency.
__HRB__ is offline   Reply With Quote
Old 2012-03-28, 17:58   #29
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

2·3·541 Posts
Default pcmpgtw

Ok, so pcmpgtw isn't exactly useless, but I'm really quite upset right now over the fact that there is no unsigned equivalent.
bsquared is offline   Reply With Quote
Old 2012-03-28, 19:45   #30
axn
 
axn's Avatar
 
Jun 2003

107618 Posts
Default

Quote:
Originally Posted by bsquared View Post
Ok, so pcmpgtw isn't exactly useless, but I'm really quite upset right now over the fact that there is no unsigned equivalent.
PSUBUSW should get you to almost all the way.
axn is offline   Reply With Quote
Old 2012-03-28, 20:34   #31
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

1100101011102 Posts
Default

Quote:
Originally Posted by axn View Post
PSUBUSW should get you to almost all the way.
Yeah, cool!

This will do the job:
Code:

"pxor %%xmm0, %%xmm0 \n\t"/* xmm0 := 0 */
"psubusw %%xmm1, %%xmm2 \n\t"/* xmm2 := b - a */
"pcmpeqw %%xmm0, %%xmm2 \n\t"/* xmm2 := a >= b ? 1 : 0 */
The extra dependency costs a cycle of latency, a "0" register must be set up (which can be reused for additional tests), and it's actually a ">=" test, but it's still a decent workaround.

In the spirit of this thread, though, it still sucks that this is necessary...

Last fiddled with by bsquared on 2012-03-28 at 20:35
bsquared is offline   Reply With Quote
Old 2012-03-28, 23:05   #32
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(3,3^1118781+1)/3

2·3·5·7·43 Posts
Default

"The only thing in the house that didn't suck was the vacuum cleaner."
Batalov is offline   Reply With Quote
Old 2012-03-29, 06:50   #33
davieddy
 
davieddy's Avatar
 
"Lucan"
Dec 2006
England

647410 Posts
Default THX

Quote:
Originally Posted by Batalov View Post
"The only thing in the house that didn't suck was the vacuum cleaner."
I don't laugh often enough these days.

Sounds like Raymond Chandler or similar.

David
davieddy is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Posts that seem less than useless, or something like that jasong Forum Feedback 1050 2019-04-29 00:50
Fedora gedit for bash has become useless EdH Linux 11 2016-05-13 15:36
Useless DC assignment lycorn PrimeNet 16 2009-09-08 18:16
Useless p-1 work jocelynl Data 4 2004-11-28 13:28

All times are UTC. The time now is 00:14.

Fri Jun 5 00:14:08 UTC 2020 up 71 days, 21:47, 0 users, load averages: 0.91, 1.11, 1.15

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.