mersenneforum.org Another challenge
 Register FAQ Search Today's Posts Mark Forums Read

 2005-07-25, 17:10 #1 R.D. Silverman     "Bob Silverman" Nov 2003 North of Boston 23×937 Posts Another challenge The following macros are (on the Pentium) 9 times faster than calling the library ceil() and floor() functions. a is a double. Note also that |a| < 16K. #define iceil(a) ((a) <= 0.0 ? (int)(a) : (int)(a) + 1) #define ifloor(a) ((a) >= 0.0 ? (int)(a) : (int)(a) - 1) Can anyone do better, by perhaps eliminating the branches? Note that these are very slightly wrong. If a > 0 and a is an exact integer, iceil returns a+1, instead of a. But this is OK, as long as the macro is fast.
 2005-07-25, 17:40 #2 akruppa     "Nancy" Aug 2002 Alexandria 46438 Posts The fastest way should be using the fistp opcode which uses the currently set rounding direction, which is round-to-nearest by default. Changing rounding modes in the fpu causes the pipeline to be flushed which imposes a huge penalty. This occurs for *each* type-cast to int as those are defined as truncate (round to zero)! Try static inline int dtoi (double d) { int i; __asm__ ("fistpl %0" : "=m" (i) : "t" (d) : "st"); return i; } which rounds to nearest. Adding/subtracting 0.5 should be quite fast so you'll get the rounding direction you want. You should also be able to use fesetround() (C99) to change the default rounding mode so you don't have to add a constant for your preferred rounding direction. Alex Last fiddled with by akruppa on 2005-07-25 at 17:44
2005-07-25, 17:58   #3
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

23·937 Posts

Quote:
 Originally Posted by akruppa The fastest way should be using the fistp opcode which uses the currently set rounding direction, which is round-to-nearest by default. Changing rounding modes in the fpu causes the pipeline to be flushed which imposes a huge penalty. This occurs for *each* type-cast to int as those are defined as truncate (round to zero)! Try static inline int dtoi (double d) { int i; __asm__ ("fistpl %0" : "=m" (i) : "t" (d) : "st"); return i; } which rounds to nearest. Adding/subtracting 0.5 should be quite fast so you'll get the rounding direction you want. You should also be able to use fesetround() (C99) to change the default rounding mode so you don't have to add a constant for your preferred rounding direction. Alex
I presume that %0 is the first arg in the subroutine argument list.
But I am not familiar with this _asm syntax. What are the items in
double quotes? And the colons?

I will try this using VC++ _asm

2005-07-25, 18:15   #4
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

23×937 Posts

Quote:
 Originally Posted by akruppa The fastest way should be using the fistp opcode which uses the currently set rounding direction, which is round-to-nearest by default. Changing rounding modes in the fpu causes the pipeline to be flushed which imposes a huge penalty. This occurs for *each* type-cast to int as those are defined as truncate (round to zero)! Try static inline int dtoi (double d) { int i; __asm__ ("fistpl %0" : "=m" (i) : "t" (d) : "st"); return i; } which rounds to nearest. Adding/subtracting 0.5 should be quite fast so you'll get the rounding direction you want. You should also be able to use fesetround() (C99) to change the default rounding mode so you don't have to add a constant for your preferred rounding direction. Alex

For VC++ I need to move d into a floating point register. Do you know
which register fistl acts upon? I guess I'll have to check my P IV
instruction set documentation.

BTW, this isn't very portable. But you know that.

 2005-07-25, 18:30 #5 akruppa     "Nancy" Aug 2002 Alexandria 2,467 Posts >Do you know which register fistl acts upon? It pops a value from the fpu stack. My inline function is inteded for gcc, I don't know if VC will like it. I have no way of testing, though. >BTW, this isn't very portable. But you know that. True, it isn't. I've tinkered with this problem for the interface between GMP-ECM and GWNUM a little, this was the fastest I solution found. There is an lrint() function defined in C99 which converts a double to an int, using the currently set rounding direction. When setting the proper #defines, GNU's math.h defines lrint() as a macro that uses fistp again. Without the proper #defines, it uses a library call - faster than the typecast with rounding-mode-change, but of course much slower than the fistp opcode. In the end, I just used my own fistp wrapper directly instead of fiddling with headers - the GWNUM interface code will only run on x86 so portability is not an issue for me. If it is for you, it may be worthwhile to check what it takes to make lrint() use the fistp opcode. Alex
 2005-07-25, 18:37 #6 akruppa     "Nancy" Aug 2002 Alexandria 2,467 Posts >I presume that %0 is the first arg in the subroutine argument list. >But I am not familiar with this _asm syntax. What are the items in >double quotes? And the colons? The "%0" refers to the first operand of the asm block, which is "=m" (i). This means output "=" to memory location "m" of variable i. The fistp opcode needs to store to memory, it cannot write to registers. The "t" (d) tells gcc that the asm block expects the value of variable d in the top "t" of the fpu stack. The "st" tells that the top value of the fpu stack will be popped. The colons separate the output : input : flags sections of the operands. There is also a fist opcode that does not pop the value off the stack. Might be useful if you want to immediately reuse that value. Alex Last fiddled with by akruppa on 2005-07-25 at 18:38
2005-07-25, 19:41   #7
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

23×937 Posts

Quote:
 Originally Posted by akruppa >I presume that %0 is the first arg in the subroutine argument list. >But I am not familiar with this _asm syntax. What are the items in >double quotes? And the colons? The "%0" refers to the first operand of the asm block, which is "=m" (i). This means output "=" to memory location "m" of variable i. The fistp opcode needs to store to memory, it cannot write to registers. The "t" (d) tells gcc that the asm block expects the value of variable d in the top "t" of the fpu stack. The "st" tells that the top value of the fpu stack will be popped. The colons separate the output : input : flags sections of the operands. There is also a fist opcode that does not pop the value off the stack. Might be useful if you want to immediately reuse that value. Alex

I have tried the following:

__inline iceil(a,d)
double a;
int d[];

{
_asm
{
mov edi, d
mov ST0, a
FIST [edi]
}
}

FIST supposedly only outputs its result to memory. [edi] is the
address pointed to by register edi. But the compiler gives "invalid argument"
Nor does it like an explicit specification that EDI is a DWORD PTR

FIST DWORD PTR[edi] does not work.

note that mov eax, DWORD PTR[edi] is perfectly valid.

How does one get FIST to return its output as the function argument?
eax is the register used by VC++ for integer returns, but

FIST eax is invalid since eax is a register, not an address.

 2005-07-25, 19:56 #8 akruppa     "Nancy" Aug 2002 Alexandria 9A316 Posts It probably complains about the "mov ST0,a". You load values onto the fpu stack with FLD, not MOV. You could try a "FLD x", where x must be a memory reference again, i.e. mov edi, x FLD [edi] mov edi, d FIST [edi] where x is a double *. It would be much nicer to tell the compiler that FIST (or FISTP) expect a double value on the fpu stack and an address to an integer, and which variables these operands should correspond to (like the GCC asm statement does). This will let the compiler merge the FIST[P] opcode into the surrounding code much better, i.e. it can decide for itself which register to use for the addressing, when to load the double on the stack, etc. But someone else will have to help with that, I have never used VC. Alex Edit: by far the cleanest and probably easiest solution if you want portability across different cpu types and compilers will be using lrint(). Last fiddled with by akruppa on 2005-07-25 at 19:59
2005-07-25, 20:09   #9
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

23×937 Posts

Quote:
 Originally Posted by akruppa It probably complains about the "mov ST0,a". You load values onto the fpu stack with FLD, not MOV. You could try a "FLD x", where x must be a memory reference again, i.e. mov edi, x FLD [edi] mov edi, d FIST [edi] where x is a double *. It would be much nicer to tell the compiler that FIST (or FISTP) expect a double value on the fpu stack and an address to an integer, and which variables these operands should correspond to (like the GCC asm statement does). This will let the compiler merge the FIST[P] opcode into the surrounding code much better, i.e. it can decide for itself which register to use for the addressing, when to load the double on the stack, etc. But someone else will have to help with that, I have never used VC. Alex Edit: by far the cleanest and probably easiest solution if you want portability across different cpu types and compilers will be using lrint().

Actually you were right. I was getting a report of "invalid argument".
When I clicked on the report it is *supposed* to point to the offending
code. It *was* pointing at the FISTP line, but the error was on the
mov ST0, a line.

I changed it to

mov edi, d
FLD a
FISTP [edi]

and now it seems fine.

 2005-07-25, 20:48 #10 dsouza123     Sep 2002 2·331 Posts Do you need to pass an integer array pointer to iceil ? What about only send a double and get back an int ? __inline int iceil(double a) { int d; const double h = 0.5000001f; _asm { finit fld a fadd h fist d }; return d; } There may an issue with rounding modes. Check if int is OK versus long int.
2005-07-26, 11:55   #11
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

23×937 Posts

Quote:
 Originally Posted by dsouza123 Do you need to pass an integer array pointer to iceil ? What about only send a double and get back an int ? __inline int iceil(double a) { int d; const double h = 0.5000001f; _asm { finit fld a fadd h fist d }; return d; } There may an issue with rounding modes. Check if int is OK versus long int.
No, I clearly don't need to pass a pointer. But your suggestion creates
two temporaries on the stack because the float instructions need memory
addresses as destinations. I am not sure if yours is slower or faster than mine
Also, the fadd is not needed in the routine. If the routine only does 'fist'
You can call it with (a + .5) to get "ceil" or (a-.5) to get "floor".

Is the finit instruction really needed??

 Similar Threads Thread Thread Starter Forum Replies Last Post devarajkandadai Miscellaneous Math 0 2012-05-31 05:17 petrw1 Lounge 14 2009-11-23 02:18 science_man_88 Miscellaneous Math 229 2009-09-07 08:08 ValerieVonck Factoring 58 2005-10-24 15:54 JuanTutors PrimeNet 2 2004-07-22 12:56

All times are UTC. The time now is 01:36.

Sun Dec 4 01:36:00 UTC 2022 up 107 days, 23:04, 0 users, load averages: 0.98, 0.92, 0.98