![]() |
Another challenge
The following macros are (on the Pentium) 9 times faster than
calling the library ceil() and floor() functions. a is a double. Note also that |a| < 16K. #define iceil(a) ((a) <= 0.0 ? (int)(a) : (int)(a) + 1) #define ifloor(a) ((a) >= 0.0 ? (int)(a) : (int)(a) - 1) Can anyone do better, by perhaps eliminating the branches? Note that these are very slightly wrong. If a > 0 and a is an exact integer, iceil returns a+1, instead of a. But this is OK, as long as the macro is fast. :bounce: :bounce: |
The fastest way should be using the fistp opcode which uses the currently set rounding direction, which is round-to-nearest by default. Changing rounding modes in the fpu causes the pipeline to be flushed which imposes a huge penalty. This occurs for *each* type-cast to int as those are defined as truncate (round to zero)!
Try static inline int dtoi (double d) { int i; __asm__ ("fistpl %0" : "=m" (i) : "t" (d) : "st"); return i; } which rounds to nearest. Adding/subtracting 0.5 should be quite fast so you'll get the rounding direction you want. You should also be able to use fesetround() (C99) to change the default rounding mode so you don't have to add a constant for your preferred rounding direction. Alex |
[QUOTE=akruppa]The fastest way should be using the fistp opcode which uses the currently set rounding direction, which is round-to-nearest by default. Changing rounding modes in the fpu causes the pipeline to be flushed which imposes a huge penalty. This occurs for *each* type-cast to int as those are defined as truncate (round to zero)!
Try static inline int dtoi (double d) { int i; __asm__ ("fistpl %0" : "=m" (i) : "t" (d) : "st"); return i; } which rounds to nearest. Adding/subtracting 0.5 should be quite fast so you'll get the rounding direction you want. You should also be able to use fesetround() (C99) to change the default rounding mode so you don't have to add a constant for your preferred rounding direction. Alex[/QUOTE] I presume that %0 is the first arg in the subroutine argument list. But I am not familiar with this _asm syntax. What are the items in double quotes? And the colons? I will try this using VC++ _asm :bounce: |
[QUOTE=akruppa]The fastest way should be using the fistp opcode which uses the currently set rounding direction, which is round-to-nearest by default. Changing rounding modes in the fpu causes the pipeline to be flushed which imposes a huge penalty. This occurs for *each* type-cast to int as those are defined as truncate (round to zero)!
Try static inline int dtoi (double d) { int i; __asm__ ("fistpl %0" : "=m" (i) : "t" (d) : "st"); return i; } which rounds to nearest. Adding/subtracting 0.5 should be quite fast so you'll get the rounding direction you want. You should also be able to use fesetround() (C99) to change the default rounding mode so you don't have to add a constant for your preferred rounding direction. Alex[/QUOTE] For VC++ I need to move d into a floating point register. Do you know which register fistl acts upon? I guess I'll have to check my P IV instruction set documentation. BTW, this isn't very portable. But you know that. |
>Do you know which register fistl acts upon?
It pops a value from the fpu stack. My inline function is inteded for gcc, I don't know if VC will like it. I have no way of testing, though. >BTW, this isn't very portable. But you know that. True, it isn't. I've tinkered with this problem for the interface between GMP-ECM and GWNUM a little, this was the fastest I solution found. There is an lrint() function defined in C99 which converts a double to an int, using the currently set rounding direction. When setting the proper #defines, GNU's math.h defines lrint() as a macro that uses fistp again. Without the proper #defines, it uses a library call - faster than the typecast with rounding-mode-change, but of course much slower than the fistp opcode. In the end, I just used my own fistp wrapper directly instead of fiddling with headers - the GWNUM interface code will only run on x86 so portability is not an issue for me. If it is for you, it may be worthwhile to check what it takes to make lrint() use the fistp opcode. Alex |
>I presume that %0 is the first arg in the subroutine argument list.
>But I am not familiar with this _asm syntax. What are the items in >double quotes? And the colons? The "%0" refers to the first operand of the asm block, which is "=m" (i). This means output "=" to memory location "m" of variable i. The fistp opcode needs to store to memory, it cannot write to registers. The "t" (d) tells gcc that the asm block expects the value of variable d in the top "t" of the fpu stack. The "st" tells that the top value of the fpu stack will be popped. The colons separate the output : input : flags sections of the operands. There is also a fist opcode that does not pop the value off the stack. Might be useful if you want to immediately reuse that value. Alex |
[QUOTE=akruppa]>I presume that %0 is the first arg in the subroutine argument list.
>But I am not familiar with this _asm syntax. What are the items in >double quotes? And the colons? The "%0" refers to the first operand of the asm block, which is "=m" (i). This means output "=" to memory location "m" of variable i. The fistp opcode needs to store to memory, it cannot write to registers. The "t" (d) tells gcc that the asm block expects the value of variable d in the top "t" of the fpu stack. The "st" tells that the top value of the fpu stack will be popped. The colons separate the output : input : flags sections of the operands. There is also a fist opcode that does not pop the value off the stack. Might be useful if you want to immediately reuse that value. Alex[/QUOTE] I have tried the following: __inline iceil(a,d) double a; int d[]; { _asm { mov edi, d mov ST0, a FIST [edi] } } FIST supposedly only outputs its result to memory. [edi] is the address pointed to by register edi. But the compiler gives "invalid argument" Nor does it like an explicit specification that EDI is a DWORD PTR FIST DWORD PTR[edi] does not work. note that mov eax, DWORD PTR[edi] is perfectly valid. How does one get FIST to return its output as the function argument? eax is the register used by VC++ for integer returns, but FIST eax is invalid since eax is a register, not an address. |
It probably complains about the "mov ST0,a". You load values onto the fpu stack with FLD, not MOV. You could try a "FLD x", where x must be a memory reference again, i.e.
mov edi, x FLD [edi] mov edi, d FIST [edi] where x is a double *. It would be much nicer to tell the compiler that FIST (or FISTP) expect a double value on the fpu stack and an address to an integer, and which variables these operands should correspond to (like the GCC asm statement does). This will let the compiler merge the FIST[P] opcode into the surrounding code much better, i.e. it can decide for itself which register to use for the addressing, when to load the double on the stack, etc. But someone else will have to help with that, I have never used VC. Alex Edit: by far the cleanest and probably easiest solution if you want portability across different cpu types and compilers will be using lrint(). |
[QUOTE=akruppa]It probably complains about the "mov ST0,a". You load values onto the fpu stack with FLD, not MOV. You could try a "FLD x", where x must be a memory reference again, i.e.
mov edi, x FLD [edi] mov edi, d FIST [edi] where x is a double *. It would be much nicer to tell the compiler that FIST (or FISTP) expect a double value on the fpu stack and an address to an integer, and which variables these operands should correspond to (like the GCC asm statement does). This will let the compiler merge the FIST[P] opcode into the surrounding code much better, i.e. it can decide for itself which register to use for the addressing, when to load the double on the stack, etc. But someone else will have to help with that, I have never used VC. Alex Edit: by far the cleanest and probably easiest solution if you want portability across different cpu types and compilers will be using lrint().[/QUOTE] Actually you were right. I was getting a report of "invalid argument". When I clicked on the report it is *supposed* to point to the offending code. It *was* pointing at the FISTP line, but the error was on the mov ST0, a line. I changed it to mov edi, d FLD a FISTP [edi] and now it seems fine. |
Do you need to pass an integer array pointer to iceil ?
What about only send a double and get back an int ? __inline int iceil(double a) { int d; const double h = 0.5000001f; _asm { finit fld a fadd h fist d }; return d; } There may an issue with rounding modes. Check if int is OK versus long int. |
[QUOTE=dsouza123]Do you need to pass an integer array pointer to iceil ?
What about only send a double and get back an int ? __inline int iceil(double a) { int d; const double h = 0.5000001f; _asm { finit fld a fadd h fist d }; return d; } There may an issue with rounding modes. Check if int is OK versus long int.[/QUOTE] No, I clearly don't need to pass a pointer. But your suggestion creates two temporaries on the stack because the float instructions need memory addresses as destinations. I am not sure if yours is slower or faster than mine Also, the fadd is not needed in the routine. If the routine only does 'fist' You can call it with (a + .5) to get "ceil" or (a-.5) to get "floor". Is the finit instruction really needed?? Thanks for the advice!!! |
| All times are UTC. The time now is 08:21. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.