![]() |
|
|
#1 |
|
Nov 2003
22·5·373 Posts |
The following macros are (on the Pentium) 9 times faster than
calling the library ceil() and floor() functions. a is a double. Note also that |a| < 16K. #define iceil(a) ((a) <= 0.0 ? (int)(a) : (int)(a) + 1) #define ifloor(a) ((a) >= 0.0 ? (int)(a) : (int)(a) - 1) Can anyone do better, by perhaps eliminating the branches? Note that these are very slightly wrong. If a > 0 and a is an exact integer, iceil returns a+1, instead of a. But this is OK, as long as the macro is fast.
|
|
|
|
|
|
#2 |
|
"Nancy"
Aug 2002
Alexandria
46438 Posts |
The fastest way should be using the fistp opcode which uses the currently set rounding direction, which is round-to-nearest by default. Changing rounding modes in the fpu causes the pipeline to be flushed which imposes a huge penalty. This occurs for *each* type-cast to int as those are defined as truncate (round to zero)!
Try static inline int dtoi (double d) { int i; __asm__ ("fistpl %0" : "=m" (i) : "t" (d) : "st"); return i; } which rounds to nearest. Adding/subtracting 0.5 should be quite fast so you'll get the rounding direction you want. You should also be able to use fesetround() (C99) to change the default rounding mode so you don't have to add a constant for your preferred rounding direction. Alex Last fiddled with by akruppa on 2005-07-25 at 17:44 |
|
|
|
|
|
#3 | |
|
Nov 2003
22×5×373 Posts |
Quote:
But I am not familiar with this _asm syntax. What are the items in double quotes? And the colons? I will try this using VC++ _asm
|
|
|
|
|
|
|
#4 | |
|
Nov 2003
22·5·373 Posts |
Quote:
For VC++ I need to move d into a floating point register. Do you know which register fistl acts upon? I guess I'll have to check my P IV instruction set documentation. BTW, this isn't very portable. But you know that. |
|
|
|
|
|
|
#5 |
|
"Nancy"
Aug 2002
Alexandria
2,467 Posts |
>Do you know which register fistl acts upon?
It pops a value from the fpu stack. My inline function is inteded for gcc, I don't know if VC will like it. I have no way of testing, though. >BTW, this isn't very portable. But you know that. True, it isn't. I've tinkered with this problem for the interface between GMP-ECM and GWNUM a little, this was the fastest I solution found. There is an lrint() function defined in C99 which converts a double to an int, using the currently set rounding direction. When setting the proper #defines, GNU's math.h defines lrint() as a macro that uses fistp again. Without the proper #defines, it uses a library call - faster than the typecast with rounding-mode-change, but of course much slower than the fistp opcode. In the end, I just used my own fistp wrapper directly instead of fiddling with headers - the GWNUM interface code will only run on x86 so portability is not an issue for me. If it is for you, it may be worthwhile to check what it takes to make lrint() use the fistp opcode. Alex |
|
|
|
|
|
#6 |
|
"Nancy"
Aug 2002
Alexandria
2,467 Posts |
>I presume that %0 is the first arg in the subroutine argument list.
>But I am not familiar with this _asm syntax. What are the items in >double quotes? And the colons? The "%0" refers to the first operand of the asm block, which is "=m" (i). This means output "=" to memory location "m" of variable i. The fistp opcode needs to store to memory, it cannot write to registers. The "t" (d) tells gcc that the asm block expects the value of variable d in the top "t" of the fpu stack. The "st" tells that the top value of the fpu stack will be popped. The colons separate the output : input : flags sections of the operands. There is also a fist opcode that does not pop the value off the stack. Might be useful if you want to immediately reuse that value. Alex Last fiddled with by akruppa on 2005-07-25 at 18:38 |
|
|
|
|
|
#7 | |
|
Nov 2003
22×5×373 Posts |
Quote:
I have tried the following: __inline iceil(a,d) double a; int d[]; { _asm { mov edi, d mov ST0, a FIST [edi] } } FIST supposedly only outputs its result to memory. [edi] is the address pointed to by register edi. But the compiler gives "invalid argument" Nor does it like an explicit specification that EDI is a DWORD PTR FIST DWORD PTR[edi] does not work. note that mov eax, DWORD PTR[edi] is perfectly valid. How does one get FIST to return its output as the function argument? eax is the register used by VC++ for integer returns, but FIST eax is invalid since eax is a register, not an address. |
|
|
|
|
|
|
#8 |
|
"Nancy"
Aug 2002
Alexandria
1001101000112 Posts |
It probably complains about the "mov ST0,a". You load values onto the fpu stack with FLD, not MOV. You could try a "FLD x", where x must be a memory reference again, i.e.
mov edi, x FLD [edi] mov edi, d FIST [edi] where x is a double *. It would be much nicer to tell the compiler that FIST (or FISTP) expect a double value on the fpu stack and an address to an integer, and which variables these operands should correspond to (like the GCC asm statement does). This will let the compiler merge the FIST[P] opcode into the surrounding code much better, i.e. it can decide for itself which register to use for the addressing, when to load the double on the stack, etc. But someone else will have to help with that, I have never used VC. Alex Edit: by far the cleanest and probably easiest solution if you want portability across different cpu types and compilers will be using lrint(). Last fiddled with by akruppa on 2005-07-25 at 19:59 |
|
|
|
|
|
#9 | |
|
Nov 2003
22·5·373 Posts |
Quote:
Actually you were right. I was getting a report of "invalid argument". When I clicked on the report it is *supposed* to point to the offending code. It *was* pointing at the FISTP line, but the error was on the mov ST0, a line. I changed it to mov edi, d FLD a FISTP [edi] and now it seems fine. |
|
|
|
|
|
|
#10 |
|
Sep 2002
2·331 Posts |
Do you need to pass an integer array pointer to iceil ?
What about only send a double and get back an int ? __inline int iceil(double a) { int d; const double h = 0.5000001f; _asm { finit fld a fadd h fist d }; return d; } There may an issue with rounding modes. Check if int is OK versus long int. |
|
|
|
|
|
#11 | |
|
Nov 2003
22·5·373 Posts |
Quote:
two temporaries on the stack because the float instructions need memory addresses as destinations. I am not sure if yours is slower or faster than mine Also, the fadd is not needed in the routine. If the routine only does 'fist' You can call it with (a + .5) to get "ceil" or (a-.5) to get "floor". Is the finit instruction really needed?? Thanks for the advice!!! |
|
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| A Challenge on the net | devarajkandadai | Miscellaneous Math | 0 | 2012-05-31 05:17 |
| When I was your age.....CHALLENGE | petrw1 | Lounge | 14 | 2009-11-23 02:18 |
| Challenge | science_man_88 | Miscellaneous Math | 229 | 2009-09-07 08:08 |
| rsa-640 challenge | ValerieVonck | Factoring | 58 | 2005-10-24 15:54 |
| Who is Challenge? | JuanTutors | PrimeNet | 2 | 2004-07-22 12:56 |