![]() |
![]() |
#1 |
"Bob Silverman"
Nov 2003
North of Boston
11101100001002 Posts |
![]()
A challenge: a is double.
The generic ceil() function in the C math library is slow. Horribly slow. The following code is much faster, but does not work correctly for positive exact integers: #define iceil(a) (a <= 0.0? (int)a : (int)a + 1) For the application I have, exact integers for a are very very very rare and it does not matter if iceil(a) is wrong by 1. Can anyone find a faster way? Is there anyway to do this without the branch? (this macro gets called many,many,many times) |
![]() |
![]() |
![]() |
#2 | |
"Bob Silverman"
Nov 2003
North of Boston
22×1,889 Posts |
![]() Quote:
by just computing e.g. (int)(a + .999999) |
|
![]() |
![]() |
![]() |
#3 | |
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
101101111110102 Posts |
![]() Quote:
Code:
#define iceil(a) ((int)a + !(*(unsigned* ) &a) >> 63))) This variant allows a to be an arbitrary expression but may not be faster because it has an implicit branch. Code:
#define iceil(a) ((int)(a) + (int (a) < 0.0)) Paul Last fiddled with by xilman on 2010-10-27 at 17:09 Reason: Remove a spurious '\'' character. |
|
![]() |
![]() |
![]() |
#4 |
Tribal Bullet
Oct 2004
5·23·31 Posts |
![]()
If you needed a round-to-nearest and not round-to-next-larger, and knew the bit size M of a floating point mantissa, then a standard trick for rounding to integer computes ((a + c) - c), where c is 3*2^(M-1). The addition will right-justify and round the mantissa in an FPU register, the subtraction replaces bits to the right of the 'binary point' with zeros. There are a few caveats though:
- you have to make sure the compiler executes the addition and then subtraction, instead of optimizing them away - you have to know the mantissa size. For x86 machines this could be 53 bits or 64 bits, depending on the OS, compiler, whatever else your app is doing, whether the operation uses SSE or x87 floating point, etc. To get around the first problem you can initialize two global variables to c and copy them to stack variables when you need rounding. You can get around the second problem by forcing the x87 precision to 53 bits, so that SSE and non-SSE code will work the same. There is a similar trick for converting floating point to integer: adding 2^M will right-justify the mantissa without rounding, so you can store to memory and then read the low 32 bits as an integer. The latter is not as necessary now as it was back in the Pentium days because integer conversion is a lot faster on modern x86 machines than it used to be; in the Pentium days the FPU would stall for 6 clocks on an integer conversion, and that was too much for a unit that could churn out an add or multiply every clock. |
![]() |
![]() |
![]() |
#5 | |
"Bob Silverman"
Nov 2003
North of Boston
22·1,889 Posts |
![]() Quote:
the rounding mode to "nearest". (one can set other modes as well) However, I actually need "ceiling", not "nearest" |
|
![]() |
![]() |
![]() |
#6 |
Oct 2007
2·53 Posts |
![]() |
![]() |
![]() |
![]() |
#7 |
"Bob Silverman"
Nov 2003
North of Boston
22·1,889 Posts |
![]() |
![]() |
![]() |
![]() |
#8 |
Feb 2005
Bristol, CT
33×19 Posts |
![]()
This should work
#define iceil(a) (a <= 0.0? (int)a : a > (int)a? (int)a+1:(int)a) Last fiddled with by WMHalsdorf on 2010-10-30 at 06:36 |
![]() |
![]() |
![]() |
#9 |
"Forget I exist"
Jul 2009
Dartmouth NS
204028 Posts |
![]()
why not floor() + 1 is that any faster ?
|
![]() |
![]() |
![]() |
#10 |
Tribal Bullet
Oct 2004
5·23·31 Posts |
![]()
The problem is that floor() and ceil() internally change the rounding mode and then do the integer conversion. Maybe they also have to check for extreme floating point values like inifinity and NaN. Changing the rounding mode is a pretty slow operation, and it's not necessary to do it every time you call floor() and ceil(), you can do it once and then do the same arithmetic they do for many inputs at once.
Getting rid of those functions and converting 'a' to an integer won't help you either, because - the C language requires integer conversion to always truncate, rather than round to nearest, so to be strictly C compliant the library has to still fiddle with the rounding mode - the CPU has to get the converted result from an FPU register to a CPU register and back (the output of iceil is an integer but later code needs it to be a double). If you are not using SSE2, that means bouncing through memory, doing the comparison and bouncing back into the FPU, which can be extremely slow. With SSE2 you can copy from the SSE2 unit to an integer register directly, but I think even today that operation has a high latency. Maybe the most efficient choice is to figure out how to use round-to-nearest and try to correct it afterwards... Last fiddled with by jasonp on 2010-10-30 at 12:32 |
![]() |
![]() |
![]() |
#11 | |
"Forget I exist"
Jul 2009
Dartmouth NS
845010 Posts |
![]() Quote:
TRUNCATE NUMBER IS LIKE FLOOR THEN JUST +1 LOL sorry caps lock was on. Last fiddled with by jasonp on 2010-10-30 at 12:34 Reason: umm, lol? |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Fun with partition function | Batalov | And now for something completely different | 49 | 2022-08-04 12:08 |
Requestion for fast chebyshev theta function | danaj | Computer Science & Computational Number Theory | 9 | 2018-03-31 14:59 |
A useful function. | JM Montolio A | Miscellaneous Math | 28 | 2018-03-08 14:29 |
phi function | rula | Homework Help | 3 | 2017-01-18 01:41 |
Do normal adults give themselves an allowance? (...to fast or not to fast - there is no question!) | jasong | jasong | 35 | 2016-12-11 00:57 |