mersenneforum.org Fast Approximate Ceiling Function
 Register FAQ Search Today's Posts Mark Forums Read

 2010-10-27, 13:02 #1 R.D. Silverman     "Bob Silverman" Nov 2003 North of Boston 1D5416 Posts Fast Approximate Ceiling Function A challenge: a is double. The generic ceil() function in the C math library is slow. Horribly slow. The following code is much faster, but does not work correctly for positive exact integers: #define iceil(a) (a <= 0.0? (int)a : (int)a + 1) For the application I have, exact integers for a are very very very rare and it does not matter if iceil(a) is wrong by 1. Can anyone find a faster way? Is there anyway to do this without the branch? (this macro gets called many,many,many times)
2010-10-27, 15:14   #2
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

22·1,877 Posts

Quote:
 Originally Posted by R.D. Silverman A challenge: a is double. The generic ceil() function in the C math library is slow. Horribly slow. The following code is much faster, but does not work correctly for positive exact integers: #define iceil(a) (a <= 0.0? (int)a : (int)a + 1) For the application I have, exact integers for a are very very very rare and it does not matter if iceil(a) is wrong by 1. Can anyone find a faster way? Is there anyway to do this without the branch? (this macro gets called many,many,many times)
I may just choose to accept a small number of additional inaccuracies
by just computing e.g. (int)(a + .999999)

2010-10-27, 17:08   #3
xilman
Bamboozled!

"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across

2×73×17 Posts

Quote:
 Originally Posted by R.D. Silverman A challenge: a is double. The generic ceil() function in the C math library is slow. Horribly slow. The following code is much faster, but does not work correctly for positive exact integers: #define iceil(a) (a <= 0.0? (int)a : (int)a + 1) For the application I have, exact integers for a are very very very rare and it does not matter if iceil(a) is wrong by 1. Can anyone find a faster way? Is there anyway to do this without the branch? (this macro gets called many,many,many times)
I'm assuming your variable a is a 64-bit double. If it's a 32-bit float you need to change the shift count in the macro below to 31. I'm also assuming IEEE floating point representation, so the sign bit is stored in the MSB of a floating point variable.
Code:
#define iceil(a) ((int)a + !(*(unsigned* ) &a) >> 63)))
Note that this will not work if a is anything but a variable. Whether it is faster than the conditional expression is open to experiment.

This variant allows a to be an arbitrary expression but may not be faster because it has an implicit branch.
Code:
#define iceil(a) ((int)(a) + (int (a) < 0.0))
Your macro doesn't work if a is an expression but is easily fixed with the addition of some parentheses.

Paul

Last fiddled with by xilman on 2010-10-27 at 17:09 Reason: Remove a spurious '\'' character.

 2010-10-29, 18:10 #4 jasonp Tribal Bullet     Oct 2004 32·5·79 Posts If you needed a round-to-nearest and not round-to-next-larger, and knew the bit size M of a floating point mantissa, then a standard trick for rounding to integer computes ((a + c) - c), where c is 3*2^(M-1). The addition will right-justify and round the mantissa in an FPU register, the subtraction replaces bits to the right of the 'binary point' with zeros. There are a few caveats though: - you have to make sure the compiler executes the addition and then subtraction, instead of optimizing them away - you have to know the mantissa size. For x86 machines this could be 53 bits or 64 bits, depending on the OS, compiler, whatever else your app is doing, whether the operation uses SSE or x87 floating point, etc. To get around the first problem you can initialize two global variables to c and copy them to stack variables when you need rounding. You can get around the second problem by forcing the x87 precision to 53 bits, so that SSE and non-SSE code will work the same. There is a similar trick for converting floating point to integer: adding 2^M will right-justify the mantissa without rounding, so you can store to memory and then read the low 32 bits as an integer. The latter is not as necessary now as it was back in the Pentium days because integer conversion is a lot faster on modern x86 machines than it used to be; in the Pentium days the FPU would stall for 6 clocks on an integer conversion, and that was too much for a unit that could churn out an add or multiply every clock.
2010-10-29, 23:18   #5
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

1D5416 Posts

Quote:
 Originally Posted by jasonp If you needed a round-to-nearest and not round-to-next-larger, and knew the bit size M of a floating point mantissa, then a standard trick for rounding to integer computes ((a + c) - c), where c is 3*2^(M-1). The addition will right-justify and round the mantissa in an FPU register, the subtraction replaces bits to the right of the 'binary point' with zeros. There are a few caveats though: - you have to make sure the compiler executes the addition and then subtraction, instead of optimizing them away - you have to know the mantissa size. For x86 machines this could be 53 bits or 64 bits, depending on the OS, compiler, whatever else your app is doing, whether the operation uses SSE or x87 floating point, etc. To get around the first problem you can initialize two global variables to c and copy them to stack variables when you need rounding. You can get around the second problem by forcing the x87 precision to 53 bits, so that SSE and non-SSE code will work the same. There is a similar trick for converting floating point to integer: adding 2^M will right-justify the mantissa without rounding, so you can store to memory and then read the low 32 bits as an integer. The latter is not as necessary now as it was back in the Pentium days because integer conversion is a lot faster on modern x86 machines than it used to be; in the Pentium days the FPU would stall for 6 clocks on an integer conversion, and that was too much for a unit that could churn out an add or multiply every clock.
The Pentium/Core-2 also has an instruction that allows the FPU to set
the rounding mode to "nearest". (one can set other modes as well)

However, I actually need "ceiling", not "nearest"

2010-10-30, 00:30   #6
Robert Holmes

Oct 2007

11010102 Posts

Quote:
 Originally Posted by R.D. Silverman The Pentium/Core-2 also has an instruction that allows the FPU to set the rounding mode to "nearest". (one can set other modes as well) However, I actually need "ceiling", not "nearest"
Wouldn't nearest(x + 0.4999...) do the ceiling?

2010-10-30, 01:53   #7
R.D. Silverman

"Bob Silverman"
Nov 2003
North of Boston

22·1,877 Posts

Quote:
 Originally Posted by Robert Holmes Wouldn't nearest(x + 0.4999...) do the ceiling?
Yes, but there is a latency associated with setting the rounding mode.
And if you want to intermingle other FP computations you constantly
set/reset the rounding mode. I've tried it. It is slow.

 2010-10-30, 05:41 #8 WMHalsdorf     Feb 2005 Bristol, CT 33×19 Posts This should work #define iceil(a) (a <= 0.0? (int)a : a > (int)a? (int)a+1:(int)a) Last fiddled with by WMHalsdorf on 2010-10-30 at 06:36
 2010-10-30, 11:47 #9 science_man_88     "Forget I exist" Jul 2009 Dartmouth NS 100000111000102 Posts why not floor() + 1 is that any faster ?
 2010-10-30, 12:19 #10 jasonp Tribal Bullet     Oct 2004 1101111000112 Posts The problem is that floor() and ceil() internally change the rounding mode and then do the integer conversion. Maybe they also have to check for extreme floating point values like inifinity and NaN. Changing the rounding mode is a pretty slow operation, and it's not necessary to do it every time you call floor() and ceil(), you can do it once and then do the same arithmetic they do for many inputs at once. Getting rid of those functions and converting 'a' to an integer won't help you either, because - the C language requires integer conversion to always truncate, rather than round to nearest, so to be strictly C compliant the library has to still fiddle with the rounding mode - the CPU has to get the converted result from an FPU register to a CPU register and back (the output of iceil is an integer but later code needs it to be a double). If you are not using SSE2, that means bouncing through memory, doing the comparison and bouncing back into the FPU, which can be extremely slow. With SSE2 you can copy from the SSE2 unit to an integer register directly, but I think even today that operation has a high latency. Maybe the most efficient choice is to figure out how to use round-to-nearest and try to correct it afterwards... Last fiddled with by jasonp on 2010-10-30 at 12:32
2010-10-30, 12:28   #11
science_man_88

"Forget I exist"
Jul 2009
Dartmouth NS

2·3·23·61 Posts

Quote:
 Originally Posted by jasonp The problem with all of those choices is that floor() and ceil() internally change the rounding mode and then do the integer conversion. Changing the rounding mode is a pretty slow operation, and it's not necessary to do it every time you call floor() and ceil(), you can do it once and then do the same arithmetic they do for many inputs at once. Getting rid of those functions and converting 'a' to an integer won't help you either, because - the C language requires integer conversion to always truncate, rather than round to nearest, so to be strictly C compliant the library has to still fiddle with the rounding mode - the CPU has to get the converted result from an FPU register to a CPU register and back (the output of iceil is an integer but later code needs it to be a double). If you are not using SSE2, that means bouncing through memory, doing the comparison and bouncing back into the FPU, which can be extremely slow. With SSE2 you can copy from the SSE2 unit to an integer register directly, but I think even today that operation has a high latency.

TRUNCATE NUMBER IS LIKE FLOOR THEN JUST +1 LOL sorry caps lock was on.

Last fiddled with by jasonp on 2010-10-30 at 12:34 Reason: umm, lol?

 Similar Threads Thread Thread Starter Forum Replies Last Post Batalov And now for something completely different 49 2022-08-04 12:08 danaj Computer Science & Computational Number Theory 9 2018-03-31 14:59 JM Montolio A Miscellaneous Math 28 2018-03-08 14:29 rula Homework Help 3 2017-01-18 01:41 jasong jasong 35 2016-12-11 00:57

All times are UTC. The time now is 21:50.

Tue Feb 7 21:50:07 UTC 2023 up 173 days, 19:18, 1 user, load averages: 1.53, 1.02, 1.00