How to compute log with the preprocessor
The C preprocessor #define
is purely a text substitution mechanism. You will not be able to calculate log values at compile time.
You might be able to with C++ templates, but that is black magic I don't understand, and currently irrelevant.
Or as I mentioned in a comment below, you could build your own pre-pre-processor that evaluates the array size equations before handing the updated code over to the standard C compiler.
Edit
In poking around some more I saw this SO question: Do any C or C++ compilers optimize within define macros?
This question is about evaluating this string of macros:
#include <math.h>
#define ROWS 15
#define COLS 16
#define COEFF 0.15
#define NODES (ROWS*COLS)
#define A_CONSTANT (COEFF*(sqrt(NODES)))
The consensus was that A_CONSTANT
can be a compile time constant, depending on how smart the compiler is, and what mathematical functions are defined as intrinsics
. It also alluded to GCC being smart enough to figure this out for this case.
So the answer to your question could be found in trying it, and seeing what sort of code your compiler actually produces.
A little bit shorter definition for LOG
macro working with integers up to 32 bits could be:
#define LOG_1(n) (((n) >= 2) ? 1 : 0)
#define LOG_2(n) (((n) >= 1<<2) ? (2 + LOG_1((n)>>2)) : LOG_1(n))
#define LOG_4(n) (((n) >= 1<<4) ? (4 + LOG_2((n)>>4)) : LOG_2(n))
#define LOG_8(n) (((n) >= 1<<8) ? (8 + LOG_4((n)>>8)) : LOG_4(n))
#define LOG(n) (((n) >= 1<<16) ? (16 + LOG_8((n)>>16)) : LOG_8(n))
However, before using it, check if you really need it. People often need to use logarithm for values which are a power of 2. For example when implementing bit-arrays or so. While it is difficult to calculate log
as a constant expression, it is very easy to define power of 2. So, you may consider to define your constants as:
#define logA 4
#define A (1<<logA)
instead of:
#define A 16
#define logA LOG(A)
This answer is inspired by 5gon12eder, but with a simpler first macro. Unlike 5gon12eder's solution, this implementation gives 0
for BITS_TO_REPRESENT(0)
, which is arguably correct. This BITS_TO_REPRESENT(N)
function returns the number of bits to represent an unsigned integer less than or equal to nonnegative integer N
; storing a signed number of magnitude N
would need one additional bit.
#define NEEDS_BIT(N, B) (((unsigned long)N >> B) > 0)
#define BITS_TO_REPRESENT(N) \
(NEEDS_BIT(N, 0) + NEEDS_BIT(N, 1) + \
NEEDS_BIT(N, 2) + NEEDS_BIT(N, 3) + \
NEEDS_BIT(N, 4) + NEEDS_BIT(N, 5) + \
NEEDS_BIT(N, 6) + NEEDS_BIT(N, 7) + \
NEEDS_BIT(N, 8) + NEEDS_BIT(N, 9) + \
NEEDS_BIT(N, 10) + NEEDS_BIT(N, 11) + \
NEEDS_BIT(N, 12) + NEEDS_BIT(N, 13) + \
NEEDS_BIT(N, 14) + NEEDS_BIT(N, 15) + \
NEEDS_BIT(N, 16) + NEEDS_BIT(N, 17) + \
NEEDS_BIT(N, 18) + NEEDS_BIT(N, 19) + \
NEEDS_BIT(N, 20) + NEEDS_BIT(N, 21) + \
NEEDS_BIT(N, 22) + NEEDS_BIT(N, 23) + \
NEEDS_BIT(N, 24) + NEEDS_BIT(N, 25) + \
NEEDS_BIT(N, 26) + NEEDS_BIT(N, 27) + \
NEEDS_BIT(N, 28) + NEEDS_BIT(N, 29) + \
NEEDS_BIT(N, 30) + NEEDS_BIT(N, 31) \
)
BITS_TO_REPRESENT
is almost a base-2 logarithm. Since the default conversion from floating point to an integer is truncation, the integer version of a base-2 logarithm corresponds to the floating-point calculation floor(log(N)/log(2))
. BITS_TO_REPRESENT(N)
returns one greater than floor(log(N)/log(2))
.
For example:
BITS_TO_REPRESENT(7)
is3
, whereasfloor(log(7)/log(2))
is2
.BITS_TO_REPRESENT(8)
is4
, whereasfloor(log(8)/log(2))
is3
.
Alright, and now for the dirty brute-force preprocessor trickery.
From your question, I assume that what you actually want is not a general logarithm (which isn't even possible in integer arithmetic) but the number of bits needed to represent a given number. If we restrict ourself to 32 bit integers, there is a solution to this, although it's not pretty.
#define IS_REPRESENTIBLE_IN_D_BITS(D, N) \
(((unsigned long) N >= (1UL << (D - 1)) && (unsigned long) N < (1UL << D)) ? D : -1)
#define BITS_TO_REPRESENT(N) \
(N == 0 ? 1 : (31 \
+ IS_REPRESENTIBLE_IN_D_BITS( 1, N) \
+ IS_REPRESENTIBLE_IN_D_BITS( 2, N) \
+ IS_REPRESENTIBLE_IN_D_BITS( 3, N) \
+ IS_REPRESENTIBLE_IN_D_BITS( 4, N) \
+ IS_REPRESENTIBLE_IN_D_BITS( 5, N) \
+ IS_REPRESENTIBLE_IN_D_BITS( 6, N) \
+ IS_REPRESENTIBLE_IN_D_BITS( 7, N) \
+ IS_REPRESENTIBLE_IN_D_BITS( 8, N) \
+ IS_REPRESENTIBLE_IN_D_BITS( 9, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(10, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(11, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(12, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(13, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(14, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(15, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(16, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(17, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(18, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(19, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(20, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(21, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(22, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(23, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(24, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(25, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(26, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(27, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(28, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(29, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(30, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(31, N) \
+ IS_REPRESENTIBLE_IN_D_BITS(32, N) \
) \
)
The idea is that a number n > 0 has a representation using exactly d bits if and only if n ≥ 2d−1 and n < 2d. After treating the n = 0 case specially, we simply brute-force this out for all 32 possible answers.
The helper macro IS_REPRESENTIBLE_IN_D_BITS(D, N)
will expand to an expression evaluating to D
if N
can be represented using exactly D
bits and to -1
otherwise. I have defined the macros such that the result is −1 if the answer is “no”. To compensate for the negative summands, I add 31 at the end. If the number cannot be represented in any 1, …, 32 bits then the overall result will be −1 which should help us catch some errors.
The expression BITS_TO_REPRESENT(42)
is a valid compile-time constant for use in an array length declaration.
All that said, the additional cost for always making your array 32 elements long seems acceptable for many applications and it saves you quite some trouble. So I would only use such trickery if I really had to.
Update: Just to avoid confusion: This solution does not use the preprocessor to evaluate the “logarithm”. All the preprocessor does is performing a text substitution that you can see if compiling with the -E
switch (at least for GCC). Let's have a look at this code:
int
main()
{
int digits[BITS_TO_REPRESENT(42)];
return 0;
}
It will be preprocessed to (be warned):
int
main()
{
int digits[(42 == 0 ? 1 : (31 + (((unsigned long) 42 >= (1UL << (1 - 1)) && (unsigned long) 42 < (1UL << 1)) ? 1 : -1) + (((unsigned long) 42 >= (1UL << (2 - 1)) && (unsigned long) 42 < (1UL << 2)) ? 2 : -1) + (((unsigned long) 42 >= (1UL << (3 - 1)) && (unsigned long) 42 < (1UL << 3)) ? 3 : -1) + (((unsigned long) 42 >= (1UL << (4 - 1)) && (unsigned long) 42 < (1UL << 4)) ? 4 : -1) + (((unsigned long) 42 >= (1UL << (5 - 1)) && (unsigned long) 42 < (1UL << 5)) ? 5 : -1) + (((unsigned long) 42 >= (1UL << (6 - 1)) && (unsigned long) 42 < (1UL << 6)) ? 6 : -1) + (((unsigned long) 42 >= (1UL << (7 - 1)) && (unsigned long) 42 < (1UL << 7)) ? 7 : -1) + (((unsigned long) 42 >= (1UL << (8 - 1)) && (unsigned long) 42 < (1UL << 8)) ? 8 : -1) + (((unsigned long) 42 >= (1UL << (9 - 1)) && (unsigned long) 42 < (1UL << 9)) ? 9 : -1) + (((unsigned long) 42 >= (1UL << (10 - 1)) && (unsigned long) 42 < (1UL << 10)) ? 10 : -1) + (((unsigned long) 42 >= (1UL << (11 - 1)) && (unsigned long) 42 < (1UL << 11)) ? 11 : -1) + (((unsigned long) 42 >= (1UL << (12 - 1)) && (unsigned long) 42 < (1UL << 12)) ? 12 : -1) + (((unsigned long) 42 >= (1UL << (13 - 1)) && (unsigned long) 42 < (1UL << 13)) ? 13 : -1) + (((unsigned long) 42 >= (1UL << (14 - 1)) && (unsigned long) 42 < (1UL << 14)) ? 14 : -1) + (((unsigned long) 42 >= (1UL << (15 - 1)) && (unsigned long) 42 < (1UL << 15)) ? 15 : -1) + (((unsigned long) 42 >= (1UL << (16 - 1)) && (unsigned long) 42 < (1UL << 16)) ? 16 : -1) + (((unsigned long) 42 >= (1UL << (17 - 1)) && (unsigned long) 42 < (1UL << 17)) ? 17 : -1) + (((unsigned long) 42 >= (1UL << (18 - 1)) && (unsigned long) 42 < (1UL << 18)) ? 18 : -1) + (((unsigned long) 42 >= (1UL << (19 - 1)) && (unsigned long) 42 < (1UL << 19)) ? 19 : -1) + (((unsigned long) 42 >= (1UL << (20 - 1)) && (unsigned long) 42 < (1UL << 20)) ? 20 : -1) + (((unsigned long) 42 >= (1UL << (21 - 1)) && (unsigned long) 42 < (1UL << 21)) ? 21 : -1) + (((unsigned long) 42 >= (1UL << (22 - 1)) && (unsigned long) 42 < (1UL << 22)) ? 22 : -1) + (((unsigned long) 42 >= (1UL << (23 - 1)) && (unsigned long) 42 < (1UL << 23)) ? 23 : -1) + (((unsigned long) 42 >= (1UL << (24 - 1)) && (unsigned long) 42 < (1UL << 24)) ? 24 : -1) + (((unsigned long) 42 >= (1UL << (25 - 1)) && (unsigned long) 42 < (1UL << 25)) ? 25 : -1) + (((unsigned long) 42 >= (1UL << (26 - 1)) && (unsigned long) 42 < (1UL << 26)) ? 26 : -1) + (((unsigned long) 42 >= (1UL << (27 - 1)) && (unsigned long) 42 < (1UL << 27)) ? 27 : -1) + (((unsigned long) 42 >= (1UL << (28 - 1)) && (unsigned long) 42 < (1UL << 28)) ? 28 : -1) + (((unsigned long) 42 >= (1UL << (29 - 1)) && (unsigned long) 42 < (1UL << 29)) ? 29 : -1) + (((unsigned long) 42 >= (1UL << (30 - 1)) && (unsigned long) 42 < (1UL << 30)) ? 30 : -1) + (((unsigned long) 42 >= (1UL << (31 - 1)) && (unsigned long) 42 < (1UL << 31)) ? 31 : -1) + (((unsigned long) 42 >= (1UL << (32 - 1)) && (unsigned long) 42 < (1UL << 32)) ? 32 : -1) ) )];
return 0;
}
This looks terrible and if it were evaluated at run-time, it would be quite a number of instructions. However, since all operands are constants (or literals, to be precise) the compiler is able to evaluate this at compile-time. It has to do so, because an array length declaration must be a constant in C 89.
If you are using the macro in other places that are not required to be compile-time constants, it is up to the compiler whether or not it evaluates the expression. However, any reasonable compiler should be expected to perform this rather elementary optimization – known as constant folding – if optimizations are enabled. If in doubt – as always – have a look at the generated assembly code.
For example, let us consider this program.
int
main()
{
return BITS_TO_REPRESENT(42);
}
The expression in a return
statement clearly isn't required to be a compile-time constant, so let's look what code GCC will generate. (I'm using the -S
switch to stop at the assembly stage.)
Even without any optimizations enabled, I get the following assembly code which shows that the macro expansion was folded into the constant 6.
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $6, %eax # See the constant 6?
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc