Bit twiddling: which bit is set?

If performance is a serious issue, then you should use intrinsics/builtins to use CPU specific instructions, such as the ones found here for GCC:

http://gcc.gnu.org/onlinedocs/gcc-4.5.0/gcc/Other-Builtins.html

  • Built-in function int __builtin_ffs(unsigned int x).

    Returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero.

  • Built-in function int __builtin_clz(unsigned int x).

    Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.

  • Built-in function int __builtin_ctz(unsigned int x).

    Returns the number of trailing 0-bits in x, starting at the least significant bit position. If x is 0, the result is undefined.

Things like this are the core of many O(1) algorithms, such as kernel schedulers which need to find the first non-empty queue signified by an array of bits.

Note: I’ve listed the unsigned int versions, but GCC has unsigned long long versions, as well.


Finally an optimal solution. See the end of this section for what to do when the input is guaranteed to have exactly one non-zero bit: http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn

Here's the code:

static const int MultiplyDeBruijnBitPosition2[32] = 
{
  0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 
  31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition2[(uint32_t)(v * 0x077CB531U) >> 27];

You may be able to adapt this to a direct multiplication-based algorithm for 64-bit inputs; otherwise, simply add one conditional to see if the bit is in the upper 32 positions or the lower 32 positions, then use the 32-bit algorithm here.

Update: Here's at least one 64-bit version I just developed myself, but it uses division (actually modulo).

r = Table[v%67];

For each power of 2, v%67 has a distinct value, so just put your odd primes (or bit indices if you don't want the odd-prime thing) at the right positions in the table. 3 positions (0, 17, and 34) are not used, which might be convenient if you also want to accept all-bits-zero as an input.

Update 2: 64-bit version.

r = Table[(uint64_t)(val * 0x022fdd63cc95386dull) >> 58];

This is my original work, but I got the B(2,6) De Bruijn sequence from this chess site so I can't take credit for anything but figuring out what a De Bruijn sequence is and using Google. ;-)

Some additional remarks on how this works:

The magic number is a B(2,6) De Bruijn sequence. It has the property that, if you look at a 6-consecutive-bit window, you can obtain any six-bit value in that window by rotating the number appropriately, and that each possible six-bit value is obtained by exactly one rotation.

We fix the window in question to be the top 6 bit positions, and choose a De Bruijn sequence with 0's in the top 6 bits. This makes it so we never have to deal with bit rotations, only shifts, since 0's will come into the bottom bits naturally (and we could never end up looking at more than 5 bits from the bottom in the top-6-bits window).

Now, the input value of this function is a power of 2. So multiplying the De Bruijn sequence by the input value performs a bitshift by log2(value) bits. We now have in the upper 6 bits a number which uniquely determines how many bits we shifted by, and can use that as an index into a table to get the actual length of the shift.

This same approach can be used for arbitrarily-large or arbitrarily-small integers, as long as you're willing to implement the multiplication. You simply have to find a B(2,k) De Bruijn sequence where k is the number of bits. The chess wiki link I provided above has De Bruijn sequences for values of k ranging from 1 to 6, and some quick Googling shows there are a few papers on optimal algorithms for generating them in the general case.