Efficient bitwise operations for counting bits or find the right|left most ones
If you want the fastest way, you will need to use non-portable methods.
Windows/MSVC:
- _BitScanForward()
- _BitScanReverse()
- __popcnt()
GCC:
- __builtin_ffs()
- __builtin_ctz()
- __builtin_clz()
- __builtin_popcount()
These typically map directly to native hardware instructions. So it doesn't get much faster than these.
But since there's no C/C++ functionality for them, they're only accessible via compiler intrinsics.
Take a look at ffs(3), ffsl(3), fls(3), flsl(3).
The ffs() and ffsl() functions find the first bit set (beginning with the least significant bit) in i and return the index of that bit.
The fls() and flsl() functions find the last bit set in i and return the index of that bit.
You might be interested in bitstring(3), too.
Quoting from http://graphics.stanford.edu/~seander/bithacks.html
The best method for counting bits in a 32-bit integer v is the following:
unsigned int v; // count bits set in this (32-bit value) unsigned int c; // store the total here v = v - ((v >> 1) & 0x55555555); // reuse input as temporary v = (v & 0x33333333) + ((v >> 2) & 0x33333333); // temp c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count
The best bit counting method takes only 12 operations, which is the same as the lookup-table method, but avoids the memory and potential cache misses of a table. It is a hybrid between the purely parallel method above and the earlier methods using multiplies (in the section on counting bits with 64-bit instructions), though it doesn't use 64-bit instructions. The counts of bits set in the bytes is done in parallel, and the sum total of the bits set in the bytes is computed by multiplying by 0x1010101 and shifting right 24 bits.