Shift masked bits to the lsb

This operation is known as compress right. It is implemented as part of BMI2 as the PEXT instruction, in Intel processors as of Haswell.

Unfortunately, without hardware support is it a quite annoying operation. Of course there is an obvious solution, just moving the bits one by one in a loop, here is the one given by Hackers Delight:

Click to copy

unsigned compress(unsigned x, unsigned m) {
   unsigned r, s, b;    // Result, shift, mask bit. 

   r = 0; 
   s = 0; 
   do {
      b = m & 1; 
      r = r | ((x & b) << s); 
      s = s + b; 
      x = x >> 1; 
      m = m >> 1; 
   } while (m != 0); 
   return r; 
}

But there is an other way, also given by Hackers Delight, which does less looping (number of iteration logarithmic in the number of bits) but more per iteration:

Click to copy

unsigned compress(unsigned x, unsigned m) {
   unsigned mk, mp, mv, t; 
   int i; 

   x = x & m;           // Clear irrelevant bits. 
   mk = ~m << 1;        // We will count 0's to right. 

   for (i = 0; i < 5; i++) {
      mp = mk ^ (mk << 1);             // Parallel prefix. 
      mp = mp ^ (mp << 2); 
      mp = mp ^ (mp << 4); 
      mp = mp ^ (mp << 8); 
      mp = mp ^ (mp << 16); 
      mv = mp & m;                     // Bits to move. 
      m = m ^ mv | (mv >> (1 << i));   // Compress m. 
      t = x & mv; 
      x = x ^ t | (t >> (1 << i));     // Compress x. 
      mk = mk & ~mp; 
   } 
   return x; 
}

Notice that a lot of the values there depend only on m. Since you only have 512 different masks, you could precompute those and simplify the code to something like this (not tested)

Click to copy

unsigned compress(unsigned x, int maskindex) {
   unsigned t; 
   int i; 

   x = x & masks[maskindex][0];

   for (i = 0; i < 5; i++) {
      t = x & masks[maskindex][i + 1]; 
      x = x ^ t | (t >> (1 << i));
   } 
   return x; 
}

Of course all of these can be turned into "not a loop" by unrolling, the second and third ways are probably more suitable for that. That's a bit of cheat however.

You can use the pack-by-multiplication technique similar to the one described here. This way you don't need any loop and can mix the bits in any order.

For example with the mask 0b10101001 == 0xA9 like above and 8-bit data abcdefgh (with a-h is the 8 bits) you can use the below expression to get 0000aceh

Click to copy

uint8_t compress_maskA9(uint8_t x)
{
    const uint8_t mask1 = 0xA9 & 0xF0;
    const uint8_t mask2 = 0xA9 & 0x0F;
    return (((x & mask1)*0x03000000 >> 28) & 0x0C) | ((x & mask2)*0x50000000 >> 30);
}

In this specific case there are some overlaps of the 4 bits while adding (which incur unexpected carry) during the multiplication step, so I've split them into 2 parts, the first one extracts bit a and c, then e and h will be extracted in the latter part. There are other ways to split the bits as well, like a & h then c & e. You can see the results compared to Harold's function live on ideone

An alternate way with only one multiplication

Click to copy

const uint32_t X = (x << 8) | x;
return (X & 0x8821)*0x12050000 >> 28;

I got this by duplicating the bits so that they're spaced out farther, leaving enough space to avoid the carry. This is often better than splitting into 2 multiplications

If you want the result's bits reversed (i.e. heca0000) you can easily change the magic numbers accordingly

Click to copy

// result: he00 | 00ca;
return (((x & 0x09)*0x88000000 >> 28) & 0x0C) | (((x & 0xA0)*0x04800000) >> 30);

or you can also extract the 3 bits e, c and a at the same time, leaving h separately (as I mentioned above, there are often multiple solutions) and you need only one multiplication

Click to copy

return ((x & 0xA8)*0x12400000 >> 29) | (x & 0x01) << 3; // result: 0eca | h000

But there might be a better alternative like the above second snippet

Click to copy

const uint32_t X = (x << 8) | x;
return (X & 0x2881)*0x80290000 >> 28

Correctness check: http://ideone.com/PYUkty

For a larger number of masks you can precompute the magic numbers correspond to those masks and store them in an array so that you can look them up immediately for use. I calculated those mask by hand but you can do that automatically

Explanation

We have abcdefgh & mask1 = a0c00000. Multiply it with magic1

Click to copy

    ........................a0c00000
 ×  00000011000000000000000000000000 (magic1 = 0x03000000)
    ────────────────────────────────
    a0c00000........................
 + a0c00000......................... (the leading "a" bit is outside int's range
    ────────────────────────────────  so it'll be truncated)
r1 = acc.............................

=> (r1 >> 28) & 0x0C = 0000ac00

Similarly we multiply abcdefgh & mask2 = 0000e00h with magic2

Click to copy

  ........................0000e00h
× 01010000000000000000000000000000 (magic2 = 0x50000000)
  ────────────────────────────────
  e00h............................
+ 0h..............................
  ────────────────────────────────
r2 = eh..............................

=> (r2 >> 30) = 000000eh

Combine them together we have the expected result

Click to copy

((r1 >> 28) & 0x0C) | (r2 >> 30) = 0000aceh

And here's the demo for the second snippet

Click to copy

                  abcdefghabcdefgh
&                 1000100000100001 (0x8821)
  ────────────────────────────────
                  a000e00000c0000h
× 00010010000001010000000000000000 (0x12050000)
  ────────────────────────────────
  000h
  00e00000c0000h
+ 0c0000h
  a000e00000c0000h
  ────────────────────────────────
= acehe0h0c0c00h0h
& 11110000000000000000000000000000
  ────────────────────────────────
= aceh

For the reversed order case:

Click to copy

                  abcdefghabcdefgh
&                 0010100010000001 (0x2881)
  ────────────────────────────────
                  00c0e000a000000h
x 10000000001010010000000000000000 (0x80290000)
  ────────────────────────────────
  000a000000h
  00c0e000a000000h
+ 0e000a000000h
  h
  ────────────────────────────────
  hecaea00a0h0h00h
& 11110000000000000000000000000000
  ────────────────────────────────
= heca

How to create a byte out of 8 bool values (and vice versa)?
Redistribute least significant bits from a 4-byte array to a nibble

Shift masked bits to the lsb

Explanation

Related:

Tags:

C++

C

Bitmask

Related

Recent Posts