Reverse Bit Order of 32-bit Integers
MMIX assembly (28 Bytes)
64 bit numbers
rbit:
SETH $1,#0102 # load matrix in 16-byte steps
ORMH $1,#0408
ORML $1,#1020
ORL $1,#4080
MOR $0,$1,$0 # multiplication 1
MOR $0,$0,$1 # multiplication 2
POP 1,0 # return
This assembles to:
rbit:
E0010102 # SETH $1,#0102
E9010408 # ORMH $1,#0408
EA011020 # ORML $1,#1020
EB014080 # ORL $1,#4080
DC000100 # MOR $0,$1,$0
DC000001 # MOR $0,$0,$1
F8010000 # POP 1,0
How does it work?
The MOR
instruction performs a matrix multiplication on two 64-bit quantities used as two 8x8 matrices of booleans. A boolean number with digits abcdefghklmnopqr2 is used as a matrix like this:
/ abcd \
| efgh |
| klmn |
\ opqr /
The MOR
instruction multiplies the matrices represented by their arguments where multiplication is and
and addition is or
. It is:
/ 0001 \ / abcd \ / opqr \
| 0010 | \/ | efgh | -- | klmn |
| 0100 | /\ | klmn | -- | efgh |
\ 1000 / \ opqr / \ abcd /
and furthermore:
/ opqr \ / 0001 \ / rqpo \
| klmn | \/ | 0010 | -- | nmlk |
| efgh | /\ | 0100 | -- | hgfe |
\ abcd / \ 1000 / \ dcba /
which is the reverse order of bits of the original number.
32 bit numbers
If you just want the reverse of a 32 bit number instead of a 64 bit number, you can use this modified method:
rbit:
SETL $1,#0408 # load first matrix in two steps
ORML $1,#0102
MOR $1,$1,$0 # apply first matrix
SLU $2,$1,32 # compile second matrix
16ADDU $1,$2,$1
MOR $1,$0,$1 # apply second matrix
POP 1,0 # return
assembled:
rbit:
E3010408 # SETL $1,#0408
EA010102 # ORML $1,#0102
DC010001 # MOR $1,$1,$0
3B020120 # SLU $2,$1,32
2E010201 # 16ADDU $1,$2,$1
DC010001 # MOR $1,$0,$1
F8010000 # POP 1,0
The first matrix multiplication basically works like this:
/ 0000 \ / 0000 \ / 0000 \
| 0000 | \/ | 0000 | -- | 0000 |
| 0001 | /\ | abcd | -- | efgh |
\ 0010 / \ efgh / \ abcd /
the corresponding octabyte is #0000000001020408
which we load in the first two instructions. The second multiplication looks like this:
/ 0000 \ / 0001 \ / 0000 \
| 0000 | \/ | 0010 | -- | 0000 |
| efgh | /\ | 0100 | -- | hgfe |
\ abcd / \ 1000 / \ dcba /
The corresponding octabyte is #0102040810204080
which we create from the first matrix like this:
SLU $2,$1,#32 # $2 = #0102040800000000
16ADDU $1,$2,$1 # $2 = $2 + $1 << 4
= $2 + #0000000010204080
# = #0102040810204080
The second multiplication is business as usual, the resulting code has the same length (28 bytes).
80386 assembly (13 12 bytes)
As a function in AT&T syntax using the cdecl calling convention.
# reverse bits of a 32 bit word
.text
.globl rbit
.type rbit,@function
rbit:
push $32 # prepare loop counter
pop %ecx
0: shrl 4(%esp) # shift lsb of argument into carry flag
adc %eax,%eax # shift carry flag into lsb
loop 0b # decrement %ecx and jump until ecx = 0
ret # return
This function assembles to the following byte sequence:
6a 20 59 d1 6c 24 04 11 c0 e2 f8 c3
Broken down into instructions:
6a 20 push $32
59 pop %ecx
d1 6c 24 04 shrl 0x4(%esp)
11 c0 adc %eax,%eax
e2 f8 loop .-6
c3 ret
It works like this: In each of the 32 iterations of the loop, the argument, which is located at 4(%esp)
, is right shifted by one position. The last bit is implicitly shifted into the carry flag. The adc
instruction adds two values and adds the value of the carry flag to the result. If you add a value to itself, i.e. %eax
, you effectively left-shift it by one position. This makes adc %eax,%eax
a convenient way to left shift %eax
by one position while shifting the content of the carry flag into the low-order bit.
I repeat this process 32 times so that the entire content of 4(%esp)
is dumped into %eax
. I never explicitly initialize %eax
as its previous contents are shifted out during the loop.
C, 63 52 48
Original version:
int r(int n){int r=0,i=32;for(;i--;r=r<<1|n&1,n>>=1);return r;}
Updated version (with changes suggeted by Allbeert, es1024, and Dennis):
r(n){int r,i=32;for(;i--;r=r*2|n&1,n>>=1);return r;}
Note: Since the second version omits setting r=0
, the code is assuming that an int
is 32 bits. If this assumption is false, the function will most likely produce an incorrect result, depending on the initial state of r
on entry to the function.
Final version (with further changes suggested by Dennis and Alchymist):
r(n,r,i){for(;32-i++;r=r*2|n&1,n>>=1);return r;}
Note: This puts the declaration of the work variables r
and i
into the parameter list. Parameters are as follows: n
is the number to be bit-reversed. r
and i
are work variables that must be passed as 0.