Efficient arbitrary-sized integer packing in Python

Assuming the poster wants to pack a large integer as a binary string, i.e. not use one byte of storage per digit in the number. One way of doing this seems to be:

import marshal

a = 47L
print marshal.dumps(a)

This prints:

'l\x01\x00\x00\x00/\x00'

I can't say that I understand how to interpret these bits, right now ...


Came across the same issue. Starting from python 3.2, you can use int.to_bytes:

>>> (2**100).to_bytes(16, byteorder='big')
b'\x00\x00\x00\x10\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

Do you mean something like this:

def num_to_bytes(num):
    bytes = []
    num = abs(num) # Because I am unsure about negatives...
    while num > 0:
        bytes.append(chr(num % 256))
        num >>= 8
    return ''.join(reversed(bytes))

def bytes_to_num(bytes):
    num = 0
    for byte in bytes:
        num <<= 8
        num += ord(byte)
    return num

for n in (1, 16, 256, 257, 1234567890987654321):
    print n,
    print num_to_bytes(n).encode('hex'),
    print bytes_to_num(num_to_bytes(n))

Which returns:

1 01 1
16 10 16
256 0100 256
257 0101 257
1234567890987654321 112210f4b16c1cb1 1234567890987654321

I'm just not sure what to do about negatives... I'm not that familiar with bit twidling.

EDIT: Another solution (which runs about 30% faster by my tests):

def num_to_bytes(num):
    num = hex(num)[2:].rstrip('L')
    if len(num) % 2:
        return ('0%s' % num).decode('hex')
    return num.decode('hex')

def bytes_to_num(bytes):
    return int(bytes.encode('hex'), 16)

I take it you mean you only want to use as many bytes as you need to represent the number? e.g. if the number is:

  • 255 or less you'd use only 1 byte
  • 65535 or less 2 bytes
  • 16777215 or less 3 bytes
  • etc etc

On the Psion PDA they'd usually have some of packing scheme in which you read the first byte, detect if it has the highest bit set and then read another byte if it has. That way you'd just keep reading bytes until you read the "full" number. That system works quite well if most of the numbers you are dealing with are fairly small, as you'll normally only use one or two bytes per number.

The alternative is to have one (or more) bytes representing the number of total bytes used, but at that point it's basically a string in Python anyway. i.e. it's a string of base-256 digits.

Tags:

Python

Byte