Convert integer to a random but deterministically repeatable choice
Using hash and modulo
import hashlib
def id_to_choice(id_num, num_choices):
id_bytes = id_num.to_bytes((id_num.bit_length() + 7) // 8, 'big')
id_hash = hashlib.sha512(id_bytes)
id_hash_int = int.from_bytes(id_hash.digest(), 'big') # Uses explicit byteorder for system-agnostic reproducibility
choice = id_hash_int % num_choices # Use with small num_choices only
return choice
>>> id_to_choice(123, 3)
0
>>> id_to_choice(456, 3)
1
Notes:
The built-in
hash
method must not be used because it can preserve the input's distribution, e.g. withhash(123)
. Alternatively, it can return values that differ when Python is restarted, e.g. withhash('123')
.For converting an int to bytes,
bytes(id_num)
works but is grossly inefficient as it returns an array of null bytes, and so it must not be used. Usingint.to_bytes
is better. Usingstr(id_num).encode()
works but wastes a few bytes.Admittedly, using modulo doesn't offer exactly uniform probability,[1][2] but this shouldn't bias much for this application because
id_hash_int
is expected to be very large andnum_choices
is assumed to be small.
Using random
The random
module can be used with id_num
as its seed, while addressing concerns surrounding both thread safety and continuity. Using randrange
in this manner is comparable to and simpler than hashing the seed and taking modulo.
With this approach, not only is cross-language reproducibility a concern, but reproducibility across multiple future versions of Python could also be a concern. It is therefore not recommended.
import random
def id_to_choice(id_num, num_choices):
localrandom = random.Random(id_num)
choice = localrandom.randrange(num_choices)
return choice
>>> id_to_choice(123, 3)
0
>>> id_to_choice(456, 3)
2