safe enough 8-character short unique random string

Which method has less collisions, is faster and easier to read?

TLDR

The random_choice is the fastest, has fewer collisions but is IMO slightly harder to read.

The most readable is shortuuid_random but is an external dependency and is slightly slower and has 6x the collisions.

The methods


alphabet = string.ascii_lowercase + string.digits
su = shortuuid.ShortUUID(alphabet=alphabet)

def random_choice():
    return ''.join(random.choices(alphabet, k=8))

def truncated_uuid4():
    return str(uuid.uuid4())[:8]

def shortuuid_random():
    return su.random(length=8)

def secrets_random_choice():
    return ''.join(secrets.choice(alphabet) for _ in range(8))

Results

All methods generate 8-character UUIDs from the abcdefghijklmnopqrstuvwxyz0123456789 alphabet. Collisions are calculated from a single run with 10 million draws. Time is reported in seconds as average function execution ± standard deviation, both calculated over 100 runs of 1,000 draws. Total time is the total execution time of the collision testing.

random_choice: collisions 22 - time (s) 0.00229 ± 0.00016 - total (s) 29.70518
truncated_uuid4: collisions 11711 - time (s) 0.00439 ± 0.00021 - total (s) 54.03649
shortuuid_random: collisions 124 - time (s) 0.00482 ± 0.00029 - total (s) 51.19624
secrets_random_choice: collisions 15 - time (s) 0.02113 ± 0.00072 - total (s) 228.23106

Notes

the default shortuuid alphabet has uppercase characters, hence creating fewer collision. To make it a fair comparison we need to select the same alphabet as the other methods.
the secrets methods token_hex and token_urlsafe while possibly faster, have different alphabets, hence not eligible for the comparison.
the alphabet and class-based shortuuid methods are factored out as module variables, hence speeding up the method execution. This should not affect the TLDR.

Full testing details

import random
import secrets
from statistics import mean
from statistics import stdev
import string
import time
import timeit
import uuid

import shortuuid


alphabet = string.ascii_lowercase + string.digits
su = shortuuid.ShortUUID(alphabet=alphabet)


def random_choice():
    return ''.join(random.choices(alphabet, k=8))


def truncated_uuid4():
    return str(uuid.uuid4())[:8]


def shortuuid_random():
    return su.random(length=8)


def secrets_random_choice():
    return ''.join(secrets.choice(alphabet) for _ in range(8))


def test_collisions(fun):
    out = set()
    count = 0
    for _ in range(10_000_000):
        new = fun()
        if new in out:
            count += 1
        else:
            out.add(new)
    return count


def run_and_print_results(fun):
    round_digits = 5
    now = time.time()
    collisions = test_collisions(fun)
    total_time = round(time.time() - now, round_digits)

    trials = 1_000
    runs = 100
    func_time = timeit.repeat(fun, repeat=runs, number=trials)
    avg = round(mean(func_time), round_digits)
    std = round(stdev(func_time), round_digits)

    print(f'{fun.__name__}: collisions {collisions} - '
          f'time (s) {avg} ± {std} - '
          f'total (s) {total_time}')


if __name__ == '__main__':
    run_and_print_results(random_choice)
    run_and_print_results(truncated_uuid4)
    run_and_print_results(shortuuid_random)
    run_and_print_results(secrets_random_choice)

Your current method should be safe enough, but you could also take a look into the uuid module. e.g.

import uuid

print str(uuid.uuid4())[:8]

Output:

ef21b9ad

You can try the shortuuid library.

Install with : pip install shortuuid

Then it is as simple as :

> import shortuuid
> shortuuid.uuid()
'vytxeTZskVKR7C7WgdSP3d'

safe enough 8-character short unique random string

Which method has less collisions, is faster and easier to read?

TLDR

The methods

Results

Notes

Full testing details

Tags:

Python

Hash

Random

Cryptography

Related

Recent Posts