Disable hash randomization from within python program

Maybe the only/cleanest way is to prepend this to the beginning of your program:

import os
import sys
hashseed = os.getenv('PYTHONHASHSEED')
if not hashseed:
    os.environ['PYTHONHASHSEED'] = '0'
    os.execv(sys.executable, [sys.executable] + sys.argv)

[the rest of your program]

If PYTHONHASHSEED is missing, it will set it to zero and replace the current program with a new, supplying the same set of arguments. According to os.execv:

These functions all execute a new program, replacing the current process; they do not return. On Unix, the new executable is loaded into the current process, and will have the same process id as the caller. Errors will be reported as OSError exceptions.

The current process is replaced immediately. Open file objects and descriptors are not flushed, so if there may be data buffered on these open files, you should flush them using sys.stdout.flush() or os.fsync() before calling an exec* function.

I suspect this isn't possible, unfortunately. Looking at test_hash.py the HashRandomizationTests class and its descendants were added in the commit that introduced this behavior. They test the hashing behavior by modifying the environment and starting a new process with PYTHONHASHSEED explicitly set. You could try to copy that pattern, perhaps.

I also just noticed you said "Every time I run my script, dict contents are iterated in a different order." - I assume you're aware of collections.OrderedDict, right? That's the normal way to get reliable hash iteration.

If you're willing to set the value in your shell environment, you could also just wrap your python call in a bash script, e.g.

#! /bin/bash
export PYTHONHASHSEED=0

# call your python program here

That avoids needing to manipulate your whole environment, as long as you're ok with a wrapper script.

Or even just pass the value on the command line:

$ PYTHONHASHSEED=0 python YOURSCRIPT.py

Apart from dictionary order, hash randomisation may also break existing code that uses hash() directly. A workaround that solved the problem for me in this case was to replace

hash(mystring)

with

int(hashlib.sha512(mystring).hexdigest(), 16)

For Python 3, a conversion like mystring.encode('utf-8') will be needed for standard strings. (I was working with byte strings.)

Note that the range of numbers and whether negative numbers are included are different. The latter code gives a much bigger range of numbers and hash collisions are extremely unlikely.

To reproduce the same 64-bit range as hash(), one could reduce the number of hexadecimal digits to 16 (4 bits per digit) and shift the result to start at the smallest negative 64-bit number:

int(hashlib.sha256(mystring).hexdigest()[:16], 16)-2**63

Alternatively, one can take 8 bytes and use int.from_bytes:

int.from_bytes(hashlib.sha256(mystring).digest()[:8], byteorder='big', signed=True)

Disable hash randomization from within python program

Tags:

Python

Hash

Python 3.X

Related

Recent Posts