"OSError: [Errno 22] Invalid argument" when read()ing a huge file
There have been several issues over the history of Python (most fixed in recent versions) reading more than 2-4 GB at once from a file handle (an unfixable version of the problem also occurs on 32 bit builds of Python, where they simply lack the virtual address space to allocate the buffer; not I/O related, but seen most frequently slurping large files). A workaround available for hashing is to update the hash in fixed size chunks (which is a good idea anyway, since counting on RAM being greater than file size is a poor idea). The most straightforward approach is to change your code to:
with open(file, 'rb') as f:
hasher = hashlib.sha256() # Make empty hasher to update piecemeal
while True:
block = f.read(64 * (1 << 20)) # Read 64 MB at a time; big, but not memory busting
if not block: # Reached EOF
break
hasher.update(block) # Update with new block
print('SHA256 of file is %s' % hasher.hexdigest()) # Finalize to compute digest
If you're feeling fancy, you can "simplify" the loop using two-arg iter
and some functools
magic, replacing the whole of the while
loop with:
for block in iter(functools.partial(f.read, 64 * (1 << 20)), b''):
hasher.update(block)
Or on Python 3.8+, with the walrus operator, :=
it's simpler without the need for imports or unreadable code:
while block := f.read(64 * (1 << 20)): # Assigns and tests result in conditional!
hasher.update(block)