Simultaneously calculate multiple digests (md5, sha256)?
Check out pee
("tee standard input to pipes
") from moreutils
. This is basically equivalent to Marco's tee
command, but a little simpler to type.
$ echo foo | pee md5sum sha256sum
d3b07384d113edec49eaa6238ad5ff00 -
b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c -
$ pee md5sum sha256sum <foo.iso
f109ffd6612e36e0fc1597eda65e9cf0 -
469a38cb785f8d47a0f85f968feff0be1d6f9398e353496ff7aa9055725bc63e -
You can use a for
loop to loop over the individual files and then use tee
combined with process substitution (works in Bash and Zsh among others) to
pipe to different checksummers.
Example:
for file in *.mkv; do
tee < "$file" >(sha256sum) | md5sum
done
You can also use more than two checksummers:
for file in *.mkv; do
tee < "$file" >(sha256sum) >(sha384sum) | md5sum
done
This has the disadvantage that the checksummers don't know the file name, because it is passed as standard input. If that's not acceptable, you have to emit the file names manually. Complete example:
for file in *.mkv; do
echo "$file"
tee < "$file" >(sha256sum) >(sha384sum) | md5sum
echo
done > hashfilelist
It's a pity that the openssl utility doesn't accept multiple digest commands; I guess performing the same command on multiple files is a more common use pattern. FWIW, the version of the openssl utility on my system (Mepis 11) only has commands for sha and sha1, not any of the other sha variants. But I do have a program called sha256sum, as well as md5sum.
Here's a simple Python program, dual_hash.py, that does what you want. A block size of 64k appears to be optimal for my machine (Intel Pentium 4 2.00GHz with 2G of RAM), YMMV. For small files, its speed is roughly the same as running md5sum and sha256sum in succession. But for larger files it is significantly faster. Eg, on a 1967063040 byte file (a disk image of an SD card full of mp3 files), md5sum + sha256sum takes around 1m44.9s, dual_hash.py takes 1m0.312s.
dual_hash.py
#! /usr/bin/env python
''' Calculate MD5 and SHA-256 digests of a file simultaneously
Written by PM 2Ring 2014.10.23
'''
import sys
import hashlib
def digests(fname, blocksize):
md5 = hashlib.md5()
sha = hashlib.sha256()
with open(fname, 'rb') as f:
while True:
block = f.read(blocksize)
if not block:
break
md5.update(block)
sha.update(block)
print("md5: %s" % md5.hexdigest())
print("sha256: %s" % sha.hexdigest())
def main(*argv):
blocksize = 1<<16 # 64kB
if len(argv) < 2:
print("No filename given!\n")
print("Calculate md5 and sha-256 message digests of a file.")
print("Usage:\npython %s filename [blocksize]\n" % sys.argv[0])
print("Default blocksize=%d" % blocksize)
return 1
fname = argv[1]
if len(argv) > 2:
blocksize = int(sys.argv[2])
print("Calculating MD5 and SHA-256 digests of %r using a blocksize of %d" % (fname, blocksize))
digests(fname, blocksize)
if __name__ == '__main__':
sys.exit(main(*sys.argv))
I suppose a C/C++ version of this program would be a little faster, but not much, since most of the work is being done by the hashlib module, which is written in C (or C++). And as you noted above, the bottleneck for large files is IO speed.