In bash, how to convert 8 bytes to an unsigned int (64bit LE)?
Bash is the wrong tool altogether. Shells are good at gluing bits and pieces together; text processing and arithmetic are provided on the side, and data processing isn't in their purview at all.
I'd go for Python over Perl, because Python has bignums right off the bat. Use struct.unpack
to unpack the data.
#!/usr/bin/env python
import os, struct, sys
fmt = "<" + "Q" * 8192
header_bytes = sys.stdin.read(65536)
header_ints = list(struct.unpack(fmt, header_bytes))
sys.stdin.seek(-65536, 2)
footer_bytes = sys.stdin.read(65536)
footer_ints = list(struct.unpack(fmt, header_bytes))
# your calculations here
Here's my answer to the original question. The revised question doesn't have much to do with the original, which was about converting one 8-byte sequence into the 64-bit integer it represents in little-endian order.
I don't think bash has any built-in feature for this. The following snippet sets a
to a string that is the hexadecimal representation of the number that corresponds to the bytes in the specified string in big endian order.
a=0x$(printf "%s" "$string" |
od -t x1 -An |
tr -dc '[:alnum:]')
For little-endian order, reverse the order of the bytes in the original string. In bash, and for a string of known length, you can do
a=0x$(printf "%s" "${string:7:1}${string:6:1}${string:5:1}${string:4:1}${string:3:1}${string:2:1}${string:1:1}${string:0:1}" |
od -t x1 -An |
tr -dc '[:alnum:]')
You can also get your platform's prefered endianness if your od
supports 8-byte types.
a=0x$(printf "%s" "$string" |
od -t x8 -An |
tr -dc '[:alnum:]')
Whether you can do arithmetic on $a
will depend on whether your bash supports 8-byte arithmetic. Even if it does, it'll treat it as a signed value.
Alternatively, use Perl:
a=0x$(perl -e 'print unpack "Q<", $ARGV[0]' "$string")
If your perl is compiled without 64-bit integer support, you'll need to break the bytes up.
a=0x$(perl -e 'printf "%x%08x\n", reverse unpack "L<L<", $ARGV[0]' "$string")
(Replace <
by >
for big-endian or remove it to get the platform endianness.)
Gilles' python method is definitely faster, but I thought I'd just throw in this *bash***+***std-single-purpose-tools* as generall grist to the mill .. It's probably as much about 'bc' as anything else... It has a lot of Initialization stuff, to cater for input files which are less than 64k... The hash is initialized to the file's length, and then each of the 64-bit integers is successively added to it; causing (expected) integer overflow.. bc
managed to do the trick...
# This script reads 8196 8-byte blocks (64 KiB) from the head and tail of a file
# Each 8-bytes block is interpreted as an unsigned 64-bit Little-Endian integer.
# The head integers and tail integers ar printed to stdout; one integer per line.
#
# INIT: If the file is smaller than 64k, calculate the number of unsigned ints to read
# ====
file="$1"
flen=($(du -b "$file")) # file length
qlen=8 # ui64 length in bytes
((flen<qlen)) && exit 1 # file is too short -- exit
bmax=$((64*1024)) # byte end of read (== byte max to read)
((flen<bmax)) && ((bmax=flen)) # reduce byte max to file length
qmax=$((bmax/qlen)) # ui64 end of read (== ui64 max to read)
(((qmax*qlen)<bmax)) && ((bmax=(qmax*qlen))) # round down byte max (/8)
hash=$(echo $flen |xxd -p -u)
#
# MAIN
# ====
for skip in 0 $((flen-bmax)) ;do
hash=$(dd if="$file" bs=1 count=$bmax skip=$skip 2>/dev/null |
xxd -p -u -c 8 |
{ echo -e " ibase=16 \n obase=10 \n scale=0 \n hash=$hash \n ouint=10000000000000000 "; \
sed -re "s/(..)(..)(..)(..)(..)(..)(..)(..)/hash=(hash+\8\7\6\5\4\3\2\1)%ouint/"; \
echo "hash"; } |bc)
done
echo $hash
#
# Output:
16A6528E803325FF