Side-by-side comparison of more than two files containing numerical values
You could process each file and print a line with some character e.g. X
for every missing number in the sequence 1-max (where max is the last number in that file), paste
the results then replace that character with space:
paste \
<(awk 'BEGIN{n=1};{while (n<$1) {print "X";n++}};{n=$1+1};1' file1) \
<(awk 'BEGIN{n=1};{while (n<$1) {print "X";n++}};{n=$1+1};1' file2) \
<(awk 'BEGIN{n=1};{while (n<$1) {print "X";n++}};{n=$1+1};1' file3) \
| tr X ' '
If a certain value is missing from all files you'll get empty lines in your output (actually they're not empty, they contain only blanks).
To remove them replace tr X ' '
with sed '/[[:digit:]]/!d;s/X/ /g'
Also, if you need a header you can always run something like this first:
printf '\t%s' file1 file2 file3 | cut -c2-
A general solution with awk: requires GNU awk
gawk -v level=0 '
FNR==1 {level++; head[level]=FILENAME}
!seen[$1]++ { n++; idx[$1] = n }
{ out[idx[$1]][level] = $1 }
END {
for (j=1; j<=level; j++) {
printf "%s\t", head[j]
}
print ""
for (i=1; i<=n; i++) {
for (j=1; j<=level; j++) {
printf "%s\t", out[i][j]
}
print ""
}
}
' file{1,2,3,4}
file1 file2 file3 file4
1 1 1
2 2
3 3
4 4
5
6
Took a different and simpler approach to this based on Don's comment:
gawk '
FNR==1 { printf "%s\t", FILENAME }
{ seen[$1][FILENAME] = $1 }
END {
print ""
PROCINFO["sorted_in"]="@ind_num_asc"
for (i in seen) {
for (j=1; j<=ARGC; j++) {
printf "%s\t", seen[i][ARGV[j]]
}
print ""
}
}
' file{1,2,3,4}
file1 file2 file3 file4
1 1
2
3 3
4 4
5 5
6
7
A solution with bash
, join
, paste
, and bad taste:
#! /usr/bin/env bash
if [ $# -lt 3 ]; then exit 1; fi
files=( '' "$@" )
declare -a temps
for ((i=0; i<=$#; i++)); do
[ $i -eq 0 -o -f "${files[$i]}" ] || exit 1
temps[$i]=$( mktemp -t "${0##*/}"_$$_XXXXXXXX ) || exit 1
done
trap 'rm -f "${temps[@]}"' EXIT HUP INT QUIT TERM
cat "$@" | sort -u >"${temps[0]}"
TAB=$( printf '\t' )
for ((i=1; i<=$#; i++)); do
join -j1 -a1 -t"$TAB" "${temps[0]}" <(paste "${files[$i]}" "${files[$i]}") | \
sed "/^[^$TAB]\$/ s/\$/$TAB/" >"${temps[$i]}"
done
printf '%s' ${files[1]}
for ((i=2; i<=$#; i++)); do
printf '\t%s' ${files[$i]}
let j=i-1
let k=i-2
join -j1 -t"$TAB" "${temps[$j]}" "${temps[$i]}" >"${temps[$k]}"
cat "${temps[$k]}" >"${temps[$i]}"
done
printf '\n'
cut -d "$TAB" -f 2- <"${temps[$#]}" | sort -n
Except for the last sort -n
, all this works with any text items rather than numbers, as long as the items don't contain tabs (but TAB
can be changed to any other separator). Also, it could be done with just 3 temporary files and some shuffling things around (but that would just increase the bad taste).