Side-by-side comparison of more than two files containing numerical values

You could process each file and print a line with some character e.g. X for every missing number in the sequence 1-max (where max is the last number in that file), paste the results then replace that character with space:

paste \
<(awk 'BEGIN{n=1};{while (n<$1) {print "X";n++}};{n=$1+1};1' file1) \
<(awk 'BEGIN{n=1};{while (n<$1) {print "X";n++}};{n=$1+1};1' file2) \
<(awk 'BEGIN{n=1};{while (n<$1) {print "X";n++}};{n=$1+1};1' file3) \
| tr X ' '

If a certain value is missing from all files you'll get empty lines in your output (actually they're not empty, they contain only blanks).
To remove them replace tr X ' ' with sed '/[[:digit:]]/!d;s/X/ /g' Also, if you need a header you can always run something like this first:

 printf '\t%s' file1 file2 file3 | cut -c2-

A general solution with awk: requires GNU awk

gawk -v level=0 '
    FNR==1 {level++; head[level]=FILENAME}
    !seen[$1]++ { n++; idx[$1] = n }
    { out[idx[$1]][level] = $1 }
    END {
        for (j=1; j<=level; j++) {
            printf "%s\t", head[j]
        }
        print ""
        for (i=1; i<=n; i++) {
            for (j=1; j<=level; j++) {
                printf "%s\t", out[i][j]
            }
            print ""
        }
    }
' file{1,2,3,4}

file1   file2   file3   file4   
1   1   1       
2           2   
3   3           
    4       4   
        5       
            6

Took a different and simpler approach to this based on Don's comment:

gawk '
    FNR==1 { printf "%s\t", FILENAME }
    { seen[$1][FILENAME] = $1 } 
    END {
        print ""
        PROCINFO["sorted_in"]="@ind_num_asc"
        for (i in seen) {
            for (j=1; j<=ARGC; j++) {
                printf "%s\t", seen[i][ARGV[j]]
            } 
            print ""
        }
    }
' file{1,2,3,4}

file1   file2   file3   file4       
    1   1           
            2       
3   3               
    4       4       
5       5           
            6       
7

A solution with bash, join, paste, and bad taste:

#! /usr/bin/env bash

if [ $# -lt 3 ]; then exit 1; fi

files=( '' "$@" )

declare -a temps
for ((i=0; i<=$#; i++)); do
    [ $i -eq 0 -o -f "${files[$i]}" ] || exit 1
    temps[$i]=$( mktemp -t "${0##*/}"_$$_XXXXXXXX ) || exit 1
done
trap 'rm -f "${temps[@]}"' EXIT HUP INT QUIT TERM

cat "$@" | sort -u >"${temps[0]}"

TAB=$( printf '\t' )
for ((i=1; i<=$#; i++)); do
    join -j1 -a1 -t"$TAB" "${temps[0]}" <(paste "${files[$i]}" "${files[$i]}") | \
        sed "/^[^$TAB]\$/ s/\$/$TAB/" >"${temps[$i]}"
done

printf '%s' ${files[1]}
for ((i=2; i<=$#; i++)); do
    printf '\t%s' ${files[$i]}
    let j=i-1
    let k=i-2
    join -j1 -t"$TAB" "${temps[$j]}" "${temps[$i]}" >"${temps[$k]}"
    cat "${temps[$k]}" >"${temps[$i]}"
done
printf '\n'

cut -d "$TAB" -f 2- <"${temps[$#]}" | sort -n

Except for the last sort -n, all this works with any text items rather than numbers, as long as the items don't contain tabs (but TAB can be changed to any other separator). Also, it could be done with just 3 temporary files and some shuffling things around (but that would just increase the bad taste).

Side-by-side comparison of more than two files containing numerical values

Tags:

Awk

Diff

Text Processing

Related

Recent Posts