Bash - pair each line of file

$ join -j 2 -o 1.1,2.1 file file | awk '!seen[$1,$2]++ && !seen[$2,$1]++'
a b
a c
a d
a e
b c
b d
b e
c d
c e
d e

This assumes that no line in the input file contains any whitespace. It also assumes that the file is sorted.

The join command creates the full cross product of the lines in the file. It does this by joining the file with itself on a non-existing field. The non-standard -j 2 may be replaced by -1 2 -2 2 (but not by -j2 unless you use GNU join).

The awk command reads the result of this and only outputs results that are pairs that has not yet been seen.

Use this command:

Click to copy

awk '{ name[$1]++ }
    END { PROCINFO["sorted_in"] = "@ind_str_asc"
        for (v1 in name) for (v2 in name) if (v1 < v2) print v1, v2 }
        ' files.dat

PROCINFO may be a gawk extension. If your awk doesn’t support it, just leave out the PROCINFO["sorted_in"] = "@ind_str_asc" line and pipe the output into sort (if you want the output sorted).

(This does not require the input to be sorted.)

A python solution. The input file is fed to itertools.combinations from the standard library, which generates 2-length tuples that are formatted and printed to standard output.

Click to copy

python3 -c 'from itertools import combinations
with open("file") as f:
    lines = (line.rstrip() for line in f)
    lines = ("{} {}".format(x, y) for x, y in combinations(lines, 2))
    print(*lines, sep="\n")
'

Bash - pair each line of file

Tags:

Text Processing

Shell Script

Related

Recent Posts