Bash - pair each line of file
$ join -j 2 -o 1.1,2.1 file file | awk '!seen[$1,$2]++ && !seen[$2,$1]++'
a b
a c
a d
a e
b c
b d
b e
c d
c e
d e
This assumes that no line in the input file contains any whitespace. It also assumes that the file is sorted.
The join
command creates the full cross product of the lines in the file. It does this by joining the file with itself on a non-existing field. The non-standard -j 2
may be replaced by -1 2 -2 2
(but not by -j2
unless you use GNU join
).
The awk
command reads the result of this and only outputs results that are pairs that has not yet been seen.
Use this command:
awk '{ name[$1]++ }
END { PROCINFO["sorted_in"] = "@ind_str_asc"
for (v1 in name) for (v2 in name) if (v1 < v2) print v1, v2 }
' files.dat
PROCINFO
may be a gawk
extension.
If your awk
doesn’t support it,
just leave out the PROCINFO["sorted_in"] = "@ind_str_asc"
line
and pipe the output into sort
(if you want the output sorted).
(This does not require the input to be sorted.)
A python
solution.
The input file is fed to itertools.combinations
from the standard library, which generates 2-length tuples that are formatted and printed to standard output.
python3 -c 'from itertools import combinations
with open("file") as f:
lines = (line.rstrip() for line in f)
lines = ("{} {}".format(x, y) for x, y in combinations(lines, 2))
print(*lines, sep="\n")
'