Unix command to find lines common in two files

The command you are seeking is comm. eg:-

comm -12 1.sorted.txt 2.sorted.txt

Here:

-1 : suppress column 1 (lines unique to 1.sorted.txt)

-2 : suppress column 2 (lines unique to 2.sorted.txt)

To easily apply the comm command to unsorted files, use Bash's process substitution:

$ bash --version
GNU bash, version 3.2.51(1)-release
Copyright (C) 2007 Free Software Foundation, Inc.
$ cat > abc
123
567
132
$ cat > def
132
777
321

So the files abc and def have one line in common, the one with "132". Using comm on unsorted files:

$ comm abc def
123
    132
567
132
    777
    321
$ comm -12 abc def # No output! The common line is not found
$

The last line produced no output, the common line was not discovered.

Now use comm on sorted files, sorting the files with process substitution:

$ comm <( sort abc ) <( sort def )
123
            132
    321
567
    777
$ comm -12 <( sort abc ) <( sort def )
132

Now we got the 132 line!

To complement the Perl one-liner, here's its awk equivalent:

awk 'NR==FNR{arr[$0];next} $0 in arr' file1 file2

This will read all lines from file1 into the array arr[], and then check for each line in file2 if it already exists within the array (i.e. file1). The lines that are found will be printed in the order in which they appear in file2. Note that the comparison in arr uses the entire line from file2 as index to the array, so it will only report exact matches on entire lines.

Unix command to find lines common in two files

Tags:

Unix

Shell

Command Line

Related

Recent Posts