Common lines between two files

Use comm -12 file1 file2 to get common lines in both files.

You may also needs your file to be sorted to comm to work as expected.

comm -12 <(sort file1) <(sort file2)

From man comm:

-1     suppress column 1 (lines unique to FILE1)
-2     suppress column 2 (lines unique to FILE2)

Or using grep command you need to add -x option to match the whole line as a matching pattern. The F option is telling grep that match pattern as a string not a regex match.

grep -Fxf file1 file2

Or using awk.

awk 'NR==FNR{seen[$0]=1; next} seen[$0]' file1 file2

This is reading whole line of file1 into an array called seen with the key as whole line (in awk the $0 represent the whole current line).

We used NR==FNR as condition to run its followed block only for first input fle1 not file2, because NR in awk refer to the current processing line number and FNR is referring to the current line number in all inputs. so NR is unique for each input file but FNR is unique for all inputs.

The next is there telling awk do not continue rest code and start again until NR wan not equal with FNR that means all lines of file1 read by awk.

Then next seen[$0] will only run for second file2 and for each line in file2 will look into the array and will print that line where it does exist in array.

Another simple option is using sort and uniq:

sort file1 file2|uniq -d

This will print both files sorted then uniq -d will print only duplicated lines. BUT this is granted when there is NO duplicated lines in both files themselves, else below is always granted even if there is a lines duplicated within both files.

uniq -d <(sort <(sort -u file1) <(sort -u file2))

Since you're running on Linux, I suppose it's GNU/Linux and you are using the GNU diff command.

If you're running the GNU diff command, this is how to see all changed lines as well as common lines:

diff \
--old-line-format='-%l
' \
--new-line-format='+%l
' \
--unchanged-line-format=' %l
' \
"$@"

This is similar to classic diff output, but no file names or separator lines appear in output, and old lines are marked with -, new lines are prefixed with +, and common lines are prefixed with a space .

Here's an example shell script and the resulting output on test files:

$ cat diffcomm.sh
#!/bin/sh
diff \
--old-line-format='-%l
' \
--new-line-format='+%l
' \
--unchanged-line-format=' %l
' \
"$@"
$ cat > filea
a
b
c
d
$ cat > fileb
a
z
d
$ ./diffcomm.sh  filea fileb
 a
-b
-c
+z
 d
$

You can modify the output format for each class of line.

See man diff or info diff or the GNU diffutils documentation for more information.

Common lines between two files

Tags:

Awk

Diff

Text Processing

Uniq

Comm

Related

Recent Posts