Common lines between two files
Use comm -12 file1 file2
to get common lines in both files.
You may also needs your file to be sorted to comm
to work as expected.
comm -12 <(sort file1) <(sort file2)
From man comm
:
-1 suppress column 1 (lines unique to FILE1)
-2 suppress column 2 (lines unique to FILE2)
Or using grep
command you need to add -x
option to match the whole line as a matching pattern. The F
option is telling grep
that match pattern as a string not a regex match.
grep -Fxf file1 file2
Or using awk
.
awk 'NR==FNR{seen[$0]=1; next} seen[$0]' file1 file2
This is reading whole line of file1 into an array called seen
with the key as whole line (in awk
the $0
represent the whole current line).
We used NR==FNR
as condition to run its followed block only for first input fle1 not file2, because NR
in awk
refer to the current processing line number and FNR
is referring to the current line number in all inputs. so NR
is unique for each input file but FNR
is unique for all inputs.
The next
is there telling awk
do not continue rest code and start again until NR
wan not equal with FNR
that means all lines of file1 read by awk
.
Then next seen[$0]
will only run for second file2 and for each line in file2 will look into the array and will print that line where it does exist in array.
Another simple option is using sort
and uniq
:
sort file1 file2|uniq -d
This will print both files sorted then uniq -d
will print only duplicated lines. BUT this is granted when there is NO duplicated lines in both files themselves, else below is always granted even if there is a lines duplicated within both files.
uniq -d <(sort <(sort -u file1) <(sort -u file2))
Since you're running on Linux, I suppose it's GNU/Linux and you are using the GNU diff
command.
If you're running the GNU diff
command, this is how to see all changed lines as well as common lines:
diff \
--old-line-format='-%l
' \
--new-line-format='+%l
' \
--unchanged-line-format=' %l
' \
"$@"
This is similar to classic diff
output, but no file names or separator lines appear in output, and old lines are marked with -
, new lines are prefixed with +
, and common lines are prefixed with a space .
Here's an example shell script and the resulting output on test files:
$ cat diffcomm.sh
#!/bin/sh
diff \
--old-line-format='-%l
' \
--new-line-format='+%l
' \
--unchanged-line-format=' %l
' \
"$@"
$ cat > filea
a
b
c
d
$ cat > fileb
a
z
d
$ ./diffcomm.sh filea fileb
a
-b
-c
+z
d
$
You can modify the output format for each class of line.
See man diff
or info diff
or the GNU diffutils documentation for more information.