extracting unique values between 2 sets/files

Using some lesser-known utilities:

sort file1 > file1.sorted
sort file2 > file2.sorted
comm -1 -3 file1.sorted file2.sorted

This will output duplicates, so if there is 1 3 in file1, but 2 in file2, this will still output 1 3. If this is not what you want, pipe the output from sort through uniq before writing it to a file:

sort file1 | uniq > file1.sorted
sort file2 | uniq > file2.sorted
comm -1 -3 file1.sorted file2.sorted

There are lots of utilities in the GNU coreutils package that allow for all sorts of text manipulations.

$ awk 'FNR==NR {a[$0]++; next} !($0 in a)' file1 file2
6
7

Explanation of how the code works:

If we're working on file1, track each line of text we see.
If we're working on file2, and have not seen the line text, then print it.

Explanation of details:

FNR is the current file's record number
NR is the current overall record number from all input files
FNR==NR is true only when we are reading file1
$0 is the current line of text
a[$0] is a hash with the key set to the current line of text
a[$0]++ tracks that we've seen the current line of text
!($0 in a) is true only when we have not seen the line text
Print the line of text if the above pattern returns true, this is the default awk behavior when no explicit action is given

extracting unique values between 2 sets/files

Tags:

Linux

Scripting

Command Line

Bash

Perl

Related

Recent Posts