extracting unique values between 2 sets/files
Using some lesser-known utilities:
sort file1 > file1.sorted
sort file2 > file2.sorted
comm -1 -3 file1.sorted file2.sorted
This will output duplicates, so if there is 1 3
in file1
, but 2 in file2
, this will still output 1 3
. If this is not what you want, pipe the output from sort
through uniq
before writing it to a file:
sort file1 | uniq > file1.sorted
sort file2 | uniq > file2.sorted
comm -1 -3 file1.sorted file2.sorted
There are lots of utilities in the GNU coreutils package that allow for all sorts of text manipulations.
$ awk 'FNR==NR {a[$0]++; next} !($0 in a)' file1 file2
6
7
Explanation of how the code works:
- If we're working on file1, track each line of text we see.
- If we're working on file2, and have not seen the line text, then print it.
Explanation of details:
FNR
is the current file's record numberNR
is the current overall record number from all input filesFNR==NR
is true only when we are reading file1$0
is the current line of texta[$0]
is a hash with the key set to the current line of texta[$0]++
tracks that we've seen the current line of text!($0 in a)
is true only when we have not seen the line text- Print the line of text if the above pattern returns true, this is the default awk behavior when no explicit action is given