extracting unique values between 2 sets/files

Using some lesser-known utilities:

sort file1 > file1.sorted
sort file2 > file2.sorted
comm -1 -3 file1.sorted file2.sorted

This will output duplicates, so if there is 1 3 in file1, but 2 in file2, this will still output 1 3. If this is not what you want, pipe the output from sort through uniq before writing it to a file:

sort file1 | uniq > file1.sorted
sort file2 | uniq > file2.sorted
comm -1 -3 file1.sorted file2.sorted

There are lots of utilities in the GNU coreutils package that allow for all sorts of text manipulations.


$ awk 'FNR==NR {a[$0]++; next} !($0 in a)' file1 file2
6
7

Explanation of how the code works:

  • If we're working on file1, track each line of text we see.
  • If we're working on file2, and have not seen the line text, then print it.

Explanation of details:

  • FNR is the current file's record number
  • NR is the current overall record number from all input files
  • FNR==NR is true only when we are reading file1
  • $0 is the current line of text
  • a[$0] is a hash with the key set to the current line of text
  • a[$0]++ tracks that we've seen the current line of text
  • !($0 in a) is true only when we have not seen the line text
  • Print the line of text if the above pattern returns true, this is the default awk behavior when no explicit action is given