bash cat unique lines code example

Example 1: bash return unique lines based on column

# Basic syntax:
awk -F'delimiter' '!a[$column_number]++' input_file
# Where:
#	- The column_number is the column you want to use for checking
#		if a line is unique
#	- The delimiter is the delimiter used in the input_file. Note that
#		awk recognizes spaces, tabs, and new lines as delimiters by 
#		default, so this can be omitted if your file uses one of those.

# Example usage 1:
# Say you have a comma-delimited file with the following rows and only
#	want one copy of the row based on the third field
10,15-10-2014,abc
20,12-10-2014,bcd
10,09-10-2014,def
40,06-10-2014,ghi
10,15-10-2014,abc # Looking only at field 3, this row duplicates row 1

# Running:
awk -F',' '!a[$3]++' input_file 
# Would return:
10,15-10-2014,abc
20,12-10-2014,bcd
10,09-10-2014,def
40,06-10-2014,ghi
# The second instance of abc in the third field/column is ignored. 

# Note, this only returns the first line that has a duplicate entry in
#	your specified column.

# Example usage 2:
# Say you have a tab-delimited file with the following rows and only
#	want one copy of each row based on the first AND fourth fields
a	11	b	17
c	23	d	27
a	31	e	17 # Looking only at fields 1 and 4 this row duplicates row 1
a	17	f	42

# Running:
awk '!a[$1$4]++' input_file
# Would return:
a	11	b	17
c	23	d	27
a	17	f	42

Example 2: bash print lines unique to one file

# Example usage using awk:
awk 'FNR==NR { b[$0] = 1; next } !b[$0]' input_file_1 input_file_2
# This command prints all lines of input_file_2 that aren't found in
#	input_file_1 (i.e. lines that are unique to input_file_2). Another
#	way of saying this is that this removes all lines of input_file_2
#	that are found in input_file_1. 
#	It does this by first adding all unique lines of input_file_1 to an 
#	associative array, which works like a dictionary with a key value
#	pair. Then each line of input_file_2 is compared against the array
#	and if it isn't found in the array, it is printed.

# Note, change $0 to $# if you want to remove lines based on values in
#	specific fields (which don't have to be the same in each file)
# Note, unlike some comm and diff commands, this solution doesn't require
#	the files to be sorted.