bash keep unique lines by column code example
Example 1: bash return unique lines based on column
# Basic syntax:
awk -F'delimiter' '!a[$column_number]++' input_file
# Where:
# - The column_number is the column you want to use for checking
# if a line is unique
# - The delimiter is the delimiter used in the input_file. Note that
# awk recognizes spaces, tabs, and new lines as delimiters by
# default, so this can be omitted if your file uses one of those.
# Example usage 1:
# Say you have a comma-delimited file with the following rows and only
# want one copy of the row based on the third field
10,15-10-2014,abc
20,12-10-2014,bcd
10,09-10-2014,def
40,06-10-2014,ghi
10,15-10-2014,abc # Looking only at field 3, this row duplicates row 1
# Running:
awk -F',' '!a[$3]++' input_file
# Would return:
10,15-10-2014,abc
20,12-10-2014,bcd
10,09-10-2014,def
40,06-10-2014,ghi
# The second instance of abc in the third field/column is ignored.
# Note, this only returns the first line that has a duplicate entry in
# your specified column.
# Example usage 2:
# Say you have a tab-delimited file with the following rows and only
# want one copy of each row based on the first AND fourth fields
a 11 b 17
c 23 d 27
a 31 e 17 # Looking only at fields 1 and 4 this row duplicates row 1
a 17 f 42
# Running:
awk '!a[$1$4]++' input_file
# Would return:
a 11 b 17
c 23 d 27
a 17 f 42
Example 2: count the number of unique elements in a column bash
# Example usage:
awk -F '\t' '{print $7}' input_file | sort | uniq -c
# Breakdown:
# awk returns the 7th tab-delimited column/field of the input_file
# sort sorts the entries so that duplicate entries are adjacent
# uniq -c returns the counts of each type of element
# Note, add "| cut -c 9-" to remove the counts if you only want the
# unique values found in the column/field