filter lines based on set of words in specific column
With GNU awk
, gensub
could be used to remove all those words, print if empty:
awk -F , -v OFS=, 'gensub(/last|lst|name|nm|[0-9_-]*/,"","g",tolower($1))=="" {
$2="found";
print $1, $2
}' file
Unlike sub
/gsub
, gensub
leaves the original record intact and instead returns the resulting string. The same approach could be used with standard awk
by copying field into a variable.
To include more characters than [0-9_-]
, you could use [^[:alpha:]]
(i.e. anything that isn't a letter
):
last|lst|name|nm|[^[:alpha:]]
Try this,
awk -F, -v OFS=, '
{
split($1,w,/[^[:alnum:]]/);
for (i in w) {
if (!(match(tolower(w[i]),/\<([0-9]*|last|nm|name|lastnm|lastname)\>/))) next;
}
$2="Found"; print;
}' file
Output:
LastNm,Found
last_nm,Found
4-LastNm,Found
Explanation:
split
field$1
by all not ([^.]
) alphanumeric ([:alnum:]
) characters to get list of words.for
loop over these words.- if a word does not match the given regex holding the allowed words, jump to
next
record. - if that did not happen, we can finally assign
$2="Found"
andprint
the record