filter lines based on set of words in specific column

With GNU awk, gensub could be used to remove all those words, print if empty:

awk -F , -v OFS=, 'gensub(/last|lst|name|nm|[0-9_-]*/,"","g",tolower($1))=="" {
    $2="found";
    print $1, $2
}' file

Unlike sub/gsub, gensub leaves the original record intact and instead returns the resulting string. The same approach could be used with standard awk by copying field into a variable.

To include more characters than [0-9_-], you could use [^[:alpha:]] (i.e. anything that isn't a letter):

Click to copy

last|lst|name|nm|[^[:alpha:]]

Try this,

Click to copy

awk -F, -v OFS=, '
{
split($1,w,/[^[:alnum:]]/);
for (i in w) {
    if (!(match(tolower(w[i]),/\<([0-9]*|last|nm|name|lastnm|lastname)\>/))) next;
}
$2="Found"; print; 
}' file

Output:

Click to copy

LastNm,Found
last_nm,Found
4-LastNm,Found

Explanation:

split field $1 by all not ([^.]) alphanumeric ([:alnum:]) characters to get list of words.
for loop over these words.
if a word does not match the given regex holding the allowed words, jump to next record.
if that did not happen, we can finally assign $2="Found" and print the record

filter lines based on set of words in specific column

Tags:

Awk

Text Processing

Related

Recent Posts