Removing rows containing NA in every column

With awk:

awk '{ for (i=2;i<=NF;i++) if ($i!="NA"){ print; break } }' file

Loop through the fields starting at the second field and print the line if a field not containing NA is found. Then break the loop.

Using GNU sed

sed -e '/g[0-9]\+\(\s*NA\s*\)\+$/d' filename

Short explanation:

g[0-9]\+$\s*NA\s*$\+$ is a regex matching g followed by at least one digit, then any number of NAs with optional spaces between until the end of the line.

sed -e '/<regex>/d' deletes all lines that match <regex>

A more standard regexp with the same meaning would be:

sed -Ee '/g[0-9]+([[:space:]]*NA[[:space:]]*)+$/d' filename

With all from the Perl List::Util module:

$ perl -MList::Util=all -alne 'shift @F; print unless all { $_ eq "NA" } @F' file
gene  v1  v2  v3  v4
g2    NA  NA  2   3
g4    1   2   3   2

Removing rows containing NA in every column

Tags:

Perl

Awk

Text Processing

Bioinformatics

Related

Recent Posts