Delete lines which have more number of specific word

Assuming the key is the 4th field and records with identical keys are consecutive (and I understood your question correctly), you could do something like:

perl -lane '
  $na = grep {$_ eq "NA"} @F;

  if ($F[3] eq $last_key) {
    if ($na < $min_na) {
      $min_na = $na; $min = $_
    }
  } else {
    print $min unless $. == 1;
    $last_key = $F[3]; $min = $_; $min_na = $na;
  }
  END{print $min if $.}' < your-file

Which among consecutive lines with same 4th field, prints the first one with the least number of NA fields.

If they're not consecutive, you could use some sorting:

< yourfile awk '{for (i=n=0;i<NF;i++) if ($i == "NA") n++; print n, $0}' |
  sort -k5,5 -k1,1n |
  sort -muk5,5 |
  cut -d ' ' -f 2-

With busybox sort, you'd want to add the -s option to the second invocation as it seems to do some level of sorting of the input again despite the -m.

Delete lines which have more number of specific word

Tags:

Awk

Sed

Text Processing

Related

Recent Posts