Column deletion based on number of string matches within column

One approach is to read the file twice. The first time one counts the F's, and the second time one outputs the line. So something like

#!/bin/sh

awk -v n=3 '
        NR==FNR { for (i=1;i<=NF;i++) { if ($i == "F") { c[i]++ }} ;next }                                                                            
        { for (i=1;i<=NF;i++) { if (c[i] < n) { printf("%s ", $i) } } ;printf("\n") }                                                                 

' filename filename

The NR==FNR is a trick to see if this is the first or second time we are reading the file. Assuming there are any lines at all in the file then it is true only when reading the file the first time. The array c is a count of the number of F characters in that column. The next says that all the processing for that line is finished when reading the file the first time. The second line is executed the second time the file is read.

Here's an illustration of the transpose - line filter - transpose method. It's perhaps unsuitable for your (large file) case but may be of value to others:

$ cat file
F G F H H
G F F F A
F G F F F
F F F T F

then

$ rs -T < file | perl -alne 'print unless (grep { $_ eq "F" } @F) > 3' | rs -T
F  G  H  H
G  F  F  A
F  G  F  F
F  F  T  F

Column deletion based on number of string matches within column

Tags:

Awk

Text Processing

Related

Recent Posts