awk - count pattern in the entire column
Your awk
command has several issues.
- You have not specified the field separator, so
awk
splits the lines at whitespace, not,
. You can use the-F','
command-line option to set the field separator. - Your RegExp states
/^E_/
and hence would look for fields that don't start withE_
(which none of your column 2 values does), not merely those that don't start withE
. Remove the_
. - Your command would also count the header line. You can use the
FNR
internal variable (which is automatically set to the current line number within the current file) to exclude the first line. - As noted by Rakesh Sharma, if all lines start with
E
, the command would print the empty string at the end instead of a0
because of the use of an uninitialized variable. You can force interpretation as number by printingcount+0
instead ofcount
.
A corrected version would be
awk -F',' 'FNR>1 && $2!~/^E/{count++} END{print count+0}' FinalOutput.csv
Note that since I used the FNR
per-file line-counter (rather then the global line-counter NR
), this would also work with more than one input file where all of them have a header line, i.e. you could even use it as
awk -F',' ' ... ' FinalOutput1.csv FinalOutput2.csv ...
Some other approaches:
awk
defaults to printing if a condition is true, so you could simply do:$ awk -F, 'NR>1 && $2!~/^E/' file | wc -l 4
print the file starting from the second line, and count how many times you see a comma followed by a non-E character (note that this assumes only one comma per line as shown in your example):
$ tail -n+2 file | grep -c ',[^E]' 4
perl
$ perl -F, -lane '$c++ if $.>1 && $F[1] !~ /^E/ }{ print $c' file 4
sed
andwc
$ sed -n '1d; /,[^E]/p' file | wc -l 4
You're very close, awk -F, 'NR>1{if ($2 !~ /^E/){count++}} END {print count}'
should work.
-F,
tells awk that ,
is the delimiter
NR>1
strips the header
I ran this on your sample file and it produced the correct output