completely ignore lines that start with a specific pattern
To ignore some lines on a line-by-line basis, add /unwanted pattern/ {next}
or ! /wanted pattern/ {next}
at the beginning of the script.
Alternatively, filter with grep: grep -v 'unwanted pattern' | awk …
or grep 'wanted pattern' | awk …
. This may be faster if grep eliminates a lot of lines, because grep is typically faster than awk for the same task (grep is more specialized so it can be optimized for its task; awk is a full programming language, it can do a lot more but it's less efficient).
If you want to ignore a block of consecutive lines, awk has a convenient facility for that: add /^IRRELEVENT DATA/,/^END/ {next}
at the top of the script to ignore all lines starting with IRRELEVENT DATA
(sic) and the following lines until the first line that starts with END
. You can't do that with grep; you can do it with sed (sed '/^IRRELEVENT DATA/,/^END/d' | awk …
) but it's less likely to be a performance gain than grep.
Without using next, using negation instead.
input:
$ cat f.txt
GOOD STUFF
----------------
IRRELEVENT DATA
----------------
IGNORE ALL THESE
----------------
END OF IT
----------------
GOOD STUFF
I want to ignore lines starting with string IRRELEVENT or IGNORE or END:
$ awk '!/IRRELEVENT|IGNORE|END/{print }' <(cat f.txt)
GOOD STUFF
----------------
----------------
----------------
----------------
GOOD STUFF