Remove lines based on pattern but keeping first n lines that match

If you want to delete all lines starting with % put preserving the first two lines of input, you could do:

sed -e 1,2b -e '/^%/d'

Though the same would be more legible with awk:

awk 'NR <= 2 || !/^%/'

Or, if you're after performance:

{ head -n 2; grep -v '^%'; } < input-file

If you want to preserve the first two lines matching the pattern while they may not be the first ones of the input, awk would certainly be a better option:

awk '!/^%/ || ++n <= 2'

With sed, you could use tricks like:

sed -e '/^%/!b' -e 'x;/xx/{h;d;}' -e 's/^/x/;x'

That is, use the hold space to count the number of occurrences of the patterns matched so far. Not terribly efficient or legible.

I'm afraid sed alone is a bit too simple for this (not that it would be impossible, rather complicated - see e.g. sed sokoban for what can be done).

How about awk?

#!/bin/awk -f
BEGIN { c = 0; }
{
    if (/^%/) {
        if (c++ < 3) {
            print;
        }
    } else {
        print;
    }
}

If you can rely on using recent enough BASH (which supports regular expressions), the awk above can be translated to:

#!/bin/bash -
c=0
while IFS= read -r line; do
    if [[ $line =~ ^% ]]; then
        if ((c++ < 3)); then
            printf '%s\n' "$line"
        fi
    else
        printf '%s\n' "$line"
    fi
done

You can also use sed or grep to do the pattern matching instead of the =~ operator.

A Perl one-liners solution:

# in-place editing
perl -i -pe '$.>2 && s/^%.*//s' filename.txt

# print to the standard output
perl -ne '$.>2 && /^%/ || print' filename.txt

Remove lines based on pattern but keeping first n lines that match

Tags:

Awk

Sed

Text Processing

Related

Recent Posts