Sed to print only first pattern match of the line

The .* in the regex pattern is greedy, it matches as long a string as it can, so the quotes that are matched will be the last ones.

Since the separator is only one character here, we can use an inverted bracket group to match anything but a quote, i.e. [^"], and then repeats of that to match a number of characters that aren't quotes.

$ echo '... "foo" ... "bar" ...' | sed 's/[^"]*"\([^"]*\)".*/\1/'
foo

Another way would be to just remove everything up to the first quote, then remove everything starting from the (new) first quote:

$ echo '... "foo" ... "bar" ...' | sed 's/^[^"]*"//; s/".*$//'
foo

In Perl regexes, the * and + specifiers can be made non-greedy by appending a question mark, so .*? would anything, but as few characters/bytes as possible.

I won't bore you with the classic warning against using simple regular expressions to parse HTML. Suffice it to say that you should use a dedicated parser instead. That said, the issue here is that sed uses greedy matching. So it will always match the longest possible string. This means that your .* goes on for ever and matches the entire line.

You could do this in sed (see below), but using a tool that allows non-greedy matches would be simpler:

$ perl -pe 's/.*?"(.*?)".*/$1/' file
data1

Since sed doesn't support non-greedy matches, you need some other trickery. The simplest would be to use the "not quotes" approach in ikkachu's answer. Here's an alternative:

$ rev file | sed 's/.*"\(.*\)".*/\1/' | rev
data1

This just reverses the file (rev), uses your original approach which now works since the 1st occurrence is now the last, and then reverses the file back again.

Here are a couple of ways you could pull out data1 from your input:

grep -oP '^[^"]*"\K[^"]*'

sed -ne '
   /\n/!{y/"/\n/;D;}
   P
'

perl -lne '/"([^"]*)"/ and print($1),last'

Sed to print only first pattern match of the line

Tags:

Sed

Related

Recent Posts