What do 'lazy' and 'greedy' mean in the context of regular expressions?
Greedy will consume as much as possible. From http://www.regular-expressions.info/repeat.html we see the example of trying to match HTML tags with <.+>
. Suppose you have the following:
<em>Hello World</em>
You may think that <.+>
(.
means any non newline character and +
means one or more) would only match the <em>
and the </em>
, when in reality it will be very greedy, and go from the first <
to the last >
. This means it will match <em>Hello World</em>
instead of what you wanted.
Making it lazy (<.+?>
) will prevent this. By adding the ?
after the +
, we tell it to repeat as few times as possible, so the first >
it comes across, is where we want to stop the matching.
I'd encourage you to download RegExr, a great tool that will help you explore Regular Expressions - I use it all the time.
'Greedy' means match longest possible string.
'Lazy' means match shortest possible string.
For example, the greedy h.+l
matches 'hell'
in 'hello'
but the lazy h.+?l
matches 'hel'
.
Greedy quantifier | Lazy quantifier | Description |
---|---|---|
* |
*? |
Star Quantifier: 0 or more |
+ |
+? |
Plus Quantifier: 1 or more |
? |
?? |
Optional Quantifier: 0 or 1 |
{n} |
{n}? |
Quantifier: exactly n |
{n,} |
{n,}? |
Quantifier: n or more |
{n,m} |
{n,m}? |
Quantifier: between n and m |
Add a ? to a quantifier to make it ungreedy i.e lazy.
Example:
test string : stackoverflow
greedy reg expression : s.*o
output: stackoverflow
lazy reg expression : s.*?o
output: stackoverflow