Matching an optional substring in a regex

(\d+)\s+(\(.*?\))?\s?Z

Note the escaped parentheses, and the ? (zero or once) quantifiers. Any of the groups you don't want to capture can be (?: non-capture groups).

I agree about the spaces. \s is a better option there. I also changed the quantifier to insure there are digits at the beginning. As far as newlines, that would depend on context: if the file is parsed line by line it won't be a problem. Another option is to anchor the start and end of the line (add a ^ at the front and a $ at the end).


You can do this:

([0-9]+) (\([^)]+\))? Z

This will not work with nested parens for Y, however. Nesting requires recursion which isn't strictly regular any more (but context-free). Modern regexp engines can still handle it, albeit with some difficulties (back-references).


This ought to work:

^\d+\s?(\([^\)]+\)\s?)?Z$

Haven't tested it though, but let me give you the breakdown, so if there are any bugs left they should be pretty straightforward to find:

First the beginning:

^ = beginning of string
\d+ = one or more decimal characters
\s? = one optional whitespace

Then this part:

(\([^\)]+\)\s?)?

Is actually:

(.............)?

Which makes the following contents optional, only if it exists fully

\([^\)]+\)\s?

\( = an opening bracket
[^\)]+ = a series of at least one character that is not a closing bracket
\) = followed by a closing bracket
\s? = followed by one optional whitespace

And the end is made up of

Z$

Where

Z = your constant string
$ = the end of the string

Tags:

Regex