How to remove all text between the outer parentheses in a string?
NOTE: \(.*\)
matches the first (
from the left, then matches any 0+ characters (other than a newline if a DOTALL modifier is not enabled) up to the last )
, and does not account for properly nested parentheses.
To remove nested parentheses correctly with a regular expression in Python, you may use a simple \([^()]*\)
(matching a (
, then 0+ chars other than (
and )
and then a )
) in a while block using re.subn
:
def remove_text_between_parens(text):
n = 1 # run at least once
while n:
text, n = re.subn(r'\([^()]*\)', '', text) # remove non-nested/flat balanced parts
return text
Bascially: remove the (...)
with no (
and )
inside until no match is found. Usage:
print(remove_text_between_parens('stuff (inside (nested) brackets) (and (some(are)) here) here'))
# => stuff here
A non-regex way is also possible:
def removeNestedParentheses(s):
ret = ''
skip = 0
for i in s:
if i == '(':
skip += 1
elif i == ')'and skip > 0:
skip -= 1
elif skip == 0:
ret += i
return ret
x = removeNestedParentheses('stuff (inside (nested) brackets) (and (some(are)) here) here')
print(x)
# => 'stuff here'
See another Python demo
As mentioned before, you'd need a recursive regex for matching arbitrary levels of nesting but if you know there can only be a maximum of one level of nesting have a try with this pattern:
\((?:[^)(]|\([^)(]*\))*\)
[^)(]
matches a character, that is not a parenthesis (negated class).|\([^)(]*\)
or it matches another(
)
pair with any amount of non)(
inside.(?:
...)*
all this any amount of times inside(
)
Here is a demo at regex101
Before the alternation used [^)(]
without +
quantifier to fail faster if unbalanced.
You need to add more levels of nesting that might occure. Eg for max 2 levels:
\((?:[^)(]|\((?:[^)(]|\([^)(]*\))*\))*\)
Another demo at regex101