Remove text between () and []
Here's a solution similar to @pradyunsg's answer (it works with arbitrary nested brackets):
def remove_text_inside_brackets(text, brackets="()[]"):
count = [0] * (len(brackets) // 2) # count open/close brackets
saved_chars = []
for character in text:
for i, b in enumerate(brackets):
if character == b: # found bracket
kind, is_close = divmod(i, 2)
count[kind] += (-1)**is_close # `+1`: open, `-1`: close
if count[kind] < 0: # unbalanced bracket
count[kind] = 0 # keep it
else: # found bracket to remove
break
else: # character is not a [balanced] bracket
if not any(count): # outside brackets
saved_chars.append(character)
return ''.join(saved_chars)
print(repr(remove_text_inside_brackets(
"This is a sentence. (once a day) [twice a day]")))
# -> 'This is a sentence. '
You can use re.sub function.
>>> import re
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("([\(\[]).*?([\)\]])", "\g<1>\g<2>", x)
'This is a sentence. () []'
If you want to remove the [] and the () you can use this code:
>>> import re
>>> x = "This is a sentence. (once a day) [twice a day]"
>>> re.sub("[\(\[].*?[\)\]]", "", x)
'This is a sentence. '
Important: This code will not work with nested symbols
Explanation
The first regex groups (
or [
into group 1 (by surrounding it with parentheses) and )
or ]
into group 2, matching these groups and all characters that come in between them. After matching, the matched portion is substituted with groups 1 and 2, leaving the final string with nothing inside the brackets. The second regex is self explanatory from this -> match everything and substitute with the empty string.
-- modified from comment by Ajay Thomas
Run this script, it works even with nested brackets.
Uses basic logical tests.
def a(test_str):
ret = ''
skip1c = 0
skip2c = 0
for i in test_str:
if i == '[':
skip1c += 1
elif i == '(':
skip2c += 1
elif i == ']' and skip1c > 0:
skip1c -= 1
elif i == ')'and skip2c > 0:
skip2c -= 1
elif skip1c == 0 and skip2c == 0:
ret += i
return ret
x = "ewq[a [(b] ([c))]] This is a sentence. (once a day) [twice a day]"
x = a(x)
print x
print repr(x)
Just incase you don't run it,
Here's the output:
>>>
ewq This is a sentence.
'ewq This is a sentence. '
This should work for parentheses. Regular expressions will "consume" the text it has matched so it won't work for nested parentheses.
import re
regex = re.compile(".*?\((.*?)\)")
result = re.findall(regex, mystring)
or this would find one set of parentheses, simply loop to find more:
start = mystring.find("(")
end = mystring.find(")")
if start != -1 and end != -1:
result = mystring[start+1:end]