Python regex match text between quotes
Use re.search()
instead of re.match()
. The latter will match only at the beginning of strings (like an implicit ^
).
Split the text on quotes and take every other element starting with the second element:
def text_between_quotes(text):
return text.split('"')[1::2]
my_string = 'Hello, "find.me-_/\\" please help and "this quote" here'
my_string.split('"')[1::2] # ['find.me-_/\\', 'this quote']
'"just one quote"'.split('"')[1::2] # ['just one quote']
This assumes you don't have quotes within quotes, and your text doesn't mix quotes or use other quoting characters like `
.
You should validate your input. For example, what do you want to do if there's an odd number of quotes, meaning not all the quotes are balanced? You could do something like discard the last item if you have an even number of things after doing the split
def text_between_quotes(text):
split_text = text.split('"')
between_quotes = split_text[1::2]
# discard the last element if the quotes are unbalanced
if len(split_text) % 2 == 0 and between_quotes and not text.endswith('"'):
between_quotes.pop()
return between_quotes
# ['first quote', 'second quote']
text_between_quotes('"first quote" and "second quote" and "unclosed quote')
or raise an error instead.
match
starts searching from the beginning of the text.
Use search
instead:
#!/usr/bin/env python
import re
text = 'Hello, "find.me-_/\\" please help with python regex'
pattern = r'"([A-Za-z0-9_\./\\-]*)"'
m = re.search(pattern, text)
print m.group()
match
and search
return None
when they fail to match.
I guess you are getting AttributeError: 'NoneType' object has no attribute 'group'
from python: This is because you are assuming you will match without checking the return from re.match
.
If you write:
m = re.search(pattern, text)
match: searches at the beginning of text
search: searches all the string
Maybe this helps you to understand: http://docs.python.org/library/re.html#matching-vs-searching