Python regex match text between quotes

Use re.search() instead of re.match(). The latter will match only at the beginning of strings (like an implicit ^).


Split the text on quotes and take every other element starting with the second element:

def text_between_quotes(text):
    return text.split('"')[1::2]

my_string = 'Hello, "find.me-_/\\" please help and "this quote" here'
my_string.split('"')[1::2]           # ['find.me-_/\\', 'this quote']
'"just one quote"'.split('"')[1::2]  # ['just one quote']

This assumes you don't have quotes within quotes, and your text doesn't mix quotes or use other quoting characters like `.

You should validate your input. For example, what do you want to do if there's an odd number of quotes, meaning not all the quotes are balanced? You could do something like discard the last item if you have an even number of things after doing the split

def text_between_quotes(text):
    split_text = text.split('"')
    between_quotes = split_text[1::2]
    # discard the last element if the quotes are unbalanced
    if len(split_text) % 2 == 0 and between_quotes and not text.endswith('"'):
        between_quotes.pop()
    return between_quotes

# ['first quote', 'second quote']
text_between_quotes('"first quote" and "second quote" and "unclosed quote')

or raise an error instead.


match starts searching from the beginning of the text.

Use search instead:

#!/usr/bin/env python

import re

text = 'Hello, "find.me-_/\\" please help with python regex'
pattern = r'"([A-Za-z0-9_\./\\-]*)"'
m = re.search(pattern, text)

print m.group()

match and search return None when they fail to match.

I guess you are getting AttributeError: 'NoneType' object has no attribute 'group' from python: This is because you are assuming you will match without checking the return from re.match.


If you write:

m = re.search(pattern, text)

match: searches at the beginning of text

search: searches all the string

Maybe this helps you to understand: http://docs.python.org/library/re.html#matching-vs-searching

Tags:

Python

Regex