Python: How to use RegEx in an if statement?

import re
if re.match(regex, content):
  blah..

You could also use re.search depending on how you want it to match.

You can run this example:

"""
very nive interface to try regexes: https://regex101.com/
"""
# %%
"""Simple if statement with a regex"""
import re

regex = r"\s*Proof.\s*"
contents = ['Proof.\n', '\nProof.\n']
for content in contents:
    assert re.match(regex, content), f'Failed on {content=} with {regex=}'
    if re.match(regex, content):
        print(content)

if re.search(r'pattern', string):

Simple if-regex example:

if re.search(r'ing\b', "seeking a great perhaps"):     # any words end with ing?
    print("yes")

Complex if-regex example (pattern check, extract a substring, case insensitive):

match_object = re.search(r'^OUGHT (.*) BE$', "ought to be", flags=re.IGNORECASE)
if match_object:
    assert "to" == match_object.group(1)     # what's between ought and be?

Notes:

  • Use re.search() not re.match. The match method restricts to the start of the string, a confusing convention. If you want that, search explicitly with caret: re.search(r'^...', ...) (Or in re.MULTILINE mode use \A)

  • Use raw string syntax r'pattern' for the first parameter. Otherwise you would need to double up backslashes, as in re.search('ing\\b', ...)

  • In these examples, '\\b' or r'\b' is a special sequence meaning word-boundary for regex purposes. Not to be confused with '\b' or '\x08' backspace.

  • re.search() returns None if it doesn't find anything, which is always falsy.

  • re.search() returns a Match object if it finds anything, which is always truthy.

  • a group is what matched inside pattern parentheses.

  • group numbering starts at 1.

  • Specs

  • Tutorial


The REPL makes it easy to learn APIs. Just run python, create an object and then ask for help:

$ python
>>> import re
>>> help(re.compile(r''))

at the command line shows, among other things:

search(...)

search(string[, pos[, endpos]]) --> match object or None. Scan through string looking for a match, and return a corresponding MatchObject instance. Return None if no position in the string matches.

so you can do

regex = re.compile(regex_txt, re.IGNORECASE)

match = regex.search(content)  # From your file reading code.
if match is not None:
  # use match

Incidentally,

regex_txt = "facebook.com"

has a . which matches any character, so re.compile("facebook.com").search("facebookkcom") is not None is true because . matches any character. Maybe

regex_txt = r"(?i)facebook\.com"

The \. matches a literal "." character instead of treating . as a special regular expression operator.

The r"..." bit means that the regular expression compiler gets the escape in \. instead of the python parser interpreting it.

The (?i) makes the regex case-insensitive like re.IGNORECASE but self-contained.

Tags:

Python

Regex