Python's re module - saving state?

You might like this module which implements the wrapper you are looking for.


Trying out some ideas...

It looks like you would ideally want an expression with side effects. If this were allowed in Python:

if m = re.match('foo (\w+) bar (\d+)', line):
  # do stuff with m.group(1) and m.group(2)
elif m = re.match('baz whoo_(\d+)', line):
  # do stuff with m.group(1)
elif ...

... then you would clearly and cleanly be expressing your intent. But it's not. If side effects were allowed in nested functions, you could:

m = None
def assign_m(x):
  m = x
  return x

if assign_m(re.match('foo (\w+) bar (\d+)', line)):
  # do stuff with m.group(1) and m.group(2)
elif assign_m(re.match('baz whoo_(\d+)', line)):
  # do stuff with m.group(1)
elif ...

Now, not only is that getting ugly, but it's still not valid Python code -- the nested function 'assign_m' isn't allowed to modify the variable m in the outer scope. The best I can come up with is really ugly, using nested class which is allowed side effects:

# per Brian's suggestion, a wrapper that is stateful
class m_(object):
  def match(self, *args):
    self.inner_ = re.match(*args)
    return self.inner_
  def group(self, *args):
    return self.inner_.group(*args)
m = m_()

# now 'm' is a stateful regex
if m.match('foo (\w+) bar (\d+)', line):
  # do stuff with m.group(1) and m.group(2)
elif m.match('baz whoo_(\d+)', line):
  # do stuff with m.group(1)
elif ...

But that is clearly overkill.

You migth consider using an inner function to allow local scope exits, which allows you to remove the else nesting:

def find_the_right_match():
  # now 'm' is a stateful regex
  m = re.match('foo (\w+) bar (\d+)', line)
  if m:
    # do stuff with m.group(1) and m.group(2)
    return # <== exit nested function only
  m = re.match('baz whoo_(\d+)', line)
  if m:
    # do stuff with m.group(1)
    return

find_the_right_match()

This lets you flatten nesting=(2*N-1) to nesting=1, but you may have just moved the side-effects problem around, and the nested functions are very likely to confuse most Python programmers.

Lastly, there are side-effect-free ways of dealing with this:

def cond_with(*phrases):
  """for each 2-tuple, invokes first item.  the first pair where
  the first item returns logical true, result is passed to second
  function in pair.  Like an if-elif-elif.. chain"""
  for (cond_lambda, then_lambda) in phrases:
    c = cond_lambda()
    if c:
      return then_lambda(c) 
  return None


cond_with( 
  ((lambda: re.match('foo (\w+) bar (\d+)', line)), 
      (lambda m: 
          ... # do stuff with m.group(1) and m.group(2)
          )),
  ((lambda: re.match('baz whoo_(\d+)', line)),
      (lambda m:
          ... # do stuff with m.group(1)
          )),
  ...)

And now the code barely even looks like Python, let alone understandable to Python programmers (is that Lisp?).

I think the moral of this story is that Python is not optimized for this sort of idiom. You really need to just be a little verbose and live with a large nesting factor of else conditions.

Tags:

Python

Regex