Regex Problem Group Name Redefinition?

The following answer deals with how to make the above regex work in Python3.

Since the re2 module as suggested by Max would not work in Python3, because of the NameError: basestring. Another alternative to this is the regex module.

regex module is just an enhanced version of re with extra added features. This module also allows to have same group names in the regex.

You can install it via:

sudo pip install regex

And if you have already been using re or re2 in your program. Just do the following to import regex module

import regex as re

No, you can't have two groups of the same name, this would somehow defy the purpose, wouldn't it?

What you probably really want is this:

^\s*(?P<NAME>\w\d{7}|R1_(?:\d{6}_){2})(01f\.foo|\.(?:bar|goo|moo|roo))$

I refactored your regex as far as possible. I made the following assumptions:

You want to (correct me if I'm wrong):

  • ignore white space at the start of the string
  • match either of the following into a group named "NAME":
    • a letter followed by 7 digits, or
    • "R1_", and two times (6 digits + "_")
  • followed by either:
    • "01f.foo" or
    • "." and ("bar" or "goo" or "moo" or "roo")
  • followed by the end of the string

You could also have meant:

^\s*(?P<NAME>\w\d{7}01f|R1_(?:\d{6}_){2})\.(?:foo|bar|goo|moo|roo)$

Which is:

  • ignore white space at the start of the string
  • match either of the following into a group named "NAME":
    • a letter followed by 7 digits and "01f"
    • "R1_", and two times (6 digits + "_")
  • a dot
  • "foo", "bar", "goo", "moo" or "roo"
  • the end of the string

Reusing the same name makes sense in your case, contrary to Tamalak's reply.

Your regex compiles with python2.7 and also re2. Maybe this problem has been resolved.

Tags:

Python

Regex