Python: convert camel case to space delimited using RegEx and taking Acronyms into account

This should work with 'divLineColor', 'simpleBigURL', 'OldHTMLFile' and 'SQLServer'.

label = re.sub(r'((?<=[a-z])[A-Z]|(?<!\A)[A-Z](?=[a-z]))', r' \1', label)

Explanation:

label = re.sub(r"""
        (            # start the group
            # alternative 1
        (?<=[a-z])  # current position is preceded by a lower char
                    # (positive lookbehind: does not consume any char)
        [A-Z]       # an upper char
                    #
        |   # or
            # alternative 2
        (?<!\A)     # current position is not at the beginning of the string
                    # (negative lookbehind: does not consume any char)
        [A-Z]       # an upper char
        (?=[a-z])   # matches if next char is a lower char
                    # lookahead assertion: does not consume any char
        )           # end the group""",
    r' \1', label, flags=re.VERBOSE)

If a match is found it is replaced with ' \1', which is a string consisting of a leading blank and the match itself.

Alternative 1 for a match is an upper character, but only if it is preceded by a lower character. We want to translate abYZ to ab YZ and not to ab Y Z.

Alternative 2 for a match is an upper character, but only if it is followed by a lower character and not at the start of the string. We want to translate ABCyz to AB Cyz and not to A B Cyz.


\g<0> references the matched string of the whole pattern while \g<1> refereces the matched string of the first subpattern ((…)). So you should use \g<1> and \g<2> instead:

label = re.sub("([a-z])([A-Z])","\g<1> \g<2>",label)

I know, it's not regex. But, you can also use map like this

>>> s = 'camelCaseTest'
>>> ''.join(map(lambda x: x if x.islower() else " "+x, s))
'camel Case Test'

Tags:

Python

Regex