Fast way to split alpha and numeric chars in a python string

Here's another approach in case you prefer to stay away from regex, which sometimes can be unwieldy if one is not familiar enough to make it/change it themselves:

from itertools import groupby

def split_text(s):
    for k, g in groupby(s, str.isalpha):
        yield ''.join(g)

print(list(split_text("Westminister15")))
print(list(split_text("Westminister15London")))
print(list(split_text("23Westminister15London")))
print(list(split_text("Westminister15London England")))

returns:

['Westminister', '15']
['Westminister', '15', 'London']
['23', 'Westminister', '15', 'London']
['Westminister', '15', 'London', ' ', 'England']

The generator can be easily modified, too, to never yield whitespace strings if desired.


You can use this regex instead of yours:

>>> import re
>>> regex = re.compile(r'(\d+|\s+)')
>>> regex.split('Westminister15')
['Westminister', '15', '']
>>> regex.split('Westminister15London England')
['Westminister', '15', 'London', ' ', 'England']
>>> 

Then you have to filter the list removing empty strings/white-space only strings.


The problem is that Python's re.split() doesn't split on zero-length matches. But you can get the desired result with re.findall():

>>> re.findall(r"[^\W\d_]+|\d+", "23Westminister15London")
['23', 'Westminister', '15', 'London']
>>> re.findall(r"[^\W\d_]+|\d+", "Westminister15London England")
['Westminister', '15', 'London', 'England']

\d+ matches any number of digits, [^\W\d_]+ matches any word.

Tags:

Python

Regex