`re.sub()` in pandas

With regards to your "bonus" question, you can use pandas.Series.str.replace, which is part of the pandas.Series.str methods which work with regex:

In [10]: import re

In [11]: import pandas as pd

In [12]: s = pd.Series(
    ...:     ['white male',
    ...:      'white male, white female',
    ...:      'hispanic male, 2 hispanic females',
    ...:      'black male, 2 white females'])

In [13]: mult = re.compile('two|2 (?P<race>[a-z]+) (?P<gender>(?:fe)?male)s')
    ...:

In [14]: s.str.replace(mult, r'\g<race> \g<gender>, \g<race> \g<gender>')
Out[14]:
0                                         white male
1                           white male, white female
2    hispanic male, hispanic female, hispanic female
3             black male, white female, white female
dtype: object

Whether or not these methods are significantly faster than .apply I don't know. I suspect that you'll never be very fast working with object dtypes.

Note, if found this issue regarding these methods being on the slow side. I suppose until they decide it is worth it to write out a Cythonized implementation then you probably can't hope for much.

`re.sub()` in pandas

Tags:

Python

Pandas

Regex

Related

Recent Posts