`re.sub()` in pandas
With regards to your "bonus" question, you can use pandas.Series.str.replace
, which is part of the pandas.Series.str
methods which work with regex:
In [10]: import re
In [11]: import pandas as pd
In [12]: s = pd.Series(
...: ['white male',
...: 'white male, white female',
...: 'hispanic male, 2 hispanic females',
...: 'black male, 2 white females'])
In [13]: mult = re.compile('two|2 (?P<race>[a-z]+) (?P<gender>(?:fe)?male)s')
...:
In [14]: s.str.replace(mult, r'\g<race> \g<gender>, \g<race> \g<gender>')
Out[14]:
0 white male
1 white male, white female
2 hispanic male, hispanic female, hispanic female
3 black male, white female, white female
dtype: object
Whether or not these methods are significantly faster than .apply
I don't know. I suspect that you'll never be very fast working with object
dtypes.
Note, if found this issue regarding these methods being on the slow side. I suppose until they decide it is worth it to write out a Cythonized implementation then you probably can't hope for much.