convert a column in a python pandas from STRING MONTH into INT
I guess the easiest and one of the fastest method would be to create a mapping dict and map like as follows:
In [2]: df
Out[2]:
YEAR MONTH ID
0 2011 JAN 1
1 2011 FEB 1
2 2011 MAR 1
In [3]: d = {'JAN':1, 'FEB':2, 'MAR':3, 'APR':4, }
In [4]: df.MONTH = df.MONTH.map(d)
In [5]: df
Out[5]:
YEAR MONTH ID
0 2011 1 1
1 2011 2 1
2 2011 3 1
you may want to use df.MONTH = df.MONTH.str.upper().map(d)
if not all MONTH
values are in upper case
another more slower but more robust method:
In [11]: pd.to_datetime(df.MONTH, format='%b').dt.month
Out[11]:
0 1
1 2
2 3
Name: MONTH, dtype: int64
UPDATE: we can create a mapping automatically (thanks to @Quetzalcoatl)
import calendar
d = dict((v,k) for k,v in enumerate(calendar.month_abbr))
or alternatively (using only Pandas):
d = dict(zip(range(1,13), pd.date_range('2000-01-01', freq='M', periods=12).strftime('%b')))
Here's a one-liner using the pandas
API and the calendar.month_abbr
convenience:
from calendar import month_abbr
lower_ma = [m.lower() for m in month_abbr]
# one-liner with Pandas
df['MONTH'] = df['MONTH'].str.lower().map(lambda m: lower_ma.index(m)).astype('Int8')
- Convert the
calendar.month_abbr
which are title-cased, into lower-cased - Feed the lowered-cased
MONTH
series to amap
method >>.str.lower()
- Use a
lambda
function within themap
method and get the index of the corresponding month abbreviation via the.index
python list method >>.map(lambda m: lower_ma.index(m))
- Convert to integer >>
.astype('Int8')