How to display the correct date century in Pandas?
Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:
from datetime import datetime, date
df=pd.DataFrame.from_dict({'DOB':['01-06-68','01-06-08']})
df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))
You can first convert to datetimes and if years are above or equal 2020
then subtract 100
years created by DateOffset
:
df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
#same like
#mask = df['DOB'].dt.year >= 2020
#df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
Or you can add 19
or 20
to years by Series.str.replace
and set valuies by numpy.where
with condition.
Notice: Solution working also for years 00
for 2000
, up to 2020
.
s1 = df['DOB'].str.replace(r'-(\d+)$', r'-19\1')
s2 = df['DOB'].str.replace(r'-(\d+)$', r'-20\1')
mask = df['DOB'].str[-2:].astype(int) <= 20
df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
If all years are below 2000
:
s1 = df['DOB'].str.replace(r'-(\d+)$', r'-19\1')
df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
In this specific case, I would use this:
pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])
Note that this will break if you have DOBs after 1999!
Output:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]