Clean way to convert quarterly periods to datetime in pandas
Use slicing by last 4 values with first 2
and convert to datetimes:
df['date'] = pd.to_datetime(df['Quarter'].str[-4:] + df['Quarter'].str[:2])
String operations in pandas are slow, so if no missing values is possible use list comprehension
:
#python 3.6+
df['date'] = pd.to_datetime([f'{x[-4:]}{x[:2]}' for x in df['Quarter']])
#python bellow
#df['date'] = pd.to_datetime(['{}{}'.format(x[-4:], x[:2]) for x in df['Quarter']])
print (df)
Quarter date
0 Q3 1996 1996-07-01
1 Q4 1996 1996-10-01
2 Q1 1997 1997-01-01
You can (and should) use pd.PeriodIndex
as a first step, then convert to timestamp using PeriodIndex.to_timestamp
:
qs = df['Quarter'].str.replace(r'(Q\d) (\d+)', r'\2-\1')
qs
0 1996-Q3
1 1996-Q4
2 1997-Q1
Name: Quarter, dtype: object
df['date'] = pd.PeriodIndex(qs, freq='Q').to_timestamp()
df
Quarter date
0 Q3 1996 1996-07-01
1 Q4 1996 1996-10-01
2 Q1 1997 1997-01-01
The initial replace step is necessary as PeriodIndex
expects your periods in the %Y-%q
format.
Another option is to use pd.to_datetime
after performing string replacement in the same way as before.
df['date'] = pd.to_datetime(
df['Quarter'].str.replace(r'(Q\d) (\d+)', r'\2-\1'), errors='coerce')
df
Quarter date
0 Q3 1996 1996-07-01
1 Q4 1996 1996-10-01
2 Q1 1997 1997-01-01
If performance is important, you can split and join, but you can do it cleanly:
df['date'] = pd.to_datetime([
'-'.join(x.split()[::-1]) for x in df['Quarter']])
df
Quarter date
0 Q3 1996 1996-07-01
1 Q4 1996 1996-10-01
2 Q1 1997 1997-01-01
Given a quarter format like 2018-Q1
, one can use the built in pd.to_datetime
function. As a general answer would have to deal with the plethora of ways one can store a quarter-year observation (e.g. 2018:1
, 2018:Q1
, 20181
, Q1:2018
, etc.), coercing the data into the format supra is outside of my answer's scope.
But given a formatted series:
formatted_series = formatted_series_supplier() ...
df['date'] = pd.to_datetime(formatted_series)
For example:
>>> pd.to_datetime(pd.Series(['2018-Q1']))
0 2018-01-01
dtype: datetime64[ns]
And if you're dealing with regulatory data, which almost always reflects the end of the quarter rather than it's start (i.e. instead of 2019-01-01, you want 2019-03-31), you can use offsets like below:
df['date'] = df['date'] + pd.offsets.QuarterEnd(0)
Using the example from above, removing the intermediate result,
>>> pd.to_datetime(pd.Series(['2018-Q1'])) + pd.offsets.QuarterEnd(0)
0 2018-03-31
dtype: datetime64[ns]
Note that the 0
parameter must be provided to QuarterEnd
if you want correct indexing within the same quarter for the end date. Otherwise, you'll get something like this:
>>> pd.to_datetime('2018-03-31') + pd.offsets.QuarterEnd()
Timestamp('2018-06-30 00:00:00')