Pandas resample with start date
My answer feels a little hacky, but uses resample
and gives the desired output. Find the date one bin length (e.g. 4 months, or month ends specifically) before the specified date, append it to s
, and then resample
:
rule = '4M'
date = '02-29-2020'
base_date = pd.to_datetime(date) - pd.tseries.frequencies.to_offset(rule)
s.loc[base_date] = np.nan
output = s.resample(rule=rule).count()
output=output[output.index >= date]
Result:
2020-02-29 32
2020-06-30 122
2020-10-31 123
2021-02-28 120
2021-06-30 122
2021-10-31 4
Freq: 4M, dtype: int64
I added output=output[output.index >= date]
b/c otherwise you get an additional empty bin:
2019-10-31 0
2020-02-29 32
2020-06-30 122
2020-10-31 123
2021-02-28 120
2021-06-30 122
2021-10-31 4
Freq: 4M, dtype: int64
Another way when dealing with months intervals could be to convert the datetime index to an integer from year and month, remove the start_date defined and some modulo value with the rule. use this in a groupby.
rule = '4M'
start = "2020-02-29"
# change types of value
d = pd.Timestamp(start)
nb = int(rule[:-1])
gr = s.groupby(d+(1+((s.index.year*12+s.index.month) #convert datetime index to int
-(d.year*12+d.month+1))//nb) # remove start and modulo rule
*pd.tseries.frequencies.to_offset(rule) # get rule freq
).count()
print (gr)
2020-02-29 32
2020-06-30 121
2020-10-31 123
2021-02-28 120
2021-06-30 122
2021-10-31 4
dtype: int64
Now compared to your method, let's say you define a date you want not being within the first X months define by your rule like 2020-07-31 with the same rule (4M). with this method, it gives:
2020-03-31 63 #you get this interval
2020-07-31 121
2020-11-30 122
2021-03-31 121
2021-07-31 95
dtype: int64
while with your method, you get:
2020-07-31 121 #you loose info from before the 2020-03-31
2020-11-30 122
2021-03-31 121
2021-07-31 95
dtype: int64
I know you stated in the question that you define the first date but with this method you could define any date as long as the rule is in month
All you need to use is pd.cut
like below:
>>> gb = pd.cut(s.index, bins).value_counts()
>>> gb.index = gb.index.categories.right
>>> gb
2020-02-29 32
2020-06-30 122
2020-10-31 123
2021-02-28 120
2021-06-30 122
2021-10-31 4
dtype: int64
there is no need to use groupby