Pandas Equivalent for SQL window function and rows range
Try groupby
with shift
then reindex
back
df['new'] = df.groupby(['customer','day']).purchase.sum().shift().reindex(pd.MultiIndex.from_frame(df[['customer','day']])).values
df
Out[259]:
customer day purchase new
0 Joe 1 5 NaN
1 Joe 1 10 NaN
2 Joe 2 10 15.0
3 Joe 2 5 15.0
4 Joe 4 10 15.0
Update
s = df.groupby(['customer','day']).apply(lambda x : df.loc[df.customer.isin(x['customer'].tolist()) & (df.day.isin(x['day']-1)|df.day.isin(x['day']-2)),'purchase'].sum())
df['new'] = s.reindex(pd.MultiIndex.from_frame(df[['customer','day']])).values
df
Out[271]:
customer day purchase new
0 Joe 1 5 0
1 Joe 1 10 0
2 Joe 2 5 15
3 Joe 2 5 15
4 Joe 4 10 10
5 Joe 7 5 0
Not sure if this is the right way to go, and this is limited since only one customer is provided; if there were different customers, I would use merge
instead of map
; Note also that there is also an implicit assumption that the days are ordered in ascending already:
Get the purchase sum based on the groupby combination of customer
and day
and create a mapping between day
and the sum:
sum_purchase = (df.groupby(["customer", "day"])
.purchase
.sum()
.shift()
.droplevel(0))
Again, for multiple customers, I would not drop the customer
index, and instead use a merge below:
Get a mapping of the days with the difference between the days:
diff_2_days = (df.drop_duplicates("day")[["day"]]
.set_index("day", drop=False)
.diff()
.day)
Create the new column by mapping the above values to the day column, then use np.where
to get columns where the diff is less than or equal to 2:
(
df.assign(
diff_2_days = df.day.map(diff_2_days),
sum_purchase = df.day.map(sum_purchase),
final=lambda df: np.where(df.diff_2_days.le(2),
df.sum_purchase,
np.nan))
.drop(columns=["sum_purchase", "diff_2_days"])
)
customer day purchase final
0 Joe 1 5 NaN
1 Joe 1 10 NaN
2 Joe 2 5 15.0
3 Joe 2 5 15.0
4 Joe 4 10 10.0
5 Joe 7 5 NaN
Ran your code in postgres to get an idea of what range does and how it differs from rows; quite insightful. I think for windows functions, SQL got this covered and easily too.
SO, let me know where this falls on its face, and I'll gladly have a rejig at it.