How to perform a cumulative sum of distinct values in pandas dataframe
Using duplicated
+ cumsum
+ last
m = df.duplicated('company')
d = df['date']
(~m).cumsum().groupby(d).last()
date
2019-01-01 2
2019-01-03 3
2019-01-04 3
2019-01-05 4
dtype: int32
Another way try to fix anky_91
(df.company.map(hash)).expanding().apply(lambda x: len(set(x)),raw=True).groupby(df.date).max()
Out[196]:
date
2019-01-01 2.0
2019-01-03 3.0
2019-01-04 3.0
2019-01-05 4.0
Name: company, dtype: float64
From anky_91
(df.company.astype('category').cat.codes).expanding().apply(lambda x: len(set(x)),raw=True).groupby(df.date).max()