Panda rolling window percentile rank
In case you need the rank of the last observation only, as the case with rolling apply you can use:
def pctrank(x):
i = x.argsort().argmax() + 1
n = len(x)
return i/n
Time is about twice as fast
Your lambda receives a numpy array, which does not have a .rank
method — it is pandas's Series
and DataFrame
that have it. You can thus change it to
pctrank = lambda x: pd.Series(x).rank(pct=True).iloc[-1]
Or you could use pure numpy along the lines of this SO answer:
def pctrank(x):
n = len(x)
temp = x.argsort()
ranks = np.empty(n)
ranks[temp] = (np.arange(n) + 1) / n
return ranks[-1]
The easiest option would be to do something like this:
from scipy import stats
# 200 is the window size
dataset[name] = dataset[name].rolling(200).apply(lambda x: stats.percentileofscore(x, x[-1]))