how do I compute a weighted moving average using pandas
Using pandas you can calculate a weighted moving average (wma) using:
.rolling() combined with .apply()
Here's an example with 3 weights and window=3:
data = {'colA': random.randint(1, 6, 10)}
df = pd.DataFrame(data)
weights = np.array([0.5, 0.25, 0.25])
sum_weights = np.sum(weights)
df['weighted_ma'] = (df['colA']
.rolling(window=3, center=True)
.apply(lambda x: np.sum(weights*x) / sum_weights, raw=False)
)
Please note that in .rolling()
I have used argument center=True
.
You should check if this applies with your usecase or whether you need center=False
.
Construct a kernel with the weights, and apply it to your series using numpy.convolve
.
import pandas as pd
import numpy as np
def wma(arr, period):
kernel = np.arange(period, 0, -1)
kernel = np.concatenate([np.zeros(period - 1), kernel / kernel.sum()])
return np.convolve(arr, kernel, 'same')
df = pd.DataFrame({'value':np.arange(11)})
df['wma'] = wma(df['value'], 4)
Here I am interpreting WMA according to this page: https://en.wikipedia.org/wiki/Moving_average
For this type of WMA, the weights should be a linear range of n values, adding up to 1.0.
Note that I pad the front of the kernel with zeros. This is because we want a 'one-sided' window function, so that 'future' values in the time series do not affect the moving average.
numpy.convolve
is fast, unlike apply()
!
You can also use numpy.correlate
if you reverse the kernel.
If data
is a Pandas DataFrame or Series and you want to compute the WMA over the rows, you can do it using
wma = data[::-1].cumsum().sum() * 2 / data.shape[0] / (data.shape[0] + 1)
If you want a rolling WMA of window length n
, use
data.rolling(n).apply(lambda x: x[::-1].cumsum().sum() * 2 / n / (n + 1))
as n = x.shape[0]
. Note that this solution might be a bit slower than the one by Sander van den Oord, but you don't have to worry about the weights.
No, there is no implementation of that exact algorithm. Created a GitHub issue about it here:
https://github.com/pydata/pandas/issues/886
I'd be happy to take a pull request for this-- implementation should be straightforward Cython coding and can be integrated into pandas.stats.moments