ACF confidence intervals in R vs python: why are they different?
It has been shown that the autocorrelation coefficient r(k)
follows a Gaussian distribution with variance Var(r(k))
.
As you have found, in R, the variance is simply calculated as Var(r(k)) = 1/N
for all k
. While, in python, the variance is calculated using Bartlett’s formula, where Var(r(k)) = 1/N (1 + 2(r(1)^2+r(2)^2+...+r(k-1)^2))
. This results in the first increasing, then flattening confidence level shown above.
Source code of ACF variances in python:
varacf = np.ones(nlags + 1) / nobs
varacf[0] = 0
varacf[1] = 1. / nobs
varacf[2:] *= 1 + 2 * np.cumsum(acf[1:-1]**2)
These two distinct formulas are based on different assumptions. The former assumes an i.i.d process and r(k) = 0
for all k != 0
, while the later assumes a MA process with order of k-1
where ACF "cuts tail" after lag k
.
Not really an answer to the theory part of this (which might be better on CrossValidated), but maybe useful ... ?
If you go to the documentation page for statsmodels.tsa.stattools.acf it gives you an option to browse the source code. The code there is:
varacf = np.ones(nlags + 1) / nobs
varacf[0] = 0
varacf[1] = 1. / nobs
varacf[2:] *= 1 + 2 * np.cumsum(acf[1:-1]**2)
interval = stats.norm.ppf(1 - alpha / 2.) * np.sqrt(varacf)
confint = np.array(lzip(acf - interval, acf + interval))
In contrast, the R source code for plot.acf shows
clim0 <- if (with.ci) qnorm((1 + ci)/2)/sqrt(x$n.used) else c(0, 0)
where ci
is the confidence level (default=0.95).