Intuitive explanation for dividing by n-1 when calculating sample variance?

At an elementary level it is possible to give a couple of "reasons" for dividing by $n - 1$. (At higher levels there are rationales that involve discussions about n-dimensional vector spaces, but let's not go there now.)

"Reason 1." Suppose you are finding the sample variance of observations 2, 3, 1, 6. Then you computations might look like this:

        x   x - 3  square
        -----------------
        2    -1     1
        3     0     0
        1    -2     4
        6     3     9
       ----------------
   Tot 12     0    15
 Mean 3          Var = 15/3

If somehow one of the four rows between dashed lines got smudged and was unreadable, you would be able to reconstruct it from the rest of the information. (2 + 3 + 'smudge' + 6 = 12; what is 'smudge'? Etc.) So in some sense, given the structure of the computation you have only $n - 1 = 3$ rows that contain information. The jargon for that is you have "degrees of freedom $DF = n - 1$."

"Reason 2." If you divide by $n - 1$ in the definition of the sample variance $S^2$, then $E(S^2) = \sigma^2.$ In statistical terminology this means "$S^2$ is an unbiased estimator of $\sigma^2.$" If you divided by $n$ instead, then you would have an estimator of the population variance that is too small.

Note: Dividing by $n - 1$ is pretty much agreed upon, but reputable authors in statistics and probability have proposed $n$, $n + 1$, and even $n + 2$ as divisors--each giving a rationale aimed at a particular objective. None of these alternative denominators has received wide acceptance. But these discussions confirm that it is not a stupid question to ask why we use $n - 1.$

$Addendum$ (Jan 25, '16): I have just read a latter by Jeffrey S. Rosenthall (U. Toronto) in the December '15 issue of the IMS Bulletin, arguing that in elementary statistics courses it is OK to use $n$ as the denominator of the sample variance. Briefly, his view is based mainly on arguments involving mean square error (MSE). For example, with normal data, MSE for estimating $\sigma^2$ is minimized by denominator $n + 1$ instead of $n - 1.$ (See his letter on page 9 for details.)

However, in more advanced courses: as in my Comment below, a penalty for changing from $n - 1$ would be minor confusion in getting confidence intervals for $\sigma^2$ and doing tests for $\sigma^2$ based on the sample variance---mainly because $\sum (X_i - \bar X)^2/\sigma^2 \sim Chisq(df = n - 1).$


The sample variance is computed using deviations from the sample mean $\bar x$ instead of the population mean $\mu$, and this is the source of the bias.

When the squared deviations are accumulated, there is a deficit of $\bar x-\mu$ every time, so that the computed variance is too small.


Hint:

$$(x_i-\mu)^2=((x_i-\bar x)+(\bar x-\mu))^2=(x_i-\bar x)^2+2(x_i-\bar x)(\bar x-\mu)+(\bar x-\mu)^2.$$

When averaging over $i$, the double product in the middle vanishes due to the factor $(\bar x-\bar x)$, and you get

$$\sigma^2=\overline{(x_i-\mu)^2}=\overline{(x_i-\bar x)^2}+\overline{(\bar x-\mu)^2}=s^2+\frac{\sigma^2}N.$$

Tags:

Statistics