How to calculate standard deviation with streaming inputs?

You can use the formula $\sigma = \sqrt{\bar{x^2}-(\bar x)^2}=\sqrt{\frac {\sum x^2}N-\left(\frac {\sum x}N\right)^2}$ Each sum can be accumulated as the data comes in. The disadvantage compared to averaging the data first and subtracting the average from each item is you are more prone to overflow and loss of significance, but mathematically it is equivalent.


To add to the accepted answer, a more numerically stable approach (due to Knuth) is to keep track of $\sum (x-\bar{x})^2$ (called M2 in the algorithm below).

The following is copied from the following wikipedia page which is worth a read. https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance

def online_variance(data):
    n = 0
    mean = 0.0
    M2 = 0.0

    for x in data:
        n += 1
        delta = x - mean
        mean += delta/n
        M2 += delta*(x - mean)

    if n < 2:
        return float('nan')
    else:
        return M2 / (n - 1)

Tags:

Statistics