Mean vs. Median: When to Use?

Almost all analytic calculations on sets of data are more natural in terms of the mean than the median. For example, the "$z$-test for significance of a discrepancy relative to the null hypothesis deals with the sample estimated mean and sample unbiased estimated standard deviation.

The median, and particularly the difference between the median and the mean, is useful to characterize how "skewed" the data is (although the skew, which depends on the third moment about the mean, is also useful for that).

The real use of the median comes when the data set may contain extreme outliers (perhaps due to errors in early processing of the sample numbers, or a serious bias in the sample gathering procedure). Then describing the distribution in terms of quartiles (with the median dividing the second from the third quartile) can be more informative than quoting $\mu$ and $\sigma$.


The median is particularly handy to describe data with a significant skew or long tail. For example, if we looked at incomes, a small number of rock-stars, corporate executives and hedge-fund managers each taking home multi-million dollar salaries. These outliers carry more weight in the calculation of the mean than they do in the median calculation. Mean income is higher than median income. The median income would be closer to something we associate with middle-class.

Means are great when the distribution has been well studied and is well understood. (e.g. normally distributed) Then mean and standard deviation tell us just about everything we care to know.