Regression to the mean - a simple question

Just citing this wikipedia article on regression toward the mean, they define regression toward the mean as:

the phenomenon that arises if a random variable is extreme on its first measurement but closer to the mean or average on its second measurement and if it is extreme on its second measurement but closer to the average on its first

In this case, the sons are the second measurement, and they are an extreme (they are tall) thus the first measurement (the fathers) must be closer to the average, and because the sons are tall, the fathers must be shorter than them on average.

I would assume in this case that tall does not mean "taller than the average" but instead is more "significantly taller than the average" which is why your reasoning fails here. If this assumption is incorrect then this problem requires more thought.


Francis Galton was the first user of "regression" in this sense. If you consider his original height data as shown in his 1875 chart below (taken from Wikipedia), it may be clearer

Taking tall parents as the top two quarters of the chart, you see how the average heights of their children are pulled to the left of the major axis, i.e. on average the children of tall parents are shorter than their parents but taller than average children (Galton illustrated this with what he called the locus of horizontal tangential points)

But taking tall children as the two right-hand quarters of the chart, you see how the average heights of their parents are pulled to the below of the major axis, i.e. on average the parents of tall children are shorter than their children but taller than average parents (Galton illustrated this with what he called the locus of vertical tangential points)

enter image description here