How to gain an intuition of the affine function's definition?

Let's use another definition of an affine function.

A function ${f}\colon \mathbb R^n \to \Bbb R^m$ is called affine if and only if $$f(\mathbf{x}) = \mathbf{A}\mathbf{x} + \mathbf{b}$$ for some $\mathbf{A} \in \Bbb R^{m \times n}$ and $\mathbf{b} \in \Bbb R^m$.

Let $f\colon \Bbb R^n \to \Bbb R^m$ be an affine function and $\alpha,\beta \in \Bbb R$. Then \begin{align*} f(\alpha \cdot \mathbf{x} + \beta \cdot \mathbf{y}) &= \mathbf{A}(\alpha \mathbf{x} + \beta \mathbf{y}) + \mathbf{b}\\&=\alpha \mathbf{A}\mathbf{x} + \beta\mathbf{A}\mathbf{y} + \mathbf{b}\\ &\\ \alpha\cdot f(\mathbf{x}) + \beta\cdot f(\mathbf{y}) &= \alpha(\mathbf{A}\mathbf{x}+\mathbf{b}) + \beta(\mathbf{A}\mathbf{y} + \mathbf{b})\\ &=\alpha\mathbf{A}\mathbf{x}+\alpha\mathbf{b} + \beta\mathbf{A}\mathbf{y}+\beta\mathbf{b}\\ &=\alpha\mathbf{A}\mathbf{x}+ \beta\mathbf{A}\mathbf{y}+(\alpha+\beta)\mathbf{b} \end{align*}

So for affine functions to we require in general that $\alpha+\beta = 1$ for the relation $f(\alpha\mathbf{x}+\beta\mathbf{y}) = \alpha f(\mathbf{x}) + \beta f(\mathbf{y})$ to hold. In other words, these definitions are equivalent as long as $\alpha + \beta = 1$.


This is a nice question. My answer lies more in the abstract side. In a nutshell, affine maps are for affine spaces the exact counterpart of linear maps for vector spaces. This point of view requires the introduction of the concept of "affine space".

Reference Gallier's notes on affine geometry, which I strongly recommend on this subject.

In the following, let $\mathbb{K}$ denote a scalar field (usually $\mathbb{R}$ or $\mathbb{C}$).

Definition A affine space is a triple $(A, V,+)$ where $A$ is a set, $V$ is a $\mathbb{K}$-vector space and $+\colon A\times V\to A$ satisfies the following axioms (analogous to the axioms of Group action)

  1. For any $p\in A$, one has $p+\vec 0 =p$.
  2. For any $p, q\in A$ there exists a unique vector $\vec v\in V$ such that $p+\vec v=q$. This vector is denoted by $\vec v=q-p$.
  3. For any $\vec v, \vec w\in V$, and for any $p\in A$, one has $(p+\vec v)+\vec w=p+(\vec v+\vec w).$ This is the same as to require that, for any three points $p_1, p_2, q\in A$, one has $p_1-p_2=(p_1-q)+(q-p_2).$

The elements of $A$ are called points, those of $V$ are called vectors. Any vector space is an affine space, and conversely, fixing an arbitrary point $o\in A$ and identifying it with the null vector $\vec 0$, one can endow $A$ with the same vector space structure of $V$. So that's why the two concepts are often confused.

In a vector space, one has the internal composition law of linear combination. In an affine space, linear combinations make no sense a priori. But one can define a notion of weighted sum (or barycentric combination, or also affine combination) as follows.

Let $p_0\ldots p_n\in A$ be points and let $w_0\ldots w_n\in \mathbb{K}$ (masses or charges) be such that $w_0+\ldots +w_n=1$. Then there exists a unique point $q$ such that, for whatever choice of $o\in A$, $$\tag{1} \sum_{j=0}^n w_j(p_j-o) = q-o.$$ One thus defines $q=\sum_{j=0}^n w_jp_j$.

Proof. One needs to check that (1) is covariant with respect to the change of origin, that is, that if one considers another origin $o'$ the equation (1) retains the same form. This is a consequence of property 3. and uses in an essential way the fact that the weights sum up to $1$. One has for the left hand side of (1) $$ \sum_{j=0}^n w_j (p_j-o)=\sum_{j=0}^n w_j(p_j-o')+\sum_{j=0}^n w_j(o-o')=\sum_{j=0}^n w_j(p_j-o')+(o-o'),$$ and the right hand side $$ q-o=q-o'+(o'-o), $$ so a change of origin produces the same change in both sides of equation (1). Therefore $$\sum_{j=0}^n w_j(p_j-o')=q-o', $$ as claimed. $\square$

One can carry the analogy between vector spaces and affine space a step further. In vector spaces, the natural maps to consider are linear maps, which commute with linear combinations. Similarly, in affine spaces the natural maps to consider are affine maps, which commute with weighted sums of points. This is exactly the kind of maps introduced by the textbook's definition of the OP.