Why formulate continuity in terms of pre-images instead of image?

It's not nice to formulate it in terms of images. The behavior of images varies wildly between various continuous functions. Take the interval $(-10,10)$ and the functions $f_1(x) = x$ and $f_2(x) = \sin x$. Then the image of $(-10,10)$ under $f_1$ is of course $(-10,10)$, but the image of $(-10,10)$ under $f_2$ is $[-1,1]$. With one function you get an open set, with another you get a closed set. You can of course explore the other cases of half-open/half-closed or other phenomena. You can't make any meaningful statement about images of open sets. You can't even say anything for certain about local behavior (the images of "small" open sets) because there are plentiful locally constant functions.

You might want to flip this on its head then since images of open sets doesn't work: what about images of closed sets? This also doesn't work. Consider $\mathbb{R}$ and the function $f_3(x) = \arctan x$, then the image of $\mathbb{R}$ under $f_3$ is $\left(-\frac{\pi}{2},\frac{\pi}{2}\right)$ which is an open set.

The beauty of pre-images is that the functions we intuitively look at as being continuous have open sets as pre-images of open sets. Perhaps the closest you can get to a statement about images is that continuous functions map compact sets to compact sets, however there are discontinuous functions which also do this: consider $[-1,1]$ and $f_4(x) = 1$ if $x>0$, $f_4(0) = 0$ and $f_4(x) = -1$ if $x<0$. Then $f_4$ maps a compact set to a compact set but it is definitely not continuous in the usual topology.


Consider $f\colon A\to B$. Changes in the domain "control" what happens to the image, the value of $x\in A$ is "causal" for the value $f(x)\in B$. Therefore you may think that going from domain to image is the natural way to define continuity. However, continuity is about how well we can predict the image $f(x)$ if we slightly modify $x$; that is, if only we can keep $x$ "under control" well enough (i.e., keep changes, influences, error, or whatever sufficiently small) then we also keep $f(x)$ under control (i.e., keep its change or error as mall as necessary/desired). In this formulation it still sounds like something going from $A$ to $B$. But if we formulate that "desire" (for a metric space) as $|f(x)-f(x_0)|<\varepsilon$ and the control we impose on the domain side as $|x_0-x|<\delta$, then we arrive exactly at the epsilon-delta-definition: For any $\epsilon>0$ (as desired) there exists (i.e., we can control) $\delta>0$ such that $|x-x_0|<\delta$ implies $|f(x)-f(x_0)|<\varepsilon$. This is where the change of sides occurs: $\varepsilon$ is picked first and then depending on it we pick $\delta$.

So can we formulate this with images of open sets? No, because while the definition implies something about the image of an open ball, namely $f(B_\delta(x_0))\subseteq B_\varepsilon(f(x_0))$ (with $B$ denoting open ball), the problem is that $\epsilon$ is given before we pick our $\delta$. That is, we can not cast this into "For every open ball $B_\delta(x_0)$, the image $f(B_\delta(x_0))$ has some property". Rewriting the inclusion as $B_\delta(x_0)\subseteq f^{-1}(B_\varepsilon(x_0))$ we obtain the right formulation, i.e., the pre-image of an open ball (around a point $f(x_0)$ in the image) contains an open ball (around that point $x_0$); and in the generalization: The pre-image of an open set is open.


You can formulate continuity in terms of images. A function $f:X\to Y$ is continuous at a point $x$ if and only if for every neighborhood $U$ of $f(x)$ there exists a neighborhood $V$ of $x$ such that $f(V)\subseteq U$.