Flips in the Minimal Model Program
The goal of the mmp is to find a representative in every birational class that for some reason may be considered nice.
For curves the answer is clear, there is a unique smooth projective representative and by any consideration that is the one that represents the class best.
For surfaces this gets complicated as there are non-trivial birational maps between smooth projective surfaces. However, since there is always a combination of blow-ups and blow-downs it is relatively easy to keep order.
Observe that a $(-1)$-curve is usually defined as a curve isomorphic to $\mathbb P^1$ having self-intersection number $(-1)$. A perhaps better definition that points to higher dimensional equivalents is that a $(-1)$-curve is a curve isomorphic to $\mathbb P^1$ having an intersection number $(-1)$ against $K_X$ where $X$ is the surface on which the curve lives. These two definitions are equivalent by the adjunction formula, but the latter one has the advantage that it does not depend on $X$ being a surface.
Let's take a look at a minimal model of a surface. Why do we pick that as our representative? In some sense there might be other ways to pick a representative, but one might argue that a minimal model is the "simplest" model that is still smooth (make a note of this, we will realize later that here smoothness is actually something else in disguise). Castelnuovo's theorem about blowing down $(-1)$-curves says that we can "get rid of them", so why not do that. Let's contract everything we can. It can be proven relatively easily that contracting a curve that is not a $(-1)$-curve will lead to singular points.
OK, so the strategy is to contract as much stuff as we can and hope that this way we get a reasonable theory. The second definition of a $(-1)$-curve suggests that to find what we can contract is through $K_X$, that is, things that can be contracted and not cause too much trouble are $K_X$-negative. In fact there is a more precise way to say this, but let me not get into technical details now.
So, either this way or already for surfaces one realizes that what makes a minimal model tick is that $K_X$ is nef, that is, intersecting with any proper curve gives a non-negative number. So, now you say that $\mathbb P^2$ is a minimal surface but $K_X$ is negative ample so this is pretty far from being nef. Yes, in the modern terminology of the mmp, $\mathbb P^2$ is actually not minimal. The claim is that every variety is birational to one that is a series of Fano fiber spaces over a minimal variety.
Perhaps I should mention an interesting example here, I think it is due to Iitaka, or someone from his school: Take a $3$-dimensional abelian variety $A$ and mod out by the involution $(-1)\cdot$. Resolve the resulting $64$ double points and call the result $X$. Then it is relatively easy to prove that $X$ is not birational to a smooth projective variety with a nef canonical bundle. At the time this was thought of as proof that minimal models did not exist in higher dimensions, but then Reid and Mori realized that it only means that minimal models need not be smooth. (N.B.: The above accepted answer of David starts by saying that a minimal model should be non-singular. He says it is too ambitious, but it may not be absolutely clear to everyone that this means impossible--as stated. And I promised a comment about why $2$-dimensional minimal models are smooth. The thing is, minimal models have no worse than terminal singularities. It turns out that terminal singularities are smooth in codimension $2$, so in particular a $2$-dimensional terminal singularity is actually smooth. So, one could argue that even minimal models of surfaces have terminal singularities, that is, that's the natural class of singularities for a minimal model. It just so happens that in dimension $2$, these singularities are indistinguishable from smooth points.)
Anyway, so we want $K_X$ to be nef and to obtain this we want to contract curves that are $K_X$-negative. It so happens that this can be done, but this is the result of some very deep results by Mori, Kollár, Kawamata, Reid, Shokurov and others. Now, already in dimension $2$ we get more than just blowing down $(-1)$-curves: the ruling map of a ruled surface and $\mathbb P^2$ mapping to a point are both contractions of $K_X$-negative curves. In general this is how we might end up with a Fano fibre space. It is possible that the contraction of a $K_X$-negative curve is not birational, but that's OK. This really means that the cycle class of that curve covers the entire $X$ and in particular it is uniruled and will never have a minimal model in the sense of $K_X$ being nef.
If the contraction is birational, then there are still two possibilities: it is a divisorial contraction or a small contraction. The former means that the exceptional set is a divisor, the latter that it is smaller than that. Now, already the former can bring in singularities, but they are not so bad and the program can continue.
When the contraction is small, there are several problems. Simply put the singularities become too bad. The badness mainly manifests itself in the singularity being non $\mathbb Q$-Gorenstein, that is, $K$ will no longer be $\mathbb Q$-Cartier which is otherwise needed. And it's not that this may be so, but it will be so for certain: if the target had a $\mathbb Q$-Cartier $K$, it could be pulled back, at least numerically (or some power could be pulled back). The pull-back would have to agree with $K$ upstairs since the map is an isomorphism in codimension $1$. However, a pull-back is necessarily trivial on the fiber of the map, but the fiber was chosen to be $K$-negative. This is a contradiction, so the target cannot have a $\mathbb Q$-Cartier canonical sheaf.
Flips were invented to remedy this situation: the original reason for wanting to contract was to "get rid" of this $K$-negative curve, so let's get rid of it a different way. Being $K$-negative is really a curvature condition and it says something about the normal bundle of the curve inside the variety. (OK, you have to adjust this slightly for singularities, but I am not writing a precise paper here). So, the idea of the flip is this: let's change the normal bundle of the curve. So, let's "cut it out" and put it back with the opposite normal bundle, so in a "flipped" way. (Remark: this is the $3$-dimensional picture, in higher dimensions it's not just curves that get flipped, but this may be better delegated to another place).
I guess I wrote a whole bunch of things just to say that and some people have said similar things already, but perhaps this little essay gives some new insight.
To answer your question about whether a similar construction exists elsewhere, the answer is "yes". A "flip" is like a "surgery" in topology. But I am no expert on that. Actually, just to include a disclaimer: I am not claiming to be an expert on flips either.
I am also just learning this stuff, and I'm partly writing this out for my own benefit. Experts, please correct and up/down vote as appropriate!
The goal of the minimal model program is to give a standard, nonsingular, representative for each birational class of algebraic variety. As stated, this goal is too ambitious, but it will help us to understand the minimal model program if we think of it as a partially successful attempt at this goal.
Let $X$ be a compact, smooth algebraic variety of dimension $n$. Let $\omega$ be the top wedge power of the holomorphic cotangent bundle. Then the vector space, $V:=H^0(X, \omega)$, of holomorphic $n$-forms on $X$ is a birational invariant of $X$. This means that we should be able to see $V$ from just the field of meromorphic functions on $X$; here is a sketch of how to do that. So we get a rational map $X \to \mathbb{P}(V^{\*})$ by the standard recipe. More generally, we can replace $\mathbb{P}(V)$ with Proj of the ring $\bigoplus H^0(X, \omega^{\otimes n})$. This is called the canonical ring; you may have heard of the recent breakthrough in proving that the canonical ring is finitely generated. We can map $X$ rationally to this Proj; the image is called the log model. This is a partial success: it is a canonical, birational construction, but it may not be birational to $X$ and may not be smooth.
There are certain well understood rules of thumb for how various subobjects of $X$ behave in the log model. For example, if $X$ is a surface and $C$ a curve with negative self intersection, then $C$ will be blown down in the log model.
Here is a more complicated example, which is relevant to your question. Let $Y$ be some variety that locally looks like the cone on the Segre embedding of $\mathbb{P}^1 \times \mathbb{P}^1$. So $Y$ is a $3$-fold with an isolated singularity. If you are familiar with the toric1 picture, it looks like the tip of a square pyramid. Inside $Y$, let $Z$ be the cone on one of the $\mathbb{P}^1$'s. This is a surface, but not a Cartier divisor. Let $X$ be $Y$ blown up along $Z$; so that the isolated singularity becomes a line. In the toric picture, the point of the pyramid has lengthened into a line segment, and two of the faces which used to touch at the point now border along an entire edge. In the log model, the line will blow back down to become a point. So the log model can turn a smooth variety, like $X$, into a singular one like $Y$.
Now, birational geometers did not rest on their laurels when they had constructed the log model. They made other constructions, which are smoother but less canonical. Many of these constructions can be thought of as taking the log model and modifying it in some way. If the log model looks like the example of the previous paragraph, they want to take the singular point of $Y$ and replace it by a line, to look like $X$. But they have two ways they can do this; they can blow up one $\mathbb{P}^1$ or the other; giving either $X$ or $X'$. Often, replacing $X$ by $X'$ is crucial in order to improve the model somewhere else. The relationship between $X$ and $X'$ is called a flip, because we take the line inside $X$ and flip it around to point in a different direction.
1 Cautionary note: although the toric picture is excellent for visualizing what is going on locally, you shouldn't take $X$ itself to be a toric variety. There are no global sections of $\omega$ on a toric variety, so the log model is empty. You want $X$ to locally look like a toric variety, but have global geometry which is nontoric in a way that creates lots of sections of $\omega$.
This is a comment to Charles's answer, but I need more room than what comments allow.
"Glue back differently" means that the curve is "glued back" with its normal bundle "reversed".
There is also an algebraic way to think about flips: If $f:X\to Y$ is a contraction, then $X$ can be considered as ${\rm Proj}_Y\sum_{m=0}^\infty f_*\mathcal O_X(-mK_X)$. Now if $f$ is small, then the flip of $f$ is given by the morphism $f^+: X^+={\rm Proj}_Y\sum_{m=0}^\infty f_*\mathcal O_X(mK_X)\to Y$. So, to prove the existence of a flip you "only" need to prove that the above algebra is finitely generated over $\mathcal O_Y$.
This might not seem an intuitive way right away, but remember that Proj comes with a relatively ample divisor, so what's happening is that we make an $f$-anti-ample divisor into an $f^+$-ample one without changing it on the locus where $f$ was an isomorphism. If $X$ and $Y$ are $3$-dimensional and $f$ is a small Mori-contraction then it contracts a single rational curve and being ample is equivalent to the degree of the divisor on the curve being positive. Now the (anti-)ampleness of the canonical class is then governed by the normal bundle of the curve and hence "flipping" the positivity of $K_X$ on this curve is essentially the same as "flipping" the normal bundle.