What's the idea behind Carleman estimates?
The weight function $\phi$ indeed plays an (I would even say the) essential role in Carleman estimates, and constructing one such that the estimate holds is one of the major challenges in proving and applying Carleman estimates. I will give a very informal overview, and refer to the references below for the technicalities.
The original motivation for introducing Carleman estimates was to prove unique continuation theorems, which can informally be stated as follows: Given a partial differential operator $P$, an oriented hypersurface $\Sigma$, and a function $u$ satisfying $Pu=0$, then if $u=0$ (locally) on one side $\Sigma^+$ of $\Sigma$, it vanishes on the other side $\Sigma^-$ as well. This can be interpreted in the sense that the complete information about $u$ in $\Sigma^-$ can be retrieved from information in $\Sigma^+$, i.e., the complete information flows across $\Sigma$. For linear hyperbolic equations, such information flow is described by the characteristics, and the idea behind Carleman estimates is to extend this concept to general (pseudo-)differential operators via microlocal analysis. This leads to considering so-called bicharacteristic rays, which need to satisfy the geometric property that all such rays passing near $\Sigma$ in $\Sigma^+$ must cross into $\Sigma^-$. Furthermore, solutions to $Pu=0$ should not decay exponentially along such rays towards $\Sigma$ (otherwise the information gets lost before it crosses $\Sigma$). A surface satisfying these conditions is called strongly pseudo-convex (with respect to $P$).
To make these conditions precise, one considers $\Sigma$ as the level set $\{\phi(x)=0\}$ (and $\Sigma^+$ as $\{\phi(x)>0\}$). The pseudo-convexity conditions above then become positivity conditions for the derivative of $\phi$ along the bicharacteristic flow (expressed using Poisson brackets). The connection is now that if $\Sigma=\{\phi(x)=0\}$ is a strongly pseudo-convex surface, then $e^{\lambda\phi}$ is strongly pseudo-convex function (in the sense of these positivity conditions) for sufficiently large $\lambda$. The parameter $\lambda$ is tied to the decay rate along the bicharacteristic rays, and hence the Carleman weight can be seen as accounting for this decay: The weight vanishes as the information flow across $\Sigma$ decays. The weight thus localizes the estimate to that part of $\Sigma^-$ where you still have usable information from $\Sigma^+$.
For example, for elliptic operators of second order, every smooth surface is strongly pseudo-convex (the decay condition is always satisfied); similarly, for hyperbolic operators with constant (real) coefficients, every non-characteristic surface is strongly pseudo-convex. In particular, for the wave operator $Pu=u_{{{tt}}} - c^2\Delta u$, any convex surface is strongly pseudo-convex, as well as the zero level sets for $\phi = x^2 - \beta t^2$ for $\beta < c^2$ (this bound is sharp).
[1] A good overview of unique continuation using Carleman estimates (along the lines I've given) is Daniel Tataru, Unique Continuation Problems for Partial Differential Equations, The IMA Volumes in Mathematics and its Applications Volume 137, 2004, pp 239-255.
[2] Carleman estimates are also useful for inverse problems for partial differential equations; this side is discussed in Victor Isakov, Inverse Problems for Partial Differential Equations, 2nd ed., 2006, Springer. (The example for the wave equation is Theorem 3.4.1.)
[3] David Dos Santos Ferreira has written a habilitation thesis on Carleman estimates.