Lagrange multipliers tangency
Tangency of the contour lines to the constraint curve is not a necessary condition.
To the Wiki article Lagrange multiplier, there is a note "Inaccurate intuition" criticizing the article for promoting the false intuition that the extrama of the function occurs when the level curves are tangent to the constraint curve. The author of the note gives an example whose simplified version is as follows.
Take $$f(x,y)=x^2$$ whose 3D plot and level curves look like as follows
The constraint curve is a circle centered at the origin (blue line): $$g(x,y)=x^2+y^2-1=0$$ the minima occur at $(0,-1)$ and $(0,1)$ at which points the contor lines are perpendicular to the constraint. The two maxima occur at points where the constraint curve is, indeed, tangent to the contours: $(-1,0)$ and $(1,0)$.
This, however, does not mean that the Lagrange multiplier method does not work. Take the Lagrange function $$\mathscr L(x,y,\lambda)=x^2+\lambda(x^2+y^2-1)$$
and take the partial derivatives and set them equal to zero:
$$2x(1+\lambda)=0,$$ $$\lambda2y=0,$$ $$x^2+y^2=1.$$
For $\lambda=0$: $x=0$ and $y=\pm 1$ and for $\lambda=-1$: $y=0$ and $x=\pm1$ as the intuition already has shown.
I like to think of it this way: imagine $\vec c$ is the vector pointing tangent along the constraint curve, and $\nabla f$ is the gradient vector of the scalar function.
If $\vec c \cdot \nabla f \neq 0$, then we can slide along the constraint curve and shift the value of the function. So we're not at a local extremum.
The condition for when we've reached a local extremum is $\vec c \cdot \nabla f = 0$. In other words, $\nabla f$ points perpendicular to the constraint curve.
Revised answer:
I will try to give an intuitive answer that works for an arbitrary number of dimensions. Because the $2-$dimensional picture is limited. You will need to be familiar with subspaces and orthogonal subspaces in order to understand. When I say "surface" below, I really just mean a differentiable manifold of any dimension. Say $\vec x \in \mathbb R^m$.
A local minimum of $f$ along a constraining $n-$dimensional surface $C$ is achieved when any slide of $x$ along that surface gives a small change of $f$ that is zero. Locally, $\delta f = \delta \vec x \cdot \nabla f$. A stationary point is when there's no choice of $\delta \vec x$ that is tangent to $C$ which also yields a change in $f$. For the $n-$dimensional surface $C$ there are $n$ fundamental directions we can take $\vec x$ (all others being linear combinations). A local extremum of $f$ is when we have $\delta f$ equal to zero along all directions tangent to that constraining surface: $$\delta \vec c_1 \cdot \nabla f = 0$$ $$\delta \vec c_2 \cdot \nabla f = 0$$ $$\dots$$ $$\delta \vec c_n \cdot \nabla f = 0$$ In other words, the projection of $\nabla f$ onto the subspace spanned by the tangent vectors to $C$ must equal zero. For n=m-1 (i.e. a single Lagrangian multiplier) this means that $\nabla f$ is zero or is pointing orthogonal to the surface of $C$.
The equation $\nabla f = \lambda_1 \nabla g_1 + \lambda_2 \nabla g_2 + \dots $ for multiple lagrange multipliers will now be derived.
When the constraining surface $C$ is represented as a collection of scalar functions equal to constants (as it is in Lagrangian multipliers): $$g_1(\vec x) = 0$$ $$g_2(\vec x) = 0$$ $$\dots$$ $$g_{m-n}(\vec x) = 0$$ the tangent subspace of $C$ at $\vec x$ is that subspace of directions (vectors) which do not alter any the values of any of these $g$ functions. i.e. a tangent vector $\vec c$ of the constraining surface $C$ is defined by $\forall i, \vec c \cdot \nabla g_i = 0$. In other terms, $\vec c$ lives in the orthogonal subspace to that spanned by the gradients of each function. The tangent subspace of $C$ is given by $$\vec c \in \left(\mathrm{span}\left\lbrace \nabla g_i\right\rbrace_i\right)_\bot.$$
As stated earlier, for a local extremum, we need either $\nabla f$ equal to zero, or to be pointing orthogonal to any tangent vector of $C$ (i.e. zero when projected onto the tangent subspace). If it is not pointing orthogonal, then there is a direction we could slide along the surface of $C$ which would change the function $f$, hence not an extremum. The conclusion is that $\nabla f$ lives in the subspace orthogonal to the tangent subspace of $C$: $$\nabla f \in \mathrm{span}\left\lbrace \nabla g_i \right\rbrace_i \quad \left( \quad = \left( \left( \mathrm{span}\left\lbrace \nabla g_i \right\rbrace_i \right)_\bot \right)_\bot \quad \right)$$ which is the same thing as stating $$\nabla f = \lambda_1 \nabla g_1 + \lambda_2 \nabla g_2 + \dots.$$
Conveniently, in one equation we have also encoded $\nabla f = 0$ as a valid solution. So this last equation completely captures the definition of a stationary point of $f$ when constrained by the functions $g_1, g_2, \dots g_i$.