What is an intuition behind total differential in two variables function?
I prefer to start from a definition of the total derivative that isn’t tied to a specific coordinate system. If $f:\mathbb R^m\to\mathbb R^n$, it is differentiable at $\mathbf v\in\mathbb R^m$ if there is a linear map $L_{\mathbf v}:\mathbb R^m\to\mathbb R^n$ such that $f(\mathbf v+\mathbf h)=f(\mathbf v)+L_{\mathbf v}[\mathbf h]+o(\|\mathbf h\|)$. The linear map $L_{\mathbf v}$ is called the differential or total derivative of $f$ at $\mathbf v$, denoted by $\mathrm df_{\mathbf v}$ or simply $\mathrm df$. The idea here is that $\mathrm df_{\mathbf v}$ is the best linear approximation to the change in $f$ near $\mathbf v$, with the error of this approximation vanishing “faster” than the displacement $\mathbf h$.
Relative to some specific pair of bases for the domain and range of $f$, $\mathrm df$ can be represented by an $n\times m$ matrix. To see what this matrix is, you can treat $f$ as a vector of functions:$$f(\mathbf v)=\pmatrix{\phi_1(\mathbf v)\\\phi_2(\mathbf v)\\\vdots\\\phi_n(\mathbf v)}$$ or, written in terms of coordinates, $$\begin{align}y_1&=\phi_1(x_1,x_2,\dots,x_m)\\y_2&=\phi_2(x_1,x_2,\dots,x_m)\\\vdots\\y_n&=\phi_n(x_1,x_2,\dots,x_m).\end{align}$$ The matrix of $\mathrm df$ then turns out to be the Jacobian matrix of partial derivatives $$\pmatrix{{\partial\phi_1\over\partial x_1}&{\partial\phi_1\over\partial x_2}&\cdots&{\partial\phi_1\over\partial x_m}\\{\partial\phi_2\over\partial x_1}&{\partial\phi_2\over\partial x_2}&\cdots&{\partial\phi_2\over\partial x_m}\\\vdots&\vdots&\ddots&\vdots\\{\partial\phi_n\over\partial x_1}&{\partial\phi_n\over\partial x_2}&\cdots&{\partial\phi_n\over\partial x_m}}.$$ The displacement vector $\mathbf h$ can be written as $\mathrm d\mathbf v=(\mathrm dx^1,\mathrm dx^2,\dots,\mathrm dx^m)^T$. (The $\mathrm dx^i$ here can themselves be thought of as differentials of affine coordinate functions, but that’s not an important detail for this discussion.)
For the special case of a scalar function $f:\mathbb R^m\to\mathbb R$, $\mathrm df[\mathbf h]$ becomes $${\partial f\over\partial x_1}\mathrm dx^1+{\partial f\over\partial x_2}\mathrm dx^2+\cdots+{\partial f\over\partial x_m}\mathrm dx^m.$$ Now, the partial derivative ${\partial f\over\partial x_i}$ is just the directional derivative of $f$ in the direction of the $x^i$-axis, so this formula expresses the total derivative of $f$ in terms of its directional derivatives in a particular set of directions. Notice that there was nothing special about the basis we chose for $\mathbb R^m$. If we choose a different basis, $\mathrm df$ will have the same form, but the derivatives will be taken in a different set of directions. In your case of $\mathbb R^2$, a basis consists of two vectors, so derivatives in only two directions are sufficient to completely specify the total derivative. If you understand it as a linear map from $\mathbb R^2$ to $\mathbb R$, this should come as no surprise.
Suppose I have a scalar field on the plane given by the formula
$$ s = x + y^2 + e^r + \sin(\theta) $$
Yes, this formula for $s$ mixes both cartesian and polar coordinates on the plane!
Using this formula, we can compute the total differential to be
$$ \mathrm{d}s = \mathrm{d}x + 2 y \,\mathrm{d}y + e^r \,\mathrm{d}r + \cos(\theta) \,\mathrm{d}\theta $$
So don't think of it as doing some calculation with just the right number of partial derivatives — think of it as just the extension of the familiar methods of computing derivatives. Partial derivatives only enter the picture when you are specifically interested in computing the differential of a function that has more than one argument; e.g. to compute $\mathrm{d}f(s,t)$ for some function $f$ of two arguments.
Of course, we can rewrite; e.g. it using equations like $\mathrm{d}x = \mathrm{d}(r \cos(\theta)) = \cos(\theta)\, \mathrm{d}r - r \sin(\theta) \, \mathrm{d}\theta$ and $\mathrm{d}y = \sin(\theta) \,\mathrm{d}r + r \cos(\theta) \, \mathrm{d}\theta$ to get rid of the $\mathrm{d}x$ and $\mathrm{d}y$ terms and leaving the result in terms of $\mathrm{d}r$ and $\mathrm{d}\theta$.
In the plane, there are only two independent differentials, so we can always rewrite as a linear combination of two of them.
In my opinion, the better way to think about things is that the total differential is the most natural form of the derivative, and the partial derivative is a linear functional on differential forms; e.g. in the standard $x-y$ coordates, $\partial/\partial x$ is the mapping that sends $\mathrm{d}x \to 1$ and $\mathrm{d}y \to 0$.
So, using the notation $\partial z / \partial x$ for the action of $\partial / \partial x$ on $\mathrm{d}z$, we see that if we have an equation
$$ \mathrm{d}z = f \,\mathrm{d}x + g \,\mathrm{d}y $$
then
$$ \frac{\partial z}{\partial x} = f \cdot 1 + g \cdot 0 = f$$ $$ \frac{\partial z}{\partial y} = f \cdot 0 + g \cdot 1 = g$$
and so we'd have
$$ \mathrm{d}z = \frac{\partial z}{\partial x}\, \mathrm{d}x + \frac{\partial z}{\partial y} \,\mathrm{d} y$$
Aside: another advantage the total differential has over the partial derivative is that it's actually self-contained. In the plane, $\partial / \partial x$ has no meaning on its own; e.g. if we set $w=x+y$, then $\partial / \partial x$ means something different when expressing things as a function of $(x,y)$ than it does when expressing things as a function of $(x,w)$. (in the former, it sends $\mathrm{d}y \to 0$, in the latter it sends $\mathrm{d}w \to 0$, and thus $\mathrm{d}y \to -1$).
Any infinitely small change in $(x,y)$ includes a change $dx$ in $x$ and a change $dy$ in $y$. The resulting change in $z$ resulting from the change in $x$ is $\dfrac{\partial z}{\partial x} \, dx$, and to that we add a change in $z$ resulting from the change in $y$.