Gradient of a Scalar Field is Perpendicular to Equipotential Surface
A simple way to understand this in an arbitrary dimension, is the following (you can make it rigorous if you work a little, but I doubt that there's a way to avoid the implicit function theorem).
The directional derivative of $\phi$ at the point $\vec r$ and in a direction $\vec u$ is given by $\vec u\cdot \vec \nabla\phi$. It is the derivative of $\phi(\gamma(t))|_{t=0}$ if $\gamma(t)$ is a curve such that $\gamma(0)=\vec r$ and $\dot \gamma(0)=\left.\frac{d\gamma}{dt}\right|_{t=0}=\vec u$. If $\gamma$ is contained in the equipotential surface, then clearly $\dot \phi=0$ identically, and in particular the gradient is perpendicular to all $\vec u$'s which are tangent to the equipotential surface.
To answer your question about coordinate independence, the gradient is a vector. As such, it is a geometrical entity that does not depend on the choice of coordinates. However, its coordinate representation clearly does depend on the choice of coordinates, but in a simple manner: a rotation of the coordinates will result in a rotation of the vector of components.
Here is an intuitive explanation. Let $p$ be any point. We want to show that $\nabla \phi(p)$ is perpendicular to the surface $\phi(x)=\phi(p)$ which passes through $p$. Let $\vec v$ be any unit vector at $p$. By the chain rule, the directional derivative of $\phi$ in the direction $\vec v$ at $p$ is $$\vec v \cdot \nabla \phi(p).$$ If $\vec v$ is tangent to the surface $\phi(x)=\phi(p)$, the directional derivative is $0$ because $\varphi$ is constant on the surface $\phi(x)=\phi(p)$. Hence this means $\vec v$ is perpendicular to $\nabla \phi(p)$.
First, when you say that the gradient is perpendicular to the scalar potential, you need to be clear that you really mean it is perpendicular to the normal vector of the surface described by that scalar potential (i.e. $\phi(x,y,z)=0$). A vector can't be perpendicular to a scalar, except w.r.t. that scalar field's normal vector.
At some particular point (x,y,z), use the implicit function theorem to solve for a surface as a function of (x,y), that is $z(x,y)$. Then the normal vector of this surface is defined as the cross product of the two partial derivatives, $\vec{n} = \frac{\partial}{\partial{x}}([x,y,z(x,y)])\times \frac{\partial}{\partial{y}}([x,y,z(x,y)])$.
Since these two vectors form a basis for the tangent plane, and the gradient lies within the tangent plane, then the gradient vector must be perpendicular to the surface normal (because the normal is the cross product of vectors in the tangent plane, and is hence perpendicular to vectors in the tangent plane).
A definition of the gradient that does not make reference to any particular coordinates is given by $$\nabla \phi = \lim_{S\to0}\frac{1}{V}\oint_{S}\vec{n}\phi{dS}$$ where $S$ is 'surface' bounding a 'volume' $V$ with the convention that, when implied by the dimensions of the problem, $S$ will be taken to mean a patch of a 2-D surface or an arc of a curve and $V$ will analogously be the area or length, respectively. Using this, you can show that the different ways of calculating the gradient in different coordinate systems are equal.
As a side note, your intuition should tell you that the gradient, and other vector quantities that arise in physical identities, ought to be independent of the coordinates because otherwise, certain physical laws would change just by changing the coordinates you work with, which doesn't make physical sense.