Understanding weighted inner product and weighted norms
Weighted norms have a variety of uses. Suppose you're measuring the size of vectors that are coming out of some random or physical process, and they look like this: $$ \begin{bmatrix} +5.4\times 10^{-10} \\ -1.3\times 10^{+6} \\ \end{bmatrix} \begin{bmatrix} +1.8\times 10^{-9} \\ -4.3\times 10^{+5} \\ \end{bmatrix} \begin{bmatrix} -2.3\times 10^{-9} \\ +3.4\times 10^{+5} \\ \end{bmatrix} \begin{bmatrix} +8.6\times 10^{-10} \\ +3.6\times 10^{+6} \\ \end{bmatrix} \begin{bmatrix} -3.2\times 10^{-10} \\ +2.7\times 10^{+6} \\ \end{bmatrix} $$ Would it make sense to use the standard Euclidean norm $\|\cdot\|_2$ to measure the size of these vectors? I say no. The values of $x_1$ hover around $10^{-9}$, $x_2$ around $10^6$. Since $x_1$ is so much smaller than $x_2$, $\|x\|_2\approx |x_2|$. You're losing information about $x_1$ with this measurement.
What you might choose to do in this circumstance is select a diagonally weighted norm $\|x\|_D\triangleq\sqrt{x^*Dx}$, with the values of $D_{ii}>0$ chosen to "normalize" each entry. For instance, I might choose $D_{11}=10^{18}$ and $D_{22}=10^{-12}$. The values of $D^{1/2} x$ are $$ \begin{bmatrix} +0.54 \\ -1.3 \end{bmatrix} \begin{bmatrix} +1.8 \\ -0.43 \end{bmatrix} \begin{bmatrix} -2.3 \\ +0.34 \end{bmatrix} \begin{bmatrix} +0.86 \\ +3.6 \end{bmatrix} \begin{bmatrix} -0.32 \\ +2.7 \end{bmatrix} $$ Now small relative changes in $x_1$ will have approximately the same impact on the norm $\|x\|_D=\sqrt{x^*Dx}=\|D^{1/2}x\|_2$ as small relative changes in $x_2$. This is probably a more informative norm for this set of vectors than a standard Euclidean norm.
Diagonally weighted norms are probably the easiest to justify intuitively, but in fact more general weighted norms have their uses. For instance, they come up often in proofs about Newton's method.
For information about matrix square roots, Wikipedia really is not a bad place to start, or any reasonably good linear algebra text. Square roots exist for any Hermitian positive semidefinite matrix---that is, any Hermitian matrix with nonnegative real eigenvalues.
Two types of square roots are typically considered for a real symmetric/complex Hermitian PSD matrix $M$. The lower triangular Cholesky factor $L$ satisfying $M=LL^*$ is simpler to compute in practice. But the symmetric/Hermitian square root $Q=M^{1/2}$ satisfying $M=Q^2$ is often preferred in proofs, because then you don't have to keep track of transposes, and because sometimes it is helpful for $Q$ and $M$ to share eigenvectors.
With the symmetric square root defined, the derivations for (2) are straightforward: $$\|M^{1/2}x\|_2 = \left(x^*M^{*/2}M^{1/2}x\right)^{1/2} = \left(x^*M^{1/2}M^{1/2}x\right)^{1/2} = \left(x^*Mx\right)^{1/2} = \|x\|_M.$$ Here is a derivation for (4). First, we convert the numerator: $$\|M^{1/2}AN^{-1/2}\|_2 = \max_{\|x\|_2=1} \|M^{1/2}(AN^{-1/2}x)\|_2 = \max_{\|x\|_2=1} \|AN^{-1/2}x\|_M$$ Now we define $y=N^{-1/2} x$, or $x=N^{1/2} y$: $$\max_{\|x\|_2=1} \|AN^{-1/2}x\|_M = \max_{\|N^{1/2} y\|_2=1} \|Ay\|_M = \max_{\|y\|_N=1}\|Ay\|_M.$$
The above answers are perfectly nice. I just want to point out another example: energy norms.
I don't know how familiar you are with differential equations and/or calculus of variations, but I'll give it a try anyway.
Consider the following integral:
$$ E(v) = \frac{1}{2}\int_\Omega |\nabla v|^2 dx $$ where $\Omega$ is a nice (say with a smooth boundary, no corners nor spikes) bounded domain in $\mathbb{R}^n$. This in many application represents the internal energy of a system in a configuration given by the function $v$. For instance, if $v$ is the displacement from a reference configuration, $E(v)$ represents the elastic energy of the system (assuming linear elasticity).
The above integral can be rewritten as
$$ E(v) = a(v,v) $$ with
$$ a(u,v) = \frac{1}{2}\int_\Omega \nabla u\cdot\nabla v dx $$
Now, suppose we have a finite dimensional representation of the function $v$ (if you know finite elements you know where I'm heading). This means
$$ v(x) = \sum_{i=1}^n v_i \varphi_i(x), $$ where all the $\varphi_i(x)$ are fixed and known a priori.
If you plug this expression inside the definition of $E$ you get (being careful not to mess with indices)
$$ E(v) = \frac{1}{2}\int_\Omega \nabla\left(\sum_{j=1}^n v_j \varphi_j(x)\right)\cdot\nabla\left(\sum_{i=1}^n v_i \varphi_i(x)\right)dx\\ = \cdots = \sum_{i=1}^n\sum_{j=1}^n v_iv_j\frac{1}{2}\int_\Omega \nabla \varphi_j \cdot \nabla \varphi_i dx $$
Now let $\underline{v}$ be the vector of the coefficients $v_i$ and $A$ the matrix whose entries are
$$ a_{ij} = a(\varphi_j,\varphi_i) = \frac{1}{2}\int_\Omega \nabla \varphi_j\cdot\nabla\varphi_i dx $$ Under certain assumptions on $v$ (for instance, $v=0$ on $\partial\Omega$, the boundary of $\Omega$), it can be shown that this is indeed a positive definite matrix.
Now, if the system is in a configuration described by $v$ and $v$ is expressed as above, then the energy of the system is given by
$$ E(v) = a(v,v) = \underline{v}^tA\underline{v} $$ which is precisely a weighted norm of $\underline{v}$ (squared). Here the matrix is not exactly a weight, bur rather it encodes the physics of the phenomenon. It is possible to show that, if $v$ is expanded as before, and you pick your basis functions $\varphi_i$ in such a way that
$$ \int_\Omega \varphi_i\varphi_j dx = \begin{cases} 0\mbox{ if }i\neq j\\ 1\mbox{ if }i=j, \end{cases} $$ then the standard Eucledian norm of $\underline{v}$ is corresponds to the value of the integral
$$ I(v) = \int_\Omega v^2 dx $$ which is an important norm of $v$, but it measures a different energy of the system.
The significance of "weighted norms", why we need them:
note that norms you consider are induced by Hermitian bilinear forms, therefore we should discuss the Hermitian bilinear forms.
From the point of view of linear algebra there is nothing special about the "standard" Hermitian bilinear form $\langle x,y\rangle=x^{*}Iy$. All Hermitian bilinear forms are bilinear forms with additional Hermitian property. It can be shown that for every Hermitian bilinear form $\langle\cdot,\cdot\rangle$ (i.e. an inner product) there exist a Hermitian positive definite matrix $M$ such that $\langle x,y\rangle=x^{*}My$. And that is all. None of those inner products is in any way "better" than the other.
The only reason that you distinguish the "standard" inner product $\langle x,y\rangle=x^{*}Iy$ is the choice of a particular basis of your space (i.e. coordinate system). You perceive it as a "best" inner product because its matrix "$M$" is the identitiy matrix $I$, the simplest of all Hermitian positive definite matrices.
So in fact the distinction between the "standard" norm and weighted norms is artificial and has no real meaning.