How is quantum superposition different from mixed state?
The state
\begin{equation} |\Psi \rangle = \frac{1}{\sqrt{2}}\left(|\psi_1\rangle +|\psi_2\rangle \right) \end{equation}
is a pure state. Meaning, there's not a 50% chance the system is in the state $|\psi_1\rangle$ and a 50% it is in the state $|\psi_2\rangle$. There is a 0% chance that the system is in either of those states, and a 100% chance the system is in the state $|\Psi\rangle$.
The point is that these statements are all made before I make any measurements.
It is true that if I measure the observable corresponding to $\psi$ ($\psi$-gular momentum :)), then there is a 50% chance after collapse the system will end up in the state $|\psi_1\rangle$.
However, let's say I choose to measure a different observable. Let's say the observable is called $\phi$, and let's say that $\phi$ and $\psi$ are incompatible observables in the sense that as operators $[\hat{\psi},\hat{\phi}]\neq0$. (I realize I'm using $\psi$ in a sense you didn't originally intend but hopefully you know what I mean). The incompatibliity means that $|\psi_1 \rangle$ is not just proportional to $|\phi_1\rangle$, it is a superposition of $|\phi_1\rangle$ and $|\phi_2\rangle$ (the two operators are not simulatenously diagonalized).
Then we want to re-express $|\Psi\rangle$ in the $\phi$ basis. Let's say that we find \begin{equation} |\Psi\rangle = |\phi_1\rangle \end{equation}
For example, this would happen if \begin{equation} |\psi_1\rangle = \frac{1}{\sqrt{2}}(|\phi_1\rangle+|\phi_2\rangle) \end{equation} \begin{equation} |\psi_2\rangle = \frac{1}{\sqrt{2}}(|\phi_1\rangle-|\phi_2\rangle) \end{equation} Then I can ask for the probability of measuring $\phi$ and having the system collapse to the state $|\phi_1\rangle$, given that the state is $|\Psi\rangle$, it's 100%. So I have predictions for the two experiments, one measuring $\psi$ and the other $\phi$, given knowledge that the state is $\Psi$.
But now let's say that there's a 50% chance that the system is in the pure state $|\psi_1\rangle$, and a 50% chance the system is in the pure state $|\psi_2\rangle$. Not a superposition, a genuine uncertainty as to what the state of the system is. If the state is $|\psi_1 \rangle$, then there is a 50% chance that measuring $\phi$ will collapse the system into the state $|\phi_1\rangle$. Meanwhile, if the state is $|\psi_2\rangle$, I get a 50% chance of finding the system in $|\phi_1\rangle$ after measuring. So the probability of measuring the system in the state $|\phi_1\rangle$ after measuring $\phi$, is (50% being in $\psi_1$)(50% measuring $\phi_1$) + (50% being in $\psi_2$)(50% measuring $\phi_1$)=50%. This is different than the pure state case.
So the difference between a 'density matrix' type uncertainty and a 'quantum superposition' of a pure state lies in the ability of quantum amplitudes to interfere, which you can measure by preparing many copies of the same state and then measuring incompatible observables.
Apart from the already mathematically detailed answers given above, perhaps it would be useful to have a physical picture in mind -- the double slit experiment.
The classical 50:50 picture corresponds to the case where you send, at random i.e. 50% chance, through either one of the slits. This will result in no interference pattern on the receiving screen. This is a maximally mixed state, and has no information content.
A quantum superposition sends the particle through both slits at once, and this will produce an interference at the screen. I'm using the language "both slits at once" because we physicists are raised on these sort of language, and there's really no way of getting around it. Bohr himself likes to say that we're suspended by words. This state can be used to transmit information; say one guy modulates the positions of the slit, and so the resulting fringes seen by another guy at the screen modulates as well, and the information is contained in the modulation. Of course, this modulation will ultimately be limited by the speed of the particles, which is limited by the speed of light. Being a pure state, this means that the fringe contrast is perfect, so the information is transmitted optimally.
This suggests a fundamental difference between classical probabilities and quantum probabilities; the latter has phase, can interfere, and produce deterministic outcomes.
The sentence of Wikipedia :
"For example, there may be a 50% probability that the state vector is $| \psi_1 \rangle$ and a 50% chance that the state vector is $| \psi_2 \rangle$ . This system would be in a mixed state."
is false.
The difference between pure states and partially or completely mixed states, is only a difference of structure of the density matrix.
For a pure (supposed normed) state $\psi$, the density matrix is $\rho =|\psi\rangle \langle \psi|$, and this matrix has rank one, so in some basis, $\rho$ may be written $\rho = \text{Diag}(1,0,0.......0)$
Density matrix with rank different of one correspond to partially or completely mixed states.
Compare a pure and a mixed density matrix (in a basis $\psi_1 , \psi_2$):
$$\rho_\text{pure} =\frac{1}{2} \begin{pmatrix} 1&1\\1&1 \end{pmatrix}, \quad \quad \rho_\text{mixed } =\frac{1}{2} \begin{pmatrix} 1&0\\0&1 \end{pmatrix}$$ where the pure density matrix is build from a pure state $\psi = \frac{1}{\sqrt{2}}(\psi_1 + \psi_2)$, with $\langle \psi_1| \psi_2 \rangle = 0$, and where the mixed density matrix is a classical statistical matrix.
It is easy to see that the probability density to find the system in state $1$, is the same for the two density matrices :
$$p_1 = Tr(\rho P_1) = Tr (\rho |\psi_1\rangle \langle \psi_1|) = \rho_{11}=\frac{1}{2}$$
In the same way, one finds , for the two matrices, : $p_2 = \rho_{22}=\frac{1}{2}$