Intuitive way to understand covariance and contravariance in Tensor Algebra
Since you asked for an intuitive way to understand covariance and contravariance, I think this will do.
First of all, remember that the reason of having covariant or contravariant tensors is because you want to represent the same thing in a different coordinate system. Such a new representation is achieved by a transformation using a set of partial derivatives. In tensor analysis, a good transformation is one that leaves invariant the quantity you are interested in.
For example, we consider the transformation from one coordinate system $x^1,...,x^{n}$ to another $x^{'1},...,x^{'n}$:
$x^{i}=f^{i}(x^{'1},x^{'2},...,x^{'n})$ where $f^{i}$ are certain functions.
Take a look at a couple of specific quantities. How do we transform coordinates? The answer is:
$dx^{i}=\displaystyle \frac{\partial x^{i}}{\partial x^{'k}}dx^{'k}$
Every quantity which under a transformation of coordinates, transforms like the coordinate differentials is called a contravariant tensor.
How do we transform some scalar $\Phi$?
$\displaystyle \frac{\partial \Phi}{\partial x^{i}}=\frac{\partial \Phi}{\partial x^{'k}}\frac{\partial x^{'k}}{\partial x^{i}}$
Every quantity which under a coordinate transformation, transforms like the derivatives of a scalar is called a covariant tensor.
Accordingly, a reasonable generalization is having a quantity which transforms like the product of the components of two contravariant tensors, that is
$A^{ik}=\displaystyle \frac{\partial x^{i}}{\partial x^{'l}}\frac{\partial x^{k}}{\partial x^{'m}}A^{'lm}$
which is called a contravariant tensor of rank two. The same applies to covariant tensors of rank n or mixed tensor of rank n.
Having in mind the analogy to coordinate differentials and derivative of a scalar, take a look at this picture, which I think will help to make it clearer:
From Wikipedia:
The contravariant components of a vector are obtained by projecting onto the coordinate axes. The covariant components are obtained by projecting onto the normal lines to the coordinate hyperplanes.
Finally, you may want to read: Basis vectors
By the way, I don't recommend to rely blindly on the picture given by matrices, specially when you are doing calculations.
I prefer to think of them as maps instead of matrices. When you move to tensor bundles over manifolds, you won't have global coordinates, so it might be preferable to think this way.
So $x_i$ is a map which sends vectors to reals. Since it's a tensor, you're only concerned with how it acts on basis elements. It's nice to think of them in terms of dual bases: then $x_i(x^j)=\delta_{ij}$, which is defined as $1$ when $i=j$ and $0$ otherwise.
Similarly, $x^i$ is a map which sends covectors to reals, and is defined by $x^i(x_j)=\delta_{ij}$.
If you have more indices, then you're dealing with a tensor product $V^*\otimes\dotsb\otimes V^*\otimes V\otimes\dotsb\otimes V$, say with $n$ copies of the vector space and $m$ copies of the dual. An element of this vector space takes in $m$ vectors and gives you back $n$, again in a tensorial way. So, for example, $X_{ijk}$ is a trilinear map; $X^{ijk}$ is a trivector (an ordered triple of vectors up to linearity); $X_{ij}^k$ is a bilinear map taking two vectors to one vector; and so on.
It's worth thinking about these in terms of the tensors you've seen already. The dot product, for example, is your basic (0,2)-tensor. The cross product is a (1,2)-tensor. If you study Riemannian manifolds, it turns out you can use the metric to "raise and lower indices"; so the Riemannian curvature tensor, for example, is alternately defined as a (1,3)-tensor and a (0,4)-tensor, depending on the author's needs.
The covariance or a contravariance of certain quantities tell you how to transform them to keep the result invariant from the choice of the coordinate system. You transform covariant quantities one-way while you do the inverse with the contravariant ones.
To describe a vector you need coordinates $v^j$ and basis vectors $\mathbf{e_j}$. So the linear combination of the two gives you the actual vector $v^j \mathbf{e_j}$.
But you are free to choose the basis so in a different basis the same vector maybe described as $w^j \mathbf{f_j}$.
So $v^j \mathbf{e_j} = w^j \mathbf{f_j}$
The basis vectors themselves can be expressed as a linear combinations of the other basis:
$\mathbf{e_j} = A^k_j \mathbf{f_k}$.
There $A$ is the basis transformation matrix. Let's have another matrix $B$. Which is inverse of the matrix $A$, so their product gives an identity matrix (Kronecker-delta):
$B^l_j A^k_l = \delta^k_j$
Let's take $w^j \mathbf{f_j}$ and multiply it with the identity, nothing changes:
$w^j \delta^k_j \mathbf{f_k}$
Expand the delta as a product of the two matrices, nothing changes:
$w^j B^l_j A^k_l \mathbf{f_k}$
Parenthesize it like this and you can see something:
$\left( w^j B^l_j \right) \left( A^k_l \mathbf{f_k} \right)$
In the right bracket you got back $\mathbf{e_j}$. While on the left bracket there must be $v^j$.
You can see the basis vectors are transformed with $A$, while the coordinates are the transformed with $B$. The basis vectors very in one way, while the coordinates vary exactly the opposite way. The basis vectors are covariant, the coordinates are contravariant.
Upper indexes and lower indexes just denote whether you need to use the basis change matrix or the inverse. So if you have a tensor let's say: ${F^{abc}}_{defg}$. Based on the index placement you already know you can transform it to a different coordinate system like this: ${F^{abc}}_{defg} B^h_a B^i_b B^j_c A^d_k A^e_l A^f_m A^g_n$.
Also if you care to always match the upper indexes with the lower ones when multiplying the result will be invariant and coordinate system independent. This is an opportunity to self check your work.
Index placement is also helpful to check whether and object is really tensor or just a symbol.
For example the metric tensor $g_{ij}$ have two covariant indexes that mean in a different coordinate system it must look like this: $\tilde g_{ij} A^i_k A^j_l$.
And indeed: $g_{ij} = \mathbf{e_i} \cdot \mathbf{e_j} = \left( \mathbf{f_k} A^k_i \right) \cdot \left( \mathbf{f_l} A^l_j \right) = \left( \mathbf{f_k} \cdot \mathbf{f_l} \right) A^k_i A^l_j = \tilde{g}_{kl} A^k_i A^l_j $
Similarly you can check the Christoffel-symbols $\Gamma^m_{jk}$ aren't tensors, because they aren't transform like that.
While the covariant derivative $\nabla_k v^m = \partial_k v^m + v^j \Gamma^m_{jk}$ does. But that would require more symbol folding.