Why do we care about dual spaces?
Let $V$ be a vector space (over any field, but we can take it to be $\mathbb R$ if you like, and for concreteness I will take the field to be $\mathbb R$ from now on; everything is just as interesting in that case). Certainly one of the interesting concepts in linear algebra is that of a hyperplane in $V$.
For example, if $V = \mathbb R^n$, then a hyperplane is just the solution set to an equation of the form $$a_1 x_1 + \cdots + a_n x_n = b,$$ for some $a_i$ not all zero and some $b$. Recall that solving such equations (or simultaneous sets of such equations) is one of the basic motivations for developing linear algebra.
Now remember that when a vector space is not given to you as $\mathbb R^n$, it doesn't normally have a canonical basis, so we don't have a canonical way to write its elements down via coordinates, and so we can't describe hyperplanes by explicit equations like above. (Or better, we can, but only after choosing coordinates, and this is not canonical.)
How can we canonically describe hyperplanes in $V$?
For this we need a conceptual interpretation of the above equation. And here linear functionals come to the rescue. More precisely, the map
$$\begin{align*} \ell: \mathbb{R}^n &\rightarrow \mathbb{R} \\ (x_1,\ldots,x_n) &\mapsto a_1 x_1 + \cdots a_n x_n \end{align*}$$
is a linear functional on $\mathbb R^n$, and so the above equation for the hyperplane can be written as $$\ell(v) = b.$$
More generally, if $V$ is any vector space, and $\ell: V \to \mathbb R$ is any non-zero linear functional (i.e. non-zero element of the dual space), then for any $b \in \mathbb R,$ the set
$$\{v \, | \, \ell(v) = b\}$$
is a hyperplane in $V$, and all hyperplanes in $V$ arise this way.
So this gives a reasonable justification for introducing the elements of the dual space to $V$; they generalize the notion of linear equation in several variables from the case of $\mathbb R^n$ to the case of an arbitrary vector space.
Now you might ask: why do we make them a vector space themselves? why do we want to add them to one another, or multiply them by scalars?
There are lots of reasons for this; here is one: Remember how important it is, when you solve systems of linear equations, to add equations together, or to multiply them by scalars (here I am referring to all the steps you typically make when performing Gaussian elimination on a collection of simultaneous linear equations)? Well, under the dictionary above between linear equations and linear functionals, these processes correspond precisely to adding together linear functionals, or multiplying them by scalars. If you ponder this for a bit, you can hopefully convince yourself that making the set of linear functionals a vector space is a pretty natural thing to do.
Summary: just as concrete vectors $(x_1,\ldots,x_n) \in \mathbb R^n$ are naturally generalized to elements of vector spaces, concrete linear expressions $a_1 x_1 + \ldots + a_n x_n$ in $x_1,\ldots, x_n$ are naturally generalized to linear functionals.
Since there is no answer giving the following point of view, I'll allow myself to resuscitate the post.
The dual is intuitively the space of "rulers" (or measurement-instruments) of our vector space. Its elements measure vectors. This is what makes the dual space and its relatives so important in Differential Geometry, for instance. This immediately motivates the study of the dual space. For motivations in other areas, the other answers are quite well-versed.
This also happens to explain intuitively some facts. For instance, the fact that there is no canonical isomorphism between a vector space and its dual can then be seen as a consequence of the fact that rulers need scaling, and there is no canonical way to provide one scaling for space. However, if we were to measure the measure-instruments, how could we proceed? Is there a canonical way to do so? Well, if we want to measure our measures, why not measure them by how they act on what they are supposed to measure? We need no bases for that. This justifies intuitively why there is a natural embedding of the space on its bidual. (Note, however, that this fails to justify why it is an isomorphism in the finite-dimensional case).
There are some very beautiful and easily accessible applications of duality, adjointness, etc. in Rota's modern reformulation of the Umbral Calculus. You'll quickly gain an appreciation for the power of such duality once you see how easily this approach unifies hundreds of diverse special-function identities, and makes their derivation essentially trivial. For a nice introduction see Steven Roman's book "The Umbral Calculus".