Why is convexity more important than quasi-convexity in optimization?
There are many reasons why convexity is more important than quasi-convexity in optimization theory. I'd like to mention one that the other answers so far haven't covered in detail. It is related to Rahul Narain's comment that the class of quasi-convex functions is not closed under addition.
Duality theory makes heavy use of optimizing functions of the form $f+L$ over all linear functions $L$. If a function $f$ is convex, then for any linear $L$ the function $f+L$ is convex, and hence quasi-convex. I recommend proving the converse as an exercise: $f+L$ is quasi-convex for all linear functions $L$ if and only if $f$ is convex.
Thus, for every quasi-convex but non-convex function $f$ there is a linear function $L$ such that $f+L$ is not quasi-convex. I encourage you to construct an example of a quasi-convex function $f$ and a linear function $L$ such that $f+L$ has local minima which are not global minima.
Thus, in some sense convex functions are the class of functions for which the techniques used in duality theory apply.
Sophisticated optimization methods are developed to deal with problems where it is not obvious what the solution is. In general, we may not know much about the behavior of the cost function, so we cannot decide on an a-priori base whether there is a unique minimum or not. We need optimization methods to find the or a minimum.
For convex problems, the global behavior is determined by purely local properties. If a function is everywhere locally convex, it is globally convex too. So for such problems, local solutions are sufficient, which make them a lot easier to deal with.
For your first question, the answer is basically yes.
Gradient methods are usually descent methods, so the cost function decreases at points that are non-optimal points. Many such methods also have the property that any accumulation point of the sequence generated cannot have a non-zero gradient (call these 'guaranteed' descent methods). (This awkward wording allows the cost function to be non-differentiable at a minimum.)
Here is a sufficient condition that applies to the problem above: Let the initial point be $x_0$, and let $L_{x_0} = \{x | f(x) \leq f(x_0) \}$. Suppose $L_{x_0}$ is compact and the gradient is non-zero at all points other than the minimizer. Then any 'guaranteed' descent methods will converge to the minimizer. (I'm assuming that the cost function $f$ is $C^1$ at non-optimal points.) The problem you have above satifies these conditions.
The second question is more of a discussion topic. Some points to consider:
Duality is another big plus for convex problems. Dual problems are often more amenable to solutions.
Problems can often be transformed into convex problems, posynomials being a good example.
Often convexity of the level sets is the important criterion. Various generalizations (eg, pseudo-, and quasi-convexity) can extend the usefulness of convexity to broader settings.