Why are planar graphs so exceptional?
(I think that the question of why planar graphs are exceptional is important. It can be asked not only in the context of graphs embeddable on other surfaces. Let me edit and elaborate, also borrowing from the remarks.)
Duality: Perhaps duality is the crucial property of planar graphs. There is a theorem asserting that the dual of a graphic matroid M is a graphic matroid if and only if M is the matroid of a planar graph. In this case, the dual of M is the matroid of the dual graph of G. (See this wikipedia article). This means that the circuits of a planar graph are in one to one correspondence with cuts of the dual graph.
One important manifestation of the uniqueness of planar graphs (which I believe is related to duality) is Kasteleyn's formula for the number of perfect matchings and the connection with counting trees.
Robust geometric descriptions: Another conceptual difference is that (3-connected or maximal) planar graphs are graphs of convex 3-dimensional polytopes and thus have extra geometric properties that graphs on surfaces do not share.
The geometric definition of planar graphs (unlike various generalizations) is very robust. A graph is planar if it can be drawn in the plane such that the edges do not intersect in their interiors and are represented by Jordan curves; The class of planar graphs is also what we get if we replace "Jordan curves" by "line intervals," or if we replace "no intersection" by "even number of crossings". The Koebe-Andreev-Thurston theorem allows to represent every planar graph by the "touching graph" of nonoverlapping circles. Both (related) representations via convex polytopes and by circle packings, can respect the group of automorphisms of the graph and its dual.
Simple inductive constructions. Another exceptional property of the class of planar graphs is that planar graphs can be constructed by simple inductive constructions. (In this respect they are similar to the class of trees, although the inductive constructions are not so simple as for trees.) This fails for most generalizations of planar graphs.
A related important property of planar graphs, maps, and triangulations (with labeled vertices) is that they can be enumerated very nicely. This is Tutte theory. (It has deep extensions to surfaces.)
It is often the case that results about planar graphs extend to other classes. As I mentioned, Tutte theory extends to triangulations of other surfaces. Another example is the fundamental Lipton-Tarjan separator theorem, which extends to all graphs with a forbidden minor.
The study of planar graphs have led to important graph theoretic concepts Another reason (of a different nature) why planar graphs are exceptional is that several important graph-theoretic concepts were disvovered by looking at planar graphs (or special planar graphs.) The notion of vertex coloring of graphs came (to the best of my knowledge) from the four color conjecture about planar graphs. Similarly, Hamiltonian paths and cycles were first studied for planar graphs.
Graphs on surfaces and other notions generalizing planarity. Considering the class of all graphs that can be embedded in a given surface is a natural and important extension of planarity. But, in fact, for various questions, graphs embeddable on surfaces may not be the right generalization of planar graphs.
David Eppstein mentioned another generalization via the colin de Verdier invariant. This describes a hiearachy of graphs where the next class after planar graphs are "linklessly embeddable graphs". Those are graphs that can be embedded in space without having two disjoint cyles geometrically link. As it turned out this is also a very robust notion and it leads to a beautiful class of graphs. (They all have at most 4v-10 edges where v is the number of vertices; The known case of Hadwiger's conjecture for graphs not having K_6 minor implies that they are all 5 colorable.) Further classes in this hierarchy are still very mysterious. Other extensions of planarity are: 3) (not literally) Graphs not having K_r as a minor; 4;5) (Both very problematic) As Joe mentioned, graphs of d-polytopes, and also graphs obtained from sphere packings in higher dimensions; 6) (not graphs) r-dimensional simplicial complexes that cannot be embedded in twice the dimension, 7) [A notion that I promoted over the years:] graphs (and skeleta) of d-polytope with vanishing second (toric) g-number, and many more.
Forbidden minors and coloring. As for the second and third items in the question. About coloring I am not sure if we should consider 4-coloring planar graphs and coloring graphs on other surfaces as very related phenomena. Regarding forbidden minors. Kuratowski's theorem on surfaces is a special case (and also an important step of the proof) of a much more general result (Wagner's conjecture proved by Robertson and Seymour) about any minor-closed class of graphs. This result can be regarded as extending Kuratowski theorem and also (and perhaps more importantly) extending Kruskal and Nash-Williams theorem on trees. Indeed Kuratoski's theorem is related nicely to the more general picture of topological obstruction to embeddibility. If you would like to propose a different (perhaps topological) understanding of the extension of Kuratowski's theorem for surfaces, then maybe you should start by the well-quasi ordering theorem for trees.
As Gil Kalai notes Steinitz's Theorem (A graph G is the vertex-edge graph of a convex 3-dimensional polyhedron, if and only if G is planar and 3-connected) is very powerful. It means that the combinatorial properties (hamiltonian circuits, matchings, etc.) of something 3-dimensional can be studied by looking at a special kind of graph in 2-dimensions. Nothing completely like this happens for higher dimensional convex polytopes. Similarly, we do not know how to study all "toroidal polytopes" using a special class of graphs on the torus.
What seems intriguing is the interface between capturing "geometric" properties of objects from combinatorial (graph theory) properties. For example, for some time I have been interested in which 3-dimensional convex polyhedra with all of its faces being triangles can be realized with isosceles triangles as faces. One way to attack this is to look at the planar 3-connected graphs graphs with at least 4 vertices all of whose faces are triangles (maximal planar graphs). It turns out one can color the edges of such a graph with two colors a and b so that each face gets two a edges and one b edge. Now one can ask the question of whether these colorings can be realized with congruent isosceles triangles, the colors a and b being used to code the two lengths of edges of the congruent isosceles triangles. It is not hard to show that some maximal planar graphs can not be realized with congruent isosceles triangles but telling exactly which graphs can be so realized seems much harder. Also, the graph theory made it possible for me to see many ways to realize such polyhedra geometrically, when such a realization is possible - in other words the same combinatorial type often can be realized with congruent isosceles triangles in many ways) other than ways which had lots of symmetry. If 4 divides the number of faces of a maximal planar graph one can color its edges with two colors a and b so that the number of a, a, b triangles and b, b, a triangles are equal. However, I have been unable to explicitly tell when such colorings translate into geometrical realizations. Loosely speaking, the combinatorial questions seem "easy" while the geometric ones seem "hard." I do not know if the maximal planar graphs with at least 4 vertices can be realized by convex polyhedra with isosceles triangles. (My guess is yes.) There are also interesting questions when one allows mixtures of isosceles triangles and equilateral triangles. (It is known there are exactly 8 convex polyhedra with only equilateral triangles for faces.) One of the sticking points of these questions is trying to tell when two triangles which share an edge in the graph flatten out and lie in one plane in the realization in 3-space.
My point though is that Steinitz's Theorem allowed progress on many questions and encouraged me to formulate new geometrical questions that would not have occurred to me otherwise.
Best,
Joe Malkevitch
Another (less known) characterization of planar graphs is Schnyder's theorem, which characterizes planar graphs according to order dimension. That is, a graph is planar if and only if its incidence poset has order dimension at most 3.
Also, I would be remiss to not mention the beautiful (strong) Hanani-Tutte theorem:
A graph is planar if and only if it has a drawing in the plane such that every two non-adjacent edges cross an even number of times.