Solving NP problems in (usually) Polynomial time?
This phenomenon extends beyond the traveling salesman problem, and even beyond NP, for there are even some undecidable problems with the feature that most instances can be solved very quickly.
There is an emerging subfield of complexity theory called generic-case complexity, which is concerned with decision problems in the generic case, the problem of solving most or nearly all instances of a given problem. This contrasts with the situtation in the classical complexity theory, where it is in effect the worst-case complexity that drives many complexity classifications. (And even for approximate solutions in NP-hard problems, the worst-case phenomenon is still present.)
Particularly interesting is the black-hole phenomenon, the phenomenon by which the difficulty of an infeasible or even undecidable problem is concentrated in a very tiny region, outside of which it is easy. (Here, tiny means tiny with respect to some natural measure, such as asymptotic density.) For example, many of the classical decision problems from combinatorial group theory, such as the word problem and the conjugacy problem are linear time solvable in the generic case. This phenomenon provides a negative answer to analogue of your question 1 for these problems. The fact that the problems are easily solved outside the black hole provides a negative answer to the analogue of question 2. And I think that the fact that these problems are actually undecidable as total problems suggests that this manner of solving almost all cases of a problem will not help us with P vs. NP, in your question 3.
For question 4, let me mention that an extreme version of the black-hole phenomenon is provided even by the classical halting problem. Of course, this is the most famous of undecidable problems. Nevertheless, Alexei Miasnikov and I proved that for one of the standard Turing machine models with a one-way infinite tape, there is an algorithm that solves the halting problem on a set of asymptotic measure one. That is, there is a set A of Turing machine programs, such that (1) almost every program is in A, in the sense of asymptotic density, (2) A is linear time decidable, and (3) the halting problem is linear time decidable for programs in A. This result appears in (J. D. Hamkins and A. Miasnikov, The halting problem is decidable on a set of asymptotic probability one, Notre Dame J. Formal Logic 47, 2006. http://arxiv.org/abs/math/0504351). Inside the black hole, the complement of A, of course, the problem is intractible. The proof, unfortunately, does not fully generalize to all the other implementations of Turing machines, since for other models one finds a black hole of some measure intermediate between 0 and 1, rather than measure 0.
Hmm, it's not an NP-complete problem, but hopefully it's still relevant to (4) and to a question I think is implicit in (2).
It's well-known that linear programming is in P, but in practice the simplex algorithm (which is exponential in the worst case) is usually the fastest method to solve LP problems, and it's virtually always competitive with the polynomial-time interior point methods. However, sampling uniformly from some space of problems is an unrealistic model, so average-case analysis isn't convincing. Spielman and Tang introduced the notion of "smoothed analysis" to remedy this, and showed that the simplex algorithm has polynomial smoothed time complexity. Glancing at Spielman's page, it looks like this has been applied to the knapsack problem, although the link to the paper is broken.
Re (1): What do you mean by "small?" :) I suspect that the heuristics would fail to help much if you took a random instance of, say, 3SAT with the right number of clauses and variables, and reduced this to an instance of TSP. But you'd get some polynomial-size blowup, so...
Re (3): It's correct that the existence of such an algorithm, on its own, would imply neither P = NP nor P != NP. But for "practical considerations" it might be hugely important, depending on what the constants were, and it would certainly spur investigation into whether there was a worst-case polynomial algorithm along the same lines.
ETA: Actually, here's a construction for an NP-complete problem and an algorithm which unconditionally runs in average-case polynomial time. The problem is the union of a language in P and an NP-complete language (solvable in exp(n) time), such that the number of instances of size n of the first problem is something like exp(n^3), while the number of instances of the second problem is exp(n).
So the interesting thing about (3) is what the existence of an average-case polynomial algorithm for every problem in NP would tell us. And there the answer is still "nothing," but it's conceivable that we could prove P = NP under this assumption.
By the way, Impagliazzo talks about some of these issues (although not all of them; the paper's 15 years old and predates Spielman-Tang, for instance) in perhaps the greatest survey paper ever written. I highly recommend it.
Another approach to NP-complete problems is via what has come to be called parameterized complexity.
The idea is a problem may be "hard" when expressed in terms of the input size but one can reformulate the problem with a parameter k so that the problem is polynomial in terms of the input size but exponential in terms of the parameter k. In some situations this allows one to find algorithms that are "tractable" for small values of k.
http://en.wikipedia.org/wiki/Parameterized_complexity
http://fpt.wikidot.com/