Does it make sense to choose a longer password than the output of a hash?
If you consider a set of potential passwords of size P, with a hash function with N possible output values, then the probability that there exists at least one collision in this set is quite low when P is less than the square root of N, and quite high beyond. See the birthday problem. As the Wikipedia page says, that probability is approximately equal to 1-e-P2/2·N.
With figures: if you use passwords consisting of 10 characters (uppercase letters, lowercase letters and digits), then P = 6210 = 839299365868340224; with a 128-bit hash function, N = 2128. The formula above means that there may exist at least one colliding pair among all these passwords with probability close to 0.1%. On the other hand, if you add one character to your passwords (i.e. the potential passwords have length 11, not 10), then the probability that there exists at least one collision rises to 98.1%.
Now all of this is about the probability of existence of a collision; not probability of hitting a collision.
Collisions are not relevant to password hashing. Password hashing works on preimage resistance: given the hash, how hard or easy it is to guess a corresponding password. Note that I said "a", not "the": for the attacker, it does not matter whether he finds the same password as the user did choose; he just wants a password which grants access, and any password which matches the hash output will do the trick.
Note that while MD5 is "broken" for collisions, it is not so for preimages (well, for preimages it is "slightly dented", but not significantly for the purposes of this question).
There are two ways to break preimage resistance:
Guess the password. This means trying all potential passwords until a/the right one is found. If there are P possible passwords with uniform probability, then this has cost at most P/2 because the user did choose one of the passwords, and the attacker will need, on average, to try half of them before hitting that exact password.
Get lucky. Try passwords (random, consecutive... it does not matter) until a matching hash value is found. This has average cost N/2.
The password hashing strength will be no more than the lower of the two. In that sense, using a set of possible passwords which is larger than the output of the hash function (e.g. P > 2128 for a 128-bit hash function) does not bring additional security, because beyond that point, the "get lucky" attack becomes a better bargain for the attacker than the "guess the password" attack, and the "get lucky" attack does not depend on how the user actually chooses his password. Please note that I say "size of the set of passwords" and NOT "password length". All the analysis above is based on how many password values could have been chosen, with uniform probability. If you use only 200-letter passwords, but you may only choose ten thousands of them (e.g. because each "password" is a sentence from your favourite book, and the attacker knows that book), then the size of the set of potential passwords is 10000, not 62200.
In practice, P is bounded by the user's brain (the user has to remember the password) and is invariably lower than N. A "very strong" password is a password from a selection process which uses a P of 280 or more; that's sufficient for security, and yet far below the 2128 of MD5 or the 2192 of bcrypt. But it seems unrealistic to expect average users to choose very strong passwords on average. Instead, we must cope with weak passwords, with P around 230 or so (meaning: try one billion possible passwords, and you'll have broken the passwords of half your users). Mitigation measures are then slow hashing (make each guess expensive) and salts (don't allow the attacker to parallel-attack several passwords at reduced cost). See this answer.
Hashing reduces an infinite space, i.e. possible data inputs, to a finite space, i.e. possible hashes. Therefore there will always be collisions.
Technically, if you restrict your input set to a size less than 2h, where h is the size of your hash output in bits, then you decrease your chance of a collision. In fact, as len(m) tends to h, the probability of a collision when exhaustively hashing all values in the set M tends to 1.
That being said, given a large enough value of h, exhaustively hashing all M is highly impractical - for SHA256 you'd need to perform 2255 operations before hitting a 50% chance of collision with a pre-chosen value.
The important thing to remember is that, for a string longer than h, your security is never less than a h-bit message, assuming there are no specific vulnerabilities in the hash that cause multi-block messages to be less secure.
So to be blunt: yes, a message shorter than your hash output length does, statistically, have a lower chance of collision, but the number of operations required to find that collision scales with length.