Proof of correctness: Algorithm for diameter of a tree in graph theory
Let's call the endpoint found by the first BFS x. The crucial step is proving that the x found in this first step always "works" -- that is, that it is always at one end of some longest path. (Note that in general there can be more than one equally-longest path.) If we can establish this, it's straightforward to see that a BFS rooted at x will find some node as far as possible from x, which must therefore be an overall longest path.
Hint: Suppose (to the contrary) that there is a longer path between two vertices u and v, neither of which is x.
Observe that, on the unique path between u and v, there must be some highest (closest to the root) vertex h. There are two possibilities: either h is on the path from the root of the BFS to x, or it is not. Show a contradiction by showing that in both cases, the u-v path can be made at least as long by replacing some path segment in it with a path to x.
[EDIT] Actually, it may not be necessary to treat the 2 cases separately after all. But I often find it easier to break a configuration into several (or even many) cases, and treat each one separately. Here, the case where h is on the path from the BFS root to x is easier to handle, and gives a clue for the other case.
[EDIT 2] Coming back to this later, it now seems to me that the two cases that need to be considered are (i) the u-v path intersects the path from the root to x (at some vertex y, not necessarily at the u-v path's highest point h); and (ii) it doesn't. We still need h to prove each case.
I'm going to work out j_random_hacker's hint. Let s, t
be a maximally distant pair. Let u
be the arbitrary vertex. We have a schematic like
u
|
|
|
x
/ \
/ \
/ \
s t ,
where x
is the junction of s, t, u
(i.e. the unique vertex that lies on each of the three paths between these vertices).
Suppose that v
is a vertex maximally distant from u
. If the schematic now looks like
u
|
|
|
x v
/ \ /
/ *
/ \
s t ,
then
d(s, t) = d(s, x) + d(x, t) <= d(s, x) + d(x, v) = d(s, v),
where the inequality holds because d(u, t) = d(u, x) + d(x, t)
and d(u, v) = d(u, x) + d(x, v)
. There is a symmetric case where v
attaches between s
and x
instead of between x
and t
.
The other case looks like
u
|
*---v
|
x
/ \
/ \
/ \
s t .
Now,
d(u, s) <= d(u, v) <= d(u, x) + d(x, v)
d(u, t) <= d(u, v) <= d(u, x) + d(x, v)
d(s, t) = d(s, x) + d(x, t)
= d(u, s) + d(u, t) - 2 d(u, x)
<= 2 d(x, v)
2 d(s, t) <= d(s, t) + 2 d(x, v)
= d(s, x) + d(x, v) + d(v, x) + d(x, t)
= d(v, s) + d(v, t),
so max(d(v, s), d(v, t)) >= d(s, t)
by an averaging argument, and v
belongs to a maximally distant pair.