Calculating the shortest distance between n number of points on a scatter graph

This really isn't a shortest-path problem per se, since you're not just finding the shortest path between a pair of vertices, but it is a graph theory problem. It is most emphatically not solvable by "common sense".

If you don't have to connect the points in a chain, (e.g. you are allowed to connect A,B,C,D by connecting A to each of the other three) then this is a minimum spanning tree problem, and can be solved by Prim's Algorithm (http://en.wikipedia.org/wiki/Prim%27s_algorithm) or Kruskal's Algorithm (http://en.wikipedia.org/wiki/Kruskal%27s_algorithm) efficiently. If not, this is a special case of the Travelling Salesman problem (http://en.wikipedia.org/wiki/Travelling_salesman_problem) which is computationally difficult.

Unless you know something special about how the points are chosen, you probably aren't going to get out of computing most of the $\binom{n}{2}$ distances between them.


Your Math Teacher's response is unsatisfactory, because in research, "common sense" will not lead you to some of the more unintuitive results. Your problem is actually quite meaningful: it is a version of the Shortest Path Problem.

Your problem is analogous to the Shortest Path Problem because if you have $n$ points on a Cartesian Plane and you want to find the shortest path to connect all of them, then that is analogous to having a complete graph with $n$ vertices, and wanting to choose the set of edges such that the graph is connected, but the sum of the edge weights is minimized. In this problem, the edge weight is just the distance between two points. There are many interesting solutions to the Shortest Path Problem - you should take a look at them.

So let's take a look at the "common sense" solution: the simplest intuitive algorithmic solution would be to start at any given point $(x_1,y_1)$, find the nearest $(x_b,y_b)$, connect those with a line, and then connect $(x_b,y_b)$ to its nearest neighbour, etc., until you are done. Here is a counterexample in which this algorithm fails. On the upper diagram, we start at $a$, on the lower diagram, we start at $d$:
enter image description here

The algorithm fails in this instance because the selection of the starting point matters. It is clearly not arbitrary - in this example, if we start at $a$ rather than at $d$, we get different 'shortest path' lengths.

So what's a working algorithm in terms of Euclidian Geometry that will always solve this problem? I'm not so sure, especially given considerations of time-complexity for large $n$. However, any algorithm capable of solving the Shortest-Path Problem should be able to do it.

Tags:

Algorithms