Trajectory Clustering: Which Clustering Method?
Every clustering algorithm needs a metric. You need to define distance between your samples. In your case simple Euclidean distance is not a good idea, especially if the trajectories can have different lengths.
If you define a metric, than you can use any clustering algorithm that allows for custom metric. Probably you do not know the correct number of clusters beforehand, then hierarchical clustering is a good option. K-means doesn't allow for custom metric, but there are modifications of K-means that do (like K-medoids)
The hard part is defining distance between two trajectories (time series). Common approach is DTW (Dynamic Time Warping). To improve performance you can approximate your trajectory by smaller amount of points (many algorithms for that).
It might be a little late but I am also working on the same problem. I suggest you take a look at TRACLUS, an algorithm created by Jae-Gil Lee, Jiawei Han and Kyu-Young Wang, published on SIGMOD’07. http://web.engr.illinois.edu/~hanj/pdf/sigmod07_jglee.pdf
This is so far the best approach I have seen for clustering trajectories because:
- Can discover common sub-trajectories.
- Focuses on Segments instead of points (so it filters out noise-outliers).
- It works over trajectories of different length.
Basically is a 2 phase approach:
Phase one - Partition: Divide trajectories into segments, this is done using MDL Optimization with complexity of O(n) where n is the numbers of points in a given trajectory. Here the input is a set of trajectories and output is a set of segments.
- Complexity: O(n) where n is number of points on a trajectory
- Input: Set of trajectories.
- Output: Set D of segments
Phase two - Group: This phase discovers the clusters using some version of density-based clustering like in DBSCAN. Input in this phase is the set of segments obtained from phase one and some parameters of what constitutes a neighborhood and the minimum amount of lines that can constitute a cluster. Output is a set of clusters. Clustering is done over segments. They define their own distance measure made of 3 components: Parallel distance, perpendicular distance and angular distance. This phase has a complexity of O(n log n) where n is the number of segments.
- Complexity: O(n log n) where n is number of segments on set D
- Input: Set D of segments, parameter E that sets neighborhood treshold and parameter MinLns that is the minimun number of lines.
- Output: Set C of Cluster, that is a Cluster of segments (trajectories clustered).
Finally they calculate a for each cluster a representative trajectory, which is nothing else that a discovered common sub-trajectory in each cluster.
They have pretty cool examples and the paper is very well explained. Once again this is not my algorithm, so don't forget to cite them if you are doing research.
PS: I made some slides based on their work, just for educational purposes: http://www.slideshare.net/ivansanchez1988/trajectory-clustering-traclus-algorithm
Neither will work. Because what is a proper mean here?
Have a look at distance based clustering methods, such as hierarchical clustering (for small data sets, but you probably don't have thousands of trajectories) and DBSCAN.
Then you only need to choose an appropriate distance function that allows e.g. differences in time and spatial resolution of trajectories.
Distance functions such as dynamic time warping (DTW) distance can accomodate this.