Connected Component Labeling - Implementation
I'll first give you the code and then explain it a bit:
// direction vectors
const int dx[] = {+1, 0, -1, 0};
const int dy[] = {0, +1, 0, -1};
// matrix dimensions
int row_count;
int col_count;
// the input matrix
int m[MAX][MAX];
// the labels, 0 means unlabeled
int label[MAX][MAX];
void dfs(int x, int y, int current_label) {
if (x < 0 || x == row_count) return; // out of bounds
if (y < 0 || y == col_count) return; // out of bounds
if (label[x][y] || !m[x][y]) return; // already labeled or not marked with 1 in m
// mark the current cell
label[x][y] = current_label;
// recursively mark the neighbors
for (int direction = 0; direction < 4; ++direction)
dfs(x + dx[direction], y + dy[direction], current_label);
}
void find_components() {
int component = 0;
for (int i = 0; i < row_count; ++i)
for (int j = 0; j < col_count; ++j)
if (!label[i][j] && m[i][j]) dfs(i, j, ++component);
}
This is a common way of solving this problem.
The direction vectors are just a nice way to find the neighboring cells (in each of the four directions).
The dfs function performs a depth-first-search of the grid. That simply means it will visit all the cells reachable from the starting cell. Each cell will be marked with current_label
The find_components function goes through all the cells of the grid and starts a component labeling if it finds an unlabeled cell (marked with 1).
This can also be done iteratively using a stack. If you replace the stack with a queue, you obtain the bfs or breadth-first-search.
This can be solved with union find (although DFS, as shown in the other answer, is probably a bit simpler).
The basic idea behind this data structure is to repeatedly merge elements in the same component. This is done by representing each component as a tree (with nodes keeping track of their own parent, instead of the other way around), you can check whether 2 elements are in the same component by traversing to the root node and you can merge nodes by simply making the one root the parent of the other root.
A short code sample demonstrating this:
const int w = 5, h = 5;
int input[w][h] = {{1,0,0,0,1},
{1,1,0,1,1},
{0,1,0,0,1},
{1,1,1,1,0},
{0,0,0,1,0}};
int component[w*h];
void doUnion(int a, int b)
{
// get the root component of a and b, and set the one's parent to the other
while (component[a] != a)
a = component[a];
while (component[b] != b)
b = component[b];
component[b] = a;
}
void unionCoords(int x, int y, int x2, int y2)
{
if (y2 < h && x2 < w && input[x][y] && input[x2][y2])
doUnion(x*h + y, x2*h + y2);
}
int main()
{
for (int i = 0; i < w*h; i++)
component[i] = i;
for (int x = 0; x < w; x++)
for (int y = 0; y < h; y++)
{
unionCoords(x, y, x+1, y);
unionCoords(x, y, x, y+1);
}
// print the array
for (int x = 0; x < w; x++)
{
for (int y = 0; y < h; y++)
{
if (input[x][y] == 0)
{
cout << ' ';
continue;
}
int c = x*h + y;
while (component[c] != c) c = component[c];
cout << (char)('a'+c);
}
cout << "\n";
}
}
Live demo.
The above will show each group of ones using a different letter of the alphabet.
p i
pp ii
p i
pppp
p
It should be easy to modify this to get the components separately or get a list of elements corresponding to each component. One idea is to replace cout << (char)('a'+c);
above with componentMap[c].add(Point(x,y))
with componentMap
being a map<int, list<Point>>
- each entry in this map will then correspond to a component and give a list of points.
There are various optimisations to improve the efficiency of union find, the above is just a basic implementation.