Find clusters of bacteria
Detecting features is easily done with the scipy ndimage measurements module. It has the added advantage of speed, if you go this way.
import numpy as np
from scipy.ndimage.measurements import label, find_objects
q = np.genfromtxt('bacteria.txt', dtype='S1', comments=':', delimiter=1)
arr = (q == b'#') # convert to boolean mask because ' ' evaluates to True
labelled, num_features = label(arr)
def count_edge_objects(labelled):
hulls = find_objects(labelled)
nbr_edgeobjects = 0
for rowslice, colslice in hulls:
if (rowslice.start == 0 or rowslice.stop == labelled.shape[0] or
colslice.start == 0 or colslice.stop == labelled.shape[1]):
nbr_edgeobjects += 1
return nbr_edgeobjects
print('{} objects'.format(num_features - count_edge_objects(labelled)))
# output:
# 4 objects
In the dataset you've shown there are 2 objects near the edge: the one at the top and the one at the bottom. Remark that I'm currently assuming the dataset has an equal amount of characters on each line (if not, check out the missing_values
option of np.genfromtxt
)
The problem you had is that from the moment you formed two clusters, you couldn't join them. Even if eventually the two clusters were meant to be joined by the addition of intermediate nodes.
This can be solved by an application of a union-find data structure. An unoptimized python version is:
s = """\
### \
##### \
####### \
####### \
###### \
###### ## \
#### ##### \
## ###### ####\
# ###### #### \
### ########## ##### \
####### #### ## ###### \
######### ## # ##### \
# #### ### ### \
##### #### # ## ## \
##### ###### # \
###### ######## \
#### ######## \
####### \
####### \
"""
representatives = {i: i for i, c in enumerate(s) if c == '#'}
nrows, ncols = 19, 44
def neighbours(idx):
i, j = divmod(idx, ncols)
if i > 0: yield idx - ncols
if i < nrows - 1: yield idx + ncols
if j > 0: yield idx - 1
if j < ncols - 1: yield idx + 1
def representative(a):
while representatives[a] != a: a = representatives[a]
return a
def join(a, b):
repr_a, repr_b = representative(a), representative(b)
if repr_a != repr_b: representatives[repr_a] = repr_b
for idx in representatives:
for n in neighbours(idx):
if s[n] == '#': join(idx, n)
cluster_count = len(set(map(representative, representatives)))
Result:
6
You could also have created also a graph and used depth first search to find the connected components. The advantage of the above method is that it's incremental and you can update easily the clusters with the addition of new points.