Extracting the dimensions of a rectangle
So here's what I came up with - it's a bit labour intensive but it does get us to the right answer eventually. I will be directly using the connected components output that you have shown with the last image.
Use morphological image skeletonization so that we get the skeleton of the blob. This way, it'll give us the most minimal contour representation such that we get a one pixel wide boundary that goes through the middle of each thick edge. You can achieve this through Scikit-image's
skeletonize
method.Use the Hough Transform which is a line detection method on the skeletonized image. In summary, it parameterizes lines in the polar domain and the output would be a set of
rho
andtheta
that tell us which lines are detected in the skeletonized image. We can use OpenCV'scv2.HoughLines
for that. It is very important that you do this on the skeletonized image or we will have a lot of candidate lines parallel to where the true delineation of the bounding box is and you wouldn't be able to distinguish between them.Take each pair of lines and find their point of intersection. We'd expect that with all pairs of lines, there will be 4 predominant clusters of intersections that give us the corner of each rectangle.
Because of the noise in the contours, we may get more than four points of intersection. We can use the convex hull to finally get 4 points of intersection for the rectangle. In summary, the convex hull algorithm operates on a list of points where it defines a subset of points that can minimally encompass the list of points. We can use
cv2.convexHull
.Finally, due to the quantization of the Hough Transform, there may be multiple points that are within the vicinity of each corner. Therefore, apply K-Means clustering to find 4 clusters of points and thus find their centroids. We can use
cv2.kmeans
for that.Once we find the centroids, we can simply iterate through each pair of points in a cyclical fashion to finally find the distances to each corner and thus find the distances you care about.
Let's step through each point one by one:
Step #1 - Morphological Image Skeletonization
Using Scikit-image's skeletonize
, we can skeletonize the connected components image that you've shown above. Take note that you need to convert the image to binary before you proceed. Once you call the method, we'll need to convert back to unsigned 8-bit integer after for the rest of the process. I've downloaded the image above and saved it locally. We can run the skeletonize
method after:
from skimage.morphology import skeletonize
im = cv2.imread('K7ELI.png', 0)
out = skeletonize(im > 0)
# Convert to uint8
out = 255*(out.astype(np.uint8))
We get this image:
Step #2 - Use the Hough Transform
Using the Hough Transform, we can detect the most prominent lines in this image:
lines = cv2.HoughLines(out,1,np.pi/180,60)
Here we specify the search space so that we look for lines where the bin size has a length of 1 and the angles have a bin of 1 degree, or pi / 180
radians. In summary, the Hough Transform looks at each edge point and iterates through a range of angles theta
that are subtended from the origin to each edge point and calculate the corresponding value of rho
respecting the bin size. This pair gets logged into a 2D histogram and we register a vote. We threshold this 2D histogram so that any bins beyond a certain value are line candidates. In the above line of code, set the threshold for the bin counts to be 60.
This code is optional, but I wanted to show you what the visualized lines look like:
img_colour = np.dstack([im, im, im])
lines = cv2.HoughLines(edges,1,np.pi/180,60)
for rho,theta in lines[:,0]:
a = np.cos(theta)
b = np.sin(theta)
x0 = a*rho
y0 = b*rho
x1 = int(x0 + 1000*(-b))
y1 = int(y0 + 1000*(a))
x2 = int(x0 - 1000*(-b))
y2 = int(y0 - 1000*(a))
cv2.line(img_colour,(x1,y1),(x2,y2),(0,0,255),2)
This code I pulled from the following tutorial. It draws the Hough Transform detected lines in the image as red. I get the following image:
As we can see, there are four points of intersection in the image. It's our job next to find these points of intersection.
Step #3 - Find points of intersection
In the Hough Transform, we can relate the length of the line from the origin to a point (x, y)
in the image subtended at the angle theta
by:
rho = x*cos(theta) + y*sin(theta)
We can also form the equation of the line y = m*x + c
in Cartesian form. We can transform between the two by dividing both sides of the rho
equation by sin(theta)
then moving the relevant terms to each side:
Therefore, we should cycle through all unique pairs of lines and using the above equation, we can find their point of intersection by setting their Cartesian forms to be equal to each other. This I won't derive for you for the interest of saving space, but simply set two lines in Cartesian form equal to each other and solve for the x
coordinate of intersection. Once that's done, substitute this point into any of the two lines to find the y
coordinate. We should obviously skip intersection points that go outside of the image in the case of two nearly parallel lines or if we choose two pairs of lines that go in the same direction and don't intersect.
pts = []
for i in range(lines.shape[0]):
(rho1, theta1) = lines[i,0]
m1 = -1/np.tan(theta1)
c1 = rho1 / np.sin(theta1)
for j in range(i+1,lines.shape[0]):
(rho2, theta2) = lines[j,0]
m2 = -1 / np.tan(theta2)
c2 = rho2 / np.sin(theta2)
if np.abs(m1 - m2) <= 1e-8:
continue
x = (c2 - c1) / (m1 - m2)
y = m1*x + c1
if 0 <= x < img.shape[1] and 0 <= y < img.shape[0]:
pts.append((int(x), int(y)))
pts
is a list of tuples such that we add all points of intersection that are within the image that are not out of bounds.
Step #4 - Use the Convex Hull
We can use this list of tuples and use the convex hull so that we find a list of points that define the outer perimeter of the rectangle. Take note that the order of points defining the rectangle are counter-clockwise. This doesn't matter for this step but it will matter later:
pts = np.array(pts)
pts = pts[:,None] # We need to convert to a 3D numpy array with a singleton 2nd dimension
hull = cv2.convexHull(pts)
hull
contains a 3D NumPy array that is a subset of the original points of intersection that create the outer boundary of the image. We can use these points to draw where these are located in the image for illustration
out2 = np.dstack([im, im, im])
for pt in hull[:,0]:
cv2.circle(out2, tuple(pt), 2, (0, 255, 0), 2)
I've taken the original image and drawn the corner points in green. We get this image:
Step #5 - Apply K-Means clustering
As you can see in the image above, there are multiple points that map to each corner. It would be good if we can consolidate the multiple points at each corner into a single point. One way is to average all of the points in each corner and the easiest way to do that out of box is to use K-Means clustering. We need the centroids to thus give us the final corner points of the rectangle. We need to make sure we specify 4 clusters to find.
From the K-Means clustering tutorial from the OpenCV docs, we can use this code:
# Define criteria = ( type, max_iter = 10 , epsilon = 1.0 )
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
# Set flags (Just to avoid line break in the code)
flags = cv2.KMEANS_RANDOM_CENTERS
# Apply KMeans
# The convex hull points need to be float32
z = hull.copy().astype(np.float32)
compactness,labels,centers = cv2.kmeans(z,4,None,criteria,10,flags)
The first parameter is the convex hull of points that need to be in float32
as required by the algorithm. The second parameter specifies the number of clusters we want to search for, so 4 in our case. The third parameter you can skip.
It is a placeholder for the best cluster ID each point is assigned to but we don't need to use it. criteria
are the K-Means parameters used for the mechanics of the algorithm, and the fifth parameter tells us how many attempts we should run to find the best clusters. We choose 10, meaning that we run K-Means 10 times and choose the clustering configuration that has the least amount of error. The error is stored in the compactness
variable that is output from the algorithm. Finally, the last variable are optional flags and we set this so that the initial centroids of the algorithm are simply selected at random from the points.
labels
provides which cluster ID is assigned to each point and centers
is the key variable we need which thus returns:
array([[338.5 , 152.5 ],
[302.6667, 368.6667],
[139. , 340. ],
[178.5 , 127. ]], dtype=float32)
These are the four corner points of the rectangle. We can see where these line up by drawing them directly on the original image, and we also get this image:
out3 = np.dstack([im, im, im])
for pt in centers:
cv2.circle(out3, tuple(pt), 2, (0, 255, 0), 2)
Step #6 - Measure the lengths now
Finally, we can cycle through each pair of lines and find the corresponding dimensions. Take note that because K-Means has the centroids in random order due to the random nature of the algorithm, we can run the convex hull on these centroids to ensure that the order is circular.
centers = cv2.convexHull(centers)[:,0]
for (i, j) in zip(range(4), [1, 2, 3, 0]):
length = np.sqrt(np.sum((centers[i] - centers[j])**2.0))
print('Length of side {}: {}'.format(i+1, length))
We thus get:
Length of side 1: 219.11654663085938
Length of side 2: 166.1582489013672
Length of side 3: 216.63160705566406
Length of side 4: 162.019287109375
If you want perspective to see how the bounding box lines up, let's actually draw these lines on the image that are defined at these centres:
out4 = np.dstack([im, im, im])
for (i, j) in zip(range(4), [1, 2, 3, 0]):
cv2.line(out4, tuple(centers[i]), tuple(centers[j]), (0, 0, 255), 2)
We get:
To see where this lines up with the original image, let's just repeat the code above but drawing the lines on the original image. I downloaded a copy of the original image to do so:
out5 = cv2.imread('no8BP.png') # Note - grayscale image read in as colour
for (i, j) in zip(range(4), [1, 2, 3, 0]):
cv2.line(out5, tuple(centers[i]), tuple(centers[j]), (0, 0, 255), 2)
For the sake of completeness, here's the entire code from start to finish without all of the debug outputs - we go from reading the image to drawing the lines in the original image with printing the lengths of each side in the detected rectangle.
from skimage.morphology import skeletonize
import cv2
import numpy as np
# Step #1 - Skeletonize
im = cv2.imread('K7ELI.png', 0)
out = skeletonize(im > 0)
# Convert to uint8
out = 255*(out.astype(np.uint8))
# Step #2 - Hough Transform
lines = cv2.HoughLines(out,1,np.pi/180,60)
# Step #3 - Find points of intersection
pts = []
for i in range(lines.shape[0]):
(rho1, theta1) = lines[i,0]
m1 = -1/np.tan(theta1)
c1 = rho1 / np.sin(theta1)
for j in range(i+1,lines.shape[0]):
(rho2, theta2) = lines[j,0]
m2 = -1 / np.tan(theta2)
c2 = rho2 / np.sin(theta2)
if np.abs(m1 - m2) <= 1e-8:
continue
x = (c2 - c1) / (m1 - m2)
y = m1*x + c1
if 0 <= x < img.shape[1] and 0 <= y < img.shape[0]:
pts.append((int(x), int(y)))
# Step #4 - Find convex hull
pts = np.array(pts)
pts = pts[:,None] # We need to convert to a 3D numpy array with a singleton 2nd dimension
hull = cv2.convexHull(pts)
# Step #5 - K-Means clustering
# Define criteria = ( type, max_iter = 10 , epsilon = 1.0 )
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
# Set flags (Just to avoid line break in the code)
flags = cv2.KMEANS_RANDOM_CENTERS
# Apply KMeans
# The convex hull points need to be float32
z = hull.copy().astype(np.float32)
compactness,labels,centers = cv2.kmeans(z,4,None,criteria,10,flags)
# Step #6 - Find the lengths of each side
centers = cv2.convexHull(centers)[:,0]
for (i, j) in zip(range(4), [1, 2, 3, 0]):
length = np.sqrt(np.sum((centers[i] - centers[j])**2.0))
print('Length of side {}: {}'.format(i+1, length))
# Draw the sides of each rectangle in the original image
out5 = cv2.imread('no8BP.png') # Note - grayscale image read in as colour
for (i, j) in zip(range(4), [1, 2, 3, 0]):
cv2.line(out5, tuple(centers[i]), tuple(centers[j]), (0, 0, 255), 2)
# Show the image
cv2.imshow('Output', out5); cv2.waitKey(0); cv2.destroyAllWindows()
It's not perfect, but this simple approach should be a good starting point for you:
import cv2, math
import numpy as np
img = cv2.imread(R'D:\dev\projects\stackoverflow\dimensions_of_rectangle\img1.png')
print(img.shape)
img_moments=cv2.moments(img[:,:,0]) #use only one channel here (cv2.moments operates only on single channels images)
print(img_moments)
# print(dir(img_moments))
# calculate centroid (center of mass of image)
x = img_moments['m10'] / img_moments['m00']
y = img_moments['m01'] / img_moments['m00']
# calculate orientation of image intensity (it corresponds to the image intensity axis)
u00 = img_moments['m00']
u20 = img_moments['m20'] - x*img_moments['m10']
u02 = img_moments['m02'] - y*img_moments['m01']
u11 = img_moments['m11'] - x*img_moments['m01']
u20_prim = u20/u00
u02_prim = u02/u00
u11_prim = u11/u00
angle = 0.5 * math.atan(2*u11_prim / (u20_prim - u02_prim))
print('The image should be rotated by: ', math.degrees(angle) / 2.0, ' degrees')
cols,rows = img.shape[:2]
# rotate the image by half of this angle
rotation_matrix = cv2.getRotationMatrix2D((cols/2,rows/2), math.degrees(angle / 2.0), 1)
img_rotated = cv2.warpAffine(img, rotation_matrix ,(cols,rows))
# print(img_rotated.shape, img_rotated.dtype)
cv2.imwrite(R'D:\dev\projects\stackoverflow\dimensions_of_rectangle\img1_rotated.png', img_rotated)
img_rotated_clone = np.copy(img_rotated)
img_rotated_clone2 = np.copy(img_rotated)
# first method - just calculate bounding rect
bounding_rect = cv2.boundingRect(img_rotated[:, :, 0])
cv2.rectangle(img_rotated_clone, (bounding_rect[0], bounding_rect[1]),
(bounding_rect[0] + bounding_rect[2], bounding_rect[1] + bounding_rect[3]), (255,0,0), 2)
# second method - find columns and rows with biggest sums
def nlargest_cols(a, n):
col_sums = [(np.sum(col), idx) for idx, col in enumerate(a.T)]
return sorted(col_sums, key=lambda a: a[0])[-n:]
def nlargest_rows(a, n):
col_sums = [(np.sum(col), idx) for idx, col in enumerate(a[:,])]
return sorted(col_sums, key=lambda a: a[0])[-n:]
top15_cols_indices = nlargest_cols(img_rotated[:,:,0], 15)
top15_rows_indices = nlargest_rows(img_rotated[:,:,0], 15)
for a in top15_cols_indices:
cv2.line(img_rotated_clone, (a[1], 0), (a[1], rows), (0, 255, 0), 1)
for a in top15_rows_indices:
cv2.line(img_rotated_clone, (0, a[1]), (cols, a[1]), (0, 0, 255), 1)
cv2.imwrite(R'D:\dev\projects\stackoverflow\dimensions_of_rectangle\img2.png', img_rotated_clone)
Of course you need to adjust paths. img1.png is the second image from your question, img1_rotated is the result of rotating the image:
and img2 is the final output:
The blue rectangle is method1 (just a bounding rect) and green and red lines (15 red and 15 green - all 1 pixel wide) is the second method.
The algorithm is quite simple:
- Calculate image moments to determine the main axis of image intensity (I don't know how to describe it well - check the wiki page https://en.wikipedia.org/wiki/Image_moment#Examples_2 ). Basically this is the angle by which you have to rotate the image to make white pixels distributed horizontally or vertically.
- Once you know the angle - rotate the image (and save the result).
- Method 1 - calculate and draw rotated rect of all pixels.
- Method 2 - find 15 rows and 15 cols with biggest sums (== biggest count of white pixels) and draw horizontal/vertical lines in those rows/columns. Note that the number 15 was selected by trial and error, but it should be easy to select 2 columns (and rows) with big sum which are not close to each other. Those columns/rows are good candidates to be rectangle boundaries.
Hope it is what you were looking for, let me know of you will have any questions.