Measuring the percentage of a given color in an area?
Update to account for multiple box sets on a single page
In order to extract the box information from a set of boxes, two tweaks were needed. First, I needed to explicitly state an ImageSize
to get the desired resolution. Second, I needed to adjust the Binarize
critereon so that the boxes, which are now surrounded by grey borders, could be identified by MorphologicalComponents
.
i = Image[
First@Import[
"https://dl.dropboxusercontent.com/u/8003134/grid2.pdf",
"Pages"], ImageSize -> 2500];
boxes = ComponentMeasurements[
MorphologicalComponents[
Binarize[i,
And[#[[1]] < 0.75, #[[2]] < 0.75, #[[3]] <
0.75] &]], {"BoundingBox", "Centroid"}];
centers = (Range@Length@boxes /. boxes)[[All, 2]];
dims = (Range@Length@boxes /. boxes)[[All, 1]];
intensity =
1 - ImageMeasurements[
ImageTake[i,
ImageDimensions[i][[2]] - {#[[2, 2]], #[[1, 2]]}, {#[[1,
1]], #[[2, 1]]}], "MeanIntensity"] & /@ dims;
intensity = Rescale[intensity, {Min[intensity], Max[intensity]}];
Show[i, Epilog ->
Table[Text[NumberForm[intensity[[i]], {1, 2}], centers[[i]]], {i,
Length@boxes}]]
The resulting image is too big to display. Instead, here is a cropped image showing the complete upper-left hand set of boxes and bits of the neighboring boxes to demonstrate that the code is working.
Some additional tweaking of the code will be necessary if you want to get precision better than about 5%. My first suggestion would be to raise the image size even further and fine tune the Binarize criteria.
Note that replacing First
with Last
in the definition of i
should get you the second page.
Old approach
This works for a pdf containing a single set of boxes
Here's another approach that makes use of ComponentMeasurements
i = Image[
First@Import["https://dl.dropboxusercontent.com/u/8003134/img.pdf",
"Pages"]];
boxes = ComponentMeasurements[
MorphologicalComponents[
Binarize[i, #[[1]] == #[[2]] == #[[3]] == 0 &]], {"BoundingBox",
"Centroid"}];
centers = (Range@Length@boxes /. boxes)[[All, 2]];
dims = (Range@Length@boxes /. boxes)[[All, 1]];
intensity =
1 - ImageMeasurements[
ImageTake[i,
ImageDimensions[i][[2]] - {#[[2, 2]], #[[1, 2]]}, {#[[1,
1]], #[[2, 1]]}], "MeanIntensity"] & /@ dims;
intensity = Rescale[intensity, {Min[intensity], Max[intensity]}];
Show[i, Epilog ->
Table[Text[intensity[[i]], centers[[i]]], {i, Length@boxes}]]
First, I find the squares in the image with MorphologicalComponents
. I use the dimensions extracted from ComponentMeasurements
to ImageTake
each square. To determine the amount of green in the square, I use ImageMeasurements
which won't give a 0 for the completely empty box since I've made no effort to remove the bounding black line. I resolve this issue by rescaling the intensity, assuming that there is a completely filled and completely empty box. Finally, since there are already answers showing how one extracts the values into a table, I use the "Centroid" from ComponentMeasurements
to Show
the values over the original image.
Since your image elements are of uniform size on a rectangular grid you can use ImageParition
to split them apart, and then a combination of ImageCrop
and ImageTake
to get just the internal areas. From there you can use ImageData
to access the raster, and Tally
to count the pixels by color.
Example:
count = Tally[Join @@ ImageData @ ImageCrop @ #] &;
i = Import["http://i.imgur.com/62buGws.png"];
cells = ImagePartition[i, {48, 48}, {50, 50}];
Map[count, cells, {2}][[2, 3]]
{{{0.188235, 0.188235, 0.188235}, 2}, {{0., 0., 0.}, 176}, {{1., 1., 1.},
616}, {{0.74902, 1., 0.74902}, 44}, {{0., 1., 0.}, 1276}, {{0.0470588, 0.0627451, 0.0470588}, 2}}
[[2, 3]]
is to look at the square on the second row, third column. (The first row is blank.)
These are the tallies of the RGB values for each cell, including the black border. {{0.74902, 1., 0.74902}, 44}
is the row of light green pixels on the margin between green and white. You'll have to decide how you want to count these. You can then process the values accordingly. This is a general approach since it counts all colors in the cell; you can find specific ratios, etc., using that data.
To count green pixels, defined as RGB values where G equals 1, weighted by saturation, we could use:
green = Tr @ Cases[#, {{x_, o_ /; o == 1, x_}, c_} :> (1 - x) c] &;
The Condition
o_ /; o == 1
is used rather than the simpler 1.
to make the match more robust.
Now:
dat = Map[Composition[green, count], Rest @ cells, {2}];
dat // First
{0., 1936., 1287.04, 1936., 1936., 1440.96, 1243.04, 1452., 902.086, 956.957, 956.957, 924., 968.}
If we know a priori the inside dimensions of each cell to be 44*44 we can find the ratio of green to white with:
dat / 44^2 // MatrixForm
ybeltukov uses a similar method that is more optimized for this particular operation, but I feel that using the image processing functions such as ImagePartition
and ImageCrop
offer additional features that you may find valuable in other applications, for example if each cell were to have a different size.
Edit: actually ybeltukov's code is not presently accurate because it does not use e.g. ImageCrop
and it is not counting only the pixels inside the box.
Note that your comment about the file being a PDF instead of a PNG changes the question, in that there ways to process a PDF that you cannot do to PNG.
First import the "Pages"
of the PDF and stored the first (and only) one in i
:
i = First @ Import["https://dl.dropboxusercontent.com/u/8003134/img.pdf", "Pages"];
Now the image is in fact scalable Graphics
and the black squares and green rectangles are stored as JoinedCurve
and FilledCurve
objects:
Cases[i, _JoinedCurve, Infinity, 1]
Cases[i, _FilledCurve, Infinity, 1]
(*
{JoinedCurve[{{{0, 2, 0}, {0, 1, 0}, {0, 1, 0}}},
{{{2., 647.}, {47., 647.}, {47., 602.}, {2., 602.}}},
CurveClosed -> {1}]}
{FilledCurve[{{{0, 2, 0}, {0, 1, 0}, {0, 1, 0}}},
{{{2., 597.}, {47., 597.}, {47., 552.}, {2., 552.}}}]}
*)
Note that the coordinates of the squares and rectangles are in the second argument.
Also upon inspection, the squares are stored in order by columns. So we can extract the squares, Partition them, and map the coordinates of a corner to the index of the square. The green rectangles share two of the coordinates with its enclosing square, so we pick a corner in common to both (the 4th coordinate). We can get the area of a green rectangle by subtracting the coordinates of opposite corners and multiplying, and store then as rules mapping corner to areas. We can get the total area of a square the same way.
corner2idx = With[{squares = Cases[i, JoinedCurve[_, {rect_}, ___] :> rect, Infinity]},
Flatten @ MapIndexed[#[[4]] -> Reverse[#2] &, Partition[squares, 13], {2}]];
corner2area = With[{green = Cases[i, FilledCurve[_, {rect_}, ___] :> rect, Infinity]},
#[[4]] -> Times @@ First@Differences[#[[{2, 4}]]] & /@ green];
totalArea =
Times @@ First @ Differences[
Cases[i, JoinedCurve[_, {rect_}, ___] :> rect, Infinity, 1][[1, {2, 4}]]]
(*
2025.
*)
Now we can put it all together, using the rules corner2idx
that map corner coordinates to indices to create a list of array rules that map the index to the corresponding area. Dividing by the total area gives the proportion of green in each square.
proportions = SparseArray[corner2area /. corner2idx, {13, 13}, 0.]/totalArea;
Round[proportions, 0.02] // MatrixForm