Measuring the percentage of a given color in an area?

Update to account for multiple box sets on a single page

In order to extract the box information from a set of boxes, two tweaks were needed. First, I needed to explicitly state an ImageSize to get the desired resolution. Second, I needed to adjust the Binarize critereon so that the boxes, which are now surrounded by grey borders, could be identified by MorphologicalComponents.

i = Image[
   First@Import[
     "https://dl.dropboxusercontent.com/u/8003134/grid2.pdf", 
     "Pages"], ImageSize -> 2500];
boxes = ComponentMeasurements[
   MorphologicalComponents[
    Binarize[i, 
     And[#[[1]] < 0.75, #[[2]] < 0.75, #[[3]] < 
        0.75] &]], {"BoundingBox", "Centroid"}];
centers = (Range@Length@boxes /. boxes)[[All, 2]];
dims = (Range@Length@boxes /. boxes)[[All, 1]];
intensity = 
  1 - ImageMeasurements[
      ImageTake[i, 
       ImageDimensions[i][[2]] - {#[[2, 2]], #[[1, 2]]}, {#[[1, 
          1]], #[[2, 1]]}], "MeanIntensity"] & /@ dims;
intensity = Rescale[intensity, {Min[intensity], Max[intensity]}];
Show[i, Epilog -> 
  Table[Text[NumberForm[intensity[[i]], {1, 2}], centers[[i]]], {i, 
    Length@boxes}]]

The resulting image is too big to display. Instead, here is a cropped image showing the complete upper-left hand set of boxes and bits of the neighboring boxes to demonstrate that the code is working.

Mathematica graphics

Some additional tweaking of the code will be necessary if you want to get precision better than about 5%. My first suggestion would be to raise the image size even further and fine tune the Binarize criteria.

Note that replacing First with Last in the definition of i should get you the second page.

Old approach

This works for a pdf containing a single set of boxes

Here's another approach that makes use of ComponentMeasurements

i = Image[
   First@Import["https://dl.dropboxusercontent.com/u/8003134/img.pdf",
      "Pages"]];
boxes = ComponentMeasurements[
   MorphologicalComponents[
    Binarize[i, #[[1]] == #[[2]] == #[[3]] == 0 &]], {"BoundingBox", 
    "Centroid"}];
centers = (Range@Length@boxes /. boxes)[[All, 2]];
dims = (Range@Length@boxes /. boxes)[[All, 1]];
intensity = 
  1 - ImageMeasurements[
      ImageTake[i, 
       ImageDimensions[i][[2]] - {#[[2, 2]], #[[1, 2]]}, {#[[1, 
          1]], #[[2, 1]]}], "MeanIntensity"] & /@ dims;
intensity = Rescale[intensity, {Min[intensity], Max[intensity]}];
Show[i, Epilog -> 
  Table[Text[intensity[[i]], centers[[i]]], {i, Length@boxes}]]

Mathematica graphics

First, I find the squares in the image with MorphologicalComponents. I use the dimensions extracted from ComponentMeasurements to ImageTake each square. To determine the amount of green in the square, I use ImageMeasurements which won't give a 0 for the completely empty box since I've made no effort to remove the bounding black line. I resolve this issue by rescaling the intensity, assuming that there is a completely filled and completely empty box. Finally, since there are already answers showing how one extracts the values into a table, I use the "Centroid" from ComponentMeasurements to Show the values over the original image.


Since your image elements are of uniform size on a rectangular grid you can use ImageParition to split them apart, and then a combination of ImageCrop and ImageTake to get just the internal areas. From there you can use ImageData to access the raster, and Tally to count the pixels by color.

Example:

count = Tally[Join @@ ImageData @ ImageCrop @ #] &;

i = Import["http://i.imgur.com/62buGws.png"];

cells = ImagePartition[i, {48, 48}, {50, 50}];

Map[count, cells, {2}][[2, 3]]
{{{0.188235, 0.188235, 0.188235}, 2}, {{0., 0., 0.}, 176}, {{1., 1., 1.}, 

616}, {{0.74902, 1., 0.74902}, 44}, {{0., 1., 0.}, 1276}, {{0.0470588, 0.0627451, 0.0470588}, 2}}

[[2, 3]] is to look at the square on the second row, third column. (The first row is blank.)

These are the tallies of the RGB values for each cell, including the black border. {{0.74902, 1., 0.74902}, 44} is the row of light green pixels on the margin between green and white. You'll have to decide how you want to count these. You can then process the values accordingly. This is a general approach since it counts all colors in the cell; you can find specific ratios, etc., using that data.

To count green pixels, defined as RGB values where G equals 1, weighted by saturation, we could use:

green = Tr @ Cases[#, {{x_, o_ /; o == 1, x_}, c_} :> (1 - x) c] &;

The Condition o_ /; o == 1 is used rather than the simpler 1. to make the match more robust.

Now:

dat = Map[Composition[green, count], Rest @ cells, {2}];

dat // First
{0., 1936., 1287.04, 1936., 1936., 1440.96, 1243.04, 1452.,
    902.086, 956.957, 956.957, 924., 968.}

If we know a priori the inside dimensions of each cell to be 44*44 we can find the ratio of green to white with:

dat / 44^2 // MatrixForm

enter image description here

ybeltukov uses a similar method that is more optimized for this particular operation, but I feel that using the image processing functions such as ImagePartition and ImageCrop offer additional features that you may find valuable in other applications, for example if each cell were to have a different size.

Edit: actually ybeltukov's code is not presently accurate because it does not use e.g. ImageCrop and it is not counting only the pixels inside the box.


Note that your comment about the file being a PDF instead of a PNG changes the question, in that there ways to process a PDF that you cannot do to PNG.

First import the "Pages" of the PDF and stored the first (and only) one in i:

i = First @ Import["https://dl.dropboxusercontent.com/u/8003134/img.pdf", "Pages"];

Now the image is in fact scalable Graphics and the black squares and green rectangles are stored as JoinedCurve and FilledCurve objects:

Cases[i, _JoinedCurve, Infinity, 1]
Cases[i, _FilledCurve, Infinity, 1]
(*
   {JoinedCurve[{{{0, 2, 0}, {0, 1, 0}, {0, 1, 0}}},
                {{{2., 647.}, {47., 647.}, {47., 602.}, {2., 602.}}},
                CurveClosed -> {1}]}
   {FilledCurve[{{{0, 2, 0}, {0, 1, 0}, {0, 1, 0}}},
                {{{2., 597.}, {47., 597.}, {47., 552.}, {2., 552.}}}]}
*)

Note that the coordinates of the squares and rectangles are in the second argument.

Also upon inspection, the squares are stored in order by columns. So we can extract the squares, Partition them, and map the coordinates of a corner to the index of the square. The green rectangles share two of the coordinates with its enclosing square, so we pick a corner in common to both (the 4th coordinate). We can get the area of a green rectangle by subtracting the coordinates of opposite corners and multiplying, and store then as rules mapping corner to areas. We can get the total area of a square the same way.

corner2idx = With[{squares = Cases[i, JoinedCurve[_, {rect_}, ___] :> rect, Infinity]},
   Flatten @ MapIndexed[#[[4]] -> Reverse[#2] &, Partition[squares, 13], {2}]];

corner2area = With[{green = Cases[i, FilledCurve[_, {rect_}, ___] :> rect, Infinity]},
   #[[4]] -> Times @@ First@Differences[#[[{2, 4}]]] & /@ green];

totalArea = 
 Times @@ First @ Differences[
   Cases[i, JoinedCurve[_, {rect_}, ___] :> rect, Infinity, 1][[1, {2, 4}]]]
(*
   2025.
*)

Now we can put it all together, using the rules corner2idx that map corner coordinates to indices to create a list of array rules that map the index to the corresponding area. Dividing by the total area gives the proportion of green in each square.

proportions = SparseArray[corner2area /. corner2idx, {13, 13}, 0.]/totalArea;

Round[proportions, 0.02] // MatrixForm

Mathematica graphics