Finding the largest rectangular submatrix

Update: Putting together pieces from several sources (links below) to identify the largest contiguous rectangle in a binary matrix:

ClearAll[poP, stutteringAccumulate, largestRectangleInHistogram, maxRectangle]
SetAttributes[poP, HoldAllComplete];
poP[a_] := Module[{b}, If[EmptyQ[a], False, b = Last[a]; Set[a, Most[a]]; b]]
stutteringAccumulate = FoldList[#2  #1 + #2 &, #] &;
largestRectangleInHistogram = Module[{max = 0, a = Join[{-1}, #, { -1}], n = 2 + Length@#, 
     stack = {1},  h, area, i, index = 1, height = 0}, 
    For[i = 1, i <= n, ++i, 
     While[a[[i]] < a[[Last@stack]], 
      h = a[[poP[stack]]]; 
      area = h (i - Last[stack] - 1); max = Max[max, area]; 
      If[max > area, index = index; height = height, index = i; height = h];
      ]; AppendTo[stack, i]];
    {height, {# - 1 - max/height, # - 2} &@index, max}] &;
maxRectangle[mat_] := Module[{lr = largestRectangleInHistogram /@ stutteringAccumulate[mat], 
    l = List /@ Range[Length@mat]},
   {#4 - {# - 1, 0}, #2, #3} & @@ MaximalBy[Join[lr, l, 2], Last][[1]]];

Examples:

mat = {{1, 1, 1, 1, 0, 0}, {1, 1, 1, 1, 1, 1}, {0, 0, 0, 1, 1, 1}, {1, 1, 0, 1, 1, 1}, 
   {1, 1, 1, 1, 1, 1}, {1, 1, 1, 1, 1, 1}};

Construct a matrix of histograms:

histograms = stutteringAccumulate @ mat

{{1, 1, 1, 1, 0, 0}, {2, 2, 2, 2, 1, 1}, {0, 0, 0, 3, 2, 2}, {1, 1, 0, 4, 3, 3}, {2, 2, 1, 5, 4, 4}, {3, 3, 2, 6, 5, 5}}

Find the largest rectangle for each row of histograms:

largestrecs = largestRectangleInHistogram /@ histograms

{{1, {1, 4}, 4}, {2, {1, 4}, 8}, {2, {4, 6}, 6}, {3, {4, 6}, 9}, {4, {4, 6}, 12}, {5, {4, 6}, 15}}

Pick from largestrecs the one with largest area:

{rows, cols, area} = maxRectangle[mat]

{{2, 6}, {4, 6}, 15}

Row[Labeled[##, Top] & @@@ Transpose[{MatrixForm /@ {mat, histograms, 
      MapAt[Style[#, Red, Bold] &, mat, Span @@@ maxRectangle[mat][[;; 2]]]}, 
   {"mat", "histograms", "max rectangle"}}]]

enter image description here

Row[Labeled[BarChart[#, ImageSize -> 100, 
     Background -> If[maxRectangle[mat][[-1]] == 
        largestRectangleInHistogram[#][[-1]], LightBlue, White]], 
    Style[largestRectangleInHistogram@#, 12], Top] & /@ histograms, Spacer[5]]

enter image description here

With SeedRandom[1]; mat = RandomInteger[1, {20, 40}]; as input

maxRectangle[mat]

{{17, 20}, {32, 33}, 8}

MatrixForm @ MapAt[Style[#, Red, Bold] &, mat, Span @@@ maxRectangle[mat][[;; 2]]]

enter image description here

Grid[Partition[Labeled[BarChart[#, ImageSize -> 100, 
   Background -> If[maxRectangle[mat][[-1]] == largestRectangleInHistogram[#][[-1]], 
     LightBlue, White]], Style[largestRectangleInHistogram@#, 10], Top] & /@ 
   (stutteringAccumulate@mat), 10]]

enter image description here

SeedRandom[123]
dta = RandomInteger[10, 50];
lr = largestRectangleInHistogram[dta];
BarChart[dta, BarSpacing -> 0, 
 ChartLabels -> Placed[Range@Length@dta, Axis], ImageSize -> Large, 
 PlotLabel -> Style[lr, 16], 
 Epilog -> {EdgeForm[Red], FaceForm[Opacity[.5, Red]], 
   Rectangle @@ (Transpose[{lr[[2]], {0, lr[[1]]}}] + {{-1/2, 0}, {1/2, 0}}) }]

enter image description here

Sources:

The idea of using an increasing stack to find the largest rectangle in a histogram and implementation is from this answer by Pei. The function largestRectangleInHistogram above is a Mathematica implementation of Pei's python function largestRectangleArea which is modified to return the column indices and the height in addition to the area of the largest rectangle.

The function poP is a slightly modified version of Pop from rosettacode - Stack.

The function stutteringAccumulate is from the posts by ciao and by Chip Hurst.

Okkes's links to Tushar Roy's YouTube videos has been extremely useful; especially, Maximum Rectangular Area in Histogram and Maximum Size Rectangle of All 1's Dynamic Programming.

Update 2: Dealing with non-necessarily-contiguous case for small matrices:

sa = SparseArray[mat];
al = DeleteCases[sa["AdjacencyLists"], {}];
nzprows = Union@sa["NonzeroPositions"][[All, 1]]; 
rowindices = MaximalBy[Subsets[nzprows, {2, Infinity}], 
  Length[#] Length[Intersection @@ #] &@al[[#]] &, 10];
rowscols = {#, Intersection @@ al[[#]]} & /@ rowindices;

Grid[Prepend[{## & @@ #, Times @@ Length /@ #} & /@ rowscols, 
  {"rows", "columns", "area"}], Dividers -> All] // TeXForm

$\begin{array}{|c|c|c|} \hline \text{rows} & \text{columns} & \text{area} \\ \hline \{2,4,5,6\} & \{1,2,4,5,6\} & 20 \\ \hline \{2,5,6\} & \{1,2,3,4,5,6\} & 18 \\ \hline \{1,2,5,6\} & \{1,2,3,4\} & 16 \\ \hline \{2,4,5\} & \{1,2,4,5,6\} & 15 \\ \hline \{2,4,6\} & \{1,2,4,5,6\} & 15 \\ \hline \{4,5,6\} & \{1,2,4,5,6\} & 15 \\ \hline \{1,2,4,5,6\} & \{1,2,4\} & 15 \\ \hline \{2,3,4,5,6\} & \{4,5,6\} & 15 \\ \hline \{2,5\} & \{1,2,3,4,5,6\} & 12 \\ \hline \{2,6\} & \{1,2,3,4,5,6\} & 12 \\ \hline \end{array}$

Original answer:

A brute force approach:

mat = {{1, 1, 1, 1, 0, 0}, {1, 1, 1, 1, 1, 1}, {0, 0, 0, 1, 1, 1}, {1, 1, 0, 1, 1, 1}, 
   {1, 1, 1, 1, 1, 1}, {1, 1, 1, 1, 1, 1}};
pairs = Transpose /@ MaximalBy[DeleteDuplicates[CoordinateBounds /@ 
   Subsets[SparseArray[mat]["NonzeroPositions"], {2}]], 
     Min[#] Total[#, 2] &@mat[[## & @@ Span @@@ #]] &]

{{{2, 4}, {6, 6}}}

Introduction

The problem you describe is called the maximum biclique problem in graph theory.

Definitions: A clique is a complete subgraph. A biclique is a complete bipartite subgraph (of a bipartite graph).

We can interpret your matrix $A$ as a bipartite incidence matrix: $a_{ij}=1$ means that vertex $i$ of the first partition is connected to vertex $j$ of the second partition of a bipartite graph.

A question remains: what does "largest" submatrix, or equivalently "largest" biclique mean? There are multiple interpretations, e.g.

Submatrix with most elements = biclique with most edges. The maximum edge biclique problem is NP-complete, meaning that there is no fast exact solution for large cases.
Submatrix with most #rows + #columns in total = biclique with most vertices. This problem can be solved in polynomial time by reducing it to bipartite matching.

Submatrix with most elements (maximum edge biclique)

I believe you want the 1st interpretation, but this does not seem to have fast solutions. Other answers already give slow solutions.

Submatrix with max #rows + #columns (maximum vertex biclique)

We can attack the 2nd problem in multiple ways. The easiest is to reduce it to simple clique finding: connect all vertex pairs within each of the two partitions, and look for maximum cliques. Clique finding is still computationally hard, but Mathematica already has a function for it, so the implementation will be easy.

Map to simple clique finding

The problem you are trying to solve is equivalent to biclique finding, i.e. finding complete bipartite graphs. We can think of the matrix as the bipartite incidence matrix of a bipartite graph.

Let us take this bipartite graph, and connect all vertex pair within each of the two partitions. Then we just need to look for maximal cliques to find maximal bicliques.

The following code looks for all maximal cliques:

Clear[getCliques]
getCliques[mat_] :=
 Module[{am, g, m, n},
  {m, n} = Dimensions[mat];
  am = ArrayFlatten[{ (* transform bipartite incidence matrix to ajdacency matrix *)
     {ConstantArray[1, {m, m}], mat},
     {Transpose[mat], ConstantArray[1, {n, n}]}
     }
    ];
  g = AdjacencyGraph[am];
  FindClique[g, Length /@ FindClique[g], All]
 ]

Here's some code to convert these to row/column indices and highlight them in the graph:

Clear[cliqueToRowCol]
cliqueToRowCol[mat_][clique_] := 
 With[{m = First@Dimensions[mat]}, {Select[clique, # <= m &], 
   Select[clique, # > m &] - m}]

Clear[highlight]
highlight[mat_][rowCol_] := 
 MatrixForm@MapAt[Style[#, Red] &, mat, Tuples[rowCol]]

This is your matrix:

mat = {{1, 1, 1, 1, 0, 0}, {1, 1, 1, 1, 1, 1}, {0, 0, 0, 1, 1, 1}, {1, 1, 0, 1, 1, 1}, {1, 1, 1, 1, 1, 1}, {1, 1, 1, 1, 1, 1}};

And this is the solutions we can find:

cliqueToRowCol[mat] /@ getCliques[mat]
(* { {{2, 5, 6},    {1, 2, 3, 4, 5, 6}}, 
     {{2, 4, 5, 6}, {1, 2, 4, 5, 6}} } *)

highlight[mat] /@ cliqueToRowCol[mat] /@ getCliques[mat]

enter image description here

Once again, this method does not find submatrices which have the most elements. It finds those for which #row + #columns is maximal. In this case, one of the solutions happens to also have the most element ($4 \times 5 = 20$). The other only has $3 \times 6 = 18$. For both sub-matrices, $4+5 = 3+6 = 9$.

Map to maximum matching

The other way is as follows:

Maximum vertex biclique finding is equivalent to finding a maximum independent vertex set in the bipartite complement graph. This, in turn, is equivalent to finding a minimum vertex cover: the vertices not in the covert will form the independent vertex set. According to Kőnig's theorem, in bipartite graphs, the minimum vertex cover can be formed from a maximum matching.

There is a function in Mathematica for finding a maximum matching: FindIndependentEdgeSet. To transform it to a min vertex cover, we can use a relatively simple algorithm, which so far I was too lazy to implement (might update this answer: http://tryalgo.org/en/matching/2016/08/05/konig/

This finds ONLY SQUARE sub matrix. Based on this explanation.

    SeedRandom@8
    {m, n} = {15, 20};
(initMat = RandomChoice[{20, 1} -> {1, 0}, {m, n}]) // MatrixForm;


mat = initMat;
MatrixForm[
  mat2 = Table[
    If[mat[[i + 1, j + 1]] == 0, mat[[i + 1, j + 1]] = 0, 
     mat[[i + 1, j + 1]] = 
      Min[{mat[[i, j]], mat[[i, j + 1]], mat[[i + 1, j]]}] + 1], {i, 
     m - 1}, {j, n - 1}]];

max = Max@mat2;

pos = Position[mat2, max] + 1;

pairs = Table[{pos[[i]] - max + 1, pos[[i]]}, {i, Length@pos}];

highlight[list_, position_] := 
 Grid[list, Background -> {None, None, # -> Yellow & /@ position}]

Table[highlight[initMat, 
  Join @@ CoordinateBoundsArray[Transpose@pairs[[i]]]], {i, 
  Length@pairs}]

enter image description here

Finding the largest rectangular submatrix

Introduction

Submatrix with most elements (maximum edge biclique)

Submatrix with max #rows + #columns (maximum vertex biclique)

Map to simple clique finding

Map to maximum matching

Tags:

Matrix

Regions

Graphs And Networks

Related

Recent Posts