Finding the largest rectangular submatrix
Update: Putting together pieces from several sources (links below) to identify the largest contiguous rectangle in a binary matrix:
ClearAll[poP, stutteringAccumulate, largestRectangleInHistogram, maxRectangle]
SetAttributes[poP, HoldAllComplete];
poP[a_] := Module[{b}, If[EmptyQ[a], False, b = Last[a]; Set[a, Most[a]]; b]]
stutteringAccumulate = FoldList[#2 #1 + #2 &, #] &;
largestRectangleInHistogram = Module[{max = 0, a = Join[{-1}, #, { -1}], n = 2 + Length@#,
stack = {1}, h, area, i, index = 1, height = 0},
For[i = 1, i <= n, ++i,
While[a[[i]] < a[[Last@stack]],
h = a[[poP[stack]]];
area = h (i - Last[stack] - 1); max = Max[max, area];
If[max > area, index = index; height = height, index = i; height = h];
]; AppendTo[stack, i]];
{height, {# - 1 - max/height, # - 2} &@index, max}] &;
maxRectangle[mat_] := Module[{lr = largestRectangleInHistogram /@ stutteringAccumulate[mat],
l = List /@ Range[Length@mat]},
{#4 - {# - 1, 0}, #2, #3} & @@ MaximalBy[Join[lr, l, 2], Last][[1]]];
Examples:
mat = {{1, 1, 1, 1, 0, 0}, {1, 1, 1, 1, 1, 1}, {0, 0, 0, 1, 1, 1}, {1, 1, 0, 1, 1, 1},
{1, 1, 1, 1, 1, 1}, {1, 1, 1, 1, 1, 1}};
Construct a matrix of histograms:
histograms = stutteringAccumulate @ mat
{{1, 1, 1, 1, 0, 0}, {2, 2, 2, 2, 1, 1}, {0, 0, 0, 3, 2, 2}, {1, 1, 0, 4, 3, 3}, {2, 2, 1, 5, 4, 4}, {3, 3, 2, 6, 5, 5}}
Find the largest rectangle for each row of histograms
:
largestrecs = largestRectangleInHistogram /@ histograms
{{1, {1, 4}, 4}, {2, {1, 4}, 8}, {2, {4, 6}, 6}, {3, {4, 6}, 9}, {4, {4, 6}, 12}, {5, {4, 6}, 15}}
Pick from largestrecs
the one with largest area:
{rows, cols, area} = maxRectangle[mat]
{{2, 6}, {4, 6}, 15}
Row[Labeled[##, Top] & @@@ Transpose[{MatrixForm /@ {mat, histograms,
MapAt[Style[#, Red, Bold] &, mat, Span @@@ maxRectangle[mat][[;; 2]]]},
{"mat", "histograms", "max rectangle"}}]]
Row[Labeled[BarChart[#, ImageSize -> 100,
Background -> If[maxRectangle[mat][[-1]] ==
largestRectangleInHistogram[#][[-1]], LightBlue, White]],
Style[largestRectangleInHistogram@#, 12], Top] & /@ histograms, Spacer[5]]
With SeedRandom[1]; mat = RandomInteger[1, {20, 40}];
as input
maxRectangle[mat]
{{17, 20}, {32, 33}, 8}
MatrixForm @ MapAt[Style[#, Red, Bold] &, mat, Span @@@ maxRectangle[mat][[;; 2]]]
Grid[Partition[Labeled[BarChart[#, ImageSize -> 100,
Background -> If[maxRectangle[mat][[-1]] == largestRectangleInHistogram[#][[-1]],
LightBlue, White]], Style[largestRectangleInHistogram@#, 10], Top] & /@
(stutteringAccumulate@mat), 10]]
SeedRandom[123]
dta = RandomInteger[10, 50];
lr = largestRectangleInHistogram[dta];
BarChart[dta, BarSpacing -> 0,
ChartLabels -> Placed[Range@Length@dta, Axis], ImageSize -> Large,
PlotLabel -> Style[lr, 16],
Epilog -> {EdgeForm[Red], FaceForm[Opacity[.5, Red]],
Rectangle @@ (Transpose[{lr[[2]], {0, lr[[1]]}}] + {{-1/2, 0}, {1/2, 0}}) }]
Sources:
The idea of using an increasing stack to find the largest rectangle in a histogram and implementation is from this answer by Pei. The function largestRectangleInHistogram
above is a Mathematica implementation of Pei's python function largestRectangleArea
which is modified to return the column indices and the height in addition to the area of the largest rectangle.
The function poP
is a slightly modified version of Pop
from rosettacode - Stack.
The function stutteringAccumulate
is from the posts by ciao and by Chip Hurst.
Okkes's links to Tushar Roy's YouTube videos has been extremely useful; especially, Maximum Rectangular Area in Histogram and Maximum Size Rectangle of All 1's Dynamic Programming.
Update 2: Dealing with non-necessarily-contiguous case for small matrices:
sa = SparseArray[mat];
al = DeleteCases[sa["AdjacencyLists"], {}];
nzprows = Union@sa["NonzeroPositions"][[All, 1]];
rowindices = MaximalBy[Subsets[nzprows, {2, Infinity}],
Length[#] Length[Intersection @@ #] &@al[[#]] &, 10];
rowscols = {#, Intersection @@ al[[#]]} & /@ rowindices;
Grid[Prepend[{## & @@ #, Times @@ Length /@ #} & /@ rowscols,
{"rows", "columns", "area"}], Dividers -> All] // TeXForm
$\begin{array}{|c|c|c|} \hline \text{rows} & \text{columns} & \text{area} \\ \hline \{2,4,5,6\} & \{1,2,4,5,6\} & 20 \\ \hline \{2,5,6\} & \{1,2,3,4,5,6\} & 18 \\ \hline \{1,2,5,6\} & \{1,2,3,4\} & 16 \\ \hline \{2,4,5\} & \{1,2,4,5,6\} & 15 \\ \hline \{2,4,6\} & \{1,2,4,5,6\} & 15 \\ \hline \{4,5,6\} & \{1,2,4,5,6\} & 15 \\ \hline \{1,2,4,5,6\} & \{1,2,4\} & 15 \\ \hline \{2,3,4,5,6\} & \{4,5,6\} & 15 \\ \hline \{2,5\} & \{1,2,3,4,5,6\} & 12 \\ \hline \{2,6\} & \{1,2,3,4,5,6\} & 12 \\ \hline \end{array}$
Original answer:
A brute force approach:
mat = {{1, 1, 1, 1, 0, 0}, {1, 1, 1, 1, 1, 1}, {0, 0, 0, 1, 1, 1}, {1, 1, 0, 1, 1, 1},
{1, 1, 1, 1, 1, 1}, {1, 1, 1, 1, 1, 1}};
pairs = Transpose /@ MaximalBy[DeleteDuplicates[CoordinateBounds /@
Subsets[SparseArray[mat]["NonzeroPositions"], {2}]],
Min[#] Total[#, 2] &@mat[[## & @@ Span @@@ #]] &]
{{{2, 4}, {6, 6}}}
Introduction
The problem you describe is called the maximum biclique problem in graph theory.
Definitions: A clique is a complete subgraph. A biclique is a complete bipartite subgraph (of a bipartite graph).
We can interpret your matrix $A$ as a bipartite incidence matrix: $a_{ij}=1$ means that vertex $i$ of the first partition is connected to vertex $j$ of the second partition of a bipartite graph.
A question remains: what does "largest" submatrix, or equivalently "largest" biclique mean? There are multiple interpretations, e.g.
Submatrix with most elements = biclique with most edges. The maximum edge biclique problem is NP-complete, meaning that there is no fast exact solution for large cases.
Submatrix with most #rows + #columns in total = biclique with most vertices. This problem can be solved in polynomial time by reducing it to bipartite matching.
Submatrix with most elements (maximum edge biclique)
I believe you want the 1st interpretation, but this does not seem to have fast solutions. Other answers already give slow solutions.
Submatrix with max #rows + #columns (maximum vertex biclique)
We can attack the 2nd problem in multiple ways. The easiest is to reduce it to simple clique finding: connect all vertex pairs within each of the two partitions, and look for maximum cliques. Clique finding is still computationally hard, but Mathematica already has a function for it, so the implementation will be easy.
Map to simple clique finding
The problem you are trying to solve is equivalent to biclique finding, i.e. finding complete bipartite graphs. We can think of the matrix as the bipartite incidence matrix of a bipartite graph.
Let us take this bipartite graph, and connect all vertex pair within each of the two partitions. Then we just need to look for maximal cliques to find maximal bicliques.
The following code looks for all maximal cliques:
Clear[getCliques]
getCliques[mat_] :=
Module[{am, g, m, n},
{m, n} = Dimensions[mat];
am = ArrayFlatten[{ (* transform bipartite incidence matrix to ajdacency matrix *)
{ConstantArray[1, {m, m}], mat},
{Transpose[mat], ConstantArray[1, {n, n}]}
}
];
g = AdjacencyGraph[am];
FindClique[g, Length /@ FindClique[g], All]
]
Here's some code to convert these to row/column indices and highlight them in the graph:
Clear[cliqueToRowCol]
cliqueToRowCol[mat_][clique_] :=
With[{m = First@Dimensions[mat]}, {Select[clique, # <= m &],
Select[clique, # > m &] - m}]
Clear[highlight]
highlight[mat_][rowCol_] :=
MatrixForm@MapAt[Style[#, Red] &, mat, Tuples[rowCol]]
This is your matrix:
mat = {{1, 1, 1, 1, 0, 0}, {1, 1, 1, 1, 1, 1}, {0, 0, 0, 1, 1, 1}, {1, 1, 0, 1, 1, 1}, {1, 1, 1, 1, 1, 1}, {1, 1, 1, 1, 1, 1}};
And this is the solutions we can find:
cliqueToRowCol[mat] /@ getCliques[mat]
(* { {{2, 5, 6}, {1, 2, 3, 4, 5, 6}},
{{2, 4, 5, 6}, {1, 2, 4, 5, 6}} } *)
highlight[mat] /@ cliqueToRowCol[mat] /@ getCliques[mat]
Once again, this method does not find submatrices which have the most elements. It finds those for which #row + #columns is maximal. In this case, one of the solutions happens to also have the most element ($4 \times 5 = 20$). The other only has $3 \times 6 = 18$. For both sub-matrices, $4+5 = 3+6 = 9$.
Map to maximum matching
The other way is as follows:
Maximum vertex biclique finding is equivalent to finding a maximum independent vertex set in the bipartite complement graph. This, in turn, is equivalent to finding a minimum vertex cover: the vertices not in the covert will form the independent vertex set. According to Kőnig's theorem, in bipartite graphs, the minimum vertex cover can be formed from a maximum matching.
There is a function in Mathematica for finding a maximum matching: FindIndependentEdgeSet
. To transform it to a min vertex cover, we can use a relatively simple algorithm, which so far I was too lazy to implement (might update this answer: http://tryalgo.org/en/matching/2016/08/05/konig/
This finds ONLY SQUARE sub matrix. Based on this explanation.
SeedRandom@8
{m, n} = {15, 20};
(initMat = RandomChoice[{20, 1} -> {1, 0}, {m, n}]) // MatrixForm;
mat = initMat;
MatrixForm[
mat2 = Table[
If[mat[[i + 1, j + 1]] == 0, mat[[i + 1, j + 1]] = 0,
mat[[i + 1, j + 1]] =
Min[{mat[[i, j]], mat[[i, j + 1]], mat[[i + 1, j]]}] + 1], {i,
m - 1}, {j, n - 1}]];
max = Max@mat2;
pos = Position[mat2, max] + 1;
pairs = Table[{pos[[i]] - max + 1, pos[[i]]}, {i, Length@pos}];
highlight[list_, position_] :=
Grid[list, Background -> {None, None, # -> Yellow & /@ position}]
Table[highlight[initMat,
Join @@ CoordinateBoundsArray[Transpose@pairs[[i]]]], {i,
Length@pairs}]