From a list of lists of integer, choose a minimal length list of integers that intersects each sublist
LinearProgramming
By enumerating your lists and associating each element of Union @@ data
with set of indices of lists that contain this element, you can translate this problem to a "minimum set cover problem". You're looking for minimal number of sets of indices such that their union gives set of all indices. "Minimal set cover problems" can be formulated as "integer linear programs" that in Mathematica can be solved by LinearProgramming
function.
In this case we are looking for vector $x$ of $k = 255$ zeros or ones such that $x_i$ is $1$ when $i$ is in our result set and $0$ if it isn't. So we want to minimize $\sum_{i=1}^k x_i$, in other words c.x
with $c$ being vector of ones. So c = ConstantArray[1, k]
should be used as first argument of LinearProgramming
.
Our first constrain is that for each list from data
, number of its elements present in result should be at least $1$ which can be written as m.x >= b
with $b$ being vector of ones b = ConstantArray[1, n]
, $n$ being number of lists n = Length@data
, and $m$ being $n \times k$ matrix such that each row corresponds to list from data and has $1$ at positions associated with elements present in list and $0$ at positions associated with elements not present in list. This m
should be used as second -, and b
as third argument of LinearProgramming
.
Second constrain is that for each $i$: $x_i \in \{0, 1\}$. Which can be communicated to Mathematica by using ConstantArray[{0, 1}, k]
as fourth argument of LinearProgramming
and restricting domain (fifth argument) to Integers
.
Putting it together we get:
n = Length@data;
k = 255;
m = ConstantArray[0, {n, k}];
MapIndexed[(m[[First@#2, #1]] = 1) &, data];
(x = LinearProgramming[ConstantArray[1, k], m, ConstantArray[1, n], ConstantArray[{0, 1}, k], Integers]) // MaxMemoryUsed // AbsoluteTiming
res = Flatten@Position[x, 1, {1}, Heads -> False]
(* LinearProgramming::lpip: Warning: integer linear programming will use a machine-precision approximation of the inputs. *)
(* {464.846, 1107448} *)
(* {9, 15, 48, 58, 122} *)
In less than 8 minutes we get result set containing 5 elements: {9, 15, 48, 58, 122}
.
Let's check that this is a correct answer:
And @@ (IntersectingQ[res, #] & /@ data)
(* True *)
Random Searches
You could try to search for minimal sets using algorithm similar to one used in other answers but with some randomization added.
We start by defining fast compiled function rejecting sets containing chosen element:
selectNonContaining = Compile[{{searchMatrix, _Integer, 2}, {remaining, _Integer, 1}, {chosen, _Integer}},
With[{result = Internal`Bag@Most@{0}},
Do[
If[Compile`GetElement[searchMatrix, i, chosen] === 0,
Internal`StuffBag[result, i]
],
{i, remaining}
];
Internal`BagPart[result, All]
],
RuntimeOptions -> "Speed", CompilationTarget -> "C"
];
Final minset
function accepting arbitrary function choiceFunc
selecting element from all elements of remaining sets.
minset // ClearAll
minset[choiceFunc_, data_, searchMatrix_, indices_] :=
Module[{remaining = indices, result = {}, chosen},
While[remaining =!= {},
chosen = choiceFunc[Join @@ data[[remaining]]];
result = {result, chosen};
remaining = selectNonContaining[searchMatrix, remaining, chosen]
];
Flatten@result
]
Prepare auxiliary data useful for fast searching:
packedData = Developer`ToPackedArray /@ data;
n = Length@data;
searchMatrix = ConstantArray[0, {n, Max@packedData}];
MapIndexed[(searchMatrix[[First@#2, #1]] = 1) &, data];
indices = Range@n;
With First@*Commonest
choice function we recreate greedy algorithm from other answers:
minset[First@*Commonest, packedData, searchMatrix, indices] // RepeatedTiming
(* {0.00035, {36, 17, 8, 10, 35, 44}} *)
As noted in Carl Woll`s answer, instead of first commonest element we can choose random commonest element:
(results = Table[minset[RandomChoice@*Commonest, packedData, searchMatrix, indices], 10^4]) // MaxMemoryUsed // AbsoluteTiming
KeySort@Counts[Length /@ results]
(* {3.51511, 1176080} *)
(* <|6 -> 10000|> *)
But we get only 6-element sets, so it seems that it's highly unlikely for greedy algorithm to find smaller set on this data.
Instead of commonest element we could simply pick random remaining element:
(results = Table[minset[RandomChoice, packedData, searchMatrix, indices], 10^4]) // MaxMemoryUsed // AbsoluteTiming
KeySort@Counts[Length /@ results]
Select[results, Length@# === 5 &]
Function[res, And @@ (IntersectingQ[res, #] & /@ data)] /@ %
(* {1.42398, 1410688} *)
(* <|5 -> 2, 6 -> 43, 7 -> 554, 8 -> 2419, 9 -> 3608, 10 -> 2385, 11 -> 768, 12 -> 189, 13 -> 27, 14 -> 4, 15 -> 1|> *)
(* {{123, 110, 14, 79, 124}, {44, 109, 115, 24, 116}} *)
(* {True, True} *)
In less than one and half second I've found two 5-element sets.
I got best result by balancing greed and randomness, by randomly choosing between random element and random commonest element in each step:
(results = Table[minset[RandomChoice[{RandomChoice@*Commonest, RandomChoice}]@# &, packedData, searchMatrix, indices], 10^4]) // MaxMemoryUsed // AbsoluteTiming
KeySort@Counts[Length /@ results]
Select[results, Length@# === 5 &]
Function[res, And @@ (IntersectingQ[res, #] & /@ data)] /@ %
(* {2.89473, 1263376} *)
(* <|5 -> 36, 6 -> 2852, 7 -> 4073, 8 -> 2359, 9 -> 590, 10 -> 83, 11 -> 7|> *)
(* {{43, 90, 13, 85, 62}, {11, 87, 97, 91, 122}, {92, 76, 126, 101, 2}, {18, 116, 19, 29, 49},
{47, 84, 49, 58, 48}, {48, 21, 23, 22, 12}, {103, 19, 28, 116, 82}, {101, 37, 105, 66, 46},
{114, 111, 61, 120, 87}, {103, 40, 113, 93, 122}, {43, 20, 23, 33, 28}, {82, 92, 99, 98, 63},
{72, 68, 6, 47, 81}, {75, 69, 29, 97, 101}, {42, 37, 40, 84, 72}, {48, 36, 40, 75, 51},
{124, 37, 105, 59, 126}, {116, 27, 101, 28, 19}, {45, 53, 55, 18, 38}, {6, 12, 15, 106, 81},
{29, 78, 13, 27, 115}, {95, 31, 11, 15, 96}, {103, 61, 68, 65, 43}, {22, 94, 12, 91, 20},
{59, 115, 124, 79, 77}, {31, 121, 101, 29, 112}, {78, 109, 59, 57, 68}, {100, 115, 45, 51, 97},
{78, 105, 97, 103, 28}, {88, 103, 102, 92, 14}, {76, 78, 95, 72, 11}, {117, 51, 118, 91, 102},
{123, 7, 16, 93, 17}, {7, 5, 14, 114, 45}, {112, 71, 76, 16, 28}, {36, 96, 40, 19, 1}} *)
(* {True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True} *)
This method found 36 5-element sets in less than three seconds.
Automating Random Search
We can automate random searching function, making it repeat search until it reaches certain specified length of result, or runs out of given time.
minsetRepeated // ClearAll
minsetRepeated // Options = {
"ChoiceFunction" -> (RandomChoice[{RandomChoice@*Commonest, RandomChoice}]@#&),
"LengthGoal" -> 1, TimeConstraint -> 1
};
minsetRepeated[data_, OptionsPattern[]] := Module[
{packedData, n, k, result, lengthGoal, choiceFunc, indices, searchMatrix},
packedData = DeleteDuplicates[Developer`ToPackedArray /@ data];
k = Max@packedData;
result = Range@k;
TimeConstrained[
{lengthGoal, choiceFunc} = OptionValue@{"LengthGoal", "ChoiceFunction"};
n = Length@data;
indices = Range@n;
searchMatrix = ConstantArray[0, {n, k}];
MapIndexed[(searchMatrix[[First@#2, #1]] = 1)&, data];
While[Length@result > lengthGoal,
Module[{remaining = indices, bag = Internal`Bag@{}, chosen},
While[Internal`BagLength@bag < Length@result && remaining =!= {},
chosen = choiceFunc[Join @@ packedData[[remaining]]];
Internal`StuffBag[bag, chosen];
remaining = selectNonContaining[searchMatrix, remaining, chosen]
];
If[Internal`BagLength@bag < Length@result, result = Internal`BagPart[bag, All]]
]
],
OptionValue@TimeConstraint
];
Sort@result
]
Default TimeConstraint
is one second, so it'll give best result it could find in one second:
SeedRandom@0
minsetRepeated@data
(* {24, 44, 53, 116, 117} *)
Which, as we can see, is enough to find 5-element set.
We can also set specific "LengthGoal"
, so that function will stop immediately after finding set with given Length
:
SeedRandom@0
minsetRepeated[data, "LengthGoal" -> 5] // AbsoluteTiming
(* {0.005669, {24, 44, 53, 116, 117}} *)
Finding first 5-element set took less than six milliseconds.
While it's possible for this random search to find minimal set very fast, it can't guarantee that it actually is set with minimal length.
Here is another implementation of @FalafelPita's algorithm:
greedy[sets_] := Module[{s = First @ Commonest @ Flatten @ sets},
Sow[s];
greedy[Cases[sets, x_ /; FreeQ[x,s]]]
]
greedy[{}] := Null
minset[sets_] := Reap[greedy[sets]][[-1,1]]
For your data set:
res = minset[data]
{36, 17, 8, 10, 35, 44}
Check:
Min[Length[Intersection[res, #]]& /@ data]
1
I don't think this algorithm is guaranteed to find a minimal set, but I can only think of a brute force method to check, and in this case I expect that such a method would take more than 400 hours. I did modify the greedy
function to randomize the commonest element it picks, but over a run of 10^4 trials, the minimal set always contained 6 elements. So, perhaps 6 is the length of the minimal set?
Here's one approach for finding such a set. It begins by counting the number of sets in which each element appears. Then, (one of) the most frequently appearing element(s) is Sow
n, and all sets that contained that element are removed from the list of remaining sets. This process continues until all sets have been removed, and then the list of Sow
n elements is returned.
findSmallSet[data_] := Module[
{nSets, remainingSets, remainingElements, common},
nSets = Length[data];
remainingSets = DeleteDuplicates[data];
Reap[
While[nSets > 0,
remainingElements = Flatten[remainingSets];
common = Commonest[remainingElements];
If[ListQ[common], common = First[common]];
Sow[common];
remainingSets =
DeleteCases[remainingSets, _?(MemberQ[#, common] &)];
nSets = Length[remainingSets];
];
][[2, 1]]
]
Older Solution
The first solution I posted was a bit less efficient because it discarded frequencies to create a list of unique elements, but then calculated frequencies later.
freqCounts[elementsList_, sets_] :=
Function[element, Count[sets, _?(MemberQ[#, element] &)]] /@
elementsList;
findSmallSet[data_] := Module[
{nSets, remainingSets, remainingElements, frequentElement,
freqElemPos},
nSets = Length[data];
remainingSets = DeleteDuplicates[data];
remainingElements = DeleteDuplicates[Flatten[data]];
Reap[
While[nSets > 0,
freqElemPos =
Position[#, Max[#]] &[
freqCounts[remainingElements, remainingSets]];
If[ListQ[freqElemPos], freqElemPos = First[freqElemPos]];
Sow[frequentElement = Extract[remainingElements, freqElemPos]];
remainingSets =
DeleteCases[remainingSets, _?(MemberQ[#, frequentElement] &)];
remainingElements = DeleteDuplicates[Flatten[remainingSets]];
nSets = Length[remainingSets];
];
][[2, 1]]
]