Delete duplicates from list of lists as if on a necklace
A high-performance solution
Since you are planning to work with thousands of necklaces, it may be much faster to introduce a canonical form.
The main point is to write a necklace canonization function, which for all equivalent necklaces should return exactly the same result (canonical form). You can then apply canonization function to all necklaces in your list and use standard DeleteDuplicates
procedure afterwards.
For simplicity we can take First@Sort@equivalentForms[necklace]
as a canonical form (here equivalentForms
generates all necklaces equivalent to a given one). In this case the full solution can be written as:
equivalentForms[nl_] := Join[NestList[RotateLeft, nl, Length[nl] - 1], NestList[RotateLeft, Reverse[nl], Length[nl] - 1]];
canonicalForm[nl_] := First@Sort@equivalentForms[nl];
myDeleteDuplicateNecklaces[list_] := DeleteDuplicates[Map[canonicalForm,list]]
(thanks to @LLlAMnYP for suggesting a more idiomatic code for equivalentForms
)
For your example we get:
smallList = {{1, 1, 2, 1, 1, 2}, {1, 2, 1, 1, 2, 1}, {1, 2, 2, 1, 2, 2},
{1, 2, 2, 2, 1, 3}, {1, 2, 3, 1, 2, 3}, {1, 3, 1, 2, 2, 2},
{1, 3, 2, 1, 3, 2}, {2, 2, 1, 2, 2, 1}, {2, 2, 1, 3, 1, 2},
{2, 2, 2, 1, 3, 1}, {2, 3, 1, 2, 3, 1}};
myDeleteDuplicateNecklaces[smallList]
{{1, 1, 2, 1, 1, 2}, {1, 2, 2, 1, 2, 2}, {1, 2, 2, 2, 1, 3}, {1, 2, 3, 1, 2, 3}}
Benchmarks for large input:
Let's take a list of 5000 necklaces:
largeList = RandomInteger[{1, 3}, {5000, 10}];
and compare calculation times (in seconds)
@kglr's answer
First@AbsoluteTiming[f[largeList];]
158.356
@halirutan's answer
First@RepeatedTiming[deleteNecklaceDuplicates[largeList];]
1.941
This answer:
First@RepeatedTiming[myDeleteDuplicateNecklaces[largeList];]
0.077
As we can see from the benchmarks, for large lists this solution is 10 to 1000 times faster.
Update: compilation and parallel evaluation optimizations
@LLlAMnYP and @halirutan showed that canonization procedure can be significantly optimized using Mathematica's compilation and parallelization capabilities. They provided the following highly-optimized code, which calculates all canonical forms in parallel and gives further speedup:
canonicalFormC = Compile[{{list, _Integer, 1}},
Module[{l =
NestList[RotateLeft, list, Length[list] - 1]~Join~
NestList[RotateLeft, Reverse[list], Length[list] - 1]},
Compile`GetElement[l, First[Ordering[l]]]
],
RuntimeAttributes -> {Listable},
Parallelization -> True,
CompilationTarget -> "C",
RuntimeOptions -> "Speed"
];
myDeleteDuplicateNecklacesC[list_] := DeleteDuplicates[canonicalFormC[list]]
Benchmark of the compiled procedure:
First@RepeatedTiming[myDeleteDuplicateNecklacesC[largeList];]
0.00576
Thus, compilation and parallelization optimizations give additional 10x speedup.
Preface
If one could create a function f
that calculates a canonical form of a necklace that turns all equivalent necklaces {t1, t2, ...}
into one unique form t
the solution is simple: Take your input list, create the canonical form of each item and delete all duplicates. If the function f
is fast, then this approach should be the way to go.
As shown in Shadowray's answer, the direct way of creating all possible allowed permutations and just take the smallest is incredibly fast, especially if you combine it with a parallel compilation as shown in my answer. Therefore, please go and check out his answer.
Answer
If you want to calculate this on a list with thousands of necklaces of say length 10 you will need a fast approach (if you are not a very patient tea-drinker).
Let us first think how we can decide the equivalence of two lists t1
and t2
in an optimised way. I suggest the following sequence of 3 steps where each step is harder to calculate:
- If the
Total
of the two lists is not equal, they are definitely not in the same equivalence class - If they are identical or identical to the
Reverse
of the other, they are in the same equivalence class - Assume we have
{1,2,3}
and{3,1,2}
. What we do is we join one with itself{1,2,3,1,2,3}
and now we go stepwise through this longer list and compare the current 3 elements. Since we already tested for exact equality, we start at position 2:- is {3,1,2} equal to {2,3,1}, no.
- is {3,1,2} equal to {3,1,2}, yes!
If you do step 3. with both, {1,2,3}
and the reversed {3,2,1}
you catch all rotated/reflected equivalences. Let us compile this down so that we can parallelised compare one element with the whole list:
pickFunc = Compile[{{t1, _Integer, 1}, {t2, _Integer, 1}},
Module[{t = {0}, t1Rev = {0}, len = Length[t2], res = True},
If[Total[t1] != Total[t2], Return[True]];
If[t1 === t2 || t1 === Reverse[t2], Return[False]];
t = Join[t2, t2];
t1Rev = Reverse[t1];
Do[
If[t1 === t[[i ;; i + len - 1]] ||
t1Rev === t[[i ;; i + len - 1]],
res = False;
Break[]
], {i, 2, Length[t2]}
];
res
], RuntimeAttributes -> {Listable}, Parallelization -> True
];
Taking your original list
we can now test
pickFunc[list[[3]], list]
(* {True, True, False, True, True, True, True, False, True, True, True} *)
At all False
positions we have an element that is equivalent to the test item. Why have I made the function return False
? Because now I know which elements I have to take out for further processing.
The rest of the algorithm is as follows: We start with the initial list and its first element. We store the first element and pick out all that are not equivalent. This cleaned list is our new starting point and we iterate all over again. On our way, we collect all first items in res
until our list
to check is empty.
deleteNecklaceDuplicates[list_List] := Module[{l = list, res = {}},
res = {};
While[Length[l] > 0,
res = {res, l[[1]]};
l = Pick[l, pickFunc[l[[1]], l]]
];
Partition[Flatten[res], Length[First[list]]]
]
Sidenote: I don't Append
to res
because it is slow. Instead, I build a nested result list with res = {res, newitem}
. At the end I flatten out res
and partition it again.
list = {{1, 1, 2, 1, 1, 2}, {1, 2, 1, 1, 2, 1}, {1, 2, 2, 1, 2,
2}, {1, 2, 2, 2, 1, 3}, {1, 2, 3, 1, 2, 3}, {1, 3, 1, 2, 2,
2}, {1, 3, 2, 1, 3, 2}, {2, 2, 1, 2, 2, 1}, {2, 2, 1, 3, 1,
2}, {2, 2, 2, 1, 3, 1}, {2, 3, 1, 2, 3, 1}};
deleteNecklaceDuplicates[list]
(* {{1, 1, 2, 1, 1, 2}, {1, 2, 2, 1, 2, 2}, {1, 2, 2, 2, 1,
3}, {1, 2, 3, 1, 2, 3}} *)
But here comes the cool part! Using @kglr's implementation in f
:
list = RandomInteger[{1, 3}, {500, 10}];
f[list] === deleteNecklaceDuplicates[list]
(* True *)
Let us time this:
AbsoluteTiming[f[list];]
(* {1.74933, Null} *)
and
AbsoluteTiming[deleteNecklaceDuplicates[list];]
(* {0.11883, Null} *)
But how fast are we on a list with thousands? Let's try:
list = RandomInteger[{1, 3}, {5000, 10}];
AbsoluteTiming[deleteNecklaceDuplicates[list];]
(* {3.13899, Null} *)
I hope 3 seconds is fast enough.
ClearAll[f]
f = DeleteDuplicates[#, MemberQ[Join @@ NestList[RotateLeft /@ # &,
{#, Reverse @ #}, Length@#], #2] &] &;
list = {{1, 1, 2, 1, 1, 2}, {1, 2, 1, 1, 2, 1}, {1, 2, 2, 1, 2, 2},
{1, 2, 2, 2, 1, 3}, {1, 2, 3, 1, 2, 3}, {1, 3, 1, 2, 2, 2},
{1, 3, 2, 1, 3, 2}, {2, 2, 1, 2, 2, 1}, {2, 2, 1, 3, 1, 2},
{2, 2, 2, 1, 3, 1}, {2, 3, 1, 2, 3, 1}};
f@list
{{1, 1, 2, 1, 1, 2}, {1, 2, 2, 1, 2, 2}, {1, 2, 2, 2, 1, 3}, {1, 2, 3, 1, 2, 3}}
Also:
<< Combinatorica`
ClearAll[dihedralL, f2]
dihedralL = ListNecklaces[Length@#, #, Dihedral] &;
f2 = DeleteDuplicates[#, {} =!= Intersection[dihedralL@#, dihedralL@#2] &] &;
f2 @ list == f @list
True
Update: a version that avoids Slots and pure functions (#
, #2
, &
):
ClearAll[f3, necklace, mytestfunction]
necklace[a_] := Join @@ NestList[RotateLeft /@ # &, {a, Reverse@a}, Length@a]
mytestfunction[a_, b_] := MemberQ[necklace[a], b]
f3[input_, testfunction_] := DeleteDuplicates[input, testfunction];
f3[list, mytestfunction]
{{1, 1, 2, 1, 1, 2}, {1, 2, 2, 1, 2, 2}, {1, 2, 2, 2, 1, 3}, {1, 2, 3, 1, 2, 3}}