Delete duplicates from list of lists as if on a necklace

A high-performance solution

Since you are planning to work with thousands of necklaces, it may be much faster to introduce a canonical form.

The main point is to write a necklace canonization function, which for all equivalent necklaces should return exactly the same result (canonical form). You can then apply canonization function to all necklaces in your list and use standard DeleteDuplicates procedure afterwards.

For simplicity we can take First@Sort@equivalentForms[necklace] as a canonical form (here equivalentForms generates all necklaces equivalent to a given one). In this case the full solution can be written as:

equivalentForms[nl_] := Join[NestList[RotateLeft, nl, Length[nl] - 1], NestList[RotateLeft, Reverse[nl], Length[nl] - 1]];
canonicalForm[nl_] := First@Sort@equivalentForms[nl];
myDeleteDuplicateNecklaces[list_] := DeleteDuplicates[Map[canonicalForm,list]]

(thanks to @LLlAMnYP for suggesting a more idiomatic code for equivalentForms)

For your example we get:

smallList = {{1, 1, 2, 1, 1, 2}, {1, 2, 1, 1, 2, 1}, {1, 2, 2, 1, 2, 2},
{1, 2, 2, 2, 1, 3}, {1, 2, 3, 1, 2, 3}, {1, 3, 1, 2, 2, 2},
{1, 3, 2, 1, 3, 2}, {2, 2, 1, 2, 2, 1}, {2, 2, 1, 3, 1, 2},
{2, 2, 2, 1, 3, 1}, {2, 3, 1, 2, 3, 1}};

myDeleteDuplicateNecklaces[smallList]

{{1, 1, 2, 1, 1, 2}, {1, 2, 2, 1, 2, 2}, {1, 2, 2, 2, 1, 3}, {1, 2, 3, 1, 2, 3}}

Benchmarks for large input:

Let's take a list of 5000 necklaces:

largeList = RandomInteger[{1, 3}, {5000, 10}];

and compare calculation times (in seconds)

@kglr's answer

First@AbsoluteTiming[f[largeList];]

158.356

@halirutan's answer

First@RepeatedTiming[deleteNecklaceDuplicates[largeList];]

1.941

This answer:

First@RepeatedTiming[myDeleteDuplicateNecklaces[largeList];]

0.077

As we can see from the benchmarks, for large lists this solution is 10 to 1000 times faster.

Update: compilation and parallel evaluation optimizations

@LLlAMnYP and @halirutan showed that canonization procedure can be significantly optimized using Mathematica's compilation and parallelization capabilities. They provided the following highly-optimized code, which calculates all canonical forms in parallel and gives further speedup:

canonicalFormC = Compile[{{list, _Integer, 1}},
   Module[{l =
      NestList[RotateLeft, list, Length[list] - 1]~Join~
       NestList[RotateLeft, Reverse[list], Length[list] - 1]},
    Compile`GetElement[l, First[Ordering[l]]]
    ],
   RuntimeAttributes -> {Listable},
   Parallelization -> True,
   CompilationTarget -> "C",
   RuntimeOptions -> "Speed"
   ];
myDeleteDuplicateNecklacesC[list_] := DeleteDuplicates[canonicalFormC[list]]

Benchmark of the compiled procedure:

First@RepeatedTiming[myDeleteDuplicateNecklacesC[largeList];]

0.00576

Thus, compilation and parallelization optimizations give additional 10x speedup.

Preface

If one could create a function f that calculates a canonical form of a necklace that turns all equivalent necklaces {t1, t2, ...} into one unique form t the solution is simple: Take your input list, create the canonical form of each item and delete all duplicates. If the function f is fast, then this approach should be the way to go.

As shown in Shadowray's answer, the direct way of creating all possible allowed permutations and just take the smallest is incredibly fast, especially if you combine it with a parallel compilation as shown in my answer. Therefore, please go and check out his answer.

Answer

If you want to calculate this on a list with thousands of necklaces of say length 10 you will need a fast approach (if you are not a very patient tea-drinker).

Let us first think how we can decide the equivalence of two lists t1 and t2 in an optimised way. I suggest the following sequence of 3 steps where each step is harder to calculate:

If the Total of the two lists is not equal, they are definitely not in the same equivalence class
If they are identical or identical to the Reverse of the other, they are in the same equivalence class
Assume we have {1,2,3} and {3,1,2}. What we do is we join one with itself {1,2,3,1,2,3} and now we go stepwise through this longer list and compare the current 3 elements. Since we already tested for exact equality, we start at position 2:
- is {3,1,2} equal to {2,3,1}, no.
- is {3,1,2} equal to {3,1,2}, yes!

If you do step 3. with both, {1,2,3} and the reversed {3,2,1} you catch all rotated/reflected equivalences. Let us compile this down so that we can parallelised compare one element with the whole list:

pickFunc = Compile[{{t1, _Integer, 1}, {t2, _Integer, 1}},
   Module[{t = {0}, t1Rev = {0}, len = Length[t2], res = True},
    If[Total[t1] != Total[t2], Return[True]];
    If[t1 === t2 || t1 === Reverse[t2], Return[False]];
    t = Join[t2, t2];
    t1Rev = Reverse[t1];
    Do[
     If[t1 === t[[i ;; i + len - 1]] || 
       t1Rev === t[[i ;; i + len - 1]],
      res = False;
      Break[]
      ], {i, 2, Length[t2]}
     ];
    res
    ], RuntimeAttributes -> {Listable}, Parallelization -> True
   ];

Taking your original list we can now test

pickFunc[list[[3]], list]
(* {True, True, False, True, True, True, True, False, True, True, True} *)

At all False positions we have an element that is equivalent to the test item. Why have I made the function return False? Because now I know which elements I have to take out for further processing.

The rest of the algorithm is as follows: We start with the initial list and its first element. We store the first element and pick out all that are not equivalent. This cleaned list is our new starting point and we iterate all over again. On our way, we collect all first items in res until our list to check is empty.

deleteNecklaceDuplicates[list_List] := Module[{l = list, res = {}},
  res = {};
  While[Length[l] > 0,
   res = {res, l[[1]]};
   l = Pick[l, pickFunc[l[[1]], l]]
   ];
  Partition[Flatten[res], Length[First[list]]]
  ]

Sidenote: I don't Append to res because it is slow. Instead, I build a nested result list with res = {res, newitem}. At the end I flatten out res and partition it again.

list = {{1, 1, 2, 1, 1, 2}, {1, 2, 1, 1, 2, 1}, {1, 2, 2, 1, 2, 
    2}, {1, 2, 2, 2, 1, 3}, {1, 2, 3, 1, 2, 3}, {1, 3, 1, 2, 2, 
    2}, {1, 3, 2, 1, 3, 2}, {2, 2, 1, 2, 2, 1}, {2, 2, 1, 3, 1, 
    2}, {2, 2, 2, 1, 3, 1}, {2, 3, 1, 2, 3, 1}};
deleteNecklaceDuplicates[list]
(* {{1, 1, 2, 1, 1, 2}, {1, 2, 2, 1, 2, 2}, {1, 2, 2, 2, 1, 
  3}, {1, 2, 3, 1, 2, 3}} *)

But here comes the cool part! Using @kglr's implementation in f:

list = RandomInteger[{1, 3}, {500, 10}];

f[list] === deleteNecklaceDuplicates[list]
(* True *)

Let us time this:

AbsoluteTiming[f[list];]
(* {1.74933, Null} *)

and

AbsoluteTiming[deleteNecklaceDuplicates[list];]
(* {0.11883, Null} *)

But how fast are we on a list with thousands? Let's try:

list = RandomInteger[{1, 3}, {5000, 10}];
AbsoluteTiming[deleteNecklaceDuplicates[list];]
(* {3.13899, Null} *)

I hope 3 seconds is fast enough.

ClearAll[f]
f = DeleteDuplicates[#, MemberQ[Join @@ NestList[RotateLeft /@ # &, 
    {#, Reverse @ #}, Length@#], #2] &] &;

list = {{1, 1, 2, 1, 1, 2}, {1, 2, 1, 1, 2, 1}, {1, 2, 2, 1, 2, 2}, 
   {1, 2, 2, 2, 1, 3}, {1, 2, 3, 1, 2, 3}, {1, 3, 1, 2, 2, 2},
   {1, 3, 2, 1, 3, 2}, {2, 2, 1, 2, 2, 1}, {2, 2, 1, 3, 1, 2}, 
   {2, 2, 2, 1, 3, 1}, {2, 3, 1, 2, 3, 1}};

f@list

{{1, 1, 2, 1, 1, 2}, {1, 2, 2, 1, 2, 2}, {1, 2, 2, 2, 1, 3}, {1, 2, 3, 1, 2, 3}}

Also:

<< Combinatorica`
ClearAll[dihedralL, f2]
dihedralL = ListNecklaces[Length@#, #, Dihedral] &;
f2 = DeleteDuplicates[#, {} =!= Intersection[dihedralL@#, dihedralL@#2] &] &;

f2 @ list == f @list

True

Update: a version that avoids Slots and pure functions (#, #2, &):

ClearAll[f3, necklace, mytestfunction]

necklace[a_] := Join @@ NestList[RotateLeft /@ # &, {a, Reverse@a}, Length@a]

mytestfunction[a_, b_] := MemberQ[necklace[a], b]

f3[input_, testfunction_] := DeleteDuplicates[input, testfunction];

f3[list, mytestfunction]

{{1, 1, 2, 1, 1, 2}, {1, 2, 2, 1, 2, 2}, {1, 2, 2, 2, 1, 3}, {1, 2, 3, 1, 2, 3}}

Delete duplicates from list of lists as if on a necklace

Update: compilation and parallel evaluation optimizations

Preface

Answer

Tags:

Permutation

List Manipulation

Canonicalization

Performance Tuning

Related

Recent Posts