Improve running speed for DeleteDuplicates

Here is a semi-imperative way that gives the same result as tup2 from the original question, but much faster:

tup2b = Module[{keep}
, keep[t_] :=
    ( keep[t] = False
    ; keep[t[[{4,5,6,1,2,3,10,11,12,7,8,9}]]] = False
    ; True
    )
; Select[tup1, keep]
];

tup2b === tup2
(* True *)

Length[tup2b] === Length[tup2] === 52650
(* True *)

On my machine, the calculation of tup2 takes a couple of hours whereas the approach shown here is subsecond. This approach also makes it easy to add other equivalence criteria if desired.

How It Works

The function keep is used as a predicate to Select to determine whether to keep each entry of the list. The first time each element is encountered, keep returns True. But as a side-effect it also adds a new definition to keep that records that the entry and its equivalent permutation are no longer to be kept. The new definition will return False if either entry is encountered later in the list. In this way keep effectively maintains a set of entries seen so far (along with their equivalents).

For the example list given in the question, it is not strictly necessary to record that we have seen each unpermuted entry since there are no duplicates. But in the general case that might not be so.

This method scans each entry of the list once, so it runs in time roughly proportional to the length $n$ of the list. Technically, the time is more on the order of $n \ln(n)$ due to the hashing involved in saving and testing entries, but for small $n$ the difference is not that noticeable.

By contrast, and as Leonid Shifrin points out, DeleteDuplicates must in principle compare every pair of elements, so the run time is proportional to $n^2$ -- a much larger number of iterations.

You can also get tup2 from tup1 using:

1. `Union`

ClearAll[fA]
fA = Union[Sort[{#, #[[{4, 5, 6, 1, 2, 3, 10, 11, 12, 7, 8,  9}]]}] & /@ #][[All, 1]] &;

tup2A = fA @ tup1; // AbsoluteTiming // First

0.213222

Length @ tup2A

2. `DeleteDuplicates`

ClearAll[fB]
fB = DeleteDuplicates[
 Sort[{#, #[[{4, 5, 6, 1, 2, 3, 10, 11, 12, 7, 8, 9}]]}] & /@ #][[All, 1]] &;

tup2B = fB @ tup1; // AbsoluteTiming // First

0.257217

3. `GroupOrbits` + `PermutationGroup`

ClearAll[fC]
fC = GroupOrbits[PermutationGroup[{{4, 5, 6, 1, 2, 3, 10, 11, 12, 7, 8, 9}}], #, 
    Permute][[All, 1]] &

tup2C = fC @ tup1; // AbsoluteTiming // First

0.640413

4. Memoization

ClearAll[fD]
fD = Module[{f0}, 
  f0[x_] := (f0[x] = f0[x[[{4, 5, 6, 1, 2, 3, 10, 11, 12, 7, 8, 9}]]] = Sequence[]; x); 
    f0 /@ #] &;

tup2D = fD @ tup1; // AbsoluteTiming // First

0.794055

5. `DeleteDuplicatesBy`

ClearAll[fE]
fE = DeleteDuplicatesBy[Sort[{#, #[[{4, 5, 6, 1, 2, 3, 10, 11, 12, 7, 8, 9}]]}]&]

tup2E = fE @ tup1; // AbsoluteTiming // First

1.13389

6. `GroupBy`

ClearAll[fF]
fF = Values @ GroupBy[#, 
     Sort[{#, #[[{4, 5, 6, 1, 2, 3, 10, 11, 12, 7, 8, 9}]]}] &, 
     First] &;

tup2F = fF @ tup1; // AbsoluteTiming // First

1.28655

All six results match tup2b from WReach's answer:

tup2b == tup2A == tup2B == tup2C == tup2D == tup2E == tup2F

True

In comparison, tup2b takes about a second:

tup2b = Module[{keep}, 
     keep[t_] := (keep[t] = False; 
       keep[t[[{4, 5, 6, 1, 2, 3, 10, 11, 12, 7, 8, 9}]]] = False; 
       True); Select[tup1, keep]]; // AbsoluteTiming // First

1.06063

A tuple is deleted if both its first two triples and its second two triples are identical when exchanged (but not when they're the same when not swapped). That we can construct the undeleted tuples directly like so:

tup2 =
  Module[{
    pair = Tuples[Tuples[{{0, 1}, {0, -1, 1}, {0, -1, 1}}], 2],
    unique
    },
   
   unique = DeleteDuplicates[pair, #1 === Reverse@#2 &];
   
   Flatten /@ 
    DeleteDuplicates@
     Join[Tuples[{unique, pair}], Tuples[{pair, unique}]]];

Takes about a tenth of a second on my laptop.

UPDATE: If you want to use the existing tup1 and delete the duplicates, you can keep the ones that appear in tup2 like so:

tup3 = Intersection[tup1, tup2];

which is very fast. If for some reason you need to keep tup1 in the original order, you can do something a bit slower:

tup4 = Select[tup1, AssociationThread[tup2 -> True]];

This takes another 0.2 seconds on my laptop.

Given the way the problem is stated, it's much easier to construct tuples you know are unique than to delete duplicates after the fact.

Improve running speed for DeleteDuplicates

1. `Union`

2. `DeleteDuplicates`

3. `GroupOrbits` + `PermutationGroup`

4. Memoization

5. `DeleteDuplicatesBy`

6. `GroupBy`

Tags:

Filtering

List Manipulation

Performance Tuning

Related

Recent Posts

Improve running speed for DeleteDuplicates

1. Union

2. DeleteDuplicates

3. GroupOrbits + PermutationGroup

4. Memoization

5. DeleteDuplicatesBy

6. GroupBy

Tags:

Filtering

List Manipulation

Performance Tuning

Related

1. `Union`

2. `DeleteDuplicates`

3. `GroupOrbits` + `PermutationGroup`

5. `DeleteDuplicatesBy`

6. `GroupBy`