How to get list of duplicates when using DeleteDuplicates?

Perhaps one of the simplest ways is to use Tally:

p = {1, 2, 4, 4, 6, 7, 8, 8};

Cases[Tally @ p, {x_, n_ /; n > 1} :> x]

{4, 8}

A somewhat faster formulation:

Pick[#, Unitize[#2 - 1], 1] & @@ Transpose[Tally @ p]

Taking the optimization to a rather excessive degree:

#[[SparseArray[#2, Automatic, 1]["AdjacencyLists"]]] & @@ Transpose[Tally @ p]

Though not as fast as the SparseArray optimized form of Tally, an alternative is to use Split after sorting. This is reasonably clean and fast:

Flatten[Split @ Sort @ p, {2}][[2]]

{4, 8}

For Integer data this method is twice as fast as any other listed here:

With[{s = Sort @ p},
 DeleteDuplicates @ 
   s[[ SparseArray[Unitize @ Differences @ s, Automatic, 1]["AdjacencyLists"] ]]
]

Timings

p = RandomInteger[1*^8, 1*^6];

Cases[Tally @ p, {x_, n_ /; n > 1} :> x] // Timing // First

Pick[#, Unitize[#2 - 1], 1] & @@ Transpose[Tally @ p] // Timing // First

#[[SparseArray[#2, Automatic, 1]["AdjacencyLists"]]] & @@ Transpose[Tally @ p] //
  Timing // First

Flatten[Split @ Sort @ p, {2}][[2]] // Timing // First

With[{s = Sort @ p},
 DeleteDuplicates @ 
   s[[ SparseArray[Unitize @ Differences @ s, Automatic, 1]["AdjacencyLists"] ]]
] // Timing // First

If you have to use DeleteDuplicates you can use Sow/Reap:

{#, Pick @@ Transpose[GatherBy[#2[[1]]][[;; , 1]]]} & @@ Reap[
   DeleteDuplicates[lstA, (Sow[{#1, SameQ[##]}]; SameQ[##]) &]]

{{1, 2, 4, 6, 7, 8}, {4, 8}}

Here's more general and faster approach:

myClone[list_, test_: Identity] := Composition[
       {#[[1]], #[[2]], Position[list, #] & /@ #[[2]]} &,
       {#[[;; , 1]], DeleteCases[#, {_}][[;; , 1]]} &,
       GatherBy[#, test] &
       ][list]

lstA = RandomInteger[10, 10]
myClone[lstA]

{5, 6, 3, 5, 10, 10, 8, 7, 0, 10}

{{5, 6, 3, 10, 8, 7, 0}, {5, 10}, {{{1}, {4}}, {{5}, {6}, {10}}}}

I am posting a second answer because this is a different method unrelated to the first.
I wondered how I might approach this if Tally did not exist. I came up with using Ordering on a reverse-sorted list as a way to look for duplicates. It seems to work, and I think it's fairly interesting. By nature it sorts the list of duplicates rather than giving them in order of appearance as Tally does.

duplicates[p_] :=
  With[{sp = Sort @ p}, sp[[
     "AdjacencyLists" //
       SparseArray[Unitize[1 - Differences @ Ordering @ Reverse @ sp], Automatic, 1]
    ]]
  ] // DeleteDuplicates

duplicates[{1, 2, 4, 4, 6, 7, 8, 8}]

{4, 8}

It is competitively fast compared to my first answer:

p = RandomInteger[8*^6, 2*^6];

Cases[Tally@p, {x_, n_ /; n > 1} :> x] // Length // Timing

Pick[#, Unitize[#2 - 1], 1] & @@ Transpose[Tally@p] // Length // Timing

#[[SparseArray[#2, Automatic, 1]["AdjacencyLists"]]] & @@ Transpose[Tally@p] // 
  Length // Timing

duplicates[p] // Length // Timing

{1.70041, 212277}

{0.592804, 212277}

{0.608404, 212277}

{0.358802, 212277}

On lists with extreme duplication it is a bit slower:

p = RandomInteger[1*^6, 5*^6];

(* same timing code as before *)

{1.57561, 959792}

{0.904806, 959792}

{0.904806, 959792}

{0.982806, 959792}

How to get list of duplicates when using DeleteDuplicates?

Timings

Tags:

Filtering

List Manipulation

Related

Recent Posts