How to get list of duplicates when using DeleteDuplicates?
Perhaps one of the simplest ways is to use Tally
:
p = {1, 2, 4, 4, 6, 7, 8, 8};
Cases[Tally @ p, {x_, n_ /; n > 1} :> x]
{4, 8}
A somewhat faster formulation:
Pick[#, Unitize[#2 - 1], 1] & @@ Transpose[Tally @ p]
Taking the optimization to a rather excessive degree:
#[[SparseArray[#2, Automatic, 1]["AdjacencyLists"]]] & @@ Transpose[Tally @ p]
Though not as fast as the SparseArray
optimized form of Tally
, an alternative is to use Split
after sorting. This is reasonably clean and fast:
Flatten[Split @ Sort @ p, {2}][[2]]
{4, 8}
For Integer data this method is twice as fast as any other listed here:
With[{s = Sort @ p},
DeleteDuplicates @
s[[ SparseArray[Unitize @ Differences @ s, Automatic, 1]["AdjacencyLists"] ]]
]
Timings
p = RandomInteger[1*^8, 1*^6];
Cases[Tally @ p, {x_, n_ /; n > 1} :> x] // Timing // First
Pick[#, Unitize[#2 - 1], 1] & @@ Transpose[Tally @ p] // Timing // First
#[[SparseArray[#2, Automatic, 1]["AdjacencyLists"]]] & @@ Transpose[Tally @ p] //
Timing // First
Flatten[Split @ Sort @ p, {2}][[2]] // Timing // First
With[{s = Sort @ p},
DeleteDuplicates @
s[[ SparseArray[Unitize @ Differences @ s, Automatic, 1]["AdjacencyLists"] ]]
] // Timing // First
0.827 0.343 0.265 0.343 0.11
If you have to use DeleteDuplicates
you can use Sow
/Reap
:
{#, Pick @@ Transpose[GatherBy[#2[[1]]][[;; , 1]]]} & @@ Reap[
DeleteDuplicates[lstA, (Sow[{#1, SameQ[##]}]; SameQ[##]) &]]
{{1, 2, 4, 6, 7, 8}, {4, 8}}
Here's more general and faster approach:
myClone[list_, test_: Identity] := Composition[
{#[[1]], #[[2]], Position[list, #] & /@ #[[2]]} &,
{#[[;; , 1]], DeleteCases[#, {_}][[;; , 1]]} &,
GatherBy[#, test] &
][list]
lstA = RandomInteger[10, 10]
myClone[lstA]
{5, 6, 3, 5, 10, 10, 8, 7, 0, 10} {{5, 6, 3, 10, 8, 7, 0}, {5, 10}, {{{1}, {4}}, {{5}, {6}, {10}}}}
I am posting a second answer because this is a different method unrelated to the first.
I wondered how I might approach this if Tally
did not exist. I came up with using Ordering
on a reverse-sorted list as a way to look for duplicates. It seems to work, and I think it's fairly interesting. By nature it sorts the list of duplicates rather than giving them in order of appearance as Tally
does.
duplicates[p_] :=
With[{sp = Sort @ p}, sp[[
"AdjacencyLists" //
SparseArray[Unitize[1 - Differences @ Ordering @ Reverse @ sp], Automatic, 1]
]]
] // DeleteDuplicates
duplicates[{1, 2, 4, 4, 6, 7, 8, 8}]
{4, 8}
It is competitively fast compared to my first answer:
p = RandomInteger[8*^6, 2*^6];
Cases[Tally@p, {x_, n_ /; n > 1} :> x] // Length // Timing
Pick[#, Unitize[#2 - 1], 1] & @@ Transpose[Tally@p] // Length // Timing
#[[SparseArray[#2, Automatic, 1]["AdjacencyLists"]]] & @@ Transpose[Tally@p] //
Length // Timing
duplicates[p] // Length // Timing
{1.70041, 212277} {0.592804, 212277} {0.608404, 212277} {0.358802, 212277}
On lists with extreme duplication it is a bit slower:
p = RandomInteger[1*^6, 5*^6];
(* same timing code as before *)
{1.57561, 959792} {0.904806, 959792} {0.904806, 959792} {0.982806, 959792}