Determining duplicate values in an array
I think this is most clear done outside of numpy
. You'll have to time it against your numpy
solutions if you are concerned with speed.
>>> import numpy as np
>>> from collections import Counter
>>> a = np.array([1, 2, 1, 3, 3, 3, 0])
>>> [item for item, count in Counter(a).items() if count > 1]
[1, 3]
note: This is similar to Burhan Khalid's answer, but the use of items
without subscripting in the condition should be faster.
As of numpy version 1.9.0, np.unique
has an argument return_counts
which greatly simplifies your task:
u, c = np.unique(a, return_counts=True)
dup = u[c > 1]
This is similar to using Counter
, except you get a pair of arrays instead of a mapping. I'd be curious to see how they perform relative to each other.
It's probably worth mentioning that even though np.unique
is quite fast in practice due to its numpyness, it has worse algorithmic complexity than the Counter
solution. np.unique
is sort-based, so runs asymptotically in O(n log n)
time. Counter
is hash-based, so has O(n)
complexity. This will not matter much for anything but the largest datasets.