Remove elements from one array if present in another array, keep duplicates - NumPy / Python
Using searchsorted
With sorted B
, we can use searchsorted
-
A[B[np.searchsorted(B,A)] != A]
From the linked docs, searchsorted(a,v)
find the indices into a sorted array a
such that, if the corresponding elements in v
were inserted before the indices, the order of a would be preserved. So, let's say idx = searchsorted(B,A)
and we index into B
with those : B[idx]
, we will get a mapped version of B
corresponding to every element in A
. Thus, comparing this mapped version against A
would tell us for every element in A
if there's a match in B
or not. Finally, index into A
to select the non-matching ones.
Generic case (B
is not sorted) :
If B
is not already sorted as is the pre-requisite, sort it and then use the proposed method.
Alternatively, we can use sorter
argument with searchsorted
-
sidx = B.argsort()
out = A[B[sidx[np.searchsorted(B,A,sorter=sidx)]] != A]
More generic case (A
has values higher than ones in B
) :
sidx = B.argsort()
idx = np.searchsorted(B,A,sorter=sidx)
idx[idx==len(B)] = 0
out = A[B[sidx[idx]] != A]
Using in1d/isin
We can also use np.in1d
, which is pretty straight-forward (the linked docs should help clarify) as it looks for any match in B
for every element in A
and then we can use boolean-indexing with an inverted mask to look for non-matching ones -
A[~np.in1d(A,B)]
Same with isin
-
A[~np.isin(A,B)]
With invert
flag -
A[np.in1d(A,B,invert=True)]
A[np.isin(A,B,invert=True)]
This solves for a generic when B
is not necessarily sorted.
I am not very familiar with numpy, but how about using sets:
C = set(A.flat) - set(B.flat)
EDIT : from comments, sets cannot have duplicates values.
So another solution would be to use a lambda expression :
C = np.array(list(filter(lambda x: x not in B, A)))