Python find elements in one list that are not in the other
TL;DR:
SOLUTION (1)
import numpy as np
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`
SOLUTION (2) You want a sorted list
def setdiff_sorted(array1,array2,assume_unique=False):
ans = np.setdiff1d(array1,array2,assume_unique).tolist()
if assume_unique:
return sorted(ans)
return ans
main_list = setdiff_sorted(list_2,list_1)
EXPLANATIONS:
(1) You can use NumPy's setdiff1d
(array1
,array2
,assume_unique
=False
).
assume_unique
asks the user IF the arrays ARE ALREADY UNIQUE.
If False
, then the unique elements are determined first.
If True
, the function will assume that the elements are already unique AND function will skip determining the unique elements.
This yields the unique values in array1
that are not in array2
. assume_unique
is False
by default.
If you are concerned with the unique elements (based on the response of Chinny84), then simply use (where assume_unique=False
=> the default value):
import numpy as np
list_1 = ["a", "b", "c", "d", "e"]
list_2 = ["a", "f", "c", "m"]
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`
(2)
For those who want answers to be sorted, I've made a custom function:
import numpy as np
def setdiff_sorted(array1,array2,assume_unique=False):
ans = np.setdiff1d(array1,array2,assume_unique).tolist()
if assume_unique:
return sorted(ans)
return ans
To get the answer, run:
main_list = setdiff_sorted(list_2,list_1)
SIDE NOTES:
(a) Solution 2 (custom function setdiff_sorted
) returns a list (compared to an array in solution 1).
(b) If you aren't sure if the elements are unique, just use the default setting of NumPy's setdiff1d
in both solutions A and B. What can be an example of a complication? See note (c).
(c) Things will be different if either of the two lists is not unique.
Say list_2
is not unique: list2 = ["a", "f", "c", "m", "m"]
. Keep list1
as is: list_1 = ["a", "b", "c", "d", "e"]
Setting the default value of assume_unique
yields ["f", "m"]
(in both solutions). HOWEVER, if you set assume_unique=True
, both solutions give ["f", "m", "m"]
. Why? This is because the user ASSUMED that the elements are unique). Hence, IT IS BETTER TO KEEP assume_unique
to its default value. Note that both answers are sorted.
pythonnumpy
You can use sets:
main_list = list(set(list_2) - set(list_1))
Output:
>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> set(list_2) - set(list_1)
set(['m', 'f'])
>>> list(set(list_2) - set(list_1))
['m', 'f']
Per @JonClements' comment, here is a tidier version:
>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> list(set(list_2).difference(list_1))
['m', 'f']