Remove duplicates in list of object with Python
This seems pretty minimal:
new_dict = dict()
for obj in myList:
if obj.title not in new_dict:
new_dict[obj.title] = obj
The set(list_of_objects)
will only remove the duplicates if you know what a duplicate is, that is, you'll need to define a uniqueness of an object.
In order to do that, you'll need to make the object hashable. You need to define both __hash__
and __eq__
method, here is how:
http://docs.python.org/glossary.html#term-hashable
Though, you'll probably only need to define __eq__
method.
EDIT: How to implement the __eq__
method:
You'll need to know, as I mentioned, the uniqueness definition of your object. Supposed we have a Book with attributes author_name and title that their combination is unique, (so, we can have many books Stephen King authored, and many books named The Shining, but only one book named The Shining by Stephen King), then the implementation is as follows:
def __eq__(self, other):
return self.author_name==other.author_name\
and self.title==other.title
Similarly, this is how I sometimes implement the __hash__
method:
def __hash__(self):
return hash(('title', self.title,
'author_name', self.author_name))
You can check that if you create a list of 2 books with same author and title, the book objects will be the same (with equal (with is
operator) and==
operator). Also, when set()
is used, it will remove one book.
EDIT: This is one old anwser of mine, but I only now notice that it has the error which is corrected with strikethrough in the last paragraph: objects with the same hash()
won't give True
when compared with is
. Hashability of object is used, however, if you intend to use them as elements of set, or as keys in dictionary.
Since they're not hashable, you can't use a set directly. The titles should be though.
Here's the first part.
seen_titles = set()
new_list = []
for obj in myList:
if obj.title not in seen_titles:
new_list.append(obj)
seen_titles.add(obj.title)
You're going to need to describe what database/ORM etc. you're using for the second part though.
If you can't (or won't) define __eq__
for the objects, you can use a dict-comprehension to achieve the same end:
unique = list({item.attribute:item for item in mylist}.values())
Note that this will contain the last instance of a given key, e.g.
for mylist = [Item(attribute=1, tag='first'), Item(attribute=1, tag='second'), Item(attribute=2, tag='third')]
you get [Item(attribute=1, tag='second'), Item(attribute=2, tag='third')]
. You can get around this by using mylist[::-1]
(if the full list is present).