How to get list of objects with unique attribute

How about using dict (since its keys are unique)?

Assuming we have

class Object:
    def __init__(self, id):
        self.id = id


Aobject = Object(1)
Bobject = Object(1)
Cobject = Object(2)
objects = [Aobject, Bobject, Cobject]

then list with Objects unique by id field can be generated using dict comprehension in Python 3

unique_objects = list({object_.id: object_ for object_ in objects}.values())

in Python 2.7

unique_objects = {object_.id: object_ for object_ in objects}.values()

and in Python <2.7

unique_objects = dict([(object_.id, object_) for object_ in objects]).values()

Finally, we can write function (Python 3 version)

def unique(elements, key):
    return list({key(element): element for element in elements}.values())

where elements may be any iterable and key is some callable which returns hashable objects from elements (key equals to operator.attrgetter('id') in our particular case).

Marcin's answer works fine but doesn't look Pythonic to me since list comprehension mutates seen object from outer scope, also there is some magic behind using set.add method and comparing its result (which is None) with obj.

And final but not less important part:

Benchmark

setup = '''
import random


class Object:
    def __init__(self, id):
        self.id = id


objects = [Object(random.randint(-100, 100))
           for i in range(1000)]
'''
solution = '''
seen = set()
result = [seen.add(object_.id) or object_
          for object_ in objects
          if object_.id not in seen]
'''
print('list comprehension + set: ',
      min(timeit.Timer(solution, setup).repeat(7, 1000)))
solution = '''
result = list({object_.id: object_
               for object_ in objects}.values())
'''
print('dict comprehension: ',
      min(timeit.Timer(solution, setup).repeat(7, 1000)))

on my machine gives

list comprehension + set:  0.20700953400228173
dict comprehension:  0.1477799109998159

seen = set() 

# never use list as a variable name
[seen.add(obj.id) or obj for obj in mylist if obj.id not in seen]

This works because set.add returns None, so the expression in the list comprehension always yields obj, but only if obj.id has not already been added to seen.

(The expression could only evaluate to None if obj is None; in that case, obj.id would raise an exception. In case mylist contains None values, change the test to if obj and (obj.id not in seen))

Note that this will give you the first object in the list which has a given id. @Abhijit's answer will give you the last such object.

Update:

Alternatively, an ordereddict could be a good choice:

import collections
seen = collections.OrderedDict()

for obj in mylist:
    # eliminate this check if you want the last item
    if obj.id not in seen:
       seen[obj.id] = obj

list(seen.values())

Given your list of object somelist be something like

[(Object [A] [1]), (Object [B] [1]), (Object [C] [2]), (Object [D] [2]), (Object [E] [3])]

You can do something like this

>>> {e.id:e for e in somelist}.values()
[(Object [B] [1]), (Object [D] [2]), (Object [E] [3])]

Tags:

Python

List

Set