How to get list of objects with unique attribute
How about using dict
(since its keys are unique)?
Assuming we have
class Object:
def __init__(self, id):
self.id = id
Aobject = Object(1)
Bobject = Object(1)
Cobject = Object(2)
objects = [Aobject, Bobject, Cobject]
then list
with Object
s unique by id
field can be generated using dict
comprehension in Python 3
unique_objects = list({object_.id: object_ for object_ in objects}.values())
in Python 2.7
unique_objects = {object_.id: object_ for object_ in objects}.values()
and in Python <2.7
unique_objects = dict([(object_.id, object_) for object_ in objects]).values()
Finally, we can write function (Python 3 version)
def unique(elements, key):
return list({key(element): element for element in elements}.values())
where elements
may be any iterable
and key
is some callable
which returns hashable
objects from elements
(key
equals to operator.attrgetter('id')
in our particular case).
Marcin's answer works fine but doesn't look Pythonic to me since list comprehension mutates seen
object from outer scope, also there is some magic behind using set.add
method and comparing its result (which is None
) with obj
.
And final but not less important part:
Benchmark
setup = '''
import random
class Object:
def __init__(self, id):
self.id = id
objects = [Object(random.randint(-100, 100))
for i in range(1000)]
'''
solution = '''
seen = set()
result = [seen.add(object_.id) or object_
for object_ in objects
if object_.id not in seen]
'''
print('list comprehension + set: ',
min(timeit.Timer(solution, setup).repeat(7, 1000)))
solution = '''
result = list({object_.id: object_
for object_ in objects}.values())
'''
print('dict comprehension: ',
min(timeit.Timer(solution, setup).repeat(7, 1000)))
on my machine gives
list comprehension + set: 0.20700953400228173
dict comprehension: 0.1477799109998159
seen = set()
# never use list as a variable name
[seen.add(obj.id) or obj for obj in mylist if obj.id not in seen]
This works because set.add
returns None
, so the expression in the list comprehension always yields obj
, but only if obj.id
has not already been added to seen
.
(The expression could only evaluate to None
if obj is None
; in that case, obj.id
would raise an exception. In case mylist
contains None
values, change the test to if obj and (obj.id not in seen)
)
Note that this will give you the first object in the list which has a given id. @Abhijit's answer will give you the last such object.
Update:
Alternatively, an ordereddict could be a good choice:
import collections
seen = collections.OrderedDict()
for obj in mylist:
# eliminate this check if you want the last item
if obj.id not in seen:
seen[obj.id] = obj
list(seen.values())
Given your list of object somelist
be something like
[(Object [A] [1]), (Object [B] [1]), (Object [C] [2]), (Object [D] [2]), (Object [E] [3])]
You can do something like this
>>> {e.id:e for e in somelist}.values()
[(Object [B] [1]), (Object [D] [2]), (Object [E] [3])]