Pythonic way of ignoring the last element when doing set difference
Here's how you might write your own class to override a tuple's normal hashing behaviour:
a_data = [('1', '2', '3', 'a'), ('1', '2', '4', 'a'), ('1', '2', '5', 'b')]
b_data = [('1', '2', '3', 'b'), ('1', '2', '4', 'b'), ('1', '2', '6', 'b')]
class HashableIgnoresLastElement(tuple):
def __eq__(self, other):
return self[:-1] == other[:-1]
def __hash__(self):
return hash(self[:-1])
a = set(map(HashableIgnoresLastElement, a_data))
b = set(map(HashableIgnoresLastElement, b_data))
print(b - a)
with output
{('1', '2', '6', 'b')}
To modify the way sets of tuples behave, we have to modify the way tuples are hashed.
From here,
An object is hashable if it has a hash value which never changes during its lifetime (it needs a
__hash__()
method), and can be compared to other objects (it needs an__eq__()
method). Hashable objects which compare equal must have the same hash value.Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.
So in order to make the hashing ignore the last element, we have to overload the dunder methods __eq__
and __hash__
appropriately. This doesn't end up being so hard because all we have to do is slice off the last element and then delegate to the appropriate methods of a normal tuple
.
Further reading:
- How to make an object properly hashable?
- https://docs.python.org/3/reference/datamodel.html
Here's one approach defining a
and b
with lists rather than sets, since it seems to me that the most straight forward solution implies indexing b
:
a = [('1', '2', '3', 'a'), ('1', '2', '4', 'a'), ('1', '2', '5', 'b')]
b = [('1', '2', '3', 'b'), ('1', '2', '4', 'b'), ('1', '2', '6', 'b')]
# reconstruct the sets of tuples removing the last elements
a_ = {tuple(t) for *t, _ in a}
b_ = [tuple(t) for *t, _ in b]
# index b based on whether an element in a_
[b[ix] for ix, j in enumerate(b_) if j not in a_]
# [('1', '2', '6', 'b')]
Sets work fine. It's your data that doesn't work right. If they look different but they are actually the same, then define a data type which behaves like you want. Then set works great on its own.
class thing:
def __init__(self, a, b, c, d):
self.a, self.b, self.c, self.d = a, b, c, d
def __repr__(self):
return (str((self.a, self.b, self.c, self.d)))
def __hash__(self):
return hash((self.a, self.b, self.c))
def __eq__(self, other):
return self.a == other.a and self.b == other.b and self.c == other.c
a = {thing('1', '2', '3', 'a'), thing('1', '2', '4', 'a'), thing('1', '2', '5', 'b')}
b = {thing('1', '2', '3', 'b'), thing('1', '2', '4', 'b'), thing('1', '2', '6', 'b')}
print (b - a)
{('1', '2', '6', 'b')}