Pythonic way of ignoring the last element when doing set difference

Here's how you might write your own class to override a tuple's normal hashing behaviour:

a_data = [('1', '2', '3', 'a'), ('1', '2', '4', 'a'), ('1', '2', '5', 'b')]
b_data = [('1', '2', '3', 'b'), ('1', '2', '4', 'b'), ('1', '2', '6', 'b')]

class HashableIgnoresLastElement(tuple):
    def __eq__(self, other):
        return self[:-1] == other[:-1]

    def __hash__(self):
        return hash(self[:-1])

a = set(map(HashableIgnoresLastElement, a_data))
b = set(map(HashableIgnoresLastElement, b_data))

print(b - a)

with output

{('1', '2', '6', 'b')}

To modify the way sets of tuples behave, we have to modify the way tuples are hashed.

From here,

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

So in order to make the hashing ignore the last element, we have to overload the dunder methods __eq__ and __hash__ appropriately. This doesn't end up being so hard because all we have to do is slice off the last element and then delegate to the appropriate methods of a normal tuple.

Further reading:

How to make an object properly hashable?
https://docs.python.org/3/reference/datamodel.html

Here's one approach defining a and b with lists rather than sets, since it seems to me that the most straight forward solution implies indexing b:

a = [('1', '2', '3', 'a'), ('1', '2', '4', 'a'), ('1', '2', '5', 'b')]
b = [('1', '2', '3', 'b'), ('1', '2', '4', 'b'), ('1', '2', '6', 'b')]

# reconstruct the sets of tuples removing the last elements
a_ = {tuple(t) for *t, _ in a}
b_ = [tuple(t) for *t, _ in b]

# index b based on whether an element in a_
[b[ix] for ix, j in enumerate(b_) if j not in a_]
# [('1', '2', '6', 'b')]

Sets work fine. It's your data that doesn't work right. If they look different but they are actually the same, then define a data type which behaves like you want. Then set works great on its own.

class thing:
    def __init__(self, a, b, c, d):
        self.a, self.b, self.c, self.d = a, b, c, d

    def __repr__(self):
        return (str((self.a, self.b, self.c, self.d)))

    def __hash__(self):
        return hash((self.a, self.b, self.c))

    def __eq__(self, other):
        return self.a == other.a and self.b == other.b and self.c == other.c       

a = {thing('1', '2', '3', 'a'), thing('1', '2', '4', 'a'), thing('1', '2', '5', 'b')}
b = {thing('1', '2', '3', 'b'), thing('1', '2', '4', 'b'), thing('1', '2', '6', 'b')}
print (b - a)

{('1', '2', '6', 'b')}

Pythonic way of ignoring the last element when doing set difference

Tags:

Python

Set

Python 2.7

Related

Recent Posts