Should I override hashCode() of Collections?
- I'll have to go into all fields and respective parent classes recursively to make sure they all implement
hashCode()
properly, because otherwisehashCode()
ofMyClass
might not take into consideration some values. Is this right?
That's correct. It's not as onerous as it sounds because the rule of thumb is that you only need to override hashCode()
if you override equals()
. You don't have to worry about classes that use the default equals()
; the default hashCode()
will suffice for them.
Also, for your class, you only need to hash the fields that you compare in your equals()
method. If one of those fields is a unique identifier, for instance, you could get away with just checking that field in equals()
and hashing it in hashCode()
.
All of this is predicated upon you also overriding equals()
. If you haven't overridden that, don't bother with hashCode()
either.
- What do I do with that
Collection
? Can I always rely on itshashCode()
method? Will it take into consideration all child values that might exist in mysomeInterface
object?
Yes, you can rely on any collection type in the Java standard library to implement hashCode()
correctly. And yes, any List
or Set
will take into account its contents (it will mix together the items' hash codes).
So you want to do a calculation on the contents of your object that will give you a unique key you'll be able to check in a HashMap
whether the "heavy" calculation that you don't want to do twice has already been done for a given deep combination of fields.
Using hashCode
alone:
I believe hashCode
is not the appropriate thing to use in the scenario you are describing.
hashCode
should always be used in association with equals()
. It's part of its contract, and it's an important part, because hashCode()
returns an integer, and although one may try to make hashCode()
as well-distributed as possible, it is not going to be unique for every possible object of the same class, except for very specific cases (It's easy for Integer
, Byte
and Character
, for example...).
If you want to see for yourself, try generating strings of up to 4 letters (lower and upper case), and see how many of them have identical hash codes.
HashMap
therefore uses both the hashCode()
and equals()
method when it looks for things in the hash table. There will be elements that have the same hashCode()
and you can only tell if it's the same element or not by testing all of them using equals()
against your class.
Using hashCode
and equals
together
In this approach, you use the object itself as the key in the hash map, and give it an appropriate equals
method.
To implement the equals
method you need to go deeply into all your fields. All of their classes must have equals()
that matches what you think of as equal for the sake of your big calculation. Special care needs to be be taken when your objects implement an interface. If the calculation is based on calls to that interface, and different objects that implement the interface return the same value in those calls, then they should implement equals
in a way that reflects that.
And their hashCode
is supposed to match the equals
- when the values are equal, the hashCode
must be equal.
You then build your equals
and hashCode
based on all those items. You may use Objects.equals(Object, Object)
and Objects.hashCode( Object...)
to save yourself a lot of boilerplate code.
But is this a good approach?
While you can cache the result of hashCode()
in the object and re-use it without calculation as long as you don't mutate it, you can't do that for equals
. This means that calculation of equals
is going to be lengthy.
So depending on how many times the equals()
method is going to be called for each object, this is going to be exacerbated.
If, for example, you are going to have 30 objects in the hashMap
, but 300,000 objects are going to come along and be compared to them only to realize that they are equal to them, you'll be making 300,000 heavy comparisons.
If you're only going to have very few instances in which an object is going to have the same hashCode
or fall in the same bucket in the HashMap
, requiring comparison, then going the equals()
way may work well.
If you decide to go this way, you'll need to remember:
If the object is a key in a HashMap
, it should not be mutated as long as it's there. If you need to mutate it, you may need to make a deep copy of it and keep the copy in the hash map. Deep copying again requires consideration of all the objects and interfaces inside to see if they are copyable at all.
Creating a unique key for each object
Back to your original idea, we have established that hashCode
is not a good candidate for a key in a hash map. A better candidate for that would be a hash function such as md5
or sha1
(or more advanced hashes, like sha256, but you don't need cryptographic strength in your case), where collisions are a lot rarer than a mere int
. You could take all the values in your class, transform them into a byte array, hash it with such a hash function, and take its hexadecimal string value as your map key.
Naturally, this is not a trivial calculation. So you need to think if it's really saving you much time over the calculation you are trying to avoid. It is probably going to be faster than repeatedly calling equals()
to compare objects, as you do it only once per instance, with the values it had at the time of the "big calculation".
For a given instance, you could cache the result and not calculate it again unless you mutate the object. Or you could just calculate it again only just before doing the "big calculation".
However, you'll need the "cooperation" of all the objects you have inside your class. That is, they will all need to be reasonably convertible into a byte array in such a way that two equivalent objects produce the same bytes (including the same issue with the interface objects that I mentioned above).
You should also beware of situations in which you have, for example, two strings "AB" and "CD" which will give you the same result as "A" and "BCD", and then you'll end up with the same hash for two different objects.