How does HashSet compare elements for equality?
Here's clarification on a part of the answer that's been left unsaid: The object type of your HashSet<T>
doesn't have to implement IEqualityComparer<T>
but instead just has to override Object.GetHashCode()
and Object.Equals(Object obj)
.
Instead of this:
public class a : IEqualityComparer<a>
{
public int GetHashCode(a obj) { /* Implementation */ }
public bool Equals(a obj1, a obj2) { /* Implementation */ }
}
You do this:
public class a
{
public override int GetHashCode() { /* Implementation */ }
public override bool Equals(object obj) { /* Implementation */ }
}
It is subtle, but this tripped me up for the better part of a day trying to get HashSet to function the way it is intended. And like others have said, HashSet<a>
will end up calling a.GetHashCode()
and a.Equals(obj)
as necessary when working with the set.
HashSet
uses Equals
and GetHashCode()
.
CompareTo
is for ordered sets.
If you want unique objects, but you don't care about their iteration order, HashSet<T>
is typically the best choice.
It uses an IEqualityComparer<T>
(EqualityComparer<T>.Default
unless you specify a different one on construction).
When you add an element to the set, it will find the hash code using IEqualityComparer<T>.GetHashCode
, and store both the hash code and the element (after checking whether the element is already in the set, of course).
To look an element up, it will first use the IEqualityComparer<T>.GetHashCode
to find the hash code, then for all elements with the same hash code, it will use IEqualityComparer<T>.Equals
to compare for actual equality.
That means you have two options:
- Pass a custom
IEqualityComparer<T>
into the constructor. This is the best option if you can't modify theT
itself, or if you want a non-default equality relation (e.g. "all users with a negative user ID are considered equal"). This is almost never implemented on the type itself (i.e.Foo
doesn't implementIEqualityComparer<Foo>
) but in a separate type which is only used for comparisons. - Implement equality in the type itself, by overriding
GetHashCode
andEquals(object)
. Ideally, implementIEquatable<T>
in the type as well, particularly if it's a value type. These methods will be called by the default equality comparer.
Note how none of this is in terms of an ordered comparison - which makes sense, as there are certainly situations where you can easily specify equality but not a total ordering. This is all the same as Dictionary<TKey, TValue>
, basically.
If you want a set which uses ordering instead of just equality comparisons, you should use SortedSet<T>
from .NET 4 - which allows you to specify an IComparer<T>
instead of an IEqualityComparer<T>
. This will use IComparer<T>.Compare
- which will delegate to IComparable<T>.CompareTo
or IComparable.CompareTo
if you're using Comparer<T>.Default
.