Define: What is a HashSet?

    1. A HashSet holds a set of objects, but in a way that allows you to easily and quickly determine whether an object is already in the set or not. It does so by internally managing an array and storing the object using an index which is calculated from the hashcode of the object. Take a look here
  1. HashSet is an unordered collection containing unique elements. It has the standard collection operations Add, Remove, Contains, but since it uses a hash-based implementation, these operations are O(1). (As opposed to List for example, which is O(n) for Contains and Remove.) HashSet also provides standard set operations such as union, intersection, and symmetric difference. Take a look here

  2. There are different implementations of Sets. Some make insertion and lookup operations super fast by hashing elements. However, that means that the order in which the elements were added is lost. Other implementations preserve the added order at the cost of slower running times.

The HashSet class in C# goes for the first approach, thus not preserving the order of elements. It is much faster than a regular List. Some basic benchmarks showed that HashSet is decently faster when dealing with primary types (int, double, bool, etc.). It is a lot faster when working with class objects. So the point is that HashSet is fast.

The only catch of HashSet is that there is no access by indices. To access elements you can either use an enumerator or use the built-in function to convert the HashSet into a List and iterate through that. Take a look here


Simply said and without revealing the kitchen secrets: a set in general, is a collection that contains no duplicate elements, and whose elements are in no particular order. So, A HashSet<T> is similar to a generic List<T>, but is optimized for fast lookups (via hashtables, as the name implies) at the cost of losing order.


A HashSet has an internal structure (hash), where items can be searched and identified quickly. The downside is that iterating through a HashSet (or getting an item by index) is rather slow.

So why would someone want be able to know if an entry already exists in a set?

One situation where a HashSet is useful is in getting distinct values from a list where duplicates may exist. Once an item is added to the HashSet it is quick to determine if the item exists (Contains operator).

Other advantages of the HashSet are the Set operations: IntersectWith, IsSubsetOf, IsSupersetOf, Overlaps, SymmetricExceptWith, UnionWith.

If you are familiar with the object constraint language then you will identify these set operations. You will also see that it is one step closer to an implementation of executable UML.

Tags:

C#

Hashset