How does C++ STL unordered_map resolve collisions?
I found this answer looking for how to detect when my types are colliding, so I will post this in case that is the intent of the question.:
I believe there's some misconception about "Unique keys No two elements in the container can have equivalent keys."
look at the code below
//pseudocode
std::unordered_map<int, char> hashmap;
hashmap[5] = 'a';
hashmap[5] = 'b'; //replace 'a' with 'b', there is no collision being handled.
I think the Jerry's answer is referring to the internal system that it uses to shrink keys to appropriate array indices.
If you want collisions to be handled for your types (with buckets), you need std::unordered_multimap
and will have to iterate over
Hopefully this code can be read without the context I generated it with. it basically checks to see if any element in the bucket associated with the hash is the element I'm looking for.
//sp is std::shared_ptr
//memo is std::unordered_multimap< int, sp<AStarNode> >
//there's probably multiple issues with this code in terms of good design (like using int keys rather than unsigned)
bool AStar_Incremental::hasNodeBeenVisited(sp<AStarNode> node)
{
using UMIter = std::unordered_multimap<int, sp<AStarNode> >::iterator;
bool bAlreadyVisited = false;
//get all values for key in O(1*)
int hash = WorldGrid::hashGrid(node->location);
std::pair<UMIter, UMIter> start_end = memo.equal_range(hash); //bucket range
UMIter start = start_end.first;
UMIter end = start_end.second;
//hopefully this is implemented to be O(m) where m is the bucket size.
for(UMIter bucketIter = start; bucketIter != end; ++bucketIter)
{
sp<AStarNode> previousNode = bucketIter->second;
sf::Vector2i& previousVisit = previousNode->location;
if (previousVisit == node->location)
{
bAlreadyVisited = true;
break;
}
}
return bAlreadyVisited;
}
The standard defines a little more about this than most people seem to realize.
Specifically, the standard requires (§23.2.5/9):
The elements of an unordered associative container are organized into buckets. Keys with the same hash code appear in the same bucket.
The interface includes a bucket_count
that runs in constant time. (table 103). It also includes a bucket_size
that has to run in time linear on the size of the bucket.
That's basically describing an implementation that uses collision chaining. When you do use collision chaining, meeting all the requirements is somewhere between easy and trivial. bucket_count()
is the number of elements in your array. bucket_size()
is the number of elements in the collision chain. Getting them in constant and linear time respectively is simple and straightforward.
By contrast, if you use something like linear probing or double hashing, those requirements become all but impossible to meet. Specifically, all the items that hashed to a specific value need to land in the same bucket, and you need to be able to count those buckets in constant time.
But, if you use something like linear probing or double hashing, finding all the items that hashed to the same value means you need to hash the value, then walk through the "chain" of non-empty items in your table to find how many of those hashed to the same value. That's not linear on the number of items that hashed to the same value though--it's linear on the number of items that hashed to the same or a colliding value.
With enough extra work and a fair amount of stretching the meaning of some of the requirements almost to the breaking point, it might be barely possible to create a hash table using something other than collision chaining, and still at least sort of meet the requirements--but I'm not really certain it's possible, and it would certain involve quite a lot of extra work.
Summary: all practical implementations of std::unordered_set
(or unordered_map
) undoubtedly use collision chaining. While it might be (just barely) possible to meet the requirements using linear probing or double hashing, such an implementation seems to lose a great deal and gain nearly nothing in return.