B-Tree vs Hash Table

I think Hashmaps don't scale as well, and can be expensive when the entire map needs to be rehashed.


The time complexity of hashtables is constant only for sufficiently sized hashtables (there need to be enough buckets to hold the data). The size of a database table is not known in advance so the table must be rehashed now and then to get optimal performance out of an hashtable. The rehashing is also expensive.


Actually, it seems that MySQL uses both kind of indexes either a hash table or a b-tree according to the following link.

The difference between using a b-tree and a hash table is that the former allows you to use column comparisons in expressions that use the =, >, >=, <, <=, or BETWEEN operators, while the latter is used only for equality comparisons that use the = or <=> operators.


You can only access elements by their primary key in a hashtable. This is faster than with a tree algorithm (O(1) instead of log(n)), but you cannot select ranges (everything in between x and y). Tree algorithms support this in Log(n) whereas hash indexes can result in a full table scan O(n). Also the constant overhead of hash indexes is usually bigger (which is no factor in theta notation, but it still exists). Also tree algorithms are usually easier to maintain, grow with data, scale, etc.

Hash indexes work with pre-defined hash sizes, so you end up with some "buckets" where the objects are stored in. These objects are looped over again to really find the right one inside this partition.

So if you have small sizes you have a lot of overhead for small elements, big sizes result in further scanning.

Todays hash tables algorithms usually scale, but scaling can be inefficient.

There are indeed scalable hashing algorithms. Don't ask me how that works - its a mystery to me too. AFAIK they evolved from scalable replication where re-hashing is not easy.

Its called RUSH - Replication Under Scalable Hashing, and those algorithms are thus called RUSH algorithms.

However there may be a point where your index exceeds a tolerable size compared to your hash sizes and your entire index needs to be re-built. Usually this is not a problem, but for huge-huge-huge databases, this can take days.

The trade off for tree algorithms is small and they are suitable for almost every use case and thus are default.

However if you have a very precise use case and you know exactly what and only what is going to be needed, you can take advantage of hashing indexes.