Why Doesn't Java's TreeMap Allow an Initial Size?
Unlike HashMap
that re-allocates its internals as new ones get inserted, the TreeMap
does not generally reallocate its nodes on adding new ones. The difference can be very loosely illustrated as that between an ArrayList
and a LinkedList
: the first re-allocates to resize, while the second one does not. That is why setting the initial size of a TreeMap
is roughly as meaningless as trying to set the initial size of a LinkedList
.
The speed difference is due to the different time complexity of the two containers: inserting N
nodes into a HashMap
is O(n)
, while for the TreeMap
it's O(N*LogN)
, which for 1000000 nodes is roughly 20 times asymptotic difference. Although the difference in asymptotic complexity does not translate directly into the timing difference because of different constants dictated by the individual algorithms, it serves as a good way to decide which algorithm is going to be faster on very large inputs.
Am I wrong to assume a TreeMap's array's initial size should be able to be set?
Yes, that assumption is incorrect. A TreeMap
doesn't have an array. A TreeMap
uses binary nodes with 2 children.
If you are suggesting that the number of children in a tree node should be a parameter, then you need to figure out how that impacts on search time. And I think that it turns the search time from O(log2N)
to O(log2M * log2(N/M))
where N
is the number elements and M
is the average number of node children. (And I'm making some optimistic assumptions ...) That's not a "win".
Is there a different reason that it is so slow?
Yes. The reason that a (large) TreeMap
is slow relative to a (large) HashMap
under optimal circumstances is that lookup using a balanced binary tree with N entries requires looking at roughly log2N
tree nodes. By contrast, in an optimal HashMap
a lookup involves 1 hashcode calculation and looking at O(1)
hashchain nodes.
Notes:
TreeMap
uses a binary tree organization that gives balanced trees, soO(log2N)
is the worst case lookup time.HashMap
performance depends on the collision rate of the hash function and key space. In the worst case where all keys end up on the same hash chain, aHashMap
hasO(N)
lookup.- In theory,
HashMap
performance becomesO(N)
when you reach the maximum possible hash array size; i.e. ~2^31 entries. But if you have aHashMap
that large, you should probably be looking at an alternative map implementation with better memory usage and garbage collection characteristics.
A Treemap is always balanced. Every time you add a node to the tree, it must make sure the nodes are all in order by the provided comparator. You don't have a specified size because the treemap is designed for a smooth sorted group of nodes and to traverse through the nodes easily.
A Hashmap needs to have a size-able amount of free space for the things that you store in it. My professor has always told me that it needs 5 times the amount of space that the objects or whatever you are storing in that hashmap. So specifying the size from the initial creation of the Hashmap improves the speed of your hashmap. Otherwise, if you have more objects going into a hashmap than you planned for, the hashmap has to "size up".
(edited for spelling)