Why does this HashSet look sorted when printed?
You are right that s1
is sorted because it's a TreeSet
, but s
isn't really sorted. If you add a few more elements to s
, you can see something weird happening.
After adding 32, 12, 13, and 14
[0, 32, 1, 12, 77, 13, 14]
So now you see that it's somewhat ordered, but not really. The reason for that is that HashSet
has a capacity of 16 by default, and it will grow later as the need arises. So when you add an element to it, picture it taking the hashcode of that element, modulo 16 so that it fits inside the internal "list" of 16 buckets. Since the hashcode of an Integer
is the int it represents, we can pretend all it does is do (the element to be added) % 16
.
So when you added, 0, 1, and 77, it probably looked something like this internally:
Bucket Elements
0 0
1 1
2
3
4
5
...
13 77 (77 % 16 is 13, so it's placed here)
14
15
16
Then we add 32. 32 % 16 is 0, but we already have 0
in the first bucket. Luckily, to prevent collisions such as this one, HashSet
s use linked lists instead of single elements, so we just append 32 to our linked list containing 0.
Bucket Elements
0 0 -> 32
1 1
2
3
4
5
...
13 77
14
15
16
It works the same way for 12, 13 and 14 too:
Bucket Elements
0 0 -> 32
1 1
2
3
4
5
...
12 12
13 77 -> 13
14 14
15
16
And if you go through the buckets in order, printing each linked list in that bucket, you get [0, 32, 1, 12, 77, 13, 14]
See it run
This happens only by chance.
HashSets
are a special implementation of HashMap
but they still use the hashCode to place object in buckets.
The standard hashCode
for an Integer
is the int
value itself.
Specifying such low values coupled with the load factor and bucket algorithm causes them to be placed in different buckets based on that code but the buckets happen to be sequential. If you change values to something larger they will not be ordered because the algorithm only uses part of the hashCode to select the bucket, so the chance of them being sequential is reduced. This would also be the case for much larger sets of randomly distributed numbers.
Set<Integer> s = new HashSet<Integer>();
s.add(57999999);
s.add(67999999);
s.add(77999999);
System.out.println(s);
TreeSet<Integer> s1 = new TreeSet<Integer>();
s1.add(57999999);
s1.add(67999999);
s1.add(77999999);
System.out.println(s1);
On my machine runing Windows and Java 14, they print as follows:
[67999999, 77999999, 57999999]
[57999999, 67999999, 77999999]