Why does `sxhash` return a constant for all structs?

I've queried Franz support and this was their response. Presumably SBCL is doing something similar for similar reasons.

The function cl:sxhash always returns the same value for structure objects. The reason for this is because it has no extra space to store a unique hash code within it. As a result, using structures as keys is very inefficient. The excl::hash-table-stats function demonstrates this when given a hash-table with structs used as keys; the histogram becomes the worst case, because every key wants the same index.

The decision was made to keep the same behavior for structure objects, because the automatic inclusion of a hashing slot in all structure objects would have made all structs an average of one word longer. For small structs this is unacceptable for many of our users.

Instead, a user may define a struct with an extra slot, and the constructor for that struct type could store a unique value into that slot (either a random value or a value gotten by incrementing a counter each time the constructor is run). Also, create a hash generating function which accesses this hash-slot to generate its value. If the structs to be hashed are buried inside a list, then this hash function would need to know how to traverse these keys to obtain a unique value. Finally, then, build your hash-table using the documented :hash-function argument to make-hash-table (still using the equal test argument), to create a hash-table which will be well-distributed.

Alternatively, and if you can guarantee that none of the slots in your structures will be changed after they are used as keys in the hash-table, you can use the equalp test function in your make-hash-table call, rather than equal. If you do, however, make sure that these struct objects don't change, because then they may not be found in the hash-table.

There are two important properties of sxhash: If (equal x y) then (= (sxhash x) (sxhash y)) and the value returned by sxhash is always the same for any object (even between lisp images).

Now structures are equal if only if they are eq (i.e. They have the same address) but sxhash cannot simply return the address (or some hash thereof) of the structure because the address can change due to garbage collection. When designing a lisp implementation one must therefore choose whether to have sxhash be the same for every structure or to store some identity in every structure which does not change when the garbage collector moves the structure and so can be used to sxhash the object. Most implementations (including Franz and sbcl) consider adding such a value either a waste of space or useless if only a few spare bits are given to it.

This tradeoff will ultimately only affect a user attempt at implementing hash tables as the implementation's own hash tables can use the address of objects and notify the garbage collector of this so they can rehash when the object moves (I don't know whether or not any implementations do do this). Some implementations (including sbcl) allow you to customise the built-in hash tables with your own comparison/hashing operations. Perhaps if you implemented hashing yourself you could add an extra field to structures for this.

I believe that the result returned by sxhash in sbcl is determined by hashing the name of the type of the structure.

Why does `sxhash` return a constant for all structs?

Tags:

Lisp

Struct

Common Lisp

Related

Recent Posts