Memory footprint of Haskell data types

The ghc-datasize package provides the recursiveSize function to calculate the size of a GHC object. However...

A garbage collection is performed before the size is calculated, because the garbage collector would make heap walks difficult.

...so it wouldn't be practical to call this often!

Also see How to find out GHC's memory representations of data types? and How can I determine size of a type in Haskell?.


(The following applies to GHC, other compilers may use different storage conventions)

Rule of thumb: a constructor costs one word for a header, and one word for each field. Exception: a constructor with no fields (like Nothing or True) takes no space, because GHC creates a single instance of these constructors and shares it amongst all uses.

A word is 4 bytes on a 32-bit machine, and 8 bytes on a 64-bit machine.

So e.g.

data Uno = Uno a
data Due = Due a b

an Uno takes 2 words, and a Due takes 3.

The Int type is defined as

data Int = I# Int#

now, Int# takes one word, so Int takes 2 in total. Most unboxed types take one word, the exceptions being Int64#, Word64#, and Double# (on a 32-bit machine) which take 2. GHC actually has a cache of small values of type Int and Char, so in many cases these take no heap space at all. A String only requires space for the list cells, unless you use Chars > 255.

An Int8 has identical representation to Int. Integer is defined like this:

data Integer
  = S# Int#                            -- small integers
  | J# Int# ByteArray#                 -- large integers

so a small Integer (S#) takes 2 words, but a large integer takes a variable amount of space depending on its value. A ByteArray# takes 2 words (header + size) plus space for the array itself.

Note that a constructor defined with newtype is free. newtype is purely a compile-time idea, and it takes up no space and costs no instructions at run time.

More details in The Layout of Heap Objects in the GHC Commentary.