Why does GHC make fix so confounding?

The difference is in sharing vs copying.1

fix1 f = x where x = f x    -- more visually apparent way to write the same thing

fix2 f = f (fix2 f)

If we substitute the definition into itself, both are reduced as the same infinite application chain f (f (f (f (f ...)))). But the first definition uses explicit naming; in Haskell (as in most other languages) sharing is enabled by the ability to name things: one name is more or less guaranteed to refer to one "entity" (here, x). The 2nd definition does not guarantee any sharing - the result of a call fix2 f is substituted into the expression, so it might as well be substituted as a value.

But a given compiler could in theory be smart about it and use sharing in the second case as well.

The related issue is "Y combinator". In untyped lambda calculus where there is no naming constructs (and thus no self-reference), Y combinator emulates self-reference by arranging for the definition to be copied, so referring to the copy of self becomes possible. But in implementations which use environment model to allow for named entities in a language, direct reference by name becomes possible.

To see a more drastic difference between the two definitions, compare

fibs1 = fix1 ( (0:) . (1:) . g ) where g (a:t@(b:_)) = (a+b):g t
fibs2 = fix2 ( (0:) . (1:) . g ) where g (a:t@(b:_)) = (a+b):g t

See also:

  • In Scheme, how do you use lambda to create a recursive function?
  • Y combinator discussion in "The Little Schemer"
  • Can fold be used to create infinite lists?

(especially try to work out the last two definitions in the last link above).


1 Working from the definitions, for your example fix (\g x -> let x2 = x+1 in x : g x2) we get

fix1 (\g x -> let x2 = x+1 in x : g x2)
 = fix1 (\g x -> x : g (x+1))
 = fix1 f where {f = \g x -> x : g (x+1)}
 = fix1 f where {f g x = x : g (x+1)}
 = x      where {x = f x ; f g x = x : g (x+1)}
 = g      where {g = f g ; f g x = x : g (x+1)}   -- both g in {g = f g} are the same g
 = g      where {g = \x -> x : g (x+1)}           -- and so, here as well
 = g      where {g x = x : g (x+1)}

and thus a proper recursive definition for g is actually created. (in the above, we write ....x.... where {x = ...} for let {x = ...} in ....x...., for legibility).

But the second derivation proceeds with a crucial distinction of substituting a value back, not a name, as

fix2 (\g x -> x : g (x+1))
 = fix2 f             where {f g x = x : g (x+1)}
 = f (fix2 f)         where {f g x = x : g (x+1)}
 = (\x-> x : g (x+1)) where {g = fix2 f ; f g x = x : g (x+1)}
 = h                  where {h   x = x : g (x+1) ; g = fix2 f   ; f g x = x : g (x+1)}

so the actual call will proceed as e.g.

take 3 $ fix2 (\g x -> x : g (x+1)) 10
 = take 3 (h 10)      where {h   x = x : g (x+1) ; g = fix2 f   ; f g x = x : g (x+1)}
 = take 3 (x:g (x+1)) where {x = 10 ;              g = fix2 f   ; f g x = x : g (x+1)}
 = x:take 2 (g x2)    where {x2 = x+1 ; x = 10 ;   g = fix2 f   ; f g x = x : g (x+1)}
 = x:take 2 (g x2)    where {x2 = x+1 ; x = 10 ; g = f (fix2 f) ; f g x = x : g (x+1)}
 = x:take 2 (x2 : g2 (x2+1))   where {             g2 = fix2 f  ;
                             x2 = x+1 ; x = 10 ;                  f g x = x : g (x+1)}
 = ......

and we see that a new binding (for g2) is established here, instead of the previous one (for g) being reused as with the fix1 definition.


How does x ever come to mean fix x in the first definition?

fix f = let x = f x in x

Let bindings in Haskell are recursive

First of all, realize that Haskell allows recursive let bindings. What Haskell calls "let", some other languages call "letrec". This feels pretty normal for function definitions. For example:

ghci> let fac n = if n == 0 then 1 else n * fac (n - 1) in fac 5
120

But it can seem pretty weird for value definitions. Nevertheless, values can be recursively defined, due to Haskell's non-strictness.

ghci> take 5 (let ones = 1 : ones in ones)
[1,1,1,1,1]

See A gentle introduction to Haskell sections 3.3 and 3.4 for more elaboration on Haskell's laziness.

Thunks in GHC

In GHC, an as-yet-unevaluated expression is wrapped up in a "thunk": a promise to perform the computation. Thunks are only evaluated when they absolutely must be. Suppose we want to fix someFunction. According to the definition of fix, that's

let x = someFunction x in x

Now, what GHC sees is something like this.

let x = MAKE A THUNK in x

So it happily makes a thunk for you and moves right along until you demand to know what x actually is.

Sample evaluation

That thunk's expression just happens to refer to itself. Let's take the ones example and rewrite it to use fix.

ghci> take 5 (let ones recur = 1 : recur in fix ones)
[1,1,1,1,1]

So what will that thunk look like?
We can inline ones as the anonymous function \recur -> 1 : recur for a clearer demonstration.

take 5 (fix (\recur -> 1 : recur))

-- expand definition of fix
take 5 (let x = (\recur -> 1 : recur) x in x)

Now then, what is x? Well, even though we're not quite sure what x is, we can still go through with the function application:

take 5 (let x = 1 : x in x)

Hey look, we're back at the definition we had before.

take 5 (let ones = 1 : ones in ones)

So if you believe you understand how that one works, then you have a good feel of how fix works.


Is there any advantage to using the first definition over the second?

Yes. The problem is that the second version can cause a space leak, even with optimizations. See GHC trac ticket #5205, for a similar problem with the definition of forever. This is why I mentioned thunks: because let x = f x in x allocates only one thunk: the x thunk.


It's easy to see how this definition works by applying equational reasoning.

fix :: (a -> a) -> a
fix f = let x = f x in x

What will x evaluate to when we try to evaluate fix f? It's defined as f x, so fix f = f x. But what is x here? It's f x, just as before. So you get fix f = f x = f (f x). Reasoning in this way you get an infinite chain of applications of f: fix f = f (f (f (f ...))).

Now, substituting (\f x -> let x' = x+1 in x:f x') for f you get

fix (\f x -> let x' = x+1 in x:f x')
    = (\f x -> let x' = x+1 in x:f x') (f ...)
    = (\x -> let x' = x+1 in x:((f ...) x'))
    = (\x -> x:((f ...) x + 1))
    = (\x -> x:((\x -> let x' = x+1 in x:(f ...) x') x + 1))
    = (\x -> x:((\x -> x:(f ...) x + 1) x + 1))
    = (\x -> x:(x + 1):((f ...) x + 1))
    = ...

Edit: Regarding your second question, @is7s pointed out in the comments that the first definition is preferable because it is more efficient.

To find out why, let's look at the Core for fix1 (:1) !! 10^8:

a_r1Ko :: Type.Integer    
a_r1Ko = __integer 1

main_x :: [Type.Integer]   
main_x =
  : @ Type.Integer a_r1Ko main_x

main3 :: Type.Integer
main3 =
  !!_sub @ Type.Integer main_x 100000000

As you can see, after the transformations fix1 (1:) essentially became main_x = 1 : main_x. Note how this definition refers to itself - this is what "tying the knot" means. This self-reference is represented as a simple pointer indirection at runtime:

fix1

Now let's look at fix2 (1:) !! 100000000:

main6 :: Type.Integer
main6 = __integer 1

main5
  :: [Type.Integer] -> [Type.Integer]
main5 = : @ Type.Integer main6

main4 :: [Type.Integer]
main4 = fix2 @ [Type.Integer] main5

main3 :: Type.Integer
main3 =
  !!_sub @ Type.Integer main4 100000000

Here the fix2 application is actually preserved:

fix2

The result is that the second program needs to do allocation for each element of the list (but since the list is immediately consumed, the program still effectively runs in constant space):

$ ./Test2 +RTS -s
   2,400,047,200 bytes allocated in the heap
         133,012 bytes copied during GC
          27,040 bytes maximum residency (1 sample(s))
          17,688 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)
 [...]

Compare that to the behaviour of the first program:

$ ./Test1 +RTS -s          
          47,168 bytes allocated in the heap
           1,756 bytes copied during GC
          42,632 bytes maximum residency (1 sample(s))
          18,808 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)
[...]