Deriving the Y-Combinator

In this post I'll derive the Y-combinator and explain all the steps taken. The key to understanding the Y-combinator is knowing precisely what it does. I am sure many people never understand Y-combinator because of this very reason, or because the introduction is too lengthy to comprehend. Therefore I'll give the shortest possible explanation of Y-combinator before I derive it so that you know what the end result should be.

The Y-combinator allows an anonymous function to call itself, that is, it allows anonymous function recursion. Since anonymous functions don't have names, they can't refer to themselves easily. The Y-combinator is a clever construct helps them to refer to themselves. That's it. That's all the Y-combinator does. Remember that.

I'll use the Scheme programming language to derive the Y-combinator. My derivation is based on the one in "The Little Schemer" book. I took the examples from the book, filled in the missing steps and explained the steps in more details.

Here it is.

Suppose we want to write the length function, which, given a list, returns the number of elements in it. It's really easy if we can give the function a name:

(define length
  (lambda (list)
    (cond
      ((null? list) 0)
      (else
        (add1 (length (cdr list)))))))

But now suppose that we can't give names - we can only use anonymous functions:

(lambda (list)
  (cond
    ((null? list) 0)
    (else
      (add1 (??? (cdr list)))))))

Suddenly there is no way this anonymous function can refer to itself. What do we put in ???? We can't refer to the name length anymore because it's an anonymous function, which doesn't have any name. One thing we can try is to put the function itself in the place of ???:

(lambda (list)
  (cond
    ((null? list) 0)
    (else
      (add1 (

  (lambda (list)                   ; the
    (cond                          ; 
      ((null? list) 0)             ; function
      (else                        ;
       (add1 (??? (cdr list))))))  ; itself

  (cdr list))))))

This is not much better, we are still left with ???. But there is a bright side to it, this function can determine lengths of lists with one element and no elements. Let's try inserting the same anonymous function in place of ??? again:

(lambda (list)
  (cond
    ((null? list) 0)
    (else
      (add1 (
  (lambda (list) 
    (cond 
      ((null? list) 0) 
      (else 
       (add1 (
    (lambda (list)
      (cond
        ((null? list) 0)
        (else
         (add1 (??? (cdr list))))))
    (cdr list))))))  
  (cdr list))))))

Now this function can determine lengths of lists with 0, 1 and 2 elements. If we continue this way, we can construct the length function for lists with a certain number of elements. But this is not what we want, we the real length function that works for lists with any number of elements.

As we all know, repeating code is not a good thing. Let's try to factor out the repetitions and rewrite the original anonymous function slightly. Instead of leaving ??? in the code, let's pass it to an anonymous function via an argument called length.

((lambda (length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (length (cdr list)))))))
 ???)

Notice that this code invokes the first lambda (length) function and passes it ??? as the length argument. This function in turn returns the original anonymous function:

(lambda (list)
  (cond
    ((null? list) 0)
    (else
     (add1 (??? (cdr list))))))

Now let's try constructing the previous two length functions that work for lists with a maximum of one and two elements. First the function for lists with a maximum of one element:

((lambda (f)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (f (cdr list)))))))
 ((lambda (g)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (g (cdr list)))))))
 ???))

Here ??? first gets passed to lambda (g), which returns the original anonymous function, which then gets passed to lambda (f), which in turn returns the function that works for lists with one element:

(lambda (list)
  (cond
    ((null? list) 0)
    (else
      (add1 (
  (lambda (list)
    (cond
      ((null? list) 0)
      (else
       (add1 (??? (cdr list))))))
  (cdr list))))))

Similarly, the following code returns a function that works for lists with a maximum of two elements:

((lambda (f)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (f (cdr list)))))))
 ((lambda (g)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (g (cdr list)))))))
 ((lambda (h)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (h (cdr list)))))))
 ???)))

Since the argument names f, g,h in (lambda (f) ...), (lambda (g) ...), (lambda (h) ...) are independent, we can rename all of them to length, to make it look more similar to the length function:

((lambda (length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (length (cdr list)))))))
 ((lambda (length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (length (cdr list)))))))
 ((lambda (length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (length (cdr list)))))))
 ???)))

We are still repeating code. The obvious thing we can do about it is create an anonymous function called mk-length (make length) that creates this code for us:

((lambda (mk-length)
   (mk-length ???))
 (lambda (length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (length (cdr list))))))))

This is pretty tricky. Observe how (lambda (mk-length) ...) gets the (lambda (length) ...) function passed as the mk-length argument, which in turn accepts ??? as an argument and returns our original anonymous function:

(lambda (list)
  (cond
    ((null? list) 0)
    (else
     (add1 (??? (cdr list))))))

Now let's try constructing length functions for lists of length one and two. For the list of length one, we just need to apply mk-length on itself:

((lambda (mk-length)
   (<span style="color:green">mk-length</span></code>
     (<span style="color:blue">mk-length</span> ???)))
 (lambda (length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (length (cdr list))))))))

Let's go through this code. First (lambda (length) ...) gets passed to the lambda (mk-length) function as the mk-length argument. Then it applies the result of (mk-length ???) (which is the original anonymous function) to the mk-length. This produces our well known function that works on lists with one or none elements:

(lambda (list)
  (cond
    ((null? list) 0)
    (else
      (add1 (
  (lambda (list)
    (cond
      ((null? list) 0)
      (else
       (add1 (??? (cdr list))))))
  (cdr list))))))

Similarly, by adding another call to mk-length, we get the function that works for lists with two or less elements:

((lambda (mk-length)
   (mk-length
     (mk-length
       (mk-length ???)))
 (lambda (length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (length (cdr list))))))))

Now, since we don't know what ??? is, let's just call it mk-length and see what happens:

((lambda (mk-length)
   (mk-length mk-length))
 (lambda (length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (length (cdr list))))))))

Nothing bad actually happens, we still get the same original anonymous function, except in place of ??? we have the function:

(lambda (length)
  (lambda (list)
    (cond
      ((null? list) 0)
      (else
       (add1 (length (cdr list)))))))

Notice also that argument names mk-length and length in lambda (mk-length) and lambda (length) are independent. Therefore we can rename length to mk-length to remind that the first argument to mk-length is mk-length:

((lambda (mk-length)
   (mk-length mk-length))
 (lambda (mk-length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (<span style="color:green">mk-length</span> (cdr list))))))))

And now comes the key trick. We replace mk-length with a self-application (mk-length mk-length):

((lambda (mk-length)
   (mk-length mk-length))
 (lambda (mk-length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 ((mk-length mk-length) (cdr list))))))))

If you try this code out, you'll see that it returns the length of a list for any list. Here is an example of applying it to a list of 10 elements:

(((lambda (mk-length)
   (mk-length mk-length))
 (lambda (mk-length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 ((mk-length mk-length) (cdr list))))))))
 '(a b c d e f g h i j))

The function works because it keeps adding recursive uses by passing mk-length to itself, just as it is about to expire. This is not yet the Y-combinator, but we have successfully managed to recursively call an anonymous function. Now we need to massage the code a bit to separate the anonymous function from the self-applicative code.

The first step is to move the self-applicative code out as much as possible:

((lambda (mk-length)
   (mk-length mk-length))
 (lambda (mk-length)
   <strong>((lambda (length)
      (lambda (list)
        (cond
          ((null? list) 0)
          (else
           (add1 (length (cdr list)))))))</strong>
   (lambda (x)
     ((mk-length mk-length) x)))))

Now the code in bold looks very much like the length function we need! Let's move it outside as well:

((lambda (le)
   ((lambda (mk-length)
      (mk-length mk-length))
    (lambda (mk-length)
      (le (lambda (x)
            ((mk-length mk-length) x))))))
 <strong>(lambda (length)
    (lambda (list)
      (cond
        ((null? list) 0)
        (else
         (add1 (length (cdr list))))))))</strong>

There we have it! The anonymous length function has been separated from the self-applicative call. The code in bold is the length function, and the code not-in-bold is the Y-combinator!

Now let's rename mk-length to f and join the lines to make it more compact:

((lambda (le)
   ((lambda (f) (f f))
    (lambda (f)
      (le (lambda (x) ((f f) x))))))
 (lambda (length)
   (lambda (list)
      (cond
        ((null? list) 0)
        (else
         (add1 (length (cdr list))))))))

And we are done. We can now write down the definition of Y-combinator:

(define Y
  (lambda (le)
    ((lambda (f) (f f))
     (lambda (f)
       (le (lambda (x) ((f f) x)))))))

And we can apply it to our anonymous length function:

((Y (lambda (length)
     (lambda (list)
       (cond
         ((null? list) 0)
         (else
          (add1 (length (cdr list))))))))
 '(a b c d e f g h i j))

; ==> 10

This is one of many derivations of the applicative-order Y-combinator. See the publication "The Why of Y" for another derivation.