Y-Combinator

In this post I'll derive the Y-combinator and explain all the steps taken. The key to understanding the Y-combinator is knowing precisely what it does. I am sure many people never understand Y-combinator because of this very reason, or because the introduction is too lengthy to comprehend. Therefore I'll give the shortest possible explanation of Y-combinator before I derive it so that you know what the end result should be.

The Y-combinator allows an anonymous function to call itself, that is, it allows anonymous function recursion. Since anonymous functions don't have names, they can't refer to themselves easily. The Y-combinator is a clever construct helps them to refer to themselves. That's it. That's all the Y-combinator does. Remember that.

I'll use the Scheme programming language to derive the Y-combinator. My derivation is based on the one in "The Little Schemer" book. I took the examples from the book, filled in the missing steps and explained the steps in more details.

Here it is.

Suppose we want to write the length function, which, given a list, returns the number of elements in it. It's really easy if we can give the function a name:

(define length
  (lambda (list)
    (cond
      ((null? list) 0)
      (else
        (add1 (length (cdr list)))))))

But now suppose that we can't give names - we can only use anonymous functions:

(lambda (list)
  (cond
    ((null? list) 0)
    (else
      (add1 (??? (cdr list)))))))

Suddenly there is no way this anonymous function can refer to itself. What do we put in ???? We can't refer to the name length anymore because it's an anonymous function, which doesn't have any name. One thing we can try is to put the function itself in the place of ???:

(lambda (list)
  (cond
    ((null? list) 0)
    (else
      (add1 (
             
  (lambda (list)                   ; the
    (cond                          ; 
      ((null? list) 0)             ; function
      (else                        ;
       (add1 (??? (cdr list))))))  ; itself
  
  (cdr list))))))

This is not much better, we are still left with ???. But there is a bright side to it, this function can determine lengths of lists with one element and no elements. Let's try inserting the same anonymous function in place of ??? again:

(lambda (list)
  (cond
    ((null? list) 0)
    (else
      (add1 (
  (lambda (list) 
    (cond 
      ((null? list) 0) 
      (else 
       (add1 (
    (lambda (list)
      (cond
        ((null? list) 0)
        (else
         (add1 (??? (cdr list))))))
    (cdr list))))))  
  (cdr list))))))

Now this function can determine lengths of lists with 0, 1 and 2 elements. If we continue this way, we can construct the length function for lists with a certain number of elements. But this is not what we want, we the real length function that works for lists with any number of elements.

As we all know, repeating code is not a good thing. Let's try to factor out the repetitions and rewrite the original anonymous function slightly. Instead of leaving ??? in the code, let's pass it to an anonymous function via an argument called length.

((lambda (length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (length (cdr list)))))))
 ???)

Notice that this code invokes the first lambda (length) function and passes it ??? as the length argument. This function in turn returns the original anonymous function:

(lambda (list)
  (cond
    ((null? list) 0)
    (else
     (add1 (??? (cdr list))))))

Now let's try constructing the previous two length functions that work for lists with a maximum of one and two elements. First the function for lists with a maximum of one element:

((lambda (f)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (f (cdr list)))))))
 ((lambda (g)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (g (cdr list)))))))
 ???))

Here ??? first gets passed to lambda (g), which returns the original anonymous function, which then gets passed to lambda (f), which in turn returns the function that works for lists with one element:

(lambda (list)
  (cond
    ((null? list) 0)
    (else
      (add1 (
  (lambda (list)
    (cond
      ((null? list) 0)
      (else
       (add1 (??? (cdr list))))))
  (cdr list))))))

Similarly, the following code returns a function that works for lists with a maximum of two elements:

((lambda (f)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (f (cdr list)))))))
 ((lambda (g)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (g (cdr list)))))))
 ((lambda (h)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (h (cdr list)))))))
 ???)))

Since the argument names f, g,h in (lambda (f) ...), (lambda (g) ...), (lambda (h) ...) are independent, we can rename all of them to length, to make it look more similar to the length function:

((lambda (length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (length (cdr list)))))))
 ((lambda (length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (length (cdr list)))))))
 ((lambda (length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (length (cdr list)))))))
 ???)))

We are still repeating code. The obvious thing we can do about it is create an anonymous function called mk-length (make length) that creates this code for us:

((lambda (mk-length)
   (mk-length ???))
 (lambda (length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (length (cdr list))))))))

This is pretty tricky. Observe how (lambda (mk-length) ...) gets the (lambda (length) ...) function passed as the mk-length argument, which in turn accepts ??? as an argument and returns our original anonymous function:

(lambda (list)
  (cond
    ((null? list) 0)
    (else
     (add1 (??? (cdr list))))))

Now let's try constructing length functions for lists of length one and two. For the list of length one, we just need to apply mk-length on itself:

((lambda (mk-length)
   (mk-length
     (mk-length ???)))
 (lambda (length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (length (cdr list))))))))

Let's go through this code. First (lambda (length) ...) gets passed to the lambda (mk-length) function as the mk-length argument. Then it applies the result of (mk-length ???) (which is the original anonymous function) to the mk-length. This produces our well known function that works on lists with one or none elements:

(lambda (list)
  (cond
    ((null? list) 0)
    (else
      (add1 (
  (lambda (list)
    (cond
      ((null? list) 0)
      (else
       (add1 (??? (cdr list))))))
  (cdr list))))))

Similarly, by adding another call to mk-length, we get the function that works for lists with two or less elements:

((lambda (mk-length)
   (mk-length
     (mk-length
       (mk-length ???)))
 (lambda (length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (length (cdr list))))))))

Now, since we don't know what ??? is, let's just call it mk-length and see what happens:

((lambda (mk-length)
   (mk-length mk-length))
 (lambda (length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (length (cdr list))))))))

Nothing bad actually happens, we still get the same original anonymous function, except in place of ??? we have the function:

(lambda (length)
  (lambda (list)
    (cond
      ((null? list) 0)
      (else
       (add1 (length (cdr list)))))))

Notice also that argument names mk-length and length in lambda (mk-length) and lambda (length) are independent. Therefore we can rename length to mk-length to remind that the first argument to mk-length is mk-length:

((lambda (mk-length)
   (mk-length mk-length))
 (lambda (mk-length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 (mk-length (cdr list))))))))

And now comes the key trick. We replace mk-length with a self-application (mk-length mk-length):

((lambda (mk-length)
   (mk-length mk-length))
 (lambda (mk-length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 ((mk-length mk-length) (cdr list))))))))

If you try this code out, you'll see that it returns the length of a list for any list. Here is an example of applying it to a list of 10 elements:

(((lambda (mk-length)
   (mk-length mk-length))
 (lambda (mk-length)
   (lambda (list)
     (cond
       ((null? list) 0)
       (else
        (add1 ((mk-length mk-length) (cdr list))))))))
 '(a b c d e f g h i j))

The function works because it keeps adding recursive uses by passing mk-length to itself, just as it is about to expire. This is not yet the Y-combinator, but we have successfully managed to recursively call an anonymous function. Now we need to massage the code a bit to separate the anonymous function from the self-applicative code.

The first step is to move the self-applicative code out as much as possible:

((lambda (mk-length)
   (mk-length mk-length))
 (lambda (mk-length)
   ((lambda (length)
      (lambda (list)
        (cond
          ((null? list) 0)
          (else
           (add1 (length (cdr list)))))))
   (lambda (x)
     ((mk-length mk-length) x)))))

Now the code in bold looks very much like the length function we need! Let's move it outside as well:

((lambda (le)
   ((lambda (mk-length)
      (mk-length mk-length))
    (lambda (mk-length)
      (le (lambda (x)
            ((mk-length mk-length) x))))))
 (lambda (length)
    (lambda (list)
      (cond
        ((null? list) 0)
        (else
         (add1 (length (cdr list))))))))

There we have it! The anonymous length function has been separated from the self-applicative call. The code in bold is the length function, and the code not-in-bold is the Y-combinator!

Now let's rename mk-length to f and join the lines to make it more compact:

((lambda (le)
   ((lambda (f) (f f))
    (lambda (f)
      (le (lambda (x) ((f f) x))))))
 (lambda (length)
   (lambda (list)
      (cond
        ((null? list) 0)
        (else
         (add1 (length (cdr list))))))))

And we are done. We can now write down the definition of Y-combinator:

(define Y
  (lambda (le)
    ((lambda (f) (f f))
     (lambda (f)
       (le (lambda (x) ((f f) x)))))))

And we can apply it to our anonymous length function:

((Y (lambda (length)
     (lambda (list)
       (cond
         ((null? list) 0)
         (else
          (add1 (length (cdr list))))))))
 '(a b c d e f g h i j))

; ==> 10

This is one of many derivations of the applicative-order Y-combinator. See the publication "The Why of Y" for another derivation.

Comments

February 19, 2010, 14:46

Great post!

I think this approach to deriving the Y combinator really appeals to the hackers. I'll share with my students.

I teach a compilers class, and I wrote up an alternate derivation for my students that starts with fixed points; you might be interested:

Deriving a memoizing Y combinator for JavaScript from fixed point theory.

Matt Permalink
February 19, 2010, 15:36

s/But this is now what we want, we a real length function that works for lists with any number of elements./But this is not what we want, we want a real length function that works for lists with any number of elements./

February 19, 2010, 16:38

most programming languages that support anonymous functions can also name the anonymous function(might be an extra line of code but that's fine).

When is a situation where we "cannot give names" to an anonymous function?

bluestorm Permalink
February 19, 2010, 18:11

tommy > there is a quirk in the Erlang syntax, wich makes that you can't have a local declaration of a recursive function (inside an expression) : actually, there is no syntax for named function declaration inside expressions, so you have to use anonymous functions. Hence, you can't declare a local recursive function without such a fixpoint operator. This is actually a bad idea, as you end up reimplementing a part of the runtime system in a probably slow and certainly cryptic way, but that was just for the example.

Peteris > this combinator is just one among the different fixpoint combinators available. There is for example the Turing Fixpoint combinator, wich is also derivable. Do you know where in your derivation you made a decision that lead you to this combinator instead of another ? I have a gut feeling that the derivation is not "canonical" and that one could probably use a slightly different succession of software-engineering-and-common-sense steps that would naturally lead to another combinator.

Also, I'm not sure I agree with your presentation of the Y-combinator. You describe it as something about recursion and anonymous functions. I believe it is much more general than that : fixpoint combinators allow you to control recursion. It is all about parametrizing the function you call over at your recursion sites, instead of hardcoding recursion using the language syntax (or lack of). From a software engineering point of view, you could say that it decouples two aspects of your function : the implementation (with domain-specific knowledge and all) and the "looping" / "tying the knot" process, wich is deferred to the fixpoint combinator.

You can do quite interesting (but somewhat hacky) things once you've decoupled those two separate concerns. Matt mentioned memoization (automagic memoization of inner recursive calls, wich have been exposed by the decoupling process), there are quite a few folk examples, see for example the following article That About Wraps it Up.

It's a funny hack, for example, to write in your favorite language a function that take any derecursified/decoupled function and, assuming the original function was tail-recursive, produce a tail-recursive implementation of it, using an explicit call stack, wich doesn't depend on the implementation of tail recursion in the compiler/runtime.

dm3 Permalink
February 19, 2010, 18:54

most programming languages that support anonymous functions can also name the anonymous function(might be an extra line of code but that’s fine).

When is a situation where we “cannot give names” to an anonymous function?

When we need to explain how the Y combinator works.

February 19, 2010, 18:59

tommy i don't think you understand the point of this.

you need to realize that there are many applications of the Y combinator, many times the language you THINK "names" anon functions doesn't actually name them.

my only critique on this post is that you don't talk about omega, which to me is the easiest way to understand why Y works

bruce Permalink
February 19, 2010, 19:03

This is nice! Just like the http://www.ece.uc.edu/~franco/C511/html/Scheme/ycomb.html explanation from some time in 1999. Good to see a refresh!!

February 19, 2010, 19:20

Matt, thanks for noticing the typo. Gonna fix it.

Tommy, I can only think of lambda calculus that, if I understand correctly, uses Y-combinator to recurse.

Bluestorm, I am also familiar with Y! (Y-bang) combinator (from The Seasoned Schemer book). But I am not exactly sure how it's different from Y-combinator. I am still thinking about it. The derivation is similar to Y-combinator's but it's a bit different. The book says that it works the same as Y-combinator but then gives an example which Y-combinator processes but Y! combinator goes into an infinite loop. And thanks for the in-depth comment!

I wrote the Y! out when I was reading that chapter. I posted the code here: (define Y-bang ...) and example where Y-bang fails, but Y works.

Bruce, nice explanation! Thanks for sharing!

Paulo Permalink
February 20, 2010, 06:58

s/And now comes they they key trick/And now comes the key trick/

February 20, 2010, 19:50

Paulo, fixed. :)

Roman Permalink
February 20, 2010, 23:12

You didn't close a span:

(which is the original anonymous function) to the <code><span style="color:green">mk-length</code></code>.

Also, this: "Now the code in bold looks very much like the length function we need!" refers to nonexisting boldness.

Aside from that, an interesting read.

February 20, 2010, 23:35

Roman, closed now, thanks for noticing! Also put the code in blog there. :)

February 21, 2010, 14:13

The problem with the Y-Combinator is, that it cannot be typed. We have Y = \s.(\x.s(x x))(\x.s(x x)), such that Ys reduces to (\x.s(x x))(\x.s (x x)) which reduces to s((\x.s(x x))(\x.s(x x))) = s Ys, thus, having Ys reducing to all sssss...sYs, which means, Ys is not strongly normalizing - and since all typed lambda terms are strongly normalizing, it cannot be typed (actually, in the same way, no fixpoint combinator can be typed).

Anyway, even strongly typed programming languages like Haskell have recursion, even though mostly through calling the function by its name. In the theory of typed lambda terms, you therefore mostly axiomatically define recursion operators, like R:(a->a)->(a->bool)->a, which applies the first argument to a until (a->bool) gets true (there are a lot of other possibilities, though). Anyway, these cannot be expressed as terms, they must be defined axiomatically.

February 21, 2010, 20:08

dasuxullebt, thanks for the insightful comment.

March 18, 2012, 07:33

Wow, that's a really great post! After reading The Little Schemer I had problems with understanding Y-combinator, but now everything is clear :)

November 20, 2012, 13:23

very good explanation, thanks. There is a recent talk worth watching about this esoteric exercises: http://www.infoq.com/presentations/Y-Combinator
I'm among readers of 'The Little Schemer' too :-)
I just completed the functional programming course in Scala offered by Martin Odersky on Coursera and find this concepts really delicious

December 15, 2012, 07:33

Good read! You see all this clearly, and I dare ask for more :-)
One intriguing thing about chapter 9 is to discover that the first way that comes to head for simplifying the value function (by factoring out (make-length make-length)) results in a function which enters an infinite loop BEFORE even starting the actual computation.
Have you got an explanation for this (apart from just saying "hey, it happens!" as the book does)? Is this something general about factoring out f(f)? or does it happen just here?
I trust it is a general problem with f(f), but I've thought long about it without coming to a conclusion. Your view will be greatly appreciated.
Thank you and have a good day.

Leave a new comment

(why do I need your e-mail?)

(Your twitter name, if you have one. (I'm @pkrumins, btw.))

Type the first letter of your name: (just to make sure you're a human)

Please preview the comment before submitting to make sure it's OK.

Advertisements