Computer Science February 19, 2010

# Deriving the Y-Combinator

In this post I'll derive the Y-combinator and explain all the steps taken. The key to understanding the Y-combinator is knowing precisely what it does. I am sure many people never understand Y-combinator because of this very reason, or because the introduction is too lengthy to comprehend. Therefore I'll give the shortest possible explanation of Y-combinator before I derive it so that you know what the end result should be.

The Y-combinator allows an anonymous function to call itself, that is, it allows anonymous function recursion. Since anonymous functions don't have names, they can't refer to themselves easily. The Y-combinator is a clever construct helps them to refer to themselves. That's it. That's all the Y-combinator does. Remember that.

I'll use the Scheme programming language to derive the Y-combinator. My derivation is based on the one in "The Little Schemer" book. I took the examples from the book, filled in the missing steps and explained the steps in more details.

Here it is.

Suppose we want to write the `length` function, which, given a list, returns the number of elements in it. It's really easy if we can give the function a name:

```(define length
(lambda (list)
(cond
((null? list) 0)
(else
```

But now suppose that we can't give names - we can only use anonymous functions:

```(lambda (list)
(cond
((null? list) 0)
(else
```

Suddenly there is no way this anonymous function can refer to itself. What do we put in `???`? We can't refer to the name `length` anymore because it's an anonymous function, which doesn't have any name. One thing we can try is to put the function itself in the place of `???`:

```(lambda (list)
(cond
((null? list) 0)
(else

(lambda (list)                   ; the
(cond                          ;
((null? list) 0)             ; function
(else                        ;
(add1 (??? (cdr list))))))  ; itself

(cdr list))))))
```

This is not much better, we are still left with `???`. But there is a bright side to it, this function can determine lengths of lists with one element and no elements. Let's try inserting the same anonymous function in place of `???` again:

```(lambda (list)
(cond
((null? list) 0)
(else
(lambda (list)
(cond
((null? list) 0)
(else
(lambda (list)
(cond
((null? list) 0)
(else
(cdr list))))))
(cdr list))))))
```

Now this function can determine lengths of lists with 0, 1 and 2 elements. If we continue this way, we can construct the `length` function for lists with a certain number of elements. But this is not what we want, we the real `length` function that works for lists with any number of elements.

As we all know, repeating code is not a good thing. Let's try to factor out the repetitions and rewrite the original anonymous function slightly. Instead of leaving `???` in the code, let's pass it to an anonymous function via an argument called `length`.

```((lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
???)
```

Notice that this code invokes the first `lambda (length)` function and passes it `???` as the `length` argument. This function in turn returns the original anonymous function:

```(lambda (list)
(cond
((null? list) 0)
(else
```

Now let's try constructing the previous two length functions that work for lists with a maximum of one and two elements. First the function for lists with a maximum of one element:

```((lambda (f)
(lambda (list)
(cond
((null? list) 0)
(else
((lambda (g)
(lambda (list)
(cond
((null? list) 0)
(else
???))
```

Here `???` first gets passed to `lambda (g)`, which returns the original anonymous function, which then gets passed to `lambda (f)`, which in turn returns the function that works for lists with one element:

```(lambda (list)
(cond
((null? list) 0)
(else
(lambda (list)
(cond
((null? list) 0)
(else
(cdr list))))))
```

Similarly, the following code returns a function that works for lists with a maximum of two elements:

```((lambda (f)
(lambda (list)
(cond
((null? list) 0)
(else
((lambda (g)
(lambda (list)
(cond
((null? list) 0)
(else
((lambda (h)
(lambda (list)
(cond
((null? list) 0)
(else
???)))
```

Since the argument names `f`, `g`,`h` in `(lambda (f) ...)`, `(lambda (g) ...)`, `(lambda (h) ...)` are independent, we can rename all of them to `length`, to make it look more similar to the `length` function:

```((lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
((lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
((lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
???)))
```

We are still repeating code. The obvious thing we can do about it is create an anonymous function called `mk-length` (make length) that creates this code for us:

```((lambda (mk-length)
(mk-length ???))
(lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
```

This is pretty tricky. Observe how `(lambda (mk-length) ...)` gets the `(lambda (length) ...)` function passed as the `mk-length` argument, which in turn accepts `???` as an argument and returns our original anonymous function:

```(lambda (list)
(cond
((null? list) 0)
(else
```

Now let's try constructing length functions for lists of length one and two. For the list of length one, we just need to apply `mk-length` on itself:

```((lambda (mk-length)
(mk-length
(mk-length ???)))
(lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
```

Let's go through this code. First `(lambda (length) ...)` gets passed to the `lambda (mk-length)` function as the `mk-length` argument. Then it applies the result of `(mk-length ???)` (which is the original anonymous function) to the `mk-length`. This produces our well known function that works on lists with one or none elements:

```(lambda (list)
(cond
((null? list) 0)
(else
(lambda (list)
(cond
((null? list) 0)
(else
(cdr list))))))
```

Similarly, by adding another call to `mk-length`, we get the function that works for lists with two or less elements:

```((lambda (mk-length)
(mk-length
(mk-length
(mk-length ???)))
(lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
```

Now, since we don't know what `???` is, let's just call it `mk-length` and see what happens:

```((lambda (mk-length)
(mk-length mk-length))
(lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
```

Nothing bad actually happens, we still get the same original anonymous function, except in place of `???` we have the function:

```(lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
```

Notice also that argument names `mk-length` and `length` in `lambda (mk-length)` and `lambda (length)` are independent. Therefore we can rename `length` to `mk-length` to remind that the first argument to `mk-length` is `mk-length`:

```((lambda (mk-length)
(mk-length mk-length))
(lambda (mk-length)
(lambda (list)
(cond
((null? list) 0)
(else
```

And now comes the key trick. We replace `mk-length` with a self-application `(mk-length mk-length)`:

```((lambda (mk-length)
(mk-length mk-length))
(lambda (mk-length)
(lambda (list)
(cond
((null? list) 0)
(else
```

If you try this code out, you'll see that it returns the length of a list for any list. Here is an example of applying it to a list of 10 elements:

```(((lambda (mk-length)
(mk-length mk-length))
(lambda (mk-length)
(lambda (list)
(cond
((null? list) 0)
(else
'(a b c d e f g h i j))
```

The function works because it keeps adding recursive uses by passing `mk-length` to itself, just as it is about to expire. This is not yet the Y-combinator, but we have successfully managed to recursively call an anonymous function. Now we need to massage the code a bit to separate the anonymous function from the self-applicative code.

The first step is to move the self-applicative code out as much as possible:

```((lambda (mk-length)
(mk-length mk-length))
(lambda (mk-length)
((lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
(lambda (x)
((mk-length mk-length) x)))))
```

Now the code in bold looks very much like the `length` function we need! Let's move it outside as well:

```((lambda (le)
((lambda (mk-length)
(mk-length mk-length))
(lambda (mk-length)
(le (lambda (x)
((mk-length mk-length) x))))))
(lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
```

There we have it! The anonymous `length` function has been separated from the self-applicative call. The code in bold is the `length` function, and the code not-in-bold is the Y-combinator!

Now let's rename `mk-length` to `f` and join the lines to make it more compact:

```((lambda (le)
((lambda (f) (f f))
(lambda (f)
(le (lambda (x) ((f f) x))))))
(lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
```

And we are done. We can now write down the definition of Y-combinator:

```(define Y
(lambda (le)
((lambda (f) (f f))
(lambda (f)
(le (lambda (x) ((f f) x)))))))
```

And we can apply it to our anonymous `length` function:

```((Y (lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
'(a b c d e f g h i j))

; ==> 10
```

This is one of many derivations of the applicative-order Y-combinator. See the publication "The Why of Y" for another derivation.

Great post!

I think this approach to deriving the Y combinator really appeals to the hackers. I'll share with my students.

I teach a compilers class, and I wrote up an alternate derivation for my students that starts with fixed points; you might be interested:

s/But this is now what we want, we a real length function that works for lists with any number of elements./But this is not what we want, we want a real length function that works for lists with any number of elements./

most programming languages that support anonymous functions can also name the anonymous function(might be an extra line of code but that's fine).

When is a situation where we "cannot give names" to an anonymous function?

tommy > there is a quirk in the Erlang syntax, wich makes that you can't have a local declaration of a recursive function (inside an expression) : actually, there is no syntax for named function declaration inside expressions, so you have to use anonymous functions. Hence, you can't declare a local recursive function without such a fixpoint operator. This is actually a bad idea, as you end up reimplementing a part of the runtime system in a probably slow and certainly cryptic way, but that was just for the example.

Peter > this combinator is just one among the different fixpoint combinators available. There is for example the Turing Fixpoint combinator, wich is also derivable. Do you know where in your derivation you made a decision that lead you to this combinator instead of another ? I have a gut feeling that the derivation is not "canonical" and that one could probably use a slightly different succession of software-engineering-and-common-sense steps that would naturally lead to another combinator.

Also, I'm not sure I agree with your presentation of the Y-combinator. You describe it as something about recursion and anonymous functions. I believe it is much more general than that : fixpoint combinators allow you to control recursion. It is all about parametrizing the function you call over at your recursion sites, instead of hardcoding recursion using the language syntax (or lack of). From a software engineering point of view, you could say that it decouples two aspects of your function : the implementation (with domain-specific knowledge and all) and the "looping" / "tying the knot" process, wich is deferred to the fixpoint combinator.

You can do quite interesting (but somewhat hacky) things once you've decoupled those two separate concerns. Matt mentioned memoization (automagic memoization of inner recursive calls, wich have been exposed by the decoupling process), there are quite a few folk examples, see for example the following article That About Wraps it Up.

It's a funny hack, for example, to write in your favorite language a function that take any derecursified/decoupled function and, assuming the original function was tail-recursive, produce a tail-recursive implementation of it, using an explicit call stack, wich doesn't depend on the implementation of tail recursion in the compiler/runtime.

most programming languages that support anonymous functions can also name the anonymous function(might be an extra line of code but that’s fine).

When is a situation where we “cannot give names” to an anonymous function?

When we need to explain how the Y combinator works.

tommy i don't think you understand the point of this.

you need to realize that there are many applications of the Y combinator, many times the language you THINK "names" anon functions doesn't actually name them.

my only critique on this post is that you don't talk about omega, which to me is the easiest way to understand why Y works

This is nice! Just like the http://www.ece.uc.edu/~franco/C511/html/Scheme/ycomb.html explanation from some time in 1999. Good to see a refresh!!

Matt, thanks for noticing the typo. Gonna fix it.

Tommy, I can only think of lambda calculus that, if I understand correctly, uses Y-combinator to recurse.

Bluestorm, I am also familiar with Y! (Y-bang) combinator (from The Seasoned Schemer book). But I am not exactly sure how it's different from Y-combinator. I am still thinking about it. The derivation is similar to Y-combinator's but it's a bit different. The book says that it works the same as Y-combinator but then gives an example which Y-combinator processes but Y! combinator goes into an infinite loop. And thanks for the in-depth comment!

I wrote the Y! out when I was reading that chapter. I posted the code here: (define Y-bang ...) and example where Y-bang fails, but Y works.

Bruce, nice explanation! Thanks for sharing!

s/And now comes they they key trick/And now comes the key trick/

Paulo, fixed. :)

You didn't close a span:

(which is the original anonymous function) to the <code><span style="color:green">mk-length</code></code>.

Also, this: "Now the code in bold looks very much like the length function we need!" refers to nonexisting boldness.

Aside from that, an interesting read.

Roman, closed now, thanks for noticing! Also put the code in blog there. :)

The problem with the Y-Combinator is, that it cannot be typed. We have Y = \s.(\x.s(x x))(\x.s(x x)), such that Ys reduces to (\x.s(x x))(\x.s (x x)) which reduces to s((\x.s(x x))(\x.s(x x))) = s Ys, thus, having Ys reducing to all sssss...sYs, which means, Ys is not strongly normalizing - and since all typed lambda terms are strongly normalizing, it cannot be typed (actually, in the same way, no fixpoint combinator can be typed).

Anyway, even strongly typed programming languages like Haskell have recursion, even though mostly through calling the function by its name. In the theory of typed lambda terms, you therefore mostly axiomatically define recursion operators, like R:(a->a)->(a->bool)->a, which applies the first argument to a until (a->bool) gets true (there are a lot of other possibilities, though). Anyway, these cannot be expressed as terms, they must be defined axiomatically.

dasuxullebt, thanks for the insightful comment.

Wow, that's a really great post! After reading The Little Schemer I had problems with understanding Y-combinator, but now everything is clear :)

I'm among readers of 'The Little Schemer' too :-)
I just completed the functional programming course in Scala offered by Martin Odersky on Coursera and find this concepts really delicious

Good read! You see all this clearly, and I dare ask for more :-)
One intriguing thing about chapter 9 is to discover that the first way that comes to head for simplifying the value function (by factoring out (make-length make-length)) results in a function which enters an infinite loop BEFORE even starting the actual computation.
Have you got an explanation for this (apart from just saying "hey, it happens!" as the book does)? Is this something general about factoring out f(f)? or does it happen just here?
I trust it is a general problem with f(f), but I've thought long about it without coming to a conclusion. Your view will be greatly appreciated.
Thank you and have a good day.

Peter - 5 years on and your article is still really useful. I had coped with The Little Schemer up until the Y combinator - you got me through it.
For entertainment value only you might want to see my attempt in Javascript at http://jsbin.com/loromo/edit.
Thanks!

There's also an account of deriving the Y combinatory in something approximating the untyped lambda calculus in "An Introduction to Functional Programming Through Lambda Calculus" (Addison-Wesley, 1989) in Chapter 4.