Follow me on Twitter for my latest adventures!

In this post I'll derive the Y-combinator and explain all the steps taken. The key to understanding the Y-combinator is knowing precisely what it does. I am sure many people never understand Y-combinator because of this very reason, or because the introduction is too lengthy to comprehend. Therefore I'll give the shortest possible explanation of Y-combinator before I derive it so that you know what the end result should be.

The Y-combinator allows an anonymous function to call itself, that is, **it allows anonymous function recursion**. Since anonymous functions don't have names, they can't refer to themselves easily. The Y-combinator is a clever construct helps them to refer to themselves. That's it. That's all the Y-combinator does. Remember that.

I'll use the Scheme programming language to derive the Y-combinator. My derivation is based on the one in "The Little Schemer" book. I took the examples from the book, filled in the missing steps and explained the steps in more details.

Here it is.

Suppose we want to write the `length`

function, which, given a list, returns the number of elements in it. It's really easy if we can give the function a name:

```
(define length
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (length (cdr list)))))))
```

But now suppose that we can't give names - we can only use anonymous functions:

```
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (??? (cdr list)))))))
```

Suddenly there is no way this anonymous function can refer to itself. What do we put in ** ???**? We can't refer to the name

`length`

anymore because it's an anonymous function, which doesn't have any name. One thing we can try is to put the function itself in the place of **:**

`???`

```
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (
(lambda (list) ; the
(cond ;
((null? list) 0) ; function
(else ;
(add1 (??? (cdr list)))))) ; itself
(cdr list))))))
```

This is not much better, we are still left with ** ???**. But there is a bright side to it, this function can determine lengths of lists with one element and no elements. Let's try inserting the same anonymous function in place of

**again:**

`???`

```
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (??? (cdr list))))))
(cdr list))))))
(cdr list))))))
```

Now this function can determine lengths of lists with 0, 1 and 2 elements. If we continue this way, we can construct the `length`

function for lists with a certain number of elements. But this is not what we want, we the real `length`

function that works for lists with any number of elements.

As we all know, repeating code is not a good thing. Let's try to factor out the repetitions and rewrite the original anonymous function slightly. Instead of leaving `???`

in the code, let's pass it to an anonymous function via an argument called `length`

.

```
((lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (length (cdr list)))))))
???)
```

Notice that this code invokes the first `lambda (length)`

function and passes it `???`

as the `length`

argument. This function in turn returns the original anonymous function:

```
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (??? (cdr list))))))
```

Now let's try constructing the previous two length functions that work for lists with a maximum of one and two elements. First the function for lists with a maximum of one element:

```
((lambda (f)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (f (cdr list)))))))
((lambda (g)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (g (cdr list)))))))
???))
```

Here `???`

first gets passed to `lambda (g)`

, which returns the original anonymous function, which then gets passed to `lambda (f)`

, which in turn returns the function that works for lists with one element:

```
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (??? (cdr list))))))
(cdr list))))))
```

Similarly, the following code returns a function that works for lists with a maximum of two elements:

```
((lambda (f)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (f (cdr list)))))))
((lambda (g)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (g (cdr list)))))))
((lambda (h)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (h (cdr list)))))))
???)))
```

Since the argument names `f`

, `g`

,`h`

in `(lambda (f) ...)`

, `(lambda (g) ...)`

, `(lambda (h) ...)`

are independent, we can rename all of them to `length`

, to make it look more similar to the `length`

function:

```
((lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (length (cdr list)))))))
((lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (length (cdr list)))))))
((lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (length (cdr list)))))))
???)))
```

We are still repeating code. The obvious thing we can do about it is create an anonymous function called `mk-length`

(make length) that creates this code for us:

```
((lambda (mk-length)
(mk-length ???))
(lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (length (cdr list))))))))
```

This is pretty tricky. Observe how `(lambda (mk-length) ...)`

gets the `(lambda (length) ...)`

function passed as the `mk-length`

argument, which in turn accepts `???`

as an argument and returns our original anonymous function:

```
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (??? (cdr list))))))
```

Now let's try constructing length functions for lists of length one and two. For the list of length one, we just need to apply `mk-length`

on itself:

((lambda (mk-length) (mk-length (mk-length ???))) (lambda (length) (lambda (list) (cond ((null? list) 0) (else (add1 (length (cdr list))))))))

Let's go through this code. First `(lambda (length) ...)`

gets passed to the `lambda (mk-length)`

function as the `mk-length`

argument. Then it applies the result of `(mk-length ???)`

(which is the original anonymous function) to the `mk-length`

. This produces our well known function that works on lists with one or none elements:

```
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (??? (cdr list))))))
(cdr list))))))
```

Similarly, by adding another call to `mk-length`

, we get the function that works for lists with two or less elements:

```
((lambda (mk-length)
(mk-length
(mk-length
(mk-length ???)))
(lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (length (cdr list))))))))
```

Now, since we don't know what `???`

is, let's just call it `mk-length`

and see what happens:

```
((lambda (mk-length)
(mk-length mk-length))
(lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (length (cdr list))))))))
```

Nothing bad actually happens, we still get the same original anonymous function, except in place of `???`

we have the function:

```
(lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (length (cdr list)))))))
```

Notice also that argument names `mk-length`

and `length`

in `lambda (mk-length)`

and `lambda (length)`

are independent. Therefore we can rename `length`

to `mk-length`

to remind that the first argument to `mk-length`

is `mk-length`

:

```
((lambda (mk-length)
(mk-length mk-length))
(lambda (mk-length)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (mk-length (cdr list))))))))
```

And now comes the key trick. We replace `mk-length`

with a self-application `(mk-length mk-length)`

:

```
((lambda (mk-length)
(mk-length mk-length))
(lambda (mk-length)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 ((mk-length mk-length) (cdr list))))))))
```

If you try this code out, you'll see that it returns the length of a list for any list. Here is an example of applying it to a list of 10 elements:

```
(((lambda (mk-length)
(mk-length mk-length))
(lambda (mk-length)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 ((mk-length mk-length) (cdr list))))))))
'(a b c d e f g h i j))
```

The function works because it keeps adding recursive uses by passing `mk-length`

to itself, just as it is about to expire. This is not yet the Y-combinator, but we have successfully managed to recursively call an anonymous function. Now we need to massage the code a bit to separate the anonymous function from the self-applicative code.

The first step is to move the self-applicative code out as much as possible:

((lambda (mk-length) (mk-length mk-length)) (lambda (mk-length)((lambda (length) (lambda (list) (cond ((null? list) 0) (else (add1 (length (cdr list)))))))(lambda (x) ((mk-length mk-length) x)))))

Now the code in bold looks very much like the `length`

function we need! Let's move it outside as well:

((lambda (le) ((lambda (mk-length) (mk-length mk-length)) (lambda (mk-length) (le (lambda (x) ((mk-length mk-length) x))))))(lambda (length) (lambda (list) (cond ((null? list) 0) (else (add1 (length (cdr list))))))))

There we have it! The anonymous `length`

function has been separated from the self-applicative call. The code in **bold** is the `length`

function, and the code not-in-bold is the Y-combinator!

Now let's rename `mk-length`

to `f`

and join the lines to make it more compact:

```
((lambda (le)
((lambda (f) (f f))
(lambda (f)
(le (lambda (x) ((f f) x))))))
(lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (length (cdr list))))))))
```

And we are done. We can now write down the definition of Y-combinator:

```
(define Y
(lambda (le)
((lambda (f) (f f))
(lambda (f)
(le (lambda (x) ((f f) x)))))))
```

And we can apply it to our anonymous `length`

function:

```
((Y (lambda (length)
(lambda (list)
(cond
((null? list) 0)
(else
(add1 (length (cdr list))))))))
'(a b c d e f g h i j))
; ==> 10
```

This is one of many derivations of the applicative-order Y-combinator. See the publication "The Why of Y" for another derivation.

## Comments

Greatpost!I think this approach to deriving the Y combinator really appeals to the hackers. I'll share with my students.

I teach a compilers class, and I wrote up an alternate derivation for my students that starts with fixed points; you might be interested:

Deriving a memoizing Y combinator for JavaScript from fixed point theory.

s/But this is now what we want, we a real length function that works for lists with any number of elements./But this is not what we want, we want a real length function that works for lists with any number of elements./

most programming languages that support anonymous functions can also name the anonymous function(might be an extra line of code but that's fine).

When is a situation where we "cannot give names" to an anonymous function?

tommy > there is a quirk in the Erlang syntax, wich makes that you can't have a local declaration of a recursive function (inside an expression) : actually, there is no syntax for named function declaration inside expressions, so you have to use anonymous functions. Hence, you can't declare a local recursive function without such a fixpoint operator. This is actually a bad idea, as you end up reimplementing a part of the runtime system in a probably slow and certainly cryptic way, but that was just for the example.

Peteris > this combinator is just one among the different fixpoint combinators available. There is for example the Turing Fixpoint combinator, wich is also derivable. Do you know where in your derivation you made a decision that lead you to this combinator instead of another ? I have a gut feeling that the derivation is not "canonical" and that one could probably use a slightly different succession of software-engineering-and-common-sense steps that would naturally lead to another combinator.

Also, I'm not sure I agree with your presentation of the Y-combinator. You describe it as something about recursion and anonymous functions. I believe it is much more general than that : fixpoint combinators allow you to control recursion. It is all about parametrizing the function you call over at your recursion sites, instead of hardcoding recursion using the language syntax (or lack of). From a software engineering point of view, you could say that it decouples two aspects of your function : the implementation (with domain-specific knowledge and all) and the "looping" / "tying the knot" process, wich is deferred to the fixpoint combinator.

You can do quite interesting (but somewhat hacky) things once you've decoupled those two separate concerns. Matt mentioned memoization (automagic memoization of inner recursive calls, wich have been exposed by the decoupling process), there are quite a few folk examples, see for example the following article That About Wraps it Up.

It's a funny hack, for example, to write in your favorite language a function that take any derecursified/decoupled function and, assuming the original function was tail-recursive, produce a tail-recursive implementation of it, using an explicit call stack, wich doesn't depend on the implementation of tail recursion in the compiler/runtime.

When we need to explain how the Y combinator works.

tommy i don't think you understand the point of this.

you need to realize that there are many applications of the Y combinator, many times the language you THINK "names" anon functions doesn't actually name them.

my only critique on this post is that you don't talk about omega, which to me is the easiest way to understand why Y works

This is nice! Just like the http://www.ece.uc.edu/~franco/C511/html/Scheme/ycomb.html explanation from some time in 1999. Good to see a refresh!!

Matt, thanks for noticing the typo. Gonna fix it.

Tommy, I can only think of lambda calculus that, if I understand correctly, uses Y-combinator to recurse.

Bluestorm, I am also familiar with Y! (Y-bang) combinator (from The Seasoned Schemer book). But I am not exactly sure how it's different from Y-combinator. I am still thinking about it. The derivation is similar to Y-combinator's but it's a bit different. The book says that it works the same as Y-combinator but then gives an example which Y-combinator processes but Y! combinator goes into an infinite loop. And thanks for the in-depth comment!

I wrote the Y! out when I was reading that chapter. I posted the code here: (define Y-bang ...) and example where Y-bang fails, but Y works.

Bruce, nice explanation! Thanks for sharing!

s/And now comes they they key trick/And now comes the key trick/

Paulo, fixed. :)

You didn't close a span:

Also, this: "Now the code in bold looks very much like the length function we need!" refers to nonexisting boldness.

Aside from that, an interesting read.

Roman, closed now, thanks for noticing! Also put the code in blog there. :)

The problem with the Y-Combinator is, that it cannot be typed. We have Y = \s.(\x.s(x x))(\x.s(x x)), such that Ys reduces to (\x.s(x x))(\x.s (x x)) which reduces to s((\x.s(x x))(\x.s(x x))) = s Ys, thus, having Ys reducing to all sssss...sYs, which means, Ys is not strongly normalizing - and since all typed lambda terms are strongly normalizing, it cannot be typed (actually, in the same way, no fixpoint combinator can be typed).

Anyway, even strongly typed programming languages like Haskell have recursion, even though mostly through calling the function by its name. In the theory of typed lambda terms, you therefore mostly axiomatically define recursion operators, like R:(a->a)->(a->bool)->a, which applies the first argument to a until (a->bool) gets true (there are a lot of other possibilities, though). Anyway, these cannot be expressed as terms, they must be defined axiomatically.

dasuxullebt, thanks for the insightful comment.

Wow, that's a really great post! After reading The Little Schemer I had problems with understanding Y-combinator, but now everything is clear :)

very good explanation, thanks. There is a recent talk worth watching about this esoteric exercises: http://www.infoq.com/presentations/Y-Combinator

I'm among readers of 'The Little Schemer' too :-)

I just completed the functional programming course in Scala offered by Martin Odersky on Coursera and find this concepts really delicious

Good read! You see all this clearly, and I dare ask for more :-)

One intriguing thing about chapter 9 is to discover that the first way that comes to head for simplifying the value function (by factoring out (make-length make-length)) results in a function which enters an infinite loop BEFORE even starting the actual computation.

Have you got an explanation for this (apart from just saying "hey, it happens!" as the book does)? Is this something general about factoring out f(f)? or does it happen just here?

I trust it is a general problem with f(f), but I've thought long about it without coming to a conclusion. Your view will be greatly appreciated.

Thank you and have a good day.

## Leave a new comment