After reading about the Y combinator in the book The Little Schemer, simply staring at its code becomes a great way to kill time. In the past few days, whenever I got bored (or need an excuse for procrastination), I spent hours trying to make sense out of its mysterious implementation. It’s like monad: everybody has a hard time reaching that "Eureka" moment, but once he finally groks it, he believes that he can explain it more clearly than the tutorial he read (which is often not the case).

So here I am writing yet another Y combinator tutorial. I think the reason why Y combinator is so hard to fathom is that when you read the code, you are facing its entire complexity. In fact, however, its derivation can be broken into a few fairly intuitive steps, and if each step is well-motivated, we no long need a "Eurika" to understand the whole thing.

The code snippets in this tutorial are written in Racket, but it should be trivial to translate them into other Lisp dialects or other dynamic functional languages. [1]

Y? Why?

Before deriving the Y combinator, let see what it is about. Consider the following simple recursive function that returns the length of a given list:

(define len
(λ (lst)
(cond
[(empty? lst) 0]
[else (+ 1 (len (rest lst)))])))

(len '(1 2 3 4 5)) ; => 5


Simple enough, except that if you think about it, it’s quite weird: len is used in the process of defining len itself - that is, the identifier len is used before it has been bound to anything, since the lambda to which we intend to bind it hasn’t yet been defined at that point! We should expect an "unbound identifier" error, but somehow the interpreter helps take care of it. But what if the interpreter is not so helpful? We have been able to write anonymous functions using lambda literals; can we write recursive anonymous functions, too?

Answering this question helps us better understand the essence of the language. If recursive anonymous functions is impossible, then define should be something essential to the language; otherwise it is merely a syntactic sugar, since everything can be anonymous.

We can’t write the len function as we like to, so what shall we do? Maybe we can write some other function mk-len that constructs len for us. Let’s start by simply wrapping a lambda without arguments around len to get a new function mk-len. Then all uses of len can be substituted with (mk-len).

(define mk-len
(λ ()
(λ (lst)
(cond
[(empty? lst) 0]
[else (+ 1 ((mk-len) (rest lst)))]))))

((mk-len) '(1 2 3 4 5)) ; => 5


We haven’t got rid of the recursion in mk-len yet. Well, to achieve the effect of recursion, mk-len needs to somehow refer to itself using some name anyway - but not the name we defined. Is there another way of binding a value to a name?

It turns out that there is: when we pass a value to a function as its argument, the value is automatically bound to the name of the corresponding parameter within the body of the function. That’s why ((λ (x) x) 123) returns 123 instead of an error "unbound identifier: x".

So how about passing the function mk-len as a argument of itself, so that it can refer to itself using the name of that parameter? Let’s see:

(define mk-len
(λ (self)
(λ (lst)
(cond
[(empty? lst) 0]
[else (+ 1 ((self self) (rest lst)))]))))

((mk-len mk-len) '(1 2 3 4 5)) ; => 5


With this simple trick, now mk-len can recurse without knowing its name! Let’s further extract a function U to do the self-application (mk-len mk-len):

(define U
(λ (f) (f f)))

((U mk-len) '(1 2 3 4 5)) ; => 5


Now we no longer need to give the function mk-len a name:

((U (λ (self)
(λ (lst)
(cond
[(empty? lst) 0]
[else (+ 1 ((self self) (rest lst)))]))))
'(1 2 3 4 5)) ; => 5


Congratulations! Now you know how to do recursion on anonymous functions!

By the way, the simple function U has a name: the U combinator. (Hold on, we’ll get to the Y soon.)

The Chicken Or The Egg Or The Fixed Point

While the U combinator allows us to write recursive anonymous functions, at the point of recursion we must always do the ugly self-application (self self). This is annoying. We would really want to write our mk-len as:

(define mk-len
(λ (rlen)
(λ (lst)
(cond
[(empty? lst) 0]
[else (+ 1 (rlen (rest lst)))]))))


Now the problem is to find some function F that transforms mk-len into another function that behaves exactly the same as len.

First let’s take a look at this new function for a moment. It takes a function rlen as its argument and returns another function. It basically says: "give me a function rlen that can be used to compute the length of (rest lst), then I will make a function for you that can compute the length of lst".

If lst is empty, what rlen you provide doesn’t matter, since ((mk-len rlen) empty) always returns 0 without calling rlen. But when lst is not empty, if we want ((mk-len rlen) lst) to return the length of lst, we need to provide a rlen that correctly computes the length of (rest lst). If we provide a function that can compute the length of a list with 10 elements, mk-len returns a function that can compute the length of a list with 11 elements. If rlen is able to compute the length of a list with 100 elements, (mk-len rlen) can compute the length of a list with 101 elements.

It follows that if we could provide a rlen such that (rlen (rest lst)) always correctly returns the length of (rest lst) for any non-empty lst, then ((mk-len rlen) lst) would always return the length of lst - that is, (mk-len rlen) would behave like the recursive len we defined at the beginning of this tutorial.

So what function should rlen be? Since lst can be any list, (rest lst) can also be any list (when lst is not empty). So we are basically asking for a rlen that is able to compute the length of any list - that is, rlen should also behave like len.

In other words, to transform mk-len into a function that behaves like len, we need to pass it a function rlen that behaves like len, but the only way to get such a function is by transforming mk-len. Now we seem to be stuck at this chicken-egg dilemma!

What should we do? Let’s just cheat by passing len to mk-len:

((mk-len len) '(1 2 3 4 5)) ; => 5


It works as we expected. Observe carefully, we can notice something curious: (mk-len len) not only behaves like len; in fact they are exactly the same function, i.e. len = (mk-len len). Therefore len is by definition the fixed point of mk-len. We can just define len in terms of mk-len following this definition, and it is equivalent to the original len.

(define len
(mk-len
(λ (lst) (len lst))))

(len '(1 2 3 4 5)) ; => 5


Now we can extract a function F that computes mk-len‘s fixed point len:

(define F
(λ (f)
(local [(define fx
(f (λ (x)
(fx x))))]
fx)))

((F mk-len) '(1 2 3 4 5)) ; => 5


And again the function mk-len can be anonymous:

((F (λ (rlen)
(λ (lst)
(cond
[(empty? lst) 0]
[else (+ 1 (rlen (rest lst)))]))))
'(1 2 3 4 5)) ; => 5


Congratulations (again)! Now you know how to do recursion on anonymous functions without the ugly (self self)! We are done…right?

Simply Y

Not quite, because we are still cheating: the definition of fixed point fx in F blatantly calls its own name to recurse. It seems that we are starting all over again. Are we making any progress?

In fact we have made progress: the users of our F function can now write recursive anonymous functions using a very clean syntax, and we are free to do whatever we want to eliminate recursion of fx in F, as long as its behavior doesn’t change.

What do we do? Remember how we used the simple U combinator to avoid recursion? Let’s try it on fx:

(define F
(λ (f)
(U (λ (self)
(f (λ (x)
((self self) x)))))))

((F mk-len) '(1 2 3 4 5)) ; => 5


The self-application (self self) is still somewhat ugly, but it only appears once in F. Someone needs to do the dirty job, so that all functions like mk-len can be pretty. That’s life.

Now if we put the definition of U in F, and change a few names, we get:

(define Y
(λ (f)
((λ (g) (g g))
(λ (g)
(f (λ (x) ((g g) x)))))))

((Y mk-len) '(1 2 3 4 5)) ; => 5


This Y is what we call: (drum roll) … the famous Y combinator discovered by Haskell Curry! There you have it. Not that hard, right?

In fact, if we define F in different ways and then use U to remove recursion, we can easily get many non-recursive functions that work just like Y. For example, because (F f) returns the fixed point of f, then by definition (F f) = (f (F f)). Let’s write the definition of F in this way [2]:

(define F
(λ (f)
(f (λ (x)
((F f) x)))))

((F mk-len) '(1 2 3 4 5)) ; => 5


Similarly, we can use U to remove the recursion:

(define F
(U (λ (self)
(λ (f)
(f (λ (x)
(((self self) f) x)))))))

((F mk-len) '(1 2 3 4 5)) ; => 5


Then, again, put in the definition of U and change some names:

(define Θ
((λ (f) (f f))
(λ (g) (λ (f)
(f (λ (x)
(((g g) f) x)))))))

((Θ mk-len) '(1 2 3 4 5)) ; => 5


This Θ is what we call: (drum roll) … the (less) famous Turing combinator discovered by Alan Turing!

I can go on and on. Such higher-order functions like Y and Θ that computes a fixed-point of other functions are called fixed-point combinators. In fact, there are infinitely many of them.

So Y is not so mysterious, and it’s not so special, either. I hope you are not too disappointed.

Wrap It Up (Pun Intended)

Now you know what Y combinator does: it computes the fixed point of another function. We can use Y to achieve anonymous recursion because a recursive function (like len) can be rewritten as the fixed-point of another function (like mk-len). So (Y mk-len) gives us len.

But there’s something more interesting. Consider some function f, in its definition it invokes another function g. If we want to control who f is talking to, we can make g an parameter. Now step back and stare at mk-len for a moment. Think about what we have done: we parameterized the point of recursion! When the normal len calls itself, it is very certain about it, and there’s nothing we can do. However, mk-len has no idea what rlen we pass to it! By writing recursion as fixed point computation, we gain the power and freedom of controlling what happens in a recursive call without modifying mk-len‘s code.

Consider that we want to print some log at each recursive call. For a normal recursive function like len, there’s nothing we can do but to modify the code. If there are 10 such functions, we need to modify each of them and create a lot of mess. On the other hand, for functions like mk-len, obviously we can simply modify our fixed-point combinator:

(define Y
(λ (f)
((λ (g) (g g))
(λ (g)
(f (λ (x) (begin (displayln x)
((g g) x))))))))
((Y mk-len) '(1 2 3 4 5))


This prints out the following:

(2 3 4 5)
(3 4 5)
(4 5)
(5)
()


Now if we want to print log after each recursive call, we would need to modify the Y again. If we want to do something else, modify again…

Well, we can do better than this. We can create wrappers around functions like mk-len that controls the recursion calls in different ways. Here’s an example:

(define log-start-wrapper
(λ (mk-len)
(λ (log)
(λ (x)
(begin
(printf "Start computing for: ~a~n" x)
((mk-len log) x))))))

((Y (log-start-wrapper mk-len)) '(1 2 3 4 5))


It prints out the following:

Start computing for: (1 2 3 4 5)
Start computing for: (2 3 4 5)
Start computing for: (3 4 5)
Start computing for: (4 5)
Start computing for: (5)
Start computing for: ()


Similarly, we can define log-start-wrapper that prints a log after each recursive call:

(define log-end-wrapper
(λ (mk-len)
(λ (log)
(λ (x)
(local [(define result
((mk-len log) x))]
(begin
(printf "Result for ~a is: ~a~n" x result)
result))))))

((Y (log-end-wrapper mk-len)) '(1 2 3 4 5))


It prints out:

Result for () is: 0
Result for (5) is: 1
Result for (4 5) is: 2
Result for (3 4 5) is: 3
Result for (2 3 4 5) is: 4
Result for (1 2 3 4 5) is: 5


We can even use several wrappers together:

((Y (log-start-wrapper (log-end-wrapper mk-len))) '(1 2 3 4 5))


This gives you:

Start computing for: (1 2 3 4 5)
Start computing for: (2 3 4 5)
Start computing for: (3 4 5)
Start computing for: (4 5)
Start computing for: (5)
Start computing for: ()
Result for () is: 0
Result for (5) is: 1
Result for (4 5) is: 2
Result for (3 4 5) is: 3
Result for (2 3 4 5) is: 4
Result for (1 2 3 4 5) is: 5


In this example, wrappers are like decorators for recursive function.

With such flexibility, there are actually more funky stuff we can do with wrappers. For example, we can cache the result of recursive calls to avoid redundant computation (a.k.a memoization), or modify the result (or even change the type of the result) of the recursive calls. You can read this interesting paper "That About Wraps it Up" for more information.

 [1] In the cases of statically typed languages, it gets more complicated or even impossible. Let’s just ignore them in this tutorial for the sake of clarity.
 [2] A subtle point here: we write (λ (x) ((F f) x)) instead of just (F f). This is because otherwise in order to pass (F f) as an argument of f, we first need to evaluate (F f), which expands to (f (F f)), to evaluate which we again need to evaluate first evaluate (F f)… The program will hang until it finally crashes from a stack overflow. Wrapping a lambda around (F f) delays its evaluation, making sure x passed to (F f) is evaluated first.