The Lambda Calculus

A brief history

The Lambda Calculus was created in 1928 by Alonzo Church. (To be expanded).

At first glance, the Lambda Calculus formalises the fundamental concept of function application from mathematics. Consider the function $ f(x) = x + 1$ . Then $ f(2)$ denotes the application of the function $ f$ to argument $ 2$ . In order to compute the result, all occurrences of variable $ x$ are replaced by the parameter (here $ 2$ ), in the body of the function. The result is $ 2+1$ , which is subsequently computed using the laws of arithmetic.

At its core, the Lambda Calculus is not (apriori) designed to describe e.g. addition, or other mathematical operators (however, as we shall further see, it is possible to encode numbers and a subset of arithmetic in t he Lambda Calculus).

The Lambda Calculus defines three constructs:

variables (henceforth denoted as $ x, y, \ldots$ )
functions (similar to \x→… from Haskell)
function applications (similar to function calls (f x) from Haskell)

Formally, let $ V$ stand for a finite set whose elements are called variables. A $ \lambda$ -expression $ e$ is recursively defined as:

$ e ::= x \in V \mid \lambda x.e \mid (e_1\;e_2)$

Examples of $ \lambda$ -expressions:

$ x$
$ \lambda x.x$
$ (\lambda x.\lambda y.x\;z)$

The set $ V$ of variables will henceforth be implicit.

Note that, in the Lambda Calculus, functions are curried (each function takes one parameter at a time). Thus, a Haskell function \x y→body would be written as $ \lambda x\lambda y.body$ .

What is a lambda-expression?

Lambda-expressions are not designed to suit certain programming applications, hence there is no predefined concept of primitive value (integer or char), nor are there special functions to construct lists. Hence, at first glance, most lambda expressions do not seem to stand for something tangible (yet). However, we can recognise certain basic functions such as:

$ \lambda x.x$ - the identity function
$ \lambda x.y$ - the constant function which allways returns $ y$
$ \lambda x.\lambda y.x$ (resp. $ \lambda x.\lambda y.y$ ) - selector functions, which are expected to be called with two parameters (curried), and return the first (resp. second) one.

In the Lambda Calculus, the naming scheme for the variables is unimportant. For instance, $ \lambda x.x$ and $ \lambda y.y$ stand for the same identity function. Similarly, $ \lambda x.\lambda x.x$ and $ \lambda x.\lambda y.y$ also stand for the same function which, called - r eturns the identity function.

Step 1: reduction

The algebraic function application $ f(c)$ of function $ f(x)=body$ on parameter $ par$ is encoded in the Lambda calculus as the lambda-expression:

$ (\lambda x.body\; par)$

where $ body,par$ are $ \lambda-expressions$ and $ x$ is a variable.

Informally, computing the function call or reducing the expression amounts to:

the substitution of all occurrences of variable $ x$ with parameter $ par$ in the body of the function.

We denote such a substitution by $$ body[par / x]$$, where $ body,par$ are $ \lambda$ -expressions and $ x$ is a variable.

We define $ body[par / x]$ recursively, over all types of lambda expressions, as follows:

a. $ x[par/x] = par$
b. $ y[par/x] = y$
c. $ \{\lambda x.body\}[par/x] = \lambda x.body[par/x]$ where curled brackets denote the scoping of the substitution.
d. $ (e_1\;e_2)[par/x] = (e_1[par/x]\;e_2[par/x])$

Examples:

$ (\lambda x.(x\;y)\;y)[z/x] = (\{\lambda x.(x\;y)\}[z/x]\; y[z/x])=$
$ = (\lambda x.(x\;y)[z/x]\;y) = (\lambda x.(x[z/x]\;y[z/x])\;y)=(\lambda x.(z\;y)\;y)$
$ \{\lambda x.\lambda x.x\}[z/x] = \lambda x.\lambda x.z $

These examples illustrate a conceptual problem with point c. from our definition. Suppose we would like to reduce:

$$ (\lambda x.\lambda x.x\; y)$$

On the left-hand side we have a function which returns the identity function, hence its call should produce $ \lambda x.x$ . However, if we substitute $ x$ by $ y$ in the body of the function, i.e. compute $ \{\lambda x.x\}[y/x]$, by c. we get $ \lambda x.y$ , which is not what we would expect.

The problem here is related to scoping. In general, in a programming language, each variable is in the scope of a:

variable definition (e.g. int x = ….), or
function definition (e.g. int f(int x) {…}),

In the Lambda Calculus, we do not have variable definitions, hence variables can be free of a particular value. However, scoping is still required. In the previous example, the single occurrence of variable $ x$ is the parameter of the function $ \lambda x.x$ , not of $ \lambda x.\lambda x.x$ . Hence - informally, replacing it by $ y$ in $ \lambda x.x$ , as done above - should be illegal.

Free and bound variables

We label each occurrence of $ x$ in the following expression: $ (\lambda x.\lambda x.x_1\;x_2)$

Informally, the occurrence $ x_1$ is bound, because $ x_1$ is the parameter of function $ \lambda x.x_1$ . However, $ x_2$ is free - it does not designate the parameter of some function.

An occurrence of a variable can either be bound or free. Suppose $ x_i$ is an occurrence of variable $ x$ in the $ \lambda$ -expression $ e$ . Then, if:

$ e=x$ , then $ x_i$ is free in $ e$ .
$ e=\lambda x.e'$ is bound in $ e$ .
$ e=(e_1\;e_2)$ is bound/free in $ e$ iff it is bound/free in $ e_1$ or $ e_2$ (note that $ x_i$ can occur either in $ e_1$ or in $ e_2$ ).

A variable is bound iff all its occurrences are bound.

For example, in the expression:

$ e = (\lambda x.\lambda x.x\;x)$

we have two occurrences of variable $ x$ : the first is bound in $ e$ , however the second is free. Hence variable $ x$ is free.

The definition of a free variable makes more sense when judged in the context of a reduction. Let us replace the phrase:

the substitution of all occurrences of variable $ x$ with parameter $ par$ in the body of the function.

with:

the substitution of all free occurrences of variable $ x$ with parameter $ par$ in the body of the function.

in the informal text for function application, and consider the following application:

$$ (\lambda x.(x\;\lambda x.x)\; z)$$

the first occurrence of $ x$ is the formal parameter of function $ \lambda x.(x\;\lambda x.x)$
the second occurrence of $ x$ is the formal parameter of function $ \lambda x.x$

Thus, $ x$ is free in $ (x\;\lambda x.x)$ , and precisely those free occurrences need to be replaced by the substitution. The expected result of the application should be: $ (z\;\lambda x.x)$ .

Let us fix point c. with this in mind:

c1. $ \{\lambda y.body\}[par/x] = \lambda y.body[par/x]$ if $ x \neq y$
c2. $ \{\lambda x.body\}[par/x] = \lambda x.body$

However, point c1. still suffers from a problem. To see this, consider the expression:

$$ \lambda y.(\lambda x \lambda y.x\;y) $$

Here, we can reduce the inner expression and by applying the c1. substitution rule we would obtain: $ \lambda y.\lambda y.y$ . However, this result is incorrect. By re-examining the expression, we observe that $ \lambda x.\lambda y.x$ is a selector. Application $ (\lambda x \lambda y.x\;y)$ should return a constant function, not the identity function.

At second glance, we can observe that, by simply applying c1, the scoping of y has changed: $ y$ was initially the formal parameter of $ \lambda y.(\lambda x \lambda y.x\;y)$ . After the reduction, it became the formal parameter of $ \y.y$ , hence changing the entire meaning of the expression.

To solve this, we should:

rename all $ y$ s in the function body $ \lambda y.x$ by a new, unused variable. The result is: $ $ [ \lambda y.(\lambda x \lambda z.x\;y) $
next, we can proceed w

ith c1. as before.

To capture this, we replace c1 by:

c1'. $ \{\lambda y.body\}[par/x] = \lambda y.body[par/x]$ if $ x \neq y$ and $ y$ is not free in $ par$
c2'. $ \{\lambda y.body\}[par/x] = \{\lambda z.body[z/y]\}[par/x]$ if $ x \neq y$ and $ y$ is free in $ par$

Step 2. reduction order(s)

In Step 1. we have satisfactorily defined the reduction: $$ (\lambda x.body\;par) \Rightarrow body[x/par] $$

of a function application, by a thorough definition of substitution of all free occurrences of a variable (x) in an expression (body).

In step 2. we look at more complicated expressions, in which several reductions can be performed at the same time. Let $ e_1 \Rightarrow e_2 \Rightarrow \ldots \Rightarrow e_n$ be a sequence of zero or more reductions (hence $ n$ can be 1). We write $ e_1 \Rightarrow^* e_n$ - hence $ e_1$ reduces to $ e_n$ in zero or more steps.

Consider:

$$ \underline{(\lambda x.z\;\underline{(\lambda x.(x\;x)\;\lambda y.y)})}$$

where we have underlined the two possible reductions. If we start with the second reduction, we obtain:

$ (\lambda x.z\;\underline{(\lambda y.y\;\lambda y.y)})$ which by the underlined reduction becomes:
$ (\lambda x.z\;\lambda y.y)$ which reduces to
$ z$

on the other hand, had we started with the first reduction in our example, we would obtain $ z$ in a single step. The following result illustrates that selecting which application to reduce first, will not affect the ultimate result obtained via a sequence of reductions:

Theorem (Church Rosser)

Consider $ e,e_1,e_2$ be $ \lambda-expressions$ such that $ e \Rightarrow^* e_1$ and $ e \Rightarrow^* e_2$ such that $ e_1\neq e_2$ . Then there exists $ e'$ such that $ e_1 \Rightarrow^* e'$ and $ e_2 \Rightarrow^* e'$ .

However, it is possible to choose a reduction sequence which does not terminate. Consider:

$$ \underline{(\lambda x.z\;\underline{(\lambda x.(x\;x)\;\lambda y.(y\;y))})}$$

If we select the outer application first, the result is, as before $ z$ . However, if we continuously select the inner application, we obtain a non-terminating reduction sequence.

There are two strategies for application selection:

normal strategy (informally: reduce the function first)
applicative strategy (informally: reduce the parameter first)

In our first example, we have applied the aplicative strategy: we have always reduced the function parameters first. In our second example, we have applied the normal strategy: we have evaluated the function and ignored the parameter, hence avoiding non-termination.

We write $ e \Rightarrow^n e'$ (resp. $ e \Rightarrow^a e'$ ) whenever $ e$ reduces to $ e'$ in zero or more reduction steps, via the normal (resp. applicative) strategy.

We now formally define $ \Rightarrow^n$ and $ \Rightarrow^n$ :

Applicative evaluation

We describe the strategies using the notation:

$$ \frac{A}{B} $$

Which expresses that $ B$ is true whenever $ A$ is true.

$$ (TFun) \frac{e \text{ does not contain applications}}{(\lambda x.body\; e) \Rightarrow^a body[e/x]} $$

$$ (TApp1) \frac{e \Rightarrow^a e'}{(\lambda x.body\; e) \Rightarrow^a (\lambda x.body\;e')} $$

The rule $ (TApp1)$ ensures that, if a parameter $ e$ from an application can be reduced to $ e'$ , then it will be done so before computing the application. This can only be done (via $ (TFun)$ ), only after parameters can no longer be reduced.

$$ (TApp2) \frac{e \Rightarrow^a e'}{(e\;e'')\Rightarrow^a (e'\;e'')}$$

The rule $ (TApp2)$ ensures that, whenever the right-hand side is not a function but is reducible, it will be reduced.

In the following example, we illustrate a sequence of reduction rules via the applicative strategy.

$$ ((\lambda x.(x\;x)\;\lambda y.y)\;(\lambda x.x\;\lambda x.x))$$

In our particular example, we cannot apply $ (TApp1)$ , since the right-hand side of the application is not a function. However, we can apply $ (TApp2)$ : since $ (\lambda x.(x\;x)\;\lambda y.y) \Rightarrow^a (\lambda y.y \lambda y.y)$ , the expression becomes:

$$ ((\lambda y.y\;\lambda y.y)\;(\lambda x.x\;\lambda x.x))$$

as before, we apply $ (TApp2)$ and obtain:

$$ (\lambda y.y\;(\lambda x.x\;\lambda x.x))$$

since the evaluation is applicative, we are forced to apply $ (TApp1)$ :

$$ (\lambda y.y\;\lambda x.x)$$

finally, we obtain $ \lambda x.x$ .

Normal evaluation

We only require two rules to specify normal evaluation:

$$ (TFun) \frac{}{(\lambda x.body\; e) \Rightarrow^a body[e/x]} $$

$$ (TApp) \frac{e \Rightarrow^a e'}{(e\;e'')\Rightarrow^a (e'\;e'')}$$

Observe that parameters are never evaluated under the normal strategy. We illustrate it on the previous example:

$$ ((\lambda x.(x\;x)\;\lambda y.y)\;(\lambda x.x\;\lambda x.x)) \Rightarrow^n$$, via $ (TApp)$ : $$ ((\lambda y.y\;\lambda y.y)\;(\lambda x.x\;\lambda x.x)) \Rightarrow^n$$, via $ (TApp)$ : $$ (\lambda y.y\;(\lambda x.x\;\lambda x.x)) \Rightarrow^n$$, via $ (TFun)$ : $$ (\lambda x.x\;\lambda x.x) \Rightarrow^n$$, via $ (TFun)$ : $$ \lambda x.x $$

A fundamental property of normal evaluation is that, for each $ \lambda$ -expression $ e$ , if there exists an expression $ e'$ which is irreducible and $ e\Rightarrow^* e'$ (i.e. $ e$ reduces to $ e'$ via some arbitrary sequence of reductions), then $ e\Rightarrow^n e'$ (i.e. by applying the normal evaluation strategy, we will obtain $ e'$ .

Lazy evaluation

Consider the following example:

$$ (\lambda x.(x\;x)\;(\lambda y.y\ \lambda y.y))$$

By applying the normal evaluation strategy, we obtain: $$ ((\lambda y.y\ \lambda y.y)\;(\lambda y.y\ \lambda y.y)) \Rightarrow^n$$ $$ (\lambda y.y\;(\lambda y.y\ \lambda y.y)) \Rightarrow^n$$ $$ (\lambda y.y\ \lambda y.y) \Rightarrow^n$$ $$ \lambda y.y$$

The careful reader will observe that $ (\lambda y.y \lambda y.y)$ - i.e. the parameter of the outer function, is evaluated twice, which is really inefficient.

The lazy evaluation strategy improves on this very issue. We informally illustrate Haskell lazy evaluation on this example. Note however, that lazy evaluation is a programming language-related strategy, and cannot be properly defined in the Lambda Calculus.

We start of with: $$ (\lambda x.(x\;x)\;(\lambda y.y\ \lambda y.y))$$

In Haskell, the parameter $ par = (\lambda y.y\ \lambda y.y)$ is a thunk, i.e. a memory object which contains:

the expression itself, here $ (\lambda y.y \lambda y.y)$
a value, initially unknown ?

In the above evaluation we apply the normal strategy and obtain: $$ (par\;par)$$

only here (in Haskell) the occurrences of $ par$ are not independent as in the Lambda Calculus - they point to the same thunk. Now, $ par$ is not a function, but it can be evaluated. The result is $ \lambda y.y$ . At this point, the value of the expression is stored in the thunk. Since the result is the identity function, the result is:

$ $$

but now $ par$ has already been evaluated hence the final result is immediately $ \lambda y.y$ , with no other (re-) evaluation of the inner expression.