Types in functional programming

Strong vs weak typing

Consider the following example from Javascript, where the operator + is applied on arguments of different types. The result is given below:

[] + []  =  ""  
[] + {}  = "[object]"
{} + []  = 0
{} + {}  = null

where [] is the empty array and {} is the empty object. The result is quite surprising, and unpredictable. To explain it, we need to look at the implementation of the Javascript interpreter. In Javascript, there is a distinction between primitive and non-primitive values. Arrays (like []) and objects (like {}) are considered non-primitive. Since the operation + is performed on primitive values, Javascript attempts to convert the operands to primitive. A pseudocode for this is shown below:

     Convert(value) {
        if (value.valueOf() is a primitive) return it          //conversion to integer works?
        else if (value.toString() is a primitive) return it    //else convert to string, if possible 
             else error
     }

The conversion of [] to string yields the empty string, while the conversion of {} to string yields “[object]”. This explains the first two results.

For the latter, we must know that {} can also be interpreted as a code-block, and this is the case here. Hence, the Javascript interpreter sees:

 
+ []
+ {}

where + is interpreted as a unary operator. JavaScript has such an implementation overloading for +. Without going into more details, + x behaves like a conversion to Number for x. Thus, [] converted to Number is 0, while {} converted to Number is NaN (not a number).

Morale

The Javascript treatment of + is unimportant by itself, and it has some advantages for the programmer (although our example shows otherwise). However, we can note that, by allowing any kind/type of operand for +:

  • complicates the semantics of the language (the programmer needs to known about conversion to primitives and conversion to numbers)
  • reduces errors (no Javascript error was signalled above), but makes programs more prone to bugs.
  • makes a program more expressive (+ can be used in several ways) but sometimes difficult to use.

In programming language design, there is a fundamental tension between expressiveness and typing. Typing is a means for enforcing constraints on what is deemed as a correct program. It may be the case that, conceptually correct programs may not be accepted as correct from a typing perspective (this situation does seldom occur in Haskell).

A strongly-typed programming language, is more coercive with respect to typing constraints. In functional programming:

  • Haskell
  • Scala

are considered strongly-typed. In imperative / OOP programming, the languages:

  • Java
  • C++
  • Scala

are considered strongly-typed.

A weakly-typed programming language is more relaxed w.r.t. typing constraints. For instance, in functional programming:

  • Racket (Lisp)
  • Clojure

are considered weakly-typed. In imperative / OOP programming, the languages:

  • C
  • Python
  • PHP
  • Javascript

(and especially the latter) are considered weakly-typed. In the latter languages, types are usually reduced to primitive constructs (programmers cannot create new types), or the type construction procedure is very simplistic. For instance, in Racket (formerly known as Scheme), which weakly-typed:

  • lists can hold any values (e.g.) '(1 #t “String”), which means that the type for lists is primitive (not composed), and is simply list.
  • functions can return values of different types - the type for functions is also primitive - #procedure.

However, type verification is not absent in weakly-typed languages, including Racket/Scheme. For instance, the call (+ 1 '()) will produce an error since the plus operator is called on values with invalid types.

We shall discuss Scheme/Racket in more detail later. It is worth noting that, in Racket there exists extensions (Typed Racket) which allow programmers to define and compose types to some extent.

The weakly-typed vs strongly-typed classification is not rigid and is subject to debate and discussion. There is no objective right-answer. For instance, here, the language C is viewed as weakly-typed. We illustrate a small motivating example:

int f (int x) {
    if (x != 0)
        return 1;
    return malloc(100);
}

In principle, the function f can return an integer, or a pointer to any object (of any type), and this is allowed by the compiler (which does issue a warning). Compared to, e.g. Java, this makes the C type system more relaxed.

There are valid arguments for considering C as strongly-typed and, as said before, there is no right answer.

Compile-time vs runtime typing

This classification is done w.r.t. the moment when type inference occurs:

  • during the compilation of a program
  • at runtime

The former is also called static typing, while the latter - dynamic typing. In the literature, static and dynamic typing are also used with other meanings, hence, here, we prefer the terms compile-time and runtime.

The imperative/OOP languages:

  • Java
  • Scala
  • C\C++

perform compile-time type checking, as well as the functional languages:

  • Haskell
  • Scala

The imperative/OOP languages:

  • Python
  • PHP
  • Javascript

perform runtime type checking, as well as the functional languages:

  • Scheme/Racket (Lisp)
  • Clojure

Compile-time type checking is preferred for strongly-typed languages: the complexity of type verification is delegated to the compiler. Conversely, in weakly-typed languages, type verification is simpler, hence it can be performed by the interpreter, at runtime. Sometimes, a compiler may be absent. This is not a golden rule but merely an observation.

While runtime type checking is simpler to deploy, it has the disadvantage of not capturing typing bugs. Consider the following program in Racket:

(define f (lambda (x) (if x 1 (+ 1 '()))))
(f #t)

The function receives a value x which must be a boolean. In the program above there is no error, even though (+ 1 '()) is an incorrectly-typed function call. However, in the execution of the program (i.e. (f #t)), the else branch of the function is not reached, hence no typing verification is performed.

Hence, runtime type checking only catches bugs on the current program execution trace.

Haskell implements the Hindley-Milner type inference algorithm. In what follows, we present a simplified, and more easy-to-follow, but incomplete algorithm which serves as an illustration for the main concepts underlying the original one.

Intro

Consider the following expressions, and their types:

\x -> x + 1 :: Integer -> Integer
\x -> if x then 0 else 1 :: Bool -> Integer
zipWith (:) :: [a] -> [[a]] -> [[a]]
\f -> (f 1) + (f 2) :: (Integer -> Integer) -> Integer

We can see via the above example that types are constructed according to the following grammar:

type ::= <const_type> | <type_var> | (<type>) | type -> type | [<type>]

This grammar only tells half the story regarding Haskell typing, however, for the purposes of this lecture, this view suffices. According to the above grammar, types can be:

  • constant types (e.g. Integer, String)
  • type variables (e.g. a - which are usually used to designate any possible type, as in [a])
  • function types (e.g. Integer → Integer or (Integer → Integer) if this type appears in a larger type expression)
  • list types (e.g. [a]).
  • any combination of the above rules.

Expression trees

We assume each Haskell expression is constructed via the following construction rules:

  • functional application (e.g. take 2 [1,2,3])
  • function definition (e.g. \x→[x+1])

We note that many Haskell definitions can be seen as such. For instance:

g f = (f 1) + 1
<code>
 
can be seen as:
 
<code haskell>
g = \f -> (f 1) + 1

Hence, we can take any Haskell expression, and construct a tree, in which each node represents a construction rule, and children represent sub-expressions:

  • for functional applications, the children are the function name and the parameters
  • for function definition, the children are the variable/variables and the function body.

For example, consider the expression tree for the function g shown previously (we use tabs to illustrate parent/child relationship):

\f -> (f 1) + 1
  f
  (f 1) + 1
    (+)
    (f 1)
       f 
       1
    1

In what follows, we shall use expression trees to perform type inference.

Typing rules

We introduce the following typing rules:

Rule (TVar)

If v is bound to a constant expression e of type ct, then v :: ct

Rule (TFun)

If x :: t1 and e :: t2 then \x → e :: t1 → t2

Rule (TApp)

If f :: t1 → t2 and e :: t1 then (f e) :: t2

The above rule can be naturally generalised:

If f :: t1 → t2 → … → tn → t and e1 :: t1, …, en :: tn then (f e1 … en) :: t

In what follows, we will use these rules to make judgements on our types. These rules have a twofold usage:

  • deduce the type of an expression, based on existing knowledge
  • make hypotheses regarding the type of an expression (we shall not focus on this aspect in the presentation))

Type inference stage 1: Expression tree construction

Type inference for an expression e can be seen as having two stages. In the first stage, we:

  • construct the expression tree of e
  • make hypotheses regarding the types of yet untyped expressions (e.g. variables).

We illustrate the first stage on the previous definition of g:

\f -> (f 1) + 1 :: ?
  f :: tf (here we introduce tf as the type of f. This is a type hypothesis)
  (f 1) + 1 :: ?
    (+) :: ?
    (f 1) :: ?
       f :: t1 -> t2 (this is another hypothesis, stemming from the fact that f is applied on 1)
       1 :: ?
    1 :: ?

Type inference stage 2: Rule application

In this stage, we start from the previously-build tree, and:

  • apply typing rules to deduce types for new sub-expressions
  • perform type unification: This is one aspect that we shall not elaborate on, in this lecture.

This is equivalent to a bottom-up tree traversal: We start from the leaves, and progress to the root (i.e. the expression to be typed).

Without delving into details, type unification is an important ingredient, because it allows us to infer the most general type of an expression. Consider the following Haskell expression: \f x → (f x,f 1), which defines a function that takes another function f, a value x and returns a pair: the first element of the pair is the application f x, while the second - f 1:

  • initially, we do not know what f is, hence it has the most general type (say) tf - it can be anything;
  • judging by the application f x, we deduce that f must be a function, of type a→b, where x::a
  • judging by the application f 1, we deduce that a must be an Integer

The unification process combines the information collected so far:

  • tf must unify (coincide with) a → b
  • a must unify with Integer

The final type for the expression is: (Integer → b) → Integer → (b, b)

We illustrate the second stage of the type inference on the same example:

\f -> (f 1) + 1 :: tf -> Integer
  f :: tf (via (TFun))
  (f 1) + 1 :: Integer (via (TApp))
    (+) :: Integer -> Integer -> Integer (this we know from Prelude, after the type synthesis of (+), also t2 must unify with Integer)
    (f 1) :: t2 (via (TApp); also, t1 must unify with Integer)
       f :: t1 -> t2
       1 :: Integer (via (TVar), from Prelude)
    1 :: Integer (via (TVar), from Prelude)

The (pseudo)-algorithmic procedure concludes with the following answer:

  • g :: tf → Integer, where
    • tf unifies with t1 → t2
    • t2 unifies with Integer
    • t1 unifies with Integer

After unification, the result is shown to the programmer: g :: (Integer → Integer) → Integer

Exercises. Find the type of the following expressions, by applying the type synthesis pseudo-algorithm:

  • map (\x→[x+1])
  • \f x→ if x then f x else x
  • g f x = x && (f x)
  • g f = f (g f)