====== Types in functional programming ====== ===== Typing in programming languages ===== ==== Strong vs weak typing ==== Consider the following example from Javascript, where the operator ''+'' is applied on arguments of different types. The result is given below: [] + [] = "" [] + {} = "[object]" {} + [] = 0 {} + {} = null where ''[]'' is the empty array and ''{}'' is the empty object. The result is quite surprising, and unpredictable. To explain it, we need to look at the implementation of the Javascript interpreter. In Javascript, there is a distinction between **primitive** and **non-primitive** values. Arrays (like ''[]'') and objects (like ''{}'') are considered non-primitive. Since the operation ''+'' is performed on primitive values, Javascript attempts to **convert** the operands to primitive. A pseudocode for this is shown below: Convert(value) { if (value.valueOf() is a primitive) return it //conversion to integer works? else if (value.toString() is a primitive) return it //else convert to string, if possible else error } The conversion of ''[]'' to string yields the empty string, while the conversion of ''{}'' to string yields "[object]". This explains the first two results. For the latter, we must know that ''{}'' can also be interpreted as a code-block, and this is the case here. Hence, the Javascript interpreter sees: + [] + {} where ''+'' is interpreted as a **unary** operator. JavaScript has such an implementation overloading for ''+''. Without going into more details, ''+ x'' behaves like a //conversion to Number// for x. Thus, ''[]'' converted to ''Number'' is ''0'', while ''{}'' converted to ''Number'' is ''NaN'' (not a number). === Morale === The Javascript treatment of ''+'' is unimportant by itself, and it has some advantages for the programmer (although our example shows otherwise). However, we can note that, by allowing any **kind/type** of operand for ''+'': * **complicates the semantics** of the language (the programmer needs to known about conversion to primitives and conversion to numbers) * **reduces errors** (no Javascript error was signalled above), but **makes programs more prone to bugs**. * makes a program **more expressive** (''+'' can be used in several ways) but sometimes difficult to use. In programming language design, there is a fundamental tension between **expressiveness** and **typing**. Typing is a means for **enforcing constraints on what is deemed as a //correct program//**. It may be the case that, //conceptually correct// programs may not be accepted as **correct** from a typing perspective (this situation does seldom occur in Haskell). A **strongly-typed** programming language, is more coercive with respect to **typing constraints**. In functional programming: * Haskell * Scala are considered **strongly-typed**. In imperative / OOP programming, the languages: * Java * C++ * Scala are considered **strongly-typed**. A **weakly-typed** programming language is more relaxed w.r.t. **typing constraints**. For instance, in functional programming: * Racket (Lisp) * Clojure are considered **weakly-typed**. In imperative / OOP programming, the languages: * C * Python * PHP * Javascript (and especially the latter) are considered weakly-typed. In the latter languages, types are usually reduced to primitive constructs (programmers cannot create new types), or the type construction procedure is very simplistic. For instance, in ''Racket'' (formerly known as ''Scheme''), which weakly-typed: * lists can hold any values (e.g.) '''(1 #t "String")'', which means that the type for lists is **primitive** (not composed), and is simply ''list''. * functions can return values of different types - the type for functions is also **primitive** - ''#procedure''. However, type verification is not absent in weakly-typed languages, including Racket/Scheme. For instance, the call ''(+ 1 '())'' will produce an error since the plus operator is called on values with invalid types. We shall discuss Scheme/Racket in more detail later. It is worth noting that, in Racket there exists extensions (Typed Racket) which allow programmers to define and compose types to some extent. The **weakly-typed** vs **strongly-typed** classification is not rigid and is subject to debate and discussion. There is no objective //right-answer//. For instance, here, the language ''C'' is viewed as weakly-typed. We illustrate a small motivating example: int f (int x) { if (x != 0) return 1; return malloc(100); } In principle, the function ''f'' can return an integer, or a pointer to any object (of any type), and this is allowed by the compiler (which does issue a warning). Compared to, e.g. Java, this makes the ''C'' type system more relaxed. There are valid arguments for considering ''C'' as strongly-typed and, as said before, there is no right answer. ==== Compile-time vs runtime typing ==== This classification is done w.r.t. **the moment** when type inference occurs: * during the **compilation** of a program * at **runtime** The former is also called **static typing**, while the latter - **dynamic typing**. In the literature, **static** and **dynamic** typing are also used with other meanings, hence, here, we prefer the terms **compile-time** and **runtime**. The imperative/OOP languages: * Java * Scala * C\C++ perform compile-time type checking, as well as the functional languages: * Haskell * Scala The imperative/OOP languages: * Python * PHP * Javascript perform runtime type checking, as well as the functional languages: * Scheme/Racket (Lisp) * Clojure Compile-time type checking is preferred for strongly-typed languages: the complexity of type verification is delegated to the compiler. Conversely, in weakly-typed languages, type verification is simpler, hence it can be performed by the interpreter, at runtime. Sometimes, a compiler may be absent. This is not a golden rule but merely an observation. While runtime type checking is simpler to deploy, it has the disadvantage of not capturing typing bugs. Consider the following program in Racket: (define f (lambda (x) (if x 1 (+ 1 '())))) (f #t) The function receives a value ''x'' which must be a boolean. In the program above there is no error, even though ''(+ 1 '())'' is an incorrectly-typed function call. However, in the execution of the program (i.e. ''(f #t)''), the else branch of the function is not reached, hence no typing verification is performed. Hence, runtime type checking only catches bugs **on the current program execution trace**. ===== Typing in Haskell ===== ===== Type inference in Haskell ===== Haskell implements the [[https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system | Hindley-Milner type inference algorithm]]. In what follows, we present a simplified, and more easy-to-follow, but incomplete algorithm which serves as an illustration for the main concepts underlying the original one. ==== Intro ==== Consider the following expressions, and their types: \x -> x + 1 :: Integer -> Integer \x -> if x then 0 else 1 :: Bool -> Integer zipWith (:) :: [a] -> [[a]] -> [[a]] \f -> (f 1) + (f 2) :: (Integer -> Integer) -> Integer We can see via the above example that types are constructed according to the following grammar: type ::= | | () | type -> type | [] This grammar only tells half the story regarding Haskell typing, however, for the purposes of this lecture, this view suffices. According to the above grammar, types can be: * **constant types** (e.g. ''Integer'', ''String'') * **type variables** (e.g. ''a'' - which are usually used to designate **any** possible type, as in ''[a]'') * **function types** (e.g. ''Integer -> Integer'' or ''(Integer -> Integer)'' if this type appears in a larger type expression) * **list types** (e.g. ''[a]''). * any combination of the above rules. ==== Expression trees ==== We assume each Haskell expression is constructed via the following //construction rules//: * **functional application** (e.g. ''take 2 [1,2,3]'') * **function definition** (e.g. ''\x->[x+1]'') We note that many Haskell definitions can be seen as such. For instance: g f = (f 1) + 1 can be seen as: g = \f -> (f 1) + 1 Hence, we can take any Haskell expression, and construct a **tree**, in which each node represents a **construction rule**, and children represent sub-expressions: * for functional applications, the children are the **function name** and **the parameters** * for function definition, the children are **the variable/variables** and the **function body**. For example, consider the expression tree for the function ''g'' shown previously (we use tabs to illustrate parent/child relationship): \f -> (f 1) + 1 f (f 1) + 1 (+) (f 1) f 1 1 In what follows, we shall use expression trees to perform type inference. ==== Typing rules ==== We introduce the following **typing rules**: === Rule (TVar) === If ''v'' is bound to a constant expression e of type ''ct'', then ''v :: ct'' === Rule (TFun) === If ''x :: t1'' and ''e :: t2'' then ''\x -> e :: t1 -> t2'' === Rule (TApp) === If ''f :: t1 -> t2'' and ''e :: t1'' then ''(f e) :: t2'' The above rule can be naturally generalised: If ''f :: t1 -> t2 -> ... -> tn -> t'' and ''e1 :: t1'', ..., ''en :: tn'' then ''(f e1 ... en) :: t'' In what follows, we will use these rules to make judgements on our types. These rules have a twofold usage: * **deduce** the type of an expression, based on //existing knowledge// * **make hypotheses** regarding the type of an expression (we shall not focus on this aspect in the presentation)) ==== Type inference stage 1: Expression tree construction ==== Type inference for an expression ''e'' can be seen as having two stages. In the first stage, we: * construct the expression tree of ''e'' * make **hypotheses** regarding the types of **yet untyped** expressions (e.g. variables). We illustrate the first stage on the previous definition of ''g'': \f -> (f 1) + 1 :: ? f :: tf (here we introduce tf as the type of f. This is a type hypothesis) (f 1) + 1 :: ? (+) :: ? (f 1) :: ? f :: t1 -> t2 (this is another hypothesis, stemming from the fact that f is applied on 1) 1 :: ? 1 :: ? ==== Type inference stage 2: Rule application ==== In this stage, we start from the previously-build tree, and: * **apply typing rules** to deduce types for new sub-expressions * **perform type unification**: This is one aspect that we shall not elaborate on, in this lecture. This is equivalent to a **bottom-up** tree traversal: We start from the leaves, and progress to the root (i.e. the expression to be typed). Without delving into details, **type unification** is an important ingredient, because it allows us to infer **the most general** type of an expression. Consider the following Haskell expression: ''\f x -> (f x,f 1)'', which defines a function that takes another function ''f'', a value ''x'' and returns a pair: the first element of the pair is the application ''f x'', while the second - ''f 1'': * initially, we do not know what ''f'' is, hence it has **the most general type** (say) ''tf'' - it can be anything; * judging by the application ''f x'', we deduce that ''f'' must be a function, of type ''a->b'', where ''x::a'' * judging by the application ''f 1'', we deduce that ''a'' must be an ''Integer'' The unification process combines the information collected so far: * ''tf'' must unify (coincide with) ''a -> b'' * ''a'' must unify with ''Integer'' The final type for the expression is: ''(Integer -> b) -> Integer -> (b, b)'' We illustrate the second stage of the type inference on the same example: \f -> (f 1) + 1 :: tf -> Integer f :: tf (via (TFun)) (f 1) + 1 :: Integer (via (TApp)) (+) :: Integer -> Integer -> Integer (this we know from Prelude, after the type synthesis of (+), also t2 must unify with Integer) (f 1) :: t2 (via (TApp); also, t1 must unify with Integer) f :: t1 -> t2 1 :: Integer (via (TVar), from Prelude) 1 :: Integer (via (TVar), from Prelude) The (pseudo)-algorithmic procedure concludes with the following answer: * ''g :: tf -> Integer'', where * ''tf'' unifies with ''t1 -> t2'' * ''t2'' unifies with ''Integer'' * ''t1'' unifies with ''Integer'' After unification, the result is shown to the programmer: '' g :: (Integer -> Integer) -> Integer'' Exercises. Find the type of the following expressions, by applying the type synthesis pseudo-algorithm: * ''map (\x->[x+1])'' * ''\f x-> if x then f x else x'' * ''g f x = x && (f x)'' * ''g f = f (g f)''