===== Nondeterministic Automata ===== ==== Motivation ==== In the previous lecture we have investigated the **semantics** of regular expressions and saw that how we can determine the language accepted by, e.g. $math[(A\cup B)(a\cup b)*(0 \cup 1)*]. However, it is not straightforward to **compute** whether a given word $math[w] is a member of $math[L(e)] and this is precisely the task of the **lexical stage**. In more formal terms, we have a //generator// - a means to construct a language from a regular expression, but we lack a means for //accepting// (words of) languages. ==== Nondeterministic automata ==== The key idea behind the previous algorithm can be generalised to **any** regular expression, and its associated code, written in the same style, yields a similar diagram. In practice, it is the diagram, i.e. the **nondeterministic finite automaton** (NFA), which helps us generate the code. $def[NFA] A **non-deterministic finite automaton** is a tuple $math[M=(K,\Sigma,\Delta,q_0,F)] where: * $math[K] is a finite set of **states** * $math[\Sigma] is an alphabet * $math[\Delta] is a **subset** of $math[K \times \Sigma^* \times K] and is called a **transition relation** * $math[q_0\in K] is **the initial state** * $math[F\subseteq K] is **the set of final states** $end As an example, consider: * $math[K=\{q_0,q_1,q_2\}] * $math[\Sigma=\{0,1\}] * $math[\Delta=\{(q_0,0,q_0),(q_0,1,q_0),(q_0,0,q_1),(q_1,1,q_2)\}] * $math[F = \{q_2\}] Notice that the NFA gets stuck for certain inputs, i.e. it **does not accept**. **Graphical notation** $def[Configuration] A **configuration** of an NFA, is a **member** of $math[K\times \Sigma^*]. $end Informally, configurations capture a **snapshot** of the execution of an NFA. The snapshot consists of the: * **current state** of the automaton and * **the rest of the word** from the input. For instance, $math[(q_0,0001)] is the **initial configuration** of the automaton from our example, on input $math[0001]. $def[Transition] We call $math[\vdash_M \subseteq (K\times \Sigma^*) \times (K\times\Sigma^*)] a **one-step** move relation of automaton $math[M]. The relation describes how the automaton **must behave** to reach one configuration from another. Formally: * $math[(q,w) \vdash_M (q',w')] if and only if there exists $math[u\in\Sigma^*], such that $math[w=uw'] ($math[u] is a prefix of $math[w]) and $math[(q,u,q')\in\Delta]: from state $math[q] on input $math[u] we reach state $math[q']. We call $math[\vdash_M^*], the **reflexive and transitive closure of** $math[\vdash_M], i.e. the **zero-or-more step(s)** move of automaton $math[M]. $end For instance, in our previous example, $math[(q_0,0001)\vdash_M(q_0,001)] and also $math[(q_0,0001)\vdash_M(q_1,001)]. At the same time, $math[(q_0,0001)\vdash_M^*(q_2,\epsilon)]. Can you figure out the sequence of steps? $prop[Acceptance] A word $math[w] is accepted by an NFA $math[M] iff $math[(q_0,w)\vdash_M^*(q,\epsilon)] and $math[q\in F]. In other words, after the word $math[w] was processed by the automaton, we reach a **final state**. $end Notice that the word $math[0001] is indeed accepted by the automaton $math[M] from our example. $def[Language accepted by an NFA] Given an NFA $math[M], we define $math[L(M) = \{w\mid w\text{ is accepted by} M\}] as the language **accepted** by $math[M]. We say $math[M] accepts the language $math[L(M)]. $end === Execution tree for Nondeterministic Finite Automata === Illustration of an AFN for $math[(A\cup B)(a\cup b)*(0 \cup 1)*]. There are two ways of writing this automaton: * one that follows exactly our previous algorithm sketch. * one that employs **epsilon transitions**. **Epsilon transitions** are a means for jumping from a state to another without consuming the input. It is a useful way of defining automata, because it empowers us to **combine** multiple automata procedures. ==== Nondeterminism as imperfect information ==== Notice that **nondeterminism** actually refers to our imperfect information regarding the current state of the automaton. **Nondeterminism** means that, after consuming some part (prefix) of a word, //several concrete states may be possible current states//. ==== From Regular Expressions to NFAs ==== While Regular Expressions are a natural instrument for declaring (or generating) tokens, NFAs are a **natural instrument for accepting** tokens (i.e. their respective language). The following theorem shows how this can be achieved. $justtheorem For every language $math[L(E)] defined by the regular expression $math[E], there exists an NFA $math[M], such that $math[L(M)=L(E)]. $end This theorem is particularly important, because it also provides an **algorithm** for constructing NFAs from regular expressions. $proof Let $math[E] be a regular expression. We construct an NFA, with: * **exactly one initial state**. * **exactly one final state**. * **no transitions from the final state**. The proof is by **induction** over the expression structure. **Basis case $math[E=\emptyset]** We construct the following automaton: {{:lfa:emptyset.jpg|}} It is clear that this automaton accepts no word, and obeys the three aforementioned conditions. **Basis case $math[E=\epsilon]** We construct the following automaton: {{:lfa:emptyword.jpg|}} hich only accepts the empty word. **Basis case $math[E=c]** where $math[c] is a symbol of the alphabet. We construct the following automaton: {{:lfa:char.jpg|}} Since regular expressions have three //inductive rules// for constructing regular expressions (union, concatenation and Kleene-star), we have to treat three induction steps: **Induction step $math[E=E_1E_2] (concatenation)** Suppose $math[E_1] and $math[E_2] are regular expressions for which NFAs can be built (**induction hypothesis**). We build the following NFA which accepts all words generated by the regular expression $math[E_1E_2]. {{:lfa:concat.jpg|}} **Induction step $math[E=E_1\cup E_2] (union)** Suppose $math[E_1] and $math[E_2] are regular expressions for which NFAs can be built (**induction hypothesis**). We build the following NFA which accepts all words generated by the regular expression $math[E_1\cup E_2]. {{:lfa:union.jpg|}} **Induction step $math[E^*] (union)** Suppose $math[E] is regular expression for which an NFA can be built (**induction hypothesis**). We build the following NFA which accepts all words generated by the regular expression $math[E*]. {{:lfa:kleene.jpg|}} $end We illustrate the algorithmic procedure on our regular expression $math[(A\cup B)(a\cup b)*(0 \cup 1)*]. The result is shown below: {{:lfa:slide4.jpg|}}