Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
lfa:dfa [2020/10/19 15:34]
pdmatei
lfa:dfa [2020/10/19 15:36] (current)
pdmatei
Line 2: Line 2:
  
 $def[DFA] $def[DFA]
-A **Deterministic Finite Automata** is a tuple $math[(K,​\Sigma,​\delta,​q_0,​F)],​ where $math[K] is a set of states, ​\$math[\Sigma] is the alphabet, $math[\Delta : K \times \Sigma \rightarrow K] is a **total** transition function. ​+A **Deterministic Finite Automata** is a tuple $math[(K,​\Sigma,​\delta,​q_0,​F)],​ where $math[K] is a set of states, $math[\Sigma] is the alphabet, $math[\Delta : K \times \Sigma \rightarrow K] is a **total** transition function. ​
 $end $end
  
Line 19: Line 19:
 We say a word $math[w] is **accepted** by a DFA $math[M] iff $math[(q_0,​w)\vdash_M^*(q,​\epsilon)] and $math[q\in F] ($math[q] is a final state). We say a word $math[w] is **accepted** by a DFA $math[M] iff $math[(q_0,​w)\vdash_M^*(q,​\epsilon)] and $math[q\in F] ($math[q] is a final state).
 $end $end
- 
-Example(s) 
  
  
Line 101: Line 99:
 $end $end
  
-===== Conclusion ===== 
- 
-We have shown so far that the problem $math[w\in L(e)] can be algorithmically solved by checking $math[w \in L(D)], where $math[D] is a DFA obtained via subset construction from $math[M], and $math[M] is obtained directly from $math[e]. 
- 
-The algorithmic procedure for $math[w \in L(D)] is actually quite straightforward,​ and is shown below: 
-<code haskell> 
-data DFA = DFA {delta :: Int -> Char -> Maybe Int, fin :: [Int] } 
- 
-check w dfa = chk w++"​!"​ dfa 0 
- where 
- chk (x:xs) dfa state =  
- | (x=='​!'​) && state `elem` (fin dfa) = True 
- | (delta a state x) <- Just next = chk xs dfa next 
- | othewise = False 
-</​code>​ 
- 
-===== Writing a lexical analyser from scratch ===== 
- 
-We now have all the necessary tools to implement a **lexical analyser**, or **scanner**,​ from scratch. We will proceed to implement such an analyser for the language IMP.  
-Our input of the scanner consists of two parts: 
-  - the //spec//, containing regular expressions for each possible word which appears at input 
-  - the actual word to be scanned 
- 
-The input 1. is specific to our language IMP, and is directly implemented in Haskell. It consists of a datatype for regular expressions,​ as well as a datatype describing each possible token. We also implement, for each different token, a function ''​String -> Token'',​ which actually returns the token, when a substring is found. To make the code nicer, we include such functions in the ''​DFA''​ datatype. 
- 
-<code haskell> 
-data RegExp = EmptyString | Atom Char | RegExp :| RegExp | RegExp :. RegExp | Kleene RegExp ​ 
- 
-plus :: RegExp -> RegExp 
-plus e = e :. (Kleene e) 
- 
--- (AUB)(aUb)*(0U1)* 
-example = ((Atom '​A'​) :| (Atom '​B'​)) :.(Kleene ((Atom '​a'​) :| (Atom '​b'​))) :. (Kleene ((Atom '​0'​) :| (Atom '​1'​))) 
- 
-data Token = If | Leq | Tru | Fals | .... | Var String | AVal Integer | BVal Bool | While | OpenPar | ... 
- 
-ifToken = '​i'​ :. '​f' ​         -- we can build an auxiliary function for that  
-f_if :: String -> Token 
-f_if _ = If 
- 
-varToken = plus ('​a'​ :| '​b'​) ​ -- [a,b]+ 
-f_var :: String -> Token 
-f_var s = Var s 
- 
-intToken = plus ('​0'​ :| '​1'​) 
-f_int :: String -> Token 
-f_int s = AVal ((read s) :: Integer) 
- 
-data DFA = DFA {delta :: Int -> Char -> Maybe Int, fin :: [Int], getToken :: String -> Token } 
- 
-</​code>​ 
- 
-We also recall some of our previously-defined procedures, already shown in the previous lectures: 
- 
-<code haskell> 
--- converts a regular expression to a NFA 
-toNFA :: RegExp -> NFA 
--- converts an NFA to a DFA 
-subset :: NFA -> DFA 
--- checks if a word is accepted by the DFA 
-check :: String -> DFA -> Bool 
- 
--- takes a regular expression and its token function and builds the DFA 
-convert :: RegExp -> (String -> Token) -> DFA 
- 
-</​code>​ 
- 
-The logic of the scanner is as follows. While processing the input, the scanner will maintain: 
-  * the rest of the (yet-unprocessed) input (i.e. ''​(x:​xs)''​) 
-  * the word which has been read so far, but whose token was not yet identified (i.e. ''​crt''​) 
-  * a list of //​configurations//,​ i.e. pairs: //(current state, automaton)//,​ for each regular expression which may be matched in the input 
-  * a list of tokens which were found so far 
- 
-The function responsible for scanning is: 
-<code haskell> 
-lex :: String -> String -> [Config] -> [Token] -> [Token] 
-</​code>​ 
- 
-In the initial phase, all regular expressions are converted to DFAs, and each DFA is converted to its initial configuration. 
-  * whenever the input is consumed (''​x == '​!'''​) and **all** dfas are in the initial state (''​(filter (\(s,_) -> s \= 0) cfgs) == []''​),​ return the list of tokens 
-  * whenever a dfa is in a final state, take the **first** such dfa, (''​[a | (s,a) <- cfgs, fin a s] <- (a:​_)''​),​ build its respective token from the scanned word (''​getToken a crt''​) and add it to the list of tokens. The search process continues after: 
-      * reseting all configurations to the initial ones 
-      * resetting the scanned (but unmatched) current word 
-  * whenever we have no dfa in a final state, we simply //move// each dfa to its successor state, and rule out configurations where a sink-state was reached; 
-  * whenever no current configuration is found, then no regular expression matches the current input and the scanning stops by returning the empty list of tokens; 
- 
- 
-<code haskell> 
-type Config = (Integer, DFA) 
- 
-regularExpressions :: [(RegExp, String -> Token)] 
-regularExpressions = ... 
- 
-dfas :: [DFA] 
-dfas = map (\(e,​f)->​ convert e f) regularExpressions 
- 
-lexical :: String -> [Token] 
-lexical w = lex (w++"​!"​) ""​ (map (\a->​(0,​a)) dfas)  
-  where lex :: String -> String -> [Config] -> [Token] -> [Token] 
-      lex (x:xs) crt cfgs tokens =  
-        -- the input ended, and  
-        | (x == '​!'​) && (filter (\(s,_) -> s \= 0) cfgs) == [] = tokens 
- 
-        -- we found a dfa which accepted; push the token of the first such dfa 
-        | [a | (s,a) <- cfgs, fin a s] <- (a:_) = lex (x:xs) ""​ (map (\a->​(0,​a)) dfas) (getToken a crt):tokens 
- 
-        -- if no continuing configuration exists, fail 
-            | cfgs == [] = []  
- 
-            -- proceed with the next symbol 
-            | otherwise lex xs (crt++[x]) ​ [(s',​a) | (s,a) <- cfgs, (Just s') <- (delta a s x)] tokens 
- 
-</​code>​