Differences

This shows you the differences between two versions of the page.

--- lfa:dfa [2020/10/19 15:34]
pdmatei
+++ lfa:dfa [2020/10/19 15:36] (current)
pdmatei
@@ Line 19: / Line 19: @@
 We say a word $math[w] is **accepted** by a DFA $math[M] iff $math[(q_0,w)\vdash_M^*(q,\epsilon)] and $math[q\in F] ($math[q] is a final state).
 $end
-Example(s)
@@ Line 101: / Line 99: @@
 $end
-===== Conclusion =====
-We have shown so far that the problem $math[w\in L(e)] can be algorithmically solved by checking $math[w \in L(D)], where $math[D] is a DFA obtained via subset construction from $math[M], and $math[M] is obtained directly from $math[e].
-The algorithmic procedure for $math[w \in L(D)] is actually quite straightforward, and is shown below:
-<code haskell>
-data DFA = DFA {delta :: Int -> Char -> Maybe Int, fin :: [Int] }
-check w dfa = chk w++"!" dfa 0
-	where
-		chk (x:xs) dfa state =
-			| (x=='!') && state `elem` (fin dfa) = True
-			| (delta a state x) <- Just next = chk xs dfa next
-			| othewise = False
-</code>
-===== Writing a lexical analyser from scratch =====
-We now have all the necessary tools to implement a **lexical analyser**, or **scanner**, from scratch. We will proceed to implement such an analyser for the language IMP.
-Our input of the scanner consists of two parts:
-  - the //spec//, containing regular expressions for each possible word which appears at input
-  - the actual word to be scanned
-The input 1. is specific to our language IMP, and is directly implemented in Haskell. It consists of a datatype for regular expressions, as well as a datatype describing each possible token. We also implement, for each different token, a function ''String -> Token'', which actually returns the token, when a substring is found. To make the code nicer, we include such functions in the ''DFA'' datatype.
-<code haskell>
-data RegExp = EmptyString | Atom Char | RegExp :| RegExp | RegExp :. RegExp | Kleene RegExp
-plus :: RegExp -> RegExp
-plus e = e :. (Kleene e)
--- (AUB)(aUb)*(0U1)*
-example = ((Atom 'A') :| (Atom 'B')) :.(Kleene ((Atom 'a') :| (Atom 'b'))) :. (Kleene ((Atom '0') :| (Atom '1')))
-data Token = If | Leq | Tru | Fals | .... | Var String | AVal Integer | BVal Bool | While | OpenPar | ...
-ifToken = 'i' :. 'f'          -- we can build an auxiliary function for that
-f_if :: String -> Token
-f_if _ = If
-varToken = plus ('a' :| 'b')  -- [a,b]+
-f_var :: String -> Token
-f_var s = Var s
-intToken = plus ('0' :| '1')
-f_int :: String -> Token
-f_int s = AVal ((read s) :: Integer)
-data DFA = DFA {delta :: Int -> Char -> Maybe Int, fin :: [Int], getToken :: String -> Token }
-</code>
-We also recall some of our previously-defined procedures, already shown in the previous lectures:
-<code haskell>
--- converts a regular expression to a NFA
-toNFA :: RegExp -> NFA
--- converts an NFA to a DFA
-subset :: NFA -> DFA
--- checks if a word is accepted by the DFA
-check :: String -> DFA -> Bool
--- takes a regular expression and its token function and builds the DFA
-convert :: RegExp -> (String -> Token) -> DFA
-</code>
-The logic of the scanner is as follows. While processing the input, the scanner will maintain:
-  * the rest of the (yet-unprocessed) input (i.e. ''(x:xs)'')
-  * the word which has been read so far, but whose token was not yet identified (i.e. ''crt'')
-  * a list of //configurations//, i.e. pairs: //(current state, automaton)//, for each regular expression which may be matched in the input
-  * a list of tokens which were found so far
-The function responsible for scanning is:
-<code haskell>
-lex :: String -> String -> [Config] -> [Token] -> [Token]
-</code>
-In the initial phase, all regular expressions are converted to DFAs, and each DFA is converted to its initial configuration.
-  * whenever the input is consumed (''x == '!''') and **all** dfas are in the initial state (''(filter (\(s,_) -> s \= 0) cfgs) == []''), return the list of tokens
-  * whenever a dfa is in a final state, take the **first** such dfa, (''[a | (s,a) <- cfgs, fin a s] <- (a:_)''), build its respective token from the scanned word (''getToken a crt'') and add it to the list of tokens. The search process continues after:
-      * reseting all configurations to the initial ones
-      * resetting the scanned (but unmatched) current word
-  * whenever we have no dfa in a final state, we simply //move// each dfa to its successor state, and rule out configurations where a sink-state was reached;
-  * whenever no current configuration is found, then no regular expression matches the current input and the scanning stops by returning the empty list of tokens;
-<code haskell>
-type Config = (Integer, DFA)
-regularExpressions :: [(RegExp, String -> Token)]
-regularExpressions = ...
-dfas :: [DFA]
-dfas = map (\(e,f)-> convert e f) regularExpressions
-lexical :: String -> [Token]
-lexical w = lex (w++"!") "" (map (\a->(0,a)) dfas)
-  where lex :: String -> String -> [Config] -> [Token] -> [Token]
-      lex (x:xs) crt cfgs tokens =
-        -- the input ended, and
-        | (x == '!') && (filter (\(s,_) -> s \= 0) cfgs) == [] = tokens
-        -- we found a dfa which accepted; push the token of the first such dfa
-        | [a | (s,a) <- cfgs, fin a s] <- (a:_) = lex (x:xs) "" (map (\a->(0,a)) dfas) (getToken a crt):tokens
-        -- if no continuing configuration exists, fail
-            | cfgs == [] = []
-            -- proceed with the next symbol
-            | otherwise lex xs (crt++[x])  [(s',a) | (s,a) <- cfgs, (Just s') <- (delta a s x)] tokens
-</code>