Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
lfa:dfa [2020/10/19 15:34] pdmatei |
lfa:dfa [2020/10/19 15:36] (current) pdmatei |
||
---|---|---|---|
Line 19: | Line 19: | ||
We say a word $math[w] is **accepted** by a DFA $math[M] iff $math[(q_0,w)\vdash_M^*(q,\epsilon)] and $math[q\in F] ($math[q] is a final state). | We say a word $math[w] is **accepted** by a DFA $math[M] iff $math[(q_0,w)\vdash_M^*(q,\epsilon)] and $math[q\in F] ($math[q] is a final state). | ||
$end | $end | ||
- | |||
- | Example(s) | ||
Line 101: | Line 99: | ||
$end | $end | ||
- | ===== Conclusion ===== | ||
- | |||
- | We have shown so far that the problem $math[w\in L(e)] can be algorithmically solved by checking $math[w \in L(D)], where $math[D] is a DFA obtained via subset construction from $math[M], and $math[M] is obtained directly from $math[e]. | ||
- | |||
- | The algorithmic procedure for $math[w \in L(D)] is actually quite straightforward, and is shown below: | ||
- | <code haskell> | ||
- | data DFA = DFA {delta :: Int -> Char -> Maybe Int, fin :: [Int] } | ||
- | |||
- | check w dfa = chk w++"!" dfa 0 | ||
- | where | ||
- | chk (x:xs) dfa state = | ||
- | | (x=='!') && state `elem` (fin dfa) = True | ||
- | | (delta a state x) <- Just next = chk xs dfa next | ||
- | | othewise = False | ||
- | </code> | ||
- | |||
- | ===== Writing a lexical analyser from scratch ===== | ||
- | |||
- | We now have all the necessary tools to implement a **lexical analyser**, or **scanner**, from scratch. We will proceed to implement such an analyser for the language IMP. | ||
- | Our input of the scanner consists of two parts: | ||
- | - the //spec//, containing regular expressions for each possible word which appears at input | ||
- | - the actual word to be scanned | ||
- | |||
- | The input 1. is specific to our language IMP, and is directly implemented in Haskell. It consists of a datatype for regular expressions, as well as a datatype describing each possible token. We also implement, for each different token, a function ''String -> Token'', which actually returns the token, when a substring is found. To make the code nicer, we include such functions in the ''DFA'' datatype. | ||
- | |||
- | <code haskell> | ||
- | data RegExp = EmptyString | Atom Char | RegExp :| RegExp | RegExp :. RegExp | Kleene RegExp | ||
- | |||
- | plus :: RegExp -> RegExp | ||
- | plus e = e :. (Kleene e) | ||
- | |||
- | -- (AUB)(aUb)*(0U1)* | ||
- | example = ((Atom 'A') :| (Atom 'B')) :.(Kleene ((Atom 'a') :| (Atom 'b'))) :. (Kleene ((Atom '0') :| (Atom '1'))) | ||
- | |||
- | data Token = If | Leq | Tru | Fals | .... | Var String | AVal Integer | BVal Bool | While | OpenPar | ... | ||
- | |||
- | ifToken = 'i' :. 'f' -- we can build an auxiliary function for that | ||
- | f_if :: String -> Token | ||
- | f_if _ = If | ||
- | |||
- | varToken = plus ('a' :| 'b') -- [a,b]+ | ||
- | f_var :: String -> Token | ||
- | f_var s = Var s | ||
- | |||
- | intToken = plus ('0' :| '1') | ||
- | f_int :: String -> Token | ||
- | f_int s = AVal ((read s) :: Integer) | ||
- | |||
- | data DFA = DFA {delta :: Int -> Char -> Maybe Int, fin :: [Int], getToken :: String -> Token } | ||
- | |||
- | </code> | ||
- | |||
- | We also recall some of our previously-defined procedures, already shown in the previous lectures: | ||
- | |||
- | <code haskell> | ||
- | -- converts a regular expression to a NFA | ||
- | toNFA :: RegExp -> NFA | ||
- | -- converts an NFA to a DFA | ||
- | subset :: NFA -> DFA | ||
- | -- checks if a word is accepted by the DFA | ||
- | check :: String -> DFA -> Bool | ||
- | |||
- | -- takes a regular expression and its token function and builds the DFA | ||
- | convert :: RegExp -> (String -> Token) -> DFA | ||
- | |||
- | </code> | ||
- | |||
- | The logic of the scanner is as follows. While processing the input, the scanner will maintain: | ||
- | * the rest of the (yet-unprocessed) input (i.e. ''(x:xs)'') | ||
- | * the word which has been read so far, but whose token was not yet identified (i.e. ''crt'') | ||
- | * a list of //configurations//, i.e. pairs: //(current state, automaton)//, for each regular expression which may be matched in the input | ||
- | * a list of tokens which were found so far | ||
- | |||
- | The function responsible for scanning is: | ||
- | <code haskell> | ||
- | lex :: String -> String -> [Config] -> [Token] -> [Token] | ||
- | </code> | ||
- | |||
- | In the initial phase, all regular expressions are converted to DFAs, and each DFA is converted to its initial configuration. | ||
- | * whenever the input is consumed (''x == '!''') and **all** dfas are in the initial state (''(filter (\(s,_) -> s \= 0) cfgs) == []''), return the list of tokens | ||
- | * whenever a dfa is in a final state, take the **first** such dfa, (''[a | (s,a) <- cfgs, fin a s] <- (a:_)''), build its respective token from the scanned word (''getToken a crt'') and add it to the list of tokens. The search process continues after: | ||
- | * reseting all configurations to the initial ones | ||
- | * resetting the scanned (but unmatched) current word | ||
- | * whenever we have no dfa in a final state, we simply //move// each dfa to its successor state, and rule out configurations where a sink-state was reached; | ||
- | * whenever no current configuration is found, then no regular expression matches the current input and the scanning stops by returning the empty list of tokens; | ||
- | |||
- | |||
- | <code haskell> | ||
- | type Config = (Integer, DFA) | ||
- | |||
- | regularExpressions :: [(RegExp, String -> Token)] | ||
- | regularExpressions = ... | ||
- | |||
- | dfas :: [DFA] | ||
- | dfas = map (\(e,f)-> convert e f) regularExpressions | ||
- | |||
- | lexical :: String -> [Token] | ||
- | lexical w = lex (w++"!") "" (map (\a->(0,a)) dfas) | ||
- | where lex :: String -> String -> [Config] -> [Token] -> [Token] | ||
- | lex (x:xs) crt cfgs tokens = | ||
- | -- the input ended, and | ||
- | | (x == '!') && (filter (\(s,_) -> s \= 0) cfgs) == [] = tokens | ||
- | |||
- | -- we found a dfa which accepted; push the token of the first such dfa | ||
- | | [a | (s,a) <- cfgs, fin a s] <- (a:_) = lex (x:xs) "" (map (\a->(0,a)) dfas) (getToken a crt):tokens | ||
- | |||
- | -- if no continuing configuration exists, fail | ||
- | | cfgs == [] = [] | ||
- | |||
- | -- proceed with the next symbol | ||
- | | otherwise lex xs (crt++[x]) [(s',a) | (s,a) <- cfgs, (Just s') <- (delta a s x)] tokens | ||
- | |||
- | </code> | ||