Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
lfa:nfa [2018/10/03 11:24] pdmatei |
lfa:nfa [2020/10/19 15:38] (current) pdmatei |
||
---|---|---|---|
Line 7: | Line 7: | ||
In more formal terms, we have a //generator// - a means to construct a language from a regular expression, but we lack a means for //accepting// (words of) languages. | In more formal terms, we have a //generator// - a means to construct a language from a regular expression, but we lack a means for //accepting// (words of) languages. | ||
- | We shall informally illustrate an algorithm for verifying the membership $math[w \in L((A\cup B)(a\cup b)*(0 \cup 1)*)], in Haskell: | ||
- | |||
- | <code haskell> | ||
- | check ('A':xs) = check1 xs | ||
- | check ('B':xs) = check1 xs | ||
- | check _ = False | ||
- | |||
- | check1 ('a':xs) = check1 xs | ||
- | check1 ('b':xs) = check1 xs | ||
- | check1 ('0':xs) = check2 xs | ||
- | check1 ('1':xs) = check2 xs | ||
- | check1 [] = True | ||
- | check1 _ = False | ||
- | |||
- | check2 ('0':xs) = check2 xs | ||
- | check2 ('1':xs) = check2 xs | ||
- | check2 [] = True | ||
- | check2 _ = False | ||
- | </code> | ||
- | |||
- | The algorithm proceeds in **three stages**: | ||
- | * in the first stage, we check if ''A'' or ''B'' are encountered, otherwise we move on to the second stage; | ||
- | * in the second stage, we check if ''a'', ''b'', ''0'' or ''1'' are encountered; if ''a'' or ''b'' are found, we continue inspection in the second stage; if ''0'' or ''1'' are found, we continue inspection in the third stage; finally, if the string terminates, we report true; | ||
- | * in the third stage we search for binary digits in a similar way; | ||
- | |||
- | The same strategy can be written in a more elegant way as: | ||
- | <code haskell> | ||
- | check w = chk w++"!" [0] | ||
- | where chk (x:xs) set = | ||
- | | (x 'elem' ['A', 'B']) && (0 'elem' set) = chk xs [1,2,3] | ||
- | | (x 'elem' ['a', 'b']) && (1 'elem' set) = chk xs [1,2,3] | ||
- | | (x 'elem' ['0', '1']) && (2 'elem' set) = chk xs [2,3] | ||
- | | (x == '!') && (3 'elem' set) = True | ||
- | | otherwise = False | ||
- | </code> | ||
- | |||
- | Here, we have introduced the symbol ''!'' to mark the string termination, and thus make the whole code nicer to write. We have also made the //stage idea// explicit. The procedure ''chk'' maintains a set of //stages// or //states//: | ||
- | * $math[0\in set] indicates that we are in the initial stage, where we are looking for ''A'' or ''B'' | ||
- | * $math[1\in set] indicates that we have read a sequence of alphabetic symbols: ''a''s, ''b''s may follow | ||
- | * $math[2\in set] indicates that the sequence of alphabetic symbols has ended; only ''0''s or ''1''s may follow; | ||
- | * $math[3\in set] indicates that the string may also terminate at any time - ''3'' is an //end-stage//. | ||
- | |||
- | We start in the initial stage. Whenever a symbol is read, the stage, i.e. the set of possible lookups is updated: for instance, when ''0'' or ''1'' are read, only the second and third situations are possible. | ||
- | |||
- | The idea behind our code could be expressed as the following diagram: | ||
- | {{:lfa:example.png|}} | ||
- | where | ||
- | * each node is a **state**, which indicates what is the current stage in the recognition of the input word; | ||
- | * each arrow is a **transition** which takes the recognition process from one stage to another; | ||
- | * here, $math[Q_0] is the initial state, $math[Q'] is the state from which any lower-case alphanumeric symbol in the alphabet may follow, and $math[Q''] is the state from which only numerics are accepted. | ||
- | |||
- | The string can terminate successfully in both $math[Q] and $math[Q'], which is shown via double circles. | ||
==== Nondeterministic automata ==== | ==== Nondeterministic automata ==== | ||
Line 195: | Line 143: | ||
{{:lfa:slide4.jpg|}} | {{:lfa:slide4.jpg|}} | ||
- | |||
- | From the proof, a naive algorithm can be easily implemented. We illustrate it in Haskell: | ||
- | <code haskell> | ||
- | data RegExp = EmptyString | | ||
- | Atom Char | | ||
- | RegExp :| RegExp | | ||
- | RegExp :. RegExp | | ||
- | Kleene RegExp deriving Show | ||
- | |||
- | data NFA = NFA {delta :: [(Int,Char,Int)], fin :: [Int]} deriving Show | ||
- | </code> | ||
- | We begin with a list-based representation of the transition function $math[\delta]. We assume the symbol ''e'' is reserved for the empty string; | ||
- | |||
- | <code haskell> | ||
- | -- the strategy is to increment by i, each state | ||
- | relabel :: Int -> NFA -> NFA | ||
- | relabel i (NFA delta fin) = NFA (map (\(s,c,s')->(s+i,c,s'+i)) delta) (map (+i) fin) | ||
- | </code> | ||
- | |||
- | Since we have chosen to represent states as integers, we use a re-labelling function to ensure uniqueness. Re-labelling relies on state increment. For instance, by calling ''relabel (f1+1) n'', we ensure that the NFA ''n'' will have the initial state equal to ''f1+1''. Note that ''f1'' is a final state in our code, which guarantees uniqueness. | ||
- | |||
- | |||
- | <code haskell> | ||
- | toNFA EmptyString = NFA [(0,'e',1)] [1] | ||
- | toNFA (Atom c) = NFA [(0,c,1)] [1] | ||
- | toNFA (e :. e') = let NFA delta1 [f1] = toNFA e | ||
- | NFA delta2 [f2] = relabel (f1+1) (toNFA e') | ||
- | in NFA (delta1++delta2++[(f1,'e',f1+1)]) [f2] | ||
- | toNFA (e :| e') = let NFA delta1 [f1] = relabel 1 (toNFA e) | ||
- | NFA delta2 [f2] = relabel (f1+1) (toNFA e') | ||
- | in NFA (delta1 ++ delta2 ++[(0,'e',1), | ||
- | (0,'e',f1+1), | ||
- | (f1,'e',f2+1), | ||
- | (f2,'e',f2+1)]) [f2+1] | ||
- | toNFA (Kleene e) = let NFA delta [f] = toNFA e in NFA (delta++[(0,'e',f),(f,'e',0)]) [f] | ||
- | </code> | ||
- | |||
- | Apart from relabelling, the code follows exactly the steps from the proof. | ||
- | |||
- | |||