Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
lfa:2024:lab03 [2024/10/20 11:51]
cata_chiru created
lfa:2024:lab03 [2024/10/22 08:13] (current)
cata_chiru
Line 1: Line 1:
 ====== 3. Regular Expressions ====== ====== 3. Regular Expressions ======
  
-===== 3.1. Regex practice ​=====+===== 3.1. Natural Language / DFA $ \rightarrow $ Regex conversion ​=====
  
-For each of the exercises from DFA practice ​write a regex describing the same language.+For each of the exercises from DFA Seminary 1 write a regex describing the same language.
  
 +**3.1.1.** $ L=\{w \in \{0,1\}^* \text{ | w contains an odd number of ones} \} $
  
-<​hidden ​2.2.1.>+{{:​lfa:​2022:​lfa2022_lab2_ex1.png?​400|}} 
 + 
 +<​hidden ​3.1.1.>
 $math[0^*10^*(10^*10^*)^*] $math[0^*10^*(10^*10^*)^*]
 </​hidden>​ </​hidden>​
  
  
-<​hidden ​2.2.2.>+**3.1.2.** The language of binary words which contain **exactly** two ones 
 + 
 +{{:​lfa:​2022:​lfa2022_lab2_ex2.png?​550|}} 
 + 
 +<​hidden ​3.1.2.>
 $math[0^*10^*10^*] $math[0^*10^*10^*]
 </​hidden>​ </​hidden>​
  
 +**3.1.3.** The language of binary words which encode odd numbers (the last digit is least significative)
  
-<​hidden ​2.2.3.>+{{:​lfa:​2022:​lfa2022_lab2_ex3.png?​300|}} 
 + 
 +<​hidden ​3.1.3.>
 $math[(0 \cup 1)^*1] $math[(0 \cup 1)^*1]
 </​hidden>​ </​hidden>​
  
  
-<hidden 2.2.4.>​ +**3.1.4.** The set of all binary strings having the substring 00101 
-creating ​the regex directly ​is not easybut the pattern ​can be derived ​by studying ​the dfa, and trying ​to 'replace' ​states ​with more complex transitionsFor step-by-step description ​of the methodwait until the lecture/lab about the dfa to regex transformation ​algorithm+ 
 +{{:​lfa:​2022:​ex_8_dfa.png?​400|}} 
 + 
 +<​hidden ​3.1.4.>​ 
 +$math[(0 \cup 1)^*00101(0 \cup 1)^*] 
 +</​hidden>​ 
 + 
 +===== 3.2. Formation rules (concatenation,​ reunion, Kleene star) ===== 
 + 
 +**3.2.1.** 
 + 
 +$math[A=\{ 0^{2k} \mid k \geq 1 \}] 
 + 
 +$ B = \{0, \epsilon \}$ 
 +\\ 
 +$ AB = ? $ 
 +\\ 
 + 
 +<hidden 3.2.1> 
 +A = {00, 0000, 00000000 ...} 
 + 
 +AB = {00, 00**0**, 0000, 0000**0**, ...} <- this is the cartesian product between the sets(languages) A and B, where the elements of A come first. 
 + 
 +  * where the words in the language that have an even length are obtained by combining a word from A with the word ε from B 
 +  * and those with an odd length are obtained by combining a word from A with the word 0 from B 
 +</​hidden>​ 
 + 
 + 
 + 
 +**3.2.2.** 
 + 
 +$math[A = \{ 0^n 1^n \mid n \geq 1 \}] 
 +\\ 
 +$ B = \{ 1^n \mid n \geq 1 \} $ 
 +\\ 
 +$ AB = ? $ \\ $ BA = ? $ 
 + 
 +<hidden 3.2.2> 
 +A is the language in which the words start with zero and end with one and the number of one is equal to the number of zeros (the same value for n is used). 
 + 
 +The notation of n in the definition of B is completely **unrelated** to the n used to define A. 
 + 
 +So, B is the language of words made of sequences of ones, having the length of at least 1, so basically B = L(11*). 
 + 
 +A = {01, 0011, 000111, 00001111 ...} 
 + 
 +B = {1, 11, 111 ...} 
 + 
 +AB = {01**1**, 0011**1**, 00001111**1**,​ ..., 01**11**, 0011**11**, 000111**11**,​ 00001111**11**,​ ... }  
 + 
 +BA = {**1**01, **1**0011, **1**000111,​ ..., **11**01, **11**0011, ..., **111**01, **111**0011 ...} 
 +</​hidden>​ 
 + 
 +**3.2.3** 
 + 
 +$math[A = \{ 0^n 1^n 0^m \mid m \geq n  \geq 1 \}] 
 +\\ 
 +$ B = \{ 0^n \mid n \geq 1 \} $ 
 +\\ 
 +$ AB = ? $ \\ $ BA = ? $ 
 + 
 + 
 +<hidden 3.2.3> 
 +$math[AB = \{ 0^n 1^n 0^{m+k} \mid m \geq n  \geq 1, k \geq 1 \}]. Deci $math[AB = A]. 
 + 
 +Note that the n in the definition of language A **is different** from the n in in the definition of B, they are **independent** when used in defining different sets/​languages. However, when n is used several times in the definition of one language, such as the 2 times it appears in language A, it is **the same** value. 
 + 
 +$math[BA = \{ 0^{(n+k)} 1^n 0^m \mid m \geq n  \geq 1, k \geq 1 \}]. Equivalently:​ $math[BA = \{0^x 1^y 0^z \mid x \geq y\geq 1 \text{ and } z \geq y \geq 1 \}] 
 +</​hidden>​ 
 + 
 +**3.2.4.** 
 + 
 +$ A = ∅ $ 
 +\\ 
 +$ B = \{ 1^n \mid n \geq 1 \} $ 
 +\\ 
 +$ AB = ? $ 
 +\\ 
 +$ A^* = ? $ 
 +\\ 
 +$ B^* = ? $ 
 +\\ 
 + 
 + 
 +<hidden 3.2.4
 + 
 +AB = ∅  (because A is empty, so the cartesian product leads to an empty set) 
 + 
 +A* = {ε} (epsilon is always part of Kleene star) 
 + 
 +B*= {ε} (epsilon is always part of Kleene star) U {$ 1^n $} U {$ 1^{2n} $} U {$ 1^{3n} $} U ... 
 +So basically B = L( ($1^n$)* ) 
 + 
 +</​hidden>​ 
 + 
 + 
 +===== 3.3 Regex Equivalence ===== 
 + 
 +Are the following ​regex pair equivalent?​ 
 + 
 +** 3.3.1 ** 
 + 
 +$ E1 = ab|a|b $ 
 +\\ 
 +$ E2 = (a|\epsilon)(b|\epsilon) $ 
 + 
 + 
 +<hidden 3.3.1><​note important>​ 
 +We can observe that E2 accepts ε, while E1 does not so they are not equivalent. 
 +\\ 
 +Another approach is to compute the language of each expression (since they are finite) and check if they are equivalent. 
 +</​note></​hidden>​ 
 + 
 + 
 +** 3.3.2 ** 
 + 
 +$ E1 = a(b|c)(d|e)|abb|abc $ 
 +\\ 
 +$ E2 = ab(b|c|d|e)|acd|ace $ 
 + 
 + 
 +<hidden 3.3.2><​note important>​ 
 +Since both E1 and E2 have a finite language, we could just compute the language and check if they are equivalent. 
 +Language is L = {abb, abc, abd, abe, acd, ace}, therefore they are equivalent. 
 +</​note></​hidden>​ 
 + 
 + 
 +** 3.3.3 ** 
 + 
 +$ E1 = (a\mid b)^*aa^* \mid \epsilon $ 
 +\\ 
 +$ E2 = (a\mid ba)^*(b\mid ba)^* $ 
 + 
 + 
 +<hidden 3.3.3><​note important>​ 
 +Both E1 and E2 have an infinite language, so comparing them is not an option. 
 + 
 +We can see that for example E2 accepts bwhile E1 does not accept it, so the expressions are not equivalent. 
 + 
 +Fun fact: E1 was proposed by a student as a solution for 3.4.1 some year ago, while E2 is the actual solution. 
 +</​note></​hidden>​ 
 + 
 + 
 +** 3.3.4 ** 
 + 
 +$ E1 = ((ab^*a)^+b)^* $ 
 +\\ 
 +$ E2 = (a(b\mid aa)^*ab)^* $ 
 + 
 + 
 +<hidden 3.3.4><​note important>​ 
 +Both E1 and E2 have an infinite language, so comparing them is not an option. 
 + 
 +We can try looking for words that are accepted by one and not by the othersbut we can't easily find such words. ! This does not mean they are equivalent ! 
 + 
 +We should transform each expression into its min DFA and check if they are the same (number of states, transitions,​ alphabet, initial/​final states) (renaming of states might be needed). 
 + 
 +Plot twist: They are the same.  
 + 
 +The purpose of this exercise is to understand how to approach regex equivalence,​ not how to solve this given comparison per  
 +se. 
 +</​note></​hidden>​ 
 + 
 + 
 +===== 3.4. Writing Regular Expressions ===== 
 + 
 +**3.4.0.** Write a regular expression for the language of arithmetic expressions containing +, * and numbers. 
 +**Hint:** you can abbreviate $ 0 \cup 1 \cup ... \cup 9 $ by $ [0-9] $ 
 + 
 + 
 +<hidden 3.4.0> 
 +We start by defining the regex for a number: 
 + 
 +  * a number can be a digit from 0 to 9 => [0-9] 
 +  * a number can have several digits, but the first one can't be 0 => [1-9][0-9]* 
 +  * so we have either one of these options: [0-9] U [1-9][0-9]* 
 +  * But we can write it in a more concise way: 0 U [1-9][0-9]* <= a number of 1 or more digits that doesn't start with 0, or 0 itself 
 + 
 +Having decided on the regex for the number, we can write a regex for expressions 
 + 
 +> (0 U [1-9][0-9]*) ( ('​+'​ U '​*'​) (0 U [1-9][0-9]*) )* 
 + 
 +This can be understood with a clearer/​intuitive notation, which however is **not exactly formally correct** (not a regex): 
 + 
 +> number ( ('​+'​ U '​*'​) number )* 
 + 
 +</​hidden>​ 
 + 
 + 
 +**3.4.1.** Write a regular expression for $ L = \{ \omega \text{ in } \text{{0,​1}} ^* \text{ | EVERY sequence of two or more consecutive zeros appears before ANY sequence of two or more consecutive ones} \} $ 
 + 
 + 
 +<hidden 3.4.1> 
 + 
 +> (1 U ε) ( 0 0* (1 U ε) )*  (0 U ε) ( 1 1* (0 U ε) )* 
 + 
 +We can start with either 1 or 0. Then we can have any sequences of zeros of any length, but no sequence of ones with length bigger than 1. 
 + 
 +This way we make sure that any sequence of 2 or more zeros precedes any sequence of 2 or more ones. 
 + 
 +Then we repeat the same logic on the left, making sure no sequence of 2 or more zeros appear on this side. 
 + 
 +</​hidden>​ 
 + 
 + 
 +**3.4.2.** 
 +Find a regular expression for the set of all binary strings with the property that none of its prefixes has two more 0's than 1's nor two more 1's than 0's. 
 + 
 +<hidden 3.4.2> 
 + 
 +We the given property implies that the word is composed of repeated sequences of '​10'​ and '​01'​ (so elements at odd positions are different from their left neighbour), possibly followed ​by a final 1 or 0: 
 + 
 +After any prefix ​of even length with an equal number of 0's and 1's (including $\epsilon$),​ you can either finish ​the wordadd a single character and then finish ​the word (both of these keeping the rule true) or add at least 2 characters. In the last case, if the characters are equal the prefix ending at these two characters breaks the rule. 
 + 
 +$(01\cup10)^*(0\cup1\cup\epsilon)$ 
 + 
 +</hidden>​ 
 + 
 +**3.4.3.** Write a regular expression which generates ​the accepted language of A. Then try to find the most simple and easy to understand way to write it. 
 + 
 +{{:​lfa:​graf1.png?​400|}} 
 + 
 + 
 +<hidden 3.4.3> 
 + 
 +Looking at the DFA we can tell state 3 is a sink state. We can simplify the DFA's drawing by not looking at/ignoring it. 
 +Let's see what words are accepted: 
 +  * ab*ab (when we don't loop on state 1) 
 +  * ab*(<a way to leave state 2 and return back to it>)*ab => ab*(aab*)*ab 
 +  * anything that repeats the previous expression several times 
 +  * ε (the initial state is also a final state) 
 +  * from the previous 2 observations => we can use Kleene star 
 + So, the regex is: 
 +> ( ab*(aab*)*ab )* 
 + 
 +<​note>​ 
 + 
 +For now, we will try to determine the equivalence between a regex and a DFA intuitively. But this is not the actual correct approach. 
 + 
 +We will learn later (and you can review this exercise with the future knowledge) that once we find a regex intuitively,​ we should check that the DFA and the regex are actually equivalent by transforming the regex into an NFA and then checking for non-distinguishable states OR we can use a DFA to regex conversion ​algorithm. (ask your TA about this if you want more details right now or wait until you're actually learning for the exam and reviewing all the courses) 
 + 
 +</​note>​ 
 + 
 +</​hidden>​ 
 + 
 + 
 +==== Conclusion ==== 
 +<hidden Conclusion ><​note important>​ 
 +What have we learned today? 
 + 
 +**Plural of regex is regrets** 
 +</​note></​hidden>​
  
  
-$math[((1(01^*0)^*1)\cup0)^*]