3. Regular expressions


$ A=\{ 0^{2k} \mid k \geq 1 \}$

$ B = \{0, \epsilon \}$
$ AB = ? $

A = {00, 0000, 00000000 …}

AB = {00, 000, 0000, 00000, …} ← this is the cartesian product between the sets(languages) A and B, where the elements of A come first.

  • where the words in the language that have an even length are obtained by combining a word from A with the word ε from B
  • and those with an odd length are obtained by combining a word from A with the word 0 from B


$ A = \{ 0^n 1^n \mid n \geq 1 \}$
$ B = \{ 1^n \mid n \geq 1 \} $
$ AB = ? $
$ BA = ? $

A is the language in which the words start with zero and end with one and the number of one is equal to the number of zeros (the same value for n is used).

The notation of n in the definition of B is completely unrelated to the n used to define A.

So, B is the language of words made of sequences of ones, having the length of at least 1, so basically B = L(11*).

A = {01, 0011, 000111, 00001111 …}

B = {1, 11, 111 …}

AB = {011, 00111, 000011111, …, 0111, 001111, 00011111, 0000111111, … }

BA = {101, 10011, 1000111, …, 1101, 110011, …, 11101, 1110011 …}


$ A = \emptyset $
$ B = \{ 1^n \mid n \geq 1 \} $
$ AB = ? $
$ A^* = ? $
$ B^* = ? $

AB = ∅ (because A is empty, so the cartesian product leads to an empty set)

A* = {ε} (epsilon is always part of Kleene star)

B*= {ε} (epsilon is always part of Kleene star) U {$ 1^n $} U {$ 1^{2n} $} U {$ 1^{3n} $} U … So basically B = L( ($1^n$)* )


$ A = \{ 0^n 1^n 0^m \mid m \geq n \geq 1 \}$
$ B = \{ 0^n \mid n \geq 1 \} $
$ AB = ? $
$ BA = ? $

$ AB = \{ 0^n 1^n 0^{m+k} \mid m \geq n \geq 1, k \geq 1 \}$ . Deci $ AB = A$ .

Note that the n in the definition of language A is different from the n in in the definition of B, they are independent when used in defining different sets/languages. However, when n is used several times in the definition of one language, such as the 2 times it appears in language A, it is the same value.

$ BA = \{ 0^{(n+k)} 1^n 0^m \mid m \geq n \geq 1, k \geq 1 \}$ . Equivalently: $ BA = \{0^x 1^y 0^z \mid x \geq y\geq 1 \text{ and } z \geq y \geq 1 \}$

3.2.1. Write a regular expression for the language of arithmetic expressions containing +, * and numbers. Hint: you can abbreviate $ 0 \cup 1 \cup \ldots \cup 9 $ by $ [0-9] $

We start by defining the regex for a number:

  • a number can be a digit from 0 to 9 ⇒ [0-9]
  • a number can have several digits, but the first one can't be 0 ⇒ [1-9][0-9]*
  • so we have either one of these options: [0-9] U [1-9][0-9]*
  • But we can write it in a more concise way: 0 U [1-9][0-9]* ⇐ a number of 1 or more digits that doesn't start with 0, or 0 itself

Having decided on the regex for the number, we can write a regex for expressions

(0 U [1-9][0-9]*) ( ('+' U '*') (0 U [1-9][0-9]*) )*

This can be understood with a clearer/intuitive notation, which however is not exactly formally correct (not a regex):

number ( ('+' U '*') number )*

3.2.2. Write a regular expression for $ L = \{ \omega \text{ in } \text{{0,1}} ^* \text{ | EVERY sequence of two or more consecutive zeros appears before ANY sequence of two or more consecutive ones} \} $

(1 U ε) ( 0 0* (1 U ε) )* (0 U ε) ( 1 1* (0 U ε) )*

We can start with either 1 or 0. Then we can have any sequences of zeros of any length, but no sequence of ones with a length bigger than 1.

This way we make sure that any sequence of 2 or more zeros precedes any sequence of 2 or more ones.

Then we repeat the same logic on the left, making sure no sequence of 2 or more zeros appear on this side.

3.2.3. Write a DFA for $ L(( 10 \cup 0) ^* ( 1 \cup \epsilon )) $

3.2.4. Write a regular expression which generates the accepted language of A. Then try to find the most simple and easy to understand way to write it.

Looking at the DFA we can tell state 3 is a sink state. We can simplify the DFA's drawing by not looking at/ignoring it. Let's see what words are accepted:

  • ab*ab (when we don't loop on state 1)
  • ab*(<a way to leave state 2 and return back to it>)*ab ⇒ ab*(aab*)*ab
  • anything that repeats the previous expression several times
  • ε (the initial state is also a final state)
  • from the previous 2 observations ⇒ we can use Kleene star

So, the regex is:

( ab*(aab*)*ab )*
For now, we will try to determine the equivalence between a regex and a DFA intuitively. But this is not the actual correct approach.

We will learn later (and you can review this exercise with the future knowledge) that once we find a regex intuitively, we should check that the DFA and the regex are actually equivalent by transforming the regex into an NFA and then checking for non-distinguishable states OR we can use a DFA to regex conversion algorithm. (ask your TA about this if you want more details right now or wait until you're actually learning for the exam and reviewing all the courses)

3.2.5. Find a regular expression for the set of all binary strings with the property that none of its prefixes has two more 0's than 1's nor two more 1's than 0's.

We the given property implies that the word is composed of repeated sequences of '10' and '01' (so elements at odd positions are different from their left neighbour), possibly followed by a final 1 or 0:

After any prefix of even length with an equal number of 0's and 1's (including $\epsilon$), you can either finish the word, add a single character and then finish the word (both of these keeping the rule true) or add at least 2 characters. In the last case, if the characters are equal the prefix ending at these two characters breaks the rule.


Are the following regex pair equivalent?


$ E1 = ab|a|b $
$ E2 = (a|\epsilon)(b|\epsilon) $

We can observe that E2 accepts ε, while E1 does not so they are not equivalent.
Another approach is to compute the language of each expression (since they are finite) and check if they are equivalent.


$ E1 = a(b|c)(d|e)|abb|abc $
$ E2 = ab(b|c|d|e)|acd|ace $

Since both E1 and E2 have a finite language, we could just compute the language and check if they are equivalent. Language is L = {abb, abc, abd, abe, acd, ace}, therefore they are equivalent.


$ E1 = (a\mid b)^*aa^* \mid \epsilon $
$ E2 = (a\mid ba)^*(b\mid ba)^* $

Both E1 and E2 have an infinite language, so comparing them is not an option.

We can see that for example E2 accepts b, while E1 does not accept it, so the expressions are not equivalent.

Fun fact: E1 was proposed by a student as a solution for 3.2.2 last year, while E2 is the actual solution.


$ E1 = ((ab^*a)^+b)^* $
$ E2 = (a(b\mid aa)^*ab)^* $

Both E1 and E2 have an infinite language, so comparing them is not an option.

We can try looking for words that are accepted by one and not by the others, but we can't easily find such words. ! This does not mean they are equivalent !

We should transform each expression into its min DFA and check if they are the same (number of states, transitions, alphabet, initial/final states) (renaming of states might be needed).

Plot twist: They are the same.

The purpose of this exercise is to understand how to approach regex equivalence, not how to solve this given comparison per se.


What have we learned today?

