[go: up one dir, main page]

0% found this document useful (0 votes)
13 views9 pages

A1 Solutions

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 9

comp2022/2922 Assignment 1 (90 marks) – Regular Languages s2 2023

This assignment is due in Week 5 on Sunday 3 September, 11:59pm.


All work must be done individually without consulting anyone else’s solutions in accordance
with the University’s “Academic Dishonesty and Plagiarism” policies.
For clarifications and more details on all aspects of this assignment (e.g., level of justification
expected, late penalties, repeated submissions, what to do if you are stuck, etc.) you are expected
to regularly monitor the Ed Forum post “Assignment FAQ”.

The submission portal on GS, and an Ed post explaining it, will open in Week 4. Here is some
early information.
The submission formats for regular expressions and automata are detailed here:
Regex submission format
DFA/NFA submission formats

For Problems 1 through 4 you will submit files. The file names must be:
problem 1 1.txt
problem 1 2.txt
problem 2 1.txt
problem 2 2.txt
problem 3 1 re.txt
problem 3 1 dfa.txt
problem 3 2 re.txt
problem 3 2 dfa.txt
problem 3 3 re.txt
problem 3 3 dfa.txt
problem 4 1 nfa.txt
problem 4 1 x.txt
problem 4 2 nfa.txt

The submission process for Problem 5 will be discussed in the forthcoming Ed post.

Problem 1. (10 marks)


For each of the following regular expressions over Σ = { a, b}, state 5 strings that
are in the language of the regular expression, and 5 strings that are not. (You
score max(0, 5 - num errors) points per part)

1. ( a∗ b)∗ | a∗
2. (Σ∗ aaΣ∗ )|(Σ∗ bbΣ∗ )|( ab)∗ |(ba)∗ | a|b

The first line of each answer file should contain a comma separated sequence of
five strings that are in the language, and the second line should contain a comma
separated sequence of five strings that are not in the language. For example, if
the regular expression was b∗ , an example of a correct text file would be:

epsilon, b, bb, bbb, bbbb


a, aa, aaa, aaaa, aaaaa

1
comp2022/2922 Assignment 1 (90 marks) – Regular Languages s2 2023

Solution 1. For instance:


1.

a, aa, aaa, aaaa, aaaaa


ba, baa, baaa, baaaa, baaaaa

2.

aa, aab, aabb, aabbb, aabbbb


aba, ababa, abababa, ababababa, abababababa

Problem 2. (10 marks)


For each of the following automata over Σ = { a, b}, state 5 strings that are in
the language of the automaton, and 5 strings that are not. (You score max(0, 5 -
num errors) points per part)

Sigma = a b
Q = q0 q1 q2 q3
start = q0
F = q2
q0 a q1
q0 b q2
q1 a q0
q1 b q0
q2 a q3
q2 b q3
q3 a q3
q3 b q3

Sigma = a b
Q = q0 q1 q2
start = q0
F = q1 q2
q0 a q1
q0 b q1
q0 a q2
q0 b q2
q1 a q2
q1 b q2
q2 a q2
q2 b q0

2
comp2022/2922 Assignment 1 (90 marks) – Regular Languages s2 2023

Solution 2. For instance:


1.

b, aab, aaaab, aaaaaab, aaaaaaaab


a, aa, aaa, aaaa, aaaaa

2.

a, aa, aaa, aaaa, aaaaa


aab, aabaab, aabaabaab, aabaabaabaab, aabaabaabaab

Problem 3. (30 marks)


For each of the following three languages over Σ = { a, b}, provide a regular
expression and a DFA for the language. For full marks, your automata should
have at most 10 states. Partial marks may be awarded for larger automata. (5
marks per regular expression and 5 marks per DFA)

1. The set of strings that contain abbab, in that order, but not necessarily con-
secutively.
Said differently, the set of strings such that one can delete some amount of
letters, possibly zero, to obtain abbab.
For example, aaababaababba is in the language, while bbbababbbaa is not.
2. The set of strings of length 4 that contain exactly two bs.
For example, abba is in the language, while abbb and abb are not.
3. The set of strings that contain exactly three as and an even number of bs.
For example, abababb is in the language, while ababab is not.

Solution 3. Here are DFAs for each language (where rejecting sinks are not
drawn):

1. b a a b a a, b

q0 a q1 b q2 b q3 a q4 b q5

To see that this DFA M is correct we reason as follows. Let L be the language
described in the question and M the drawn automaton. We must show that
L = L( M). To do this we will reason that L( M ) ⊆ L and then reason that
L ⊆ L( M ). To see that L( M) ⊆ L, note that the only way to get to the
final state is to read abbab in that order, possibly with some other letters
inbetween. To see that L ⊆ L( M), consider a string u ∈ L. Intuitively, the
DFA finds the first subsequence of the form abbab in u. To see what ’first’

3
comp2022/2922 Assignment 1 (90 marks) – Regular Languages s2 2023

means, note that every subsequence of the form abbab is identified by the
sequence of positions of its letters in u. For instance if u = baabbbaba then
baabbbaba(1, 3, 4, 6, 7)
baabbbaba(1, 3, 5, 6, 7)
baabbbaba(1, 4, 5, 6, 7)
baabbbaba(2, 3, 4, 6, 7)
baabbbaba(2, 3, 5, 6, 7)
baabbbaba(2, 4, 5, 6, 7)

These sequences are written in dictionary order, e.g., (1, 3, 4, 6, 7) comes


before (1, 3, 5, 6, 7) in the dictionary order, which comes before (1, 4, 5, 6, 7)
etc. Now, look at the underlined string whose sequence is the first in the
dictionary order. The underlining shows the first (in the dictionary order)
subsequence abbab in the string u. The automaton changes state when it
reads the underlined strings.
2. The idea is that the x-axis records the number of a’s, and the y-axis the
number of b’s.

q0,0 a q1,0 a q2,0

b b b

q1,1 a q2,1 a q3,1

b b b

q2,2 a q3,2 a q4,2

3. The idea is that the x-axis records the number of a’s, and the y-axis the num-
ber of b’s modulo 2. q a q a q a q
0 1 2 3

b b b b b b b b

q0,1 a q1,1 a q2,1 a q3,1

Here are regular expressions for each language:

4
comp2022/2922 Assignment 1 (90 marks) – Regular Languages s2 2023

1. ( a|b)∗ a( a|b)∗ b( a|b)∗ b( a|b)∗ a( a|b)∗ b( a|b)∗


2. ( aabb)|( abab)|(baab)|( abba)|(baba)|(bbaa)
3. Use the facts that ’even plus even = even’, ’even plus odd = odd’, and ’odd
plus odd = even’. Let E = (bb)∗ and O = b(bb)∗ .

( EaEaEaE)|(OaOaEaE)|(OaEaOaE)|(OaEaEaO)|
( EaOaOaE)|( EaOaEaO)|( EaEaOaO)|(OaOaOaO)

A slightly more concise expression can be constructed based on the idea


that if and only if the string is in the language, then the substrings to the
left and right of the central a have one a each and they either both contain
an odd number of bs or they both have an even number of bs. We can then
construct our answer R as follows:

Xodd = b(bb)∗ a(bb)∗ | (bb)∗ a(bb)∗ b ∵ odd+even=odd, even+odd=odd




Xeven = (bb)∗ a(bb)∗ | b(bb)∗ a(bb)∗ b ∵ even+even=even, odd+odd=even




R = Xodd aXodd | Xeven aXeven


   
= b(bb)∗ a(bb)∗ | (bb)∗ a(bb)∗ b a b(bb)∗ a(bb)∗ | (bb)∗ a(bb)∗ b |
   
(bb)∗ a(bb)∗ | b(bb)∗ a(bb)∗ b a (bb)∗ a(bb)∗ | b(bb)∗ a(bb)∗ b

An even more concise answer:


 
(bb)∗ ( a | bab)(bb)∗ a(bb)∗ ( a | bab) | ( ab|ba)(bb)∗ a(bb)∗ ( ab | ba) (bb)∗

Problem 4. (20 marks)


Always eager to help students with their assignments, the new teaching assis-
tant Chat2022 introduces a “clipping” mechanism for determinising NFAs. It’s
like trimming a bonsai – cutting away the superfluous to reveal a clean, aesthetic
structure. But here’s the catch: just as shortcuts might miss out on scenic routes,
“clipping” might not always capture the true essence of an NFA. While some
students happily accept Chat2022’s clipped solutions, others notice discrepan-
cies in behaviour. Your assignment: delve deep into Chat2022’s method.
In this problem, we only consider automata without epsilon transitions.
A clipping of an NFA N is a DFA obtained by the following process: for each
state-letter pair (q, x ) where q ∈ Q, x ∈ Σ, if |δ(q, x )| > 1, choose one element of
it to be δ(q, x ) and ignore the rest. Note that an NFA may have many possible
clippings.
For example, here is an NFA N:

5
comp2022/2922 Assignment 1 (90 marks) – Regular Languages s2 2023

a a, b
a, b
q0 q2
b
a, b
a

q1

There are two pairs (q, x ) where |δ(q, x )| > 1, i.e., (q0 , a) and (q2 , b). Hence there
are 6 possible clippings of N, corresponding to fixing one of the 3 choices for
(q0 , a) and one of the 2 choices for (q2 , b).

1. Consider the following claim: “For every epsilon-free NFA N and string x,
the NFA N accepts x if and only if some clipping of N accepts x.” This claim
is false. Show that it is false by providing a counterexample. (10 marks)
(Aside: It turns out that deciding whether N accepts x is in P, while deciding
whether some clipping of N accepts x is NP-complete, which you will learn
about later in the course. Hence, if the claim were true, it would imply that
P = NP.)

An automaton A over the alphabet Σ is universal if it accepts every string, that


is, if L( A) = Σ∗ .

2. Consider the following claim: “For every epsilon-free NFA N, the NFA N is
universal if and only if some clipping of N is universal.” This claim is also
false. Show that it is false by providing a counterexample. (10 marks)

In both of these questions, each counterexample should:

• be over the alphabet Σ = { a, b},


• have no more than 10 states, and
• have no more than 10 clippings.

Solution 4.

a a
1. The following automaton accepts x = aa (since q0 − → q0 − → q1 is an accept-
ing run), but no clipping of it accepts x. In particular, in any clipping, either
a a a
q0 −
→ q0 or q0 −→ q1 must be removed. If q0 − → q0 is removed, then the only
a
run on aa ends at q0 and so x is rejected. If q0 −→ q1 is removed, then there
is no run on aa and so x is rejected.

6
comp2022/2922 Assignment 1 (90 marks) – Regular Languages s2 2023

q0 a q1

2. The following automaton over Σ = { a, b} is universal, but no clipping of


x
it is universal. It is universal, since ϵ is trivially accepted, q0 −
→ q1 is an
x1 x2
accepting run for x ∈ Σ, and for any x1 x2 . . . xn where n ≥ 2, q0 − → q2 −→
x3 xn a
q3 −→ . . . −→ q3 is an accepting run. Now in any clipping, either q0 −→ q1 or
a a
q0 −→ q2 must be removed. If q0 − → q1 is removed, then the only run on a
a
ends at q2 and so a is rejected. If q0 −
→ q2 is removed, then there is no run
on aa and so aa is rejected. In either case, the clipping is not universal.
a, b
q0 q1

a, b a, b

a, b
q2 q3

Problem 5. (20 marks)


You work at AutoGenoma, a facility that designs automata for pattern matching
DNA strings. They use an in-house variant of Nucleic Acid Notation. This
includes the letters A, C, G and T which represent DNA bases, as well as a new
letter X. A string that contains the letter X represents strings in which every
occurrence of X is replaced by the same base letter. For instance, the string
XXCXT represents the strings AACAT, CCCCT, GGCGT and TTCTT, but not
AACTT (since X needs to be replaced by the same base letter), and not AXCAT
(since every occurrence of X must be replaced). On the other hand, the string
AACAT is represented by XXCXT, AAXAT and AACAX.
Your job is to convert an automaton M over { A, C, G, T } into an automaton M′
that accepts exactly the strings over { A, C, G, T, X } that represent some string
accepted by M. We say that M′ represents M. For instance, if M accepts AACAT
then M′ must accept all of the following strings:

• AACAT (zero occurrences of X),


• XACAT, AXCAT, AAXAT, AACXT, AACAX (one occurrence of X),
• AXCXT, XACXT, XXCAT (two occurrences of X),
• XXCXT (three occurrences of X);

7
comp2022/2922 Assignment 1 (90 marks) – Regular Languages s2 2023

and if M′ accepts XXCXT then M must accept at least one of the following
strings: AACAT, CCCCT, GGCGT, TTCTT.
For example, suppose M is the following automaton:
A, C, G, T

q0 A q1 T q2

Then the following automaton represents M:


A, C, G, T, X

A T, X
q0 q1 q2

X T

q3

A, C, G, T, X

1. Provide a program to convert an NFA M over alphabet { A, C, G, T } into an


NFA M′ over alphabet { A, C, G, T, X } that represents M. (10 marks)
2. Provide a program to convert a DFA M over alphabet { A, C, G, T } into a
DFA M′ over alphabet { A, C, G, T, X } that represents M. (10 marks)

The conversions should ensure that the number of states in M′ is polynomial in


the number of states of M.

Solution 5. We use the following constructions. Note that others exist, e.g. we
could use the construction for (2) in (1).

1. For the NFA version, the idea is to form an NFA as the union of four NFAs
that decide which of A, C, G, T the X’s should be replaced with. The union
is done with the union construction for NFAs, i.e., by adding a new initial
state and epsilon transitions to the initial states of each of the four NFAs.
Here are the details. For every state qi in M, create four states in M′ :
Aqi , Cqi , Gqi , Tqi each indicating that the X in the string should be sub-
stituted with A, C, G, T respectively. Also create a new initial state q0′ for
M′ .
For each transition q j ∈ δ(qi , Y ) in M (here Y ∈ { A, C, G, T, ϵ}), we add
a transition for each of A, C, G, T variants, i.e. Zq j ∈ δ′ ( Zqi , Y ) for each

8
comp2022/2922 Assignment 1 (90 marks) – Regular Languages s2 2023

Z ∈ { A, C, G, T }. If Y is not ϵ, we also add a transition Yq j ∈ δ′ (Yqi , X ),


where Yqi ∈ { Aqi , Cqi , Gqi , Tqi }.
Let the initial state of M be q0 , then we add an epsilon transition from q0′ to
each variant of Aq0 , Cq0 , Gq0 , Tq0 .
Finally, for each final state qi ∈ F for M, we add Aqi , Cqi , Gqi , Tqi to F ′ .
We claim that this construction yields an automaton with exactly the correct
language.
First, consider any string accepted by M′ . Then there must be an accepting
c1 ck
run q0′ − Yqi , where Y is one of A, C, T, G and Yqi ∈ F ′ . But
ϵ
→ Yq0 − → ... −

this means that the string, after replacing X 7→ Y, must have been accepted
by M, since removing the first state and transition in that run and replacing
Yqi with qi , and X with Y in all transitions in this run yields an accepting
run for M.
Second, consider any string s accepted by M, and let s′ represent s, i.e., ei-
ther s′ = s or s′ is formed from s by choosing Y ∈ { A, C, T, G } and replacing
some occurrence of Y in s by X. We now argue that s′ is accepted by M′ . If
s′ = s (i.e., does not contain X), then each of the four copies will accept s′ .
Otherwise, we can transform an accepting run of M on s into an accepting
run of M′ on s′ as follows: add the transition q0′ −
ϵ
→ Yq0 to the start of the
run and convert all occurrences of qi to Yqi .
As such, we conclude that the construction is correct. This construction
takes 4n + 1 states, where n is the number of states of M.
2. For the DFA version, we implement a similar idea, except a state of the DFA
stores 4 pieces of information as a 4-tuple of states of the given DFA, where
the first element is the state the DFA would have been in if X was parsed
as A, the second if X was parsed as C, and so on. More explicitly, we create
a state (qi , q j , qk , ql ) for M′ for each tuple of 4 states from M, as well as an
error state qerr . Our initial state is then (q0 , q0 , q0 , q0 ).
For each state in M′ , (qi , q j , qk , ql ), we now add the transition
δ′ ((qi , q j , qk , ql ), Y ) = (δ(qi , Y ), δ(q j , Y ), δ(qk , Y ), δ(ql , Y )), for each Y ∈
{ A, C, T, G }. We also let δ(qerr , Y ) = qerr for all Y. Furthermore, we add
the transition δ′ ((qi , q j , qk , ql ), X ) = (δ(qi , A), δ(q j , C ), δ(qk , G ), δ(ql , T )).
Finally, (qi , q j , qk , ql ) ∈ F ′ if any of qi , q j , qk , ql ∈ F. This is correct using an
analogous argument. This construction takes n4 + 1 states.

You might also like