[go: up one dir, main page]

0% found this document useful (0 votes)
68 views53 pages

Languages Strings

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views53 pages

Languages Strings

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 53

Theory of Computation

Automata, Computability, & Complexity


by Elaine Rich
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

1
Why Study the Theory of
Computation?
Implementations come and go.

Chapter 1 2
IBM 7090 Programming in the 1950’s
ENTRY SXA 4,RETURN
LDQ X
FMP A
FAD B Ax2 + Bx +C
XCA
FMP X
FAD C
STO RESULT
RETURN TRA 0
A BSS 1
B BSS 1
C BSS 1
X BSS 1
TEMP BSS 1
STORE BSS 1
END 3
Programming in the 1970’s
IBM 360 JCL (Job Control Language)
//MYJOB JOB (COMPRESS),
'VOLKER BANDKE',CLASS=P,COND=(0,NE)
//BACKUP EXEC PGM=IEBCOPY
//SYSPRINT DD SYSOUT=*
//SYSUT1 DD DISP=SHR,DSN=MY.IMPORTNT.PDS
//SYSUT2 DD DISP=(,CATLG),
DSN=MY.IMPORTNT.PDS.BACKUP,
// UNIT=3350,VOL=SER=DISK01,
// DCB=MY.IMPORTNT.PDS,
SPACE=(CYL,(10,10,20))
//COMPRESS EXEC PGM=IEBCOPY
//SYSPRINT DD SYSOUT=*
//MYPDS DD DISP=OLD,DSN=*.BACKUP.SYSUT1
//SYSIN DD *
COPY INDD=MYPDS,OUTDD=MYPDS
//DELETE2 EXEC PGM=IEFBR14
//BACKPDS DD DISP=(OLD,DELETE,DELETE),
DSN=MY.IMPORTNT.PDS.BACKUP
4
Guruhood

IBM’s APL Language – Returns 1 if the


largest value in a 3 element vector is
greater than the sum of the other 2 and
Returns 0 otherwise
APL was very powerful for processing
arrays & vectors

5
Why study this?
Science of Computing
•Mathematical Properties (problems &
algorithms) having nothing to do with
current technology or languages
•E.G. Alan Turing – died 1954
•Provides Abstract Structures
•Defines Provable Limits
– Like “Big Oh”
6
Goals
Study Principles of Problems themselves
•Does a solution exist?
– If not, is there a restricted variation?
•Can solution be implemented in fixed
memory?
•Is Solution efficient?
– Growth of time & memory with problem size?
•Are there equivalent groups of problems?
7
Applications of the Theory
• Programming languages, compilers, & context-free
grammars.
• FSMs (finite state machines) for parity checkers,
vending machines, communication protocols, &
building security devices.
• Interactive games as nondeterministic FSMs.
• Natural languages are mostly context-free. Speech
understanding systems use probabilistic FSMs.
• Computational Biology: DNA & proteins are strings.
• The undecidability of a simple security model.
• Artificial Intelligence: the undecidability of first-order
logic.
8
Languages and Strings

This is one of MOST important chapters.


It includes the TERMINOLOGY in this course.

Chapter 2
9
Let's Look at Some Problems
int alpha, beta;
alpha = 3;
beta = (2 + 5) / 10;
(1) Lexical analysis: Scan the program; break it into variable
names, numbers, etc.
(2) Parsing: Create a tree that corresponds to the sequence of
operations to be executed, e.g.,
/

+ 10

2 5
(3) Optimization: Recognize, can skip first assignment since value
is never used; can precompute the arithmetic expression, since it
contains only constants.
(4) Termination: Determine if program is guaranteed to halt.
(5) Interpretation: Figure out what (if anything) useful program does.
10
Side Note:
What is a Good Definition?

A good definition consists of 2 components


•Category of term being defined
•Description of what distinguishes this
particular element of the category from
others

11
Definition Example
• Define automobile
• A mode of transportation – NO
• A vehicle with 4 wheels that allows a person to
travel from on location to another – NO
• A mode of transportation by which I get to MSU
– NO
• All the statements are true so Why Not?
• Which on is the best? Why?

12
Definition Example
• Define automobile
• A personal motorized vehicle with 4 wheels that
allows a person to drive himself and others
from on location to another
– This is pretty good but does it completely distinguish
an automobile from other similar vehicles?
– What about a go cart? A pick-up truck?

WHAT IT THE POINT OF THIS??


13
A Framework for Analyzing
Problems
We need a single framework in which we can
analyze a very diverse set of problems.

The framework is

Language Recognition

*A language is a (possibly infinite) set of finite


length strings over a finite alphabet.

NOTE: Pay particular attention to use of finite & infinite in all definitions!
14
Alphabet - 
• An alphabet is a non-empty, finite set of
characters/symbols
– Use  to denote an alphabet

• Examples
 = { a, b }
 = { 0, 1, 2 }
 = { a, b, c,…z, A, B, … Z }
 = { #, $, *, @, & }
15
Strings
• A string is a finite sequence, possibly
empty, of characters drawn from some
alphabet .
  is the empty string
• * is the set of all possible strings over an
alphabet .

16
Example Alphabets & Strings

Alphabet name Alphabet symbols Example strings


The lower case {a, b, c, …, z} , aabbcg, aaaaa
English alphabet

The binary {0, 1} , 0, 001100,11


alphabet

A star alphabet { ,  ,  , , , } , , 


A music
alphabet {, , , , , , } , 

17
Functions on Strings
Length:
• |s| is the length of string s
• |s| is the number of characters in string s.

| | = 0
|1001101| = 7

#c(s) is defined as the number of times that c occurs in s.

#a(abbaaa) = 4.

18
More Functions on Strings
Concatenation: the concatenation of 2 strings s
and t is the string formed by appending t to s; written
as s||t or more commonly, st

Example:
If x = good and y = bye, then xy = goodbye
and yx = byegood

•Note that |xy| = |x| + |y| -- Is it always??


• is the identity for concatenation of strings. So,
x (x  =  x = x)
•Concatenation is associative. So,
s, t, w ((st)w = s(tw))
19
More Functions on Strings
Replication: For each string w and each natural
number k, the string w k is:

w0=
w k+1 = w k w

Examples:
a3 = aaa
(bye)2 = byebye
a0b3 = bbb
b2y2e2 = ??

Natural Numbers {0,1,2,…}


20
More Functions on Strings
Reverse: For each string w, w R is defined as:

if |w| = 0 then w R = w = 

if |w| = 1 then w R = w

if |w| > 1 then:


a   (u  * (w = ua))
So define w R = a u R
OR

if |w| > 1 then:


a   & u  * ϶ w = ua
So define w R = a u R
21
Proof is by simple induction
Concatenation & Reverse of Strings

Theorem: If w and x are strings, then (w x)R = x R w R.

Example:

(nametag)R = (tag)R (name)R = gateman

In this course, will not use this too much!


Not responsible for inductive proof on next slide.

22
Concatenation & Reverse of Strings
Proof: By induction on |x|:

|x| = 0: Then x = , and (wx)R = (w )R = (w)R =  wR = R wR = xR wR.

n  0 (((|x| = n)  ((w x)R = xR wR)) 


((|x| = n + 1)  ((w x)R = xR wR))):

Consider any string x, where |x| = n + 1. Then x = u a for some


character a and |u| = n. So:

(w x)R = (w (u a))R rewrite x as ua


= ((w u) a)R associativity of concatenation
= a (w u)R definition of reversal
= a (uR wR) induction hypothesis
= (a uR) wR associativity of concatenation
= (ua)R wR definition of reversal
= x R wR rewrite ua as x
23
Relations on Strings - Substrings
o Substring: string s is a substring of string t if s
occurs contiguously in t
o Every string is a substring of itself
o  is a substring of every string
o Proper Substring: s is a proper substring of t
iff s ≠ t
o Suppose t = aabbcc.
o Substrings: , a, aa, ab, bbcc, b, c, aabbcc
o Proper substrings?
o Others? 24
The Prefix Relations
s is a prefix of t iff x  * (t = sx).

s is a proper prefix of t iff s is a prefix of t and s  t.

Examples:

The prefixes of abba are: , a, ab, abb, abba.


The proper prefixes of abba are: , a, ab, abb.

•Every string is a prefix of itself.


• is a prefix of every string.

25
The Suffix Relations

s is a suffix of t iff x  * (t = xs).

s is a proper suffix of t iff s is a suffix of t and s  t.

Examples:

The suffixes of abba are: , a, ba, bba, abba.


The proper suffixes of abba are: , a, ba, bba.

•Every string is a suffix of itself.


• is a suffix of every string.

26
Defining a Language
A language is a (finite or infinite) set of strings over a (finite)
alphabet .

Examples: Let  = {a, b}

Some languages over :


={} // the empty language, no strings
{} // language contains only the empty string
{a, b}
{, a, aa, aaa, aaaa, aaaaa}

27
Defining a Language
Two ways to define a language via a
Machine = Automaton
AKA – Computer Program
•Recognizer
•Generator
Which do we want? Why?

28
*
 * is defined as the set of all possible
strings that can be formed from the
alphabet *
 * is a language

 * contains an infinite number of strings


 * is countably infinite

29
* Example
Let  = {a, b}
* = {, a, b,aa,ab,ba,bb,aaa,aab,… }

Later, we will spend some more time


studying *.

30
Defining Languages
Remember we are defining a set
Set Notation:
L = { w  * | description of w}
L = { w  {a,b,c}* | description of w}

•“description of w” can take many forms but must


be precise
•Notation can vary, but must precisely define
31
Example Language Definitions

L = {x  {a, b}* | all a’s precede all b’s}

•aab,aaabb, and aabbb are in L.


•aba, ba, and abc are not in L.
•What about , a, aa, and bb?

L = {x : y  {a, b}* | x = ya}


•Give an English description.

32
Example Language Definitions

Let  = {a, b}
•L = { w  * : |w| < 5}
•L = { w  * | w begins with b}
•L = { w  * | #b(w) = 2}
•L = { w  * | each a is followed by
exactly 2 b’s}
•L = { w  * | w does not begin with a}

33
The Perils of Using English

L = {x#y: x, y  {0, 1, 2, 3, 4, 5, 6, 7, 8,
9}* and, when x & y are viewed as
decimal representations of natural
numbers, square(x) = y}.

Examples:
3#9, 12#144
3#8, 12, 12#12#12
#

34
A Halting Problem Language

L = {w | w is a C++ program that halts on


all inputs}
• Well specified.
• Can we decide what strings it contains?
•Do we want a generator or recognizer?

35
More Examples

What strings are in the following languages?

L = {w  {a, b}*: no prefix of w contains b}

L = {w  {a, b}*: no prefix of w starts with a}

L = {w  {a, b}*: every prefix of w starts with a}

L = {an : n  0}

L = {ba2n : n  0}

L = {bnan : n  0}
36
Enumeration

Enumeration: to list all strings in a language (set)

• Arbitrary order

• More useful: lexicographic order


• Shortest first
• Within a length, dictionary order
• Define linear order of arbitrary symbols

37
Lexicographic Enumeration
{w  {a, b}* : |w| is even}
{, aa, ab, bb, aaaa, aaab, …}

What string is next?


How many strings of length 4?
How many strings of length 6?

38
Cardinality of a Language

•Cardinality of a Language: the number of strings


in the language
•| L |
•Smallest language over any  is , with
cardinality 0.
•The largest is *.
• Is this true?
• How big is it?
•Can a language be uncountable?
39
Countably Infinite
Theorem: If    then * is countably infinite.
Proof: The elements of * can be lexicographically
enumerated by the following procedure:
• Enumerate all strings of length 0, then length 1,
then length 2, and so forth.
• Within the strings of a given length, enumerate
them in dictionary order.
This enumeration is infinite since there is no longest
string in *. Since there exists an infinite enumeration of
*, it is countably infinite.

40
How Many Languages Are There?

Theorem: If    then the set of languages


over  is uncountably infinite (uncountable).

Proof: The set of languages defined on  is


P(*). * is countably infinite. By Theorem A.4,
if S is a countably infinite set, P(S) is
uncountably infinite. So P(*) is uncountably
infinite.

What does this mean?!?!?!


41
Functions on Languages

Set (Language) functions


Have the traditional meaning
• Union
• Intersection
• Complement
• Difference

Language functions
• Concatenation
• Kleene star
42
Concatenation of Languages
If L1 and L2 are languages over :

L1L2 = {w : s  L1 & t  L2 ϶ w = st }

Examples:
L1 = {cat, dog}
L2 = {apple, pear}
L1 L2 ={catapple, catpear, dogapple, dogpear}
L2 L1 ={applecat,appledog,pearcat,peardog}

43
Concatenation of Languages
{} is the identity for concatenation:

L{} = {}L = L

 is a zero for concatenation:

L=L=

44
Concatenating Languages Defined
Using Variables

The scope of any variable used in an expression that


invokes replication will be taken to be the entire
expression.

L1 = {an: n  0}
L2 = {bn : n  0}

L1 L2 = {anbm : n, m  0}
L1L2  {anbn : n  0}

45
*
Kleene Star
L* - language consisting of 0 or more concatenations of
strings from L
L* = {}  {w  * : w = w1 w2 … wk, k  1 &
w1, w2, … wk  L}
Examples:
L = {dog, cat, fish}
L* = {, dog, cat, fish, dogdog, dogcat,
dogfish,fishcatfish,fishdogdogfishcat, …}
~~~~~~~~~~~~
L1 = a* L2 = b*
What is a*? b*?
L1 L2 =
L2 L1 =
46
L1 L1 =
The + Operator
L+ = language consisting of 1 or more
concatenations of strings from L

L+ = L L*

L+ = L* - {} iff   L
Explain this definition!!
When is   L+?
47
Closure
• A set S is closed under the operation @ if
for every element x & y in S, x@y is also
an element of S
• A set S is closed under the operation @ if
for every element x  S & y  S, x@y  S
• Examples

48
Semantics: Assigning Meaning to Strings

When is the meaning of a string important?

A semantic interpretation function assigns


meanings to the strings of a language.

Can be very complex.

Example from English:

I brogelled the yourtish.


He’s all thumbs.
49
Uniqueness???
• Chocolate, please.
• I’d like chocolate.
• I’ll have chocolate today.
• I guess I’ll have chocolate.

They all have the same meaning!

50
Uniqueness???
Hand
•Give me a hand.
•I smashed my hand in the door.
•Please hand me that book.

These all have different meanings!!!

51
Uniqueness in CS???
• int x = 4; x++;
• int x = 4; ++x;
• int x = 4; x = x + 1;
• int x = 4; x=x - -1;
• int x = 5;

These all have the same result/meaning!

52
Semantic Interpretation
Functions
For formal languages:
• Programming languages
• Network protocol languages
• Database query languages
• HTML
• BNF
For other kinds of “natural” languages:
• DNA
Other computing
• Genetic algorithm solutions
• Other problem solutions 53

You might also like