Rule Generator
Dr. M. Moshiul Hoque
CSE, CUET
Grammar of a Language
• A grammar captures the legal structure in a
language and thus allows a sentence to be
analyzed.
• To do this we must know the rules of how
language is organized & have an algorithm to
analyze language given those rules.
• More complex than a programming language
• Rules of syntax specify the possible organizations
of words in sentences, and allow us to determine
a particular sentence's structure (or possible
structures) and hence help to determine its
meaning.
Designing the Grammar Rules
• The most common way to represent grammar is as a set of
production rules that says how the parts of speech can be
put together to make grammatical or “well-formed”
sentences.
➢ how the basic components of symbol strings, the
symbol themselves, can be aggregated into phrases, and
how these phrases can themselves be aggregated finally
into sentences.
➢ CFG & CSG
• Phrasal Categories: noun phrase, verb phrase, & adjective
phrase;
• Lexical categories: noun, verb, adjective, adverb and many
others.
Phrase Structure (PS) Grammar
• PS grammar is the collection of set of rewrite
rules to make a sentence.
• At the starting of the PS grammar place an
initial symbol or initial string # sentence #,
➢after initial symbol placed a set of sentence
making rules by which we can generate
sentences mechanically.
➢PS grammar gives the automatic structural
descriptions of the generative sentences.
PS Grammar
• Sentences are made up of words, traditionally
categorized into parts of speech or categories
including nouns, verbs, adjective, adverbs and
prepositions (normal abbreviated to N, V, A,
ADV and P).
• A grammar of a language is a set of rules,
which says how these parts of speech can be
put together to make grammatical, or “well-
formed” sentences.
PS Grammar
• cheleti vhat khaechhe (‡Q‡jwU fvZ †L‡q‡Q)
➢S→ NP + VP
➢cheleti (‡Q‡jwU) + vhat khaechhe (fvZ †L‡q‡Q)
❑NP → N + DET
❑chele (‡Q‡j) + ti (wU)
PS Grammar
• Determiner can be omitted
• rahim vhat khaechhe (iwng fvZ †L‡q‡Q)
• Sentence (S), Noun phrase (NP), Verb phrase
(VP), Auxiliary (AUX), Determiner (DET), Noun
(N), Verb form (VF), Verb root (VR)
• Visualized: labeled bracketing of a string of
words, or tree diagram
PS Grammar
• (a) cheleti vhat
khaechhe (‡Q‡jwU fvZ
†L‡q‡Q)
• (b) [S [NP [N chele]
[DET ti] ] [VP [NP [N
vhat] ] [VF [VR kha]
[AUX echhe]]]].
PS Grammar
• For convenience linguists often are a special
notation to write out grammar rules.
• In this notation, a rule consists of a left-land
side (LHS) and a right-land side (RHS)
connected by an arrow (→ )
• Head of the production → Body of the
production
PS Grammar
• S → NP VP
• VP → NP VF
• NP → N (DET)
• VF → VR AUX
• N → chele | boi | ami | kobita
• VR → kor | por | kha
• AUX → che | i | echhe
• DET → ti
• Each node in the tree corresponds to the LHS of a particular
rule, while the daughters of each node correspond to the RHS
of that rule.
• If the RHS has 02 constituents, as in NP → N (DET) there will be
02 branches & 02 daughters; if there are 03 constituents, there
will be 03 branches & 03 daughters, and so on.
Context-Free Grammars (CFG’s)
• A →B
• A can only be a single non-terminal
• B can by any number of terminals & non-terminals.
• Properties :
✓ A LHS & RHS separated by an arrow.
✓ One symbol only on the LHS .
✓ At least one symbol on the RHS.
✓ Symbols on the LHS of rules are always non-terminals
(i.e. they never appear as leaves on trees).
✓ Symbols on the RHS may be either terminals or non-
terminals.
CFG’s for Bangla
Simple sentences
Context-Sensitive Grammars (CSG’s)
• CSG’s are the collection of a set of PS rules at
which at least one PS rule is context-sensitive.
• Applications of PS rules are restricted by the
context that is if certain conditions regarding
the context are fulfilled then they will be
activated.
Context-Sensitive Grammars (CSG’s)
• CSG’s have the added restriction that the
length of the string on the RHS of the rewrite
rule must be at least as long as the string on
the LHS.
• xyz → xwz
➢y must be a single non-terminal symbol
➢w nonempty strings.
Context-Sensitive Grammars (CSG’s)
• S → aS
• S → aAB
• AB → BA
• aA → ab
• aA → aa
❑capitalized letters: non-terminals
❑lower case letters: terminals.
❑There is an agreement between subject & verb
forms
CSG’s for Bangla
• The components of the language are terminal
symbols and non-terminals.
• The highest-level non-terminal is the
sentence, represented by the symbol S.
• Each non-terminal can be defined as a
sequence of other symbols, either non-
terminal ones or terminal ones or both.
CSG’s for Bangla
• It is not possible to represent the agreement
of person + Class between subject and verb
form by the CFG’s.
• So, Context- sensitive grammatical PS rules
must be generated to analyze the all sorts of
sentences in Bangla.
• In CSG’s, auxiliaries are depended on subjects
i. e; there is an agreement of person + class
between the auxiliary and subject.
Agreements
• In every language, there is an agreement
between the subject & verb form.
• In English, subject and verbs have to agree in
person and number.
• Similarly, there is also an agreement of person
+ class between verb form and subject in
Bangla.
• The auxiliary changes for tense, aspect and
person + class.
Agreements
• tini boi porchen (wZwb eB co‡Qb)
• VP → boi porchen (eB co‡Qb)
• VF → porchen (co‡Qb)
• VR → por (co)
• AUX → chen (‡Qb)
• AUX → Aspect + Tense + Concord
• Chen = Ch (Q) + Q + en (Gb)
= Continuous + Present + Concord of third person
honorific for Present tense
Agreements
• In Bangla, the person + class of AUX follows the person +
class of subject NP.
▪ if subject is apani (Avcwb) then person + class of auxiliary
should be en (Gb)
▪ For tumi (Zzwg), it should be o (I) or e (G)
• apani boi porchen (Avcwb eB co‡Qb)
❑ [apani (Avcwb)+ boi (eB) + por (co)) + chh (Q) + en (Gb)]
• But it is not possible to expressed –
• apani boi porcho (Avcwb eB co‡Qv)
• [apani (Avcwb) + boi (eB) + por (co) + chhh (Q) + o (I)],
❑ Because o (I) does not agree with the subject apani
(Avcwb)
Agreement of person + class with the verb
auxiliary and subject for Present Tense.
Agreement of person + class with the verb
auxiliary and subject for Past Tense
Agreement of person + class with verb
auxiliary and subject for Future Tense
Agreement of person + class with verb
auxiliary and subject for Habitual Past
Subcategorization & Updating the
Grammar Rules
• Sub categorization expresses the constraints
that a predicate (verb for now) places on the
number and type of the argument it wants to
take.
• Inflection of Bangla verb called AUX can have
different forms and hence subcategorized the
verb auxiliary depending on the various
tenses, the person & the classes of the verb.
Updated CSG Rules
Types of Sentences in Bangla
▪Each of these is defined using by clause.
▪Clause is the subpart of a Bangla sentence that has a
meaning.
(i) Principle Clause or Independent Clause
(ii) Subordinate Clause or Dependent Clause.
ami jani je tumi ashbe (Avwg Rvwb †h Zzwg Avm‡e),
Principle Clause: ami jani (Avwg Rvwb)
Subordinate Clause: je tumi ashbe (†h Zzwg Avm‡e).
Structure of the Simple Sentence
• A simple sentence is formed by an
independent clause or principle clause.
• A principle clause in Bangla can be rewritten
as (or decomposes into, or consist of) a NP
followed by VP
“se kall ashbe (‡m Kvj Avm‡e)” or “rahim eshkole jae
(iwng ¯‹z‡j hvq)”
Structure of the Complex Sentence
• Formed by one principle clause & one or more
subordinate clause.
• “ami jani je tumi ashbe (Avwg Rvwb †h Zzwg Avm‡e)”.
• Principle Clause: ami jani (Avwg Rvwb)
Subordinate Clause: je tumi ashbe (†h Zzwg
Avm‡e)
• sentences “ami jani (Avwg Rvwb)” & “je tumi
ashbe (†h Zzwg Avm‡e)” are simple form but 2nd
part is the subordinate of 1st part.
Another example..
• ami jani je tumi ashbe jehutu gorojta tumari
(Avwg Rvwb †h Zzwg Avm‡e †h‡nZz MiRUv †ZvgviB)”.
➢ami jani (Avwg Rvwb) is the principal clause
➢je tumi ashbe (†h Zzwg Avm‡e)” and “jehutu
gorojta tumari (†h‡nZz MiRUv †ZvgviB)” are the
two subordinate clause 1st part.
Subordinator/subordinator
complement
• Clauses are combined with subordinator
and/or corresponding subordinator
complement
• preposition that is used before the
subordinate clause is called subordinator
• preposition that is used before the principle
clause is called subordinator complement.
Subordinator/subordinator
complement
• jadi tumi poro tahole pass koribe (hw` Zzwg co
Zvn‡j Zzwg cvk Ki‡e)
• Subordinate Clause: jadi tumi poro (hw` Zzwg co)
• Principle Clause: “tahole tumi pass koribe
(Zvn‡j Zzwg cvk Ki‡e)”.
• Subordinator: “jadi (hw`)”
• Subordinator Complement: “tahole (Zvn‡j)”.
Some subordinators and their
respective subordinator complements
A set of rules for parsing complex
sentence.
Structure of the Compound Sentence
• Formed by two or more principle clause joined
by an indeclinable or connectives.
Connectives or Indeclinable
• Indeclinable means no change that means the
word in the sentence is unchanged.
• The part of speech,
➢which is always, unchanged form in the
sentence
➢which makes the sentence more meaningful
➢used as a connective for more than one part
of speech, clause or sentence is called
indeclinable.
Bangla: o (I), ebong (Ges), noile (bB‡j), kingtu
(wKš‘), fale (d‡j), notuba (bZzev), ar (Avi),
sutorang (myZivs), etc.
Decomposition Technique for
Compound Sentence
• To analyze the compound sentence, at first
breaks the complete sentence into several
independent clauses then analyze that
independent clause separately.
➢ If independent clause is same as simple sentence
then analyze that clause as simple sentence
➢ if the independent clause is same as complex
sentence then analyze that clause as complex
sentence
An Example
• tomora manush noo, ar jader chalaoo taraoo manush
noo (‡Zvgiv gvbyl bI, Avi hv‡`i PvjvI ZvivI gvbyl bq)”.
• It is a compound sentence & it consists of two
independent clause-
➢ (i) Tomara manush noo (‡Zvgiv gvbyl bI)
➢ (ii) jader chalaoo taraoo manush noo (hv‡`i PvjvI ZvivI
gvbyl bq).
• These two independent clauses are combined a
connective “ar (Avi)”.
• 1st independent clause is a simple sentence
• 2nd independent clause is a complex sentence.
• Further, 2nd independent clause is divided into
two clauses.
• (i) Principle clause: tomora manush noo (‡Zvgiv
gvbyl bI)
• (ii) Subordinate clause: jader chalaoo (hv‡`i
PvjvI)”. These two clauses are simple
sentence.
•1st two independent clauses are separated &
then first one is analyze as simple sentence & 2nd
one is analyze as complex sentence
•then the two clause under the complex sentence
are analyze as simple sentence separately.
Simple + Simple
• se soth kingtu tar bhai osoth (‡m mr wKš‘ Zvi fvB
Amr)”.
❑se soth (‡m mr)
❑tar bhai osoth (Zvi fvB Amr) are principal
clause or simple sentence
❑kingtu (wKš‘) : a connective or indeclinable
Simple + Complex
• se soth kingtu je bondhuti tar sange eshechilo
se osoth (‡m mr wKš‘ †h eÜzwU Zvi m‡½ G‡mwQj ‡m
Amr)”.
• se soth (‡m mr) is a simple sentence
• je bonduti tar sange eshechilo se osoth (†h
eÜzwU Zvi m‡½ G‡mwQj ‡m Amr)” is a complex
sentence.
Complex + Complex
• jadi jante chaoo se keno asheni tahole bolbo ami
jani na, ar jadi jante chaoo ami keno jaini tahole
bolbo amar echhe holona tai (hw` Rvb‡Z PvI †m †Kb
Av‡mwb Zvn‡j ej‡ev Avwg Rvwbbv, Avi hw` Rvb‡Z PvI Avwg
†Kb Rvwbwb Zvn‡j ej‡ev Avgvi B‡”Q n‡jvbv ZvB)”.
• two complex sentences are connected by a
connective “ar (Avi)”.
• Construct compound sentence: simple + simple,
simple + complex, complex + simple, complex +
complex etc.
A set of rules for compound sentence
SR: Simple Sentence (CFG)
SR: Simple Sentence (CSG)
SR: Complex Sentence (CFG)
SR: Complex Sentence (CSG)
SR: Compound Sentences (CFG)
SR: Compound Sentences (CSG)
Limitations of Phrase Structure (PS)
Rules
• CFG’s + CSG’s is collection a sets of rewrite phrase
structure (PS) rules.
• By using PS rules, it cannot be possible to
generate all the sentence patterns of a language.
• PS used to produce deep structure of the
sentences
• another rules are needed with PS rules to
produce the surface structures of the sentences
Failure to Indicate the Discontinuous
Constituent
• CFG rules cannot represent the tree structure
of discontinuous constituents; it’s based on
only the immediate constituent analysis
(a) Se hese uthlo (‡m †n‡m DV‡jv)
(b) Uthlo se hese (DV‡jv ‡m †n‡m)
Failure to Indicate Structural
Similarity
• Immediate-constituent PS grammars are
structurally dissimilar
• (a) Ami tomake chai (Avwg †Zvgv‡K PvB)
• (b) Chai tomake ami (PvB †Zvgv‡K Avwg)
Ami tomake chai (Avwg †Zvgv‡K PvB)
generate only one sentence
(a) cannot generate (b)
Chai tomake ami (PvB †Zvgv‡K Avwg)
The SR is not correct because the CFGs gives
the structural descriptions of (b), which is
ungrammatical.
Failure to Solve the Problem of
Syntactic Ambiguity
• When a phrase or sentence can have more
than one structure it is said to be syntactically
or structurally ambiguous.
• It occurs when a structure of a sentence
provides more than one meaning.
• When a structure in a sentence is interpreted
in different way, then there will occur a
problem regarding the extraction of correct
meaning.
An Example..
• (i) Purono boier dokan (cyi‡bv eB‡qi †`vKvb).
• (j) (Purono boier) dokan (cyi‡bv eB‡qi) †`vKvb).
• (k) Purono (Boier Dokan) (cyi‡bv (eB‡qi †`vKvb).
❑NP: purono boier dokan (cyi‡bv eB‡qi †`vKvb) gives
two types of meaning:
✓The shop sell old books
✓The shop is old
❑Finding the meaning of a sentence is complicated
by the problem of ambiguity–there is often several
different possible meaning
Another example..
• Aakaser moto nil somudhro dhulche (AvKv‡ki
g‡Zv bxj mgy`ª †`vj‡Q).
03 types meaning:
▪The sea is blue as sky and is moving
▪The sea is blue as like as sky and moving
▪As the sky moves so as the blue sea is moving
By using the immediate constituent PS rules
cannot solve the problem of ambiguity of
these sentences & almost impossible.
Assignments
• 1. Ekti Chele Ekti mojaor boi porche
• 2. CUET is the best university in Bangladesh
• 3. Pakhigulo khub sundor lagche
• 4. Robibar Eid sutrang versity bondho thakbe
• 5. Rahim, Karim ebong tader baba bajare gelo
• 6. The user should clean the printer by the good
brush
➢ Draw PARSE TREE (CFG+CSG) & represent the
sentence in list form or label bracketing (CFG)
• Submission Deadline: Date of NLP Exam
Thanks