Grammatical Methods in Computer Vision: An Overview
Grammatical Methods in Computer Vision: An Overview
Technical Report
Nov 29, 2004
Abstract
We review various methods and applications that have used grammars for solving inference problems
in computer vision and pattern recognition. Grammars have been useful because they are intuitively sim-
ple to understand, and have very elegant representations. Their ability to model semantic interpretations
of patterns, both spatial and temporal, have made them extremely popular in the research community.
In this paper, we attempt to give an overview of what syntactic methods exist in the literature, and how
they have been used as tools for pattern modeling and recognition. We also describe several practical
applications, which have used them with great success.
1 Introduction
Grammars have been useful for providing structural descriptions of a variety of objects like plants, buildings,
English sentences etc., and associating semantic information with the resulting models. They have been
most widely used by the natural language processing community. However, they have also been applied in
many pattern recognition and computer vision problems. Some of the areas include character recognition
[K.H. Lee and Kashyap, 1988], recognition of handwritten mathematical formulae [Miller and Viola, 1998]
and more recently, activity recognition [Minnen et al., 2003].
Grammars consist of a collection of basic primitives and a set of rules which compose patterns out
of these primitives. Thus collections of objects which exhibit structural regularity can be modeled using
grammars. For example, the set 00 0110, 01110, 011110... describes the set of sequences over 0 1
which have a sequence of 1s bounded by two 0s. Much more complicated sets, which exhibit regularity
in higher dimensions including time, can be modeled using grammars. Roughly speaking, they generate
sub-patterns out of basic primitives, and then generate bigger or more complicated patterns from these sub-
patterns. This generation is controlled by the grammar rules.
There are several issues in the modeling of objects or events using grammars. The first issue is the
choice of basic primitives and the appropriate grammar type. The choice of primitives is determined by
how easily basic features can be extracted from the data, while the choice of grammar type is determined
1
by the complexity of the desired structure. One might want to consider the scenario, wherein extraction
of basic primitives and choice of the grammar, be completely independent. However, Narasimhan in Chap
1 of [Kaneff, 1969] argues that, this is not possible. According to him, the inherent complexity of the
problem is divided among the basic primitives and the actual grammar and thus the choice cannot be made
independently. Related to this point is the syntax-semantic tradeoff, which states that one might choose to
keep a very simple grammar and imbibe a lot of semantic interpretations in it, or otherwise choose to keep a
highly complicated grammar and have simpler and more tractable interpretations. This tradeoff is a crucial
decision that researchers have to make before thinking of a solution.
The second important issue is noise handling. Inputs in computer vision applications are very noisy,
and formal grammars are too exact theoretical concepts to model these perturbations in data. Thus, people
have come up with grammatical models and algorithms which are robust against noise. These two issues are
important in defining grammars and using them for performing inference tasks.
In this paper, we survey the various forms of grammars that have been developed and used by the
research community. We also attempt to provide scenarios where one form may prove more appropriate than
another form, depending on the characteristics of the problem under consideration. The paper is divided into
5 sections. In section 2, we discuss string grammars, their different extensions and some applications as to
how they have been useful in describing line drawings and activity recognition. Probabilistic information
could be associated with grammars and so we explore stochastic grammars in section 3. We then move on
to higher dimensional grammars and discuss various forms which have been developed over the past 30
years in section 4. We finally conclude the paper in section 5. Most of the mathematical descriptions have
been taken from [Bunke and Sanfeliu, 1990].
2 String Grammars
String grammars form the basis of formal language theory. These grammars are defined over a vocabulary
(finite set of symbols). Each grammar represents a set of (finite) sequences over this vocabulary. For ex-
ample, the sequence 01 0011 000111 00001111 , comprises of all strings over the vocabulary 0 1
which are formed by concatenating a sequence of 0s with a sequence of 1s, such that the length of the two
sequences are same. These sequences are generated using a collection of production rules. These rules
define a set of substring replacement policies which gives the final string certain properties, like the one
described in the example above.
The set of symbols belonging to the grammar vocabulary correspond to the basic primitives, while the
production rules correspond to the structural relationships that exist in the set defined by the grammar.
Since a sequence is a one-dimensional concept, the above schema is not very suitable for describing 2D
or 3D objects. However, certain extensions of these grammars have led to their use in computer vision
and pattern recognition applications. In this section, we first provide some formal definitions and examples
of string grammars. We then review the extensions of these grammars and how they have been useful in
practical applications.
2
N is a finite set of non-terminal symbols,
T is a finite set of terminal symbols,
P is a finite set of productions, or rewriting rules, and
S N is the initial or starting symbol.
It is required that N T φ. Let us suppose, V N
T . Each production p P is of the form α β
where α and β are called the left-hand and right-hand side respectively, with α V (set of finite, non-zero
length sequences over V ) and β V (set of finite length sequences
over V ).
To illustrate this definition, consider the grammar G N T P S
where N S A B , T a b c ,
P S cAb A aBa B aBa B cb .
This grammar generates the language
L G
can cban b n 1
For example, for generating caacbaab, i.e., for n 2, the following sequence of substring replacements are
applied.
S cAb caBab caaBaab caacbaab
1. Unrestricted class or type 0 have no constraints other than the definition of grammar described above.
3
Consider the grammar GBin N T P S
where N S , T 0 1 , P S S0 S S1 S 0
S 1 . GBin generates all sequences of 0s and 1s. Now, if we want to associate the language of G Bin with
the set of whole numbers, then the semantics that we choose for terminals and non-terminals would be their
actual value (attr==value). Thus, the symbol 0 will be mapped to the “whole number” zero, and the symbol
1 will be mapped to the “whole number” one. For further discussions, we will call the actual whole numbers
by their English names. Also, for a production S S0, the first non-terminal will be referred to as S 0 while
the second non-terminal will be referred to as S 1 , so that we know which attribute value we are referring
to. Thus, we have
0 value zero
1 value one
We now need to associate semantics to the production rules. The semantic rules relate the attribute values
of the left-hand side and the right-hand side of a production rule. It is through these rules that information
or meaning can propagate from the start symbol to the terminals and vice versa. For instance, the semantic
rule for the first and third productions would be
S value zero
the array index is present to refer to the appropriate non-terminal (S of left-hand side or right-hand side) in
the production. Now while deriving the string 10, we have the sequence
S S0 10
S 0 value two S 1 value zero two one zero two
2.3 Parsing
Grammars are used in computer vision and pattern recognition for performing inference tasks, i.e., finding
out whether the given data satisfies certain structural characteristics. Given the description of a grammar
and an input string, determining whether the string belongs to the language generated by the grammar is the
key question in this inference task. This task is performed by a parser. A parser takes as input a sequence
of symbols, and returns a parse tree which contains the details of the derivation of that string. If the string
is not part of the language, then no such tree can be generated and the parser can detect this. Designing
efficient and robust parsers are very important for performing pattern recognition tasks using grammars.
As the degree of expressiveness of the grammar increases, design of the corresponding parser becomes
more and more difficult. Most of the parsing methods assume a noise-free stream of input symbols. How-
ever, error-correcting parsing methods have also been devised. Much research has also been done in parsing
of higher dimensional structures like graph grammars and tree grammars [Fu and Shi, 1983]. More detailed
description of different parsers and issues in parsing could be found at [Aho et al., 1988] and [Fu, 1974].
4
2.4 Grammar Learning
A grammar representing a pattern class could be learned from a set of examples from that class. This is the
typical machine learning problem, where from a huge set of positive and negative examples, one attempts to
learn the syntactic structure inherent in this class. Inference of grammars has been a very difficult problem,
when applied to practical situations. Since the number of examples is always finite, generalization is an
important issue in these learning algorithms (otherwise overfitting will lead to inaccurate models). Most of
the applications propose grammatical models rather than “learning” those models and then apply them to
the test examples.
Much work has been done in developing different inference techniques for various forms of grammar.
For surveys of the different methods, see [Fu, 1974], [Angluin and Smith, 1983]. Some of the projects which
have used these inference techniques includes fingerprint classification using tree grammars [Moayer and Fu, 1975]
(discussions on higher dimensional grammars like tree grammars will be done in Section 4). In this work,
the “interesting” components of the picture were identified which had a lot of repetitive structures. Gram-
mars were inferred for each such component and the different grammars were then merged.
A similar (top-down) approach was used in the work on texture synthesis by [Lu and Fu, 1979], where
texture in images were modeled using stochastic tree grammars. The problem was broken into simpler
problems by decomposing the tree structure into a number of strings. Each such string class corresponded
to a finite state grammar, and was learnt independently. The resultant grammars were then recomposed to
output one tree grammar.
A pattern recognition for use in robotics was developed by [Ouriachi, 1980], where in structure of dif-
ferent machine components were inferred from image features like edges. A polygonal approximation of
these features is translated into sentences and a finite state grammar is learnt.
More recent work in this area is based on inferring bidimensional objects [Sainz and Sanfeliu, 1996]. A
grammar formalism developed by the same authors, ARE (Augmented Regular Expressions), was used to
describe two dimensional traffic signals. The grammar was itself learnt and its performance was shown to
be tolerant towards noise.
2.5.1 Definitions
The following definition has been taken from [Bunke and Sanfeliu, 1990]. A translation from language L 1 to
language L2 is a relation M in L1 L2 , where xMy means “output string y in L 2 is a translation of input
string
x from L1 . A syntax directed translation
is a generalization of context-free grammar, say G N T P S
5
together the input generation and the corresponding output generation into one single rule. Thus the rules
are of the form A α β, where β is a translated version of α. There needs to be some structural similarity
(in terms of positions and types of non-terminals), between α and β so that the entire generation process
remains essentially the same for both the channels. Moreover, associated non-terminals in the two channels
must be rewritten simultaneously.
To illustrate these translations, consider the schema
S aSc aSc
S b b
S a b
The language that is of interest is given by the output channel. This language is L a n bcn n 1 .
However, the input channel observes this data and sometimes misreads b to be a. In that case, the translation
schemes mentioned in the grammar would do the necessary correction and will restore the correct string.
6
h
a+b
head b
t
a
t a h
a−b
b
h
aXb
b
tail t a
c
a*b
t a h
must comprise of. Based on this work, many languages based on line drawings were introduced. Some of
these have been discussed in the next couple of sections of this paper.
7
Primitives a b d
c
Sub−pattern 1
P1 = [(b + c) * a]
Sub−pattern 2
P2 = [d + (a + (~d))]
Main pattern
P = P1 * P2
applied without the left-to-right restriction. However, these algorithms take into consideration all possible
constructions and are space-inefficient. Miller [Miller, 1974] had developed a local parsing scheme as an
improvement over this. This scheme was modified and adapted into the MIRABELLE system and an alter-
nating bottom-up and top-down parser was developed. Using this parser, hand drawings of common tools
like hammer, and sketches of house drawings were interpreted.
PDL patterns always had only two attaching points which limited their use in defining more complicated
drawings. This led to development of PLEX structures described in the next section.
8
1
1 2
2 hor (1, 2)
3
ver (1, 2, 3)
using a dimension grammar (first defined by [Dori and Pnueli, 1988]), and the extracted set of primitive
components are then parsed by this grammar. The underlying structure is that of the plex grammar. The
parser used also handles noise, which may be present due to the digitization process. Parsers for plex
languages are more complex than those corresponding to the strings. These were proposed and developed
by [Fu and Shi, 1983], and [Flasinski, 1989].
9
3 Stochastic Grammars
Grammars can be extended to incorporate probabilistic information. This information allows modeling of
the probability distribution over the space of patterns. Given a set of such probability distributions over
different grammar classes, it is easy to classify new patterns. We just need to compute for each class, the
probability of the pattern belonging to that class (by computing the parse tree), and then choosing the class
with the highest value. Even if an exact parse is not possible, one can perform “an approximate” parse and
get the probabilities. Probabilistic information allows us to use statistical techniques not only for finding
best matches, but also for performing robust feature detection guided by the grammar structure.
3.1 Definitions
A stochastic grammar is obtained from the characteristic (base) grammar, by associating a probability with
every rule, such that, the sum of the probabilities of rules, with identical left hand side, add up to 1. Intu-
itively, if we see the generating process of a string, at any stage, the probability of a “rewriting rule” getting
fired for a left-hand side sub-string (a non-terminal in case of context-free grammar), is based on the discrete
distribution of the choices that exist; the distribution for the i th left-hand side (say α) being specified by p i j ,
where j is the index for the set of productions, whose left-hand side is α.
The following definition is taken from [Bunke and Sanfeliu, 1990]. For simplicity of notation, we will
consider context-free grammars. Analogous definitions of stochastic grammars (used by [Minnen et al., 2003])
could be provided for other grammar forms too. So a generic production α β will now have a more spe-
cific form A Xi j , where Xi j will be strings over terminals and non-terminals. Consider the grammar
G N T P S
where for notational convenience, we have
N A1 A2 Am as the set of non-terminals, and the set of productions as
A1 X11 A1 X12 A1 X1n1 ......
Am Xm1 Am Xm2 Am Xmnm where Xi j N
T
! .
A stochastic (context-free) grammar is a four-tuple G s N T Ps S
, where N T and S are same as the
base context-free grammar (non-terminals, terminals and initial symbol), and P s is the set of productions, of
the form
Ai Xi j , pi j
such that Ai N, Xi j N
T
and ∑nj i" 1 pi j 1 for i 1 m., where ni denotes the
number of productions with left hand side as A i .
10
1. If x L G
is unambiguous and has a derivation S w 0 r1 w1 r2 # rk wk x, where r1 r2 rk
P, then the probability of x with respect to G s is given by
k
p x Gs
$ ∏ p ri
i" 1
This particular definition of probability does not ensure that the function p x G s
is a probability
distribution. This problem of grammar consistency is described more in [Fu, 1974].
2. If x L G
is ambiguous
and has l different derivation trees with corresponding probabilities p 1 x Gs
%
p2 x G
% pl x G
then the probability of x with respect to G s is given by
s s
l
p x Gs
$ ∑ pi x G s
i" 1
Thus, to get the probability of an unambiguous string x, we simply multiply the probabilities of the produc-
tions that are applied during its derivation. If there are multiple ways of deriving x, then for each derivation,
we compute the probability and sum them up.
Example: Consider the grammar G N T P S
where N S A , T a , P S aA S
aaA A aa A a with the probabilities being 0 5 0 5 0 3 0 7 . Then the probabilities of the strings
contained in L G
$ aa aaa aaaa are 0 35 0 5 0 15 .
Now let us consider the problem of classifying a given string x, into one of the n classes (taken from
[Bunke and Sanfeliu, 1990]). The classes are each
described by a stochastic grammar G si . Let the prior
probability
of the pattern classes be denoted by p G si
. Then, if the probability of the string be denoted by
p x
, we have by Bayes’ formula
p Gsi x
p x
& p x Gsi
p Gsi
where p x Gsi
is computed as described above. The expression p G si x
is called the posterior probability of
Gsi . Thus, if x can be generated by more than one grammar G sj , j 1 n, a meaningful classification would
be obtained by
argmaxGsj p Gsj x
'
11
3.3.1 Inside algorithm
The inside algorithm is a dynamic programming procedure for computing the probability of a string, and
the most likely parse of a particular string given a stochastic context-free grammar. The number of parse
trees for a particular string may be exponential in number and thus, computing all parse trees is not a good
idea. Instead, by following a bottom-up approach of parsing smaller strings and storing the results, the
computation of the probabilities and most likely parse can be done in polynomial time. Material in this
section has been taken from [Manning and Schutze, 2001].
We first describe some notations. The string that we are parsing is denoted by w w 0 w1 wn , and
a particular substring w p wq is denoted by w pq . The grammar, as usual is denoted by the symbol G
N T P S
, where the set of non-terminals
is indexed by j, as N j , with N 1 denoting the initial symbol. The
j
inside probability β j p q
( P w pq N pq G
is the probability of generating w pq w p w p 1 wq starting
j
from N pq , where the subscript is for denoting the positions in the string.
Now, if we want to compute the probability of a string w being generated by a stochastic grammar, we
perform the following dynamic programming procedure also known as the inside algorithm. We can assume
that the grammar is in Chomsky Normal Form, i.e., all productions are of the form A BC or A a where
A B and C are non-terminals and a is a terminal. Thus, in the present notation, the productions are of the
form N i N j N k or of the form N i wl .
1. Base Case: For each substring wk of length 1, compute the inside probabilities β j k k
of non-
terminal N j generating wk .
2. Induction: Now, for finding β j p q
where p ) q, we see all possibilities of splitting w pq into w pd and
wd 1 * q , and recursively compute the corresponding inside probabilities β p d
and β d 1 q
. Since
we are using a bottom-up approach, these probabilities will already be stored in our tables. Also, we
have to consider all possible productions that are applicable at N j for analyzing the possibilities of
splitting w pq . More formally, we have to perform
q+ 1
β j p q
∑ ∑ P Nj N r N s
βr p d
β s d 1 q
r* s d " p
By computing β1 0 n
, we can obtain the probability of string w being generated by G.
A similar technique could be used for finding the most likely parse of a string. In this, we have to store
two sets of information for each substring w pq and non-terminal N j . These are δ j p q
, which contains the
j
highest inside probability parse of a subtree N pq . The second is ψi p q
which stores the rule which led to
the most probable parse from non-terminal N i . Then a similar bottom-up algorithm could be followed :
1. Base Case: For each substring of length 1, compute δ i p p
P N i wp
.
2. Induction: For generating w pq , the highest probability parse is computed using
δi p q
, max1 - j * k- n* p- r . qP Ni N j N k
δ j p r
δk r 1 q
12
3.3.2 Parsing for noisy inputs
When the input stream is noisy, we do not consider the actual string, but a set of beliefs about the correctness
of a particular symbol at a position. Material in this section has
been taken
from [Bunke and Sanfeliu, 1990].
Thus we consider a sequence of vectors c 1 x1
c1 x2
% c1 xn
1 .... cm x1
cm x2
% cm xn
1 , where ci x j
is a measure of certainty that the symbol x j T is correct at position i. Thus, for a particular input string
x ) xi1 xi2 xim 2 , we define the uncertainty by
m
c x
3 ∑ cj xi j
j" 1
The certainties will be defined on the basis of the actual input string observed as well as the belief on the
particular string positions. The parsing procedure will then determine the string x which is compatible
with the given grammar and has maximum certainty. A fast and simple algorithm for solving this for a
regular grammar was solved in [Bunke et al., 1984]. Its extension to context-free grammars was given in
[Bunke and Pasche, 1988].
13
not the same. Consider for example, the context-free grammar denoted by the rules S SS, S a. Now,
the languages accepted by the corresponding grammars are
L G
$ an n 1
L G6
a2n n 1
In fact, the language that is accepted by L G 6
is not context-free even though the underlying grammar
is context-free. In the case of regular grammars, an intermediate string always consists of a single non-
terminal and hence, there is no difference between G 6 and G.
One problem with parallel grammars is that for a rule α β, instances of α are allowed to overlap.
So if there exists a length decreasing rule XXX ZZ, then starting from X m , a parallel grammar derives
/
Z 2 m + 2 0 . A more important problem is that parsing and generation are no longer inverses of each other. For
instance, applying XX ZZ to XXX in parallel yields ZZZZ, but applying ZZ XX to ZZZZ in parallel
yields XXXXXX. Thus parsing is an issue in parallel grammars.
Parallel grammars have been widely applied in computer graphics applications in the form of L-systems.
We discuss some of the applications of L-systems in the next section.
4.1.1 L-systems
First introduced as a mathematical theory to explain growth and development of multicellular organisms
by Aristid Lindenmayer, L-systems have since then made a big impact in areas of simulation and model-
ing. These systems were developed by [Prusinkiewicz and Lindenmayer, 1990] into a system which could
express plant models. However, now L-systems are being used for various other applications in computer
graphics.
L-systems are formal grammars which can generate strings. These grammars are in general context-
sensitive parallel grammars. The rules that generate these strings and the symbols themselves have partic-
ular semantic and visual interpretations based on the problem being modeled. The string of symbols has
traditionally been interpreted as a sequence of commands given to a turtle. The type of motion exhibited
by the turtle, then describes the end result. Application of rules in L-systems occurs in a parallel fashion,
thereby resembling natural sub-division processes in cellular organisms.
Applications of L-systems in computer graphics includes modeling of plants [Kurth, 1994]. In this work,
parameterized L-systems were used not only for simulating differential plant growth but also for modeling
interactions with external environment and other plant structures. Another important application was in the
artificial generation of random cities [Parish and Muller, 2001]. In this work, open L-systems were used
for modeling street maps and building structures, by generating templates and then interacting them with
external functions.
14
.
15
sequences of symbols.
Definitions here are taken from [Bunke and Sanfeliu, 1990]. The entire string is modeled as a function
from the set of (integer-coordinate) lattice points to the set of terminal symbols and the blank symbol (#).
An example of an array string is shown in the figure. The
notion of connectedness in these strings is defined
on the basis of their coordinates. Two points i j
and h k
are said to be neighbors (and hence connected),
if
i h <= j k % 1
An array Σ is called connected, if for all non-# symbols A B in Σ, there exist a sequence A A 0 A1 An
B of non-# symbols in Σ (a “path”), such that A i is a neighbor of Ai + 1 . Definition of an array grammar keeps
all the non-# symbols connected. If this restriction is not imposed, then the distance between disconnected
components may become arbitrarily large, and thus modeling the resultant string using local computations
will not be feasible. Thus, there are some restrictions on the way production rules are defined. More particu-
larly, the rules follow the structure of an isometric grammar. In this grammar, for any rule α β, the “size”
of α and β must be the same. This looks like a big restriction, since then the strings generated will always
be of finite length (of the start configuration), and thus the number of strings will always be finite. However,
the language of an isometric string grammar G is defined to be the set of terminal strings τ such that the
infinite string #∞ τ#∞ can be derived in G from the infinite initial string # ∞ S#∞ . Hence, by incorporating rules
containing the * symbol, arbitrary length strings could be generated.
Thus, in an array grammar, one starts with an infinite two-dimensional array consisting of #’s with the
initial symbol located at one place. Each rule of the grammar, replaces one array with another. The rules
satisfy the isometry constraint since otherwise, it is not clear how one can replace a bigger array, with a
smaller array and keep the resulting embedding connected. Similar problems arise when we have a smaller
array replacing a bigger array in a production.
16
Figure 6: Example of a graph grammar and a graph generated from this
grammar[Bunke and Sanfeliu, 1990]
These grammars defined graphs which through some re-numbering procedures were directly translated into
string grammars. Thus application of string parsing techniques were easily applicable in this case.
An important special case of graph grammars are tree grammars. Here in the rules α β, α and β
are labeled sub-trees. We apply the rule to a tree σ by replacing some occurrence of α, as a subtree of σ,
by β. In this case, the connections to be made are not ambiguous since β is simply attached to the parent
of α. Examples of graph and tree grammars are depicted in the figure. Tree grammars are more tractable
than graph grammars since the replacement rule substitutes entire subtrees. As a result, parsing of such
grammars becomes simpler. Efficient tree parsers based on tree automaton exist in the literature. A very
good discussion on matching tree structures is available in Chap 6 of [Bunke and Sanfeliu, 1990].
17
Figure 7: Example of a tree grammar and a tree generated from this grammar
Rule No U W
(2) (6) None
(3) (7) None
(4) (8) None
(5) (9) None
(6) or (7) (2), (3), (4) or (5) None
(8) or (9) None None
The language generated by this grammar is σσ σ a b
%? . No context-free language can generate
this set. Thus, using programmed grammars, one can make use of the simplicity of the language, while also
using it to model more complicated features. These languages have been used for character recognition
tasks [K.H. Lee and Kashyap, 1988]. In this particular work, attribute dependent regular programmable
languages were used to build a model for Hangul, the Korean alphabet scheme.
18
Figure 8: Layout of the historic Mughal Gardens
Rule 0
Rule 0
Rule 1 Rule 1
Rule 2
Rule 2
Figure 9: Example of a shape grammar and a shape generated from this grammar
19
Rule 1
Rule 2
Rule 3
Rule 4
Figure 10: An example of Chinese lattice patterns and the four rules of the ice-ray grammar
[Koning and Eizenberg, 1981] defined a parametric shape grammar that could generate the compositional
forms and specify the functional zones of Frank Lloyd Wright’s prairie-style houses. Application of shape
grammars in product design was demonstrated by [Agarwal and Cagan, 1998]. Using labeled shape gram-
mars, connections between different parts of the product and their functional implications were considered.
5 Conclusions
In this paper, we have tried to cover most of the important grammar formalisms that have been used in
the computer vision and pattern recognition community. If we see the series of developments in the usage
of grammars, we do not see a very clear trend. One plausible explanation is, sufficient developments in
theoretical aspects of grammatical techniques which might prove suitable for pattern analysis tasks did not
take place in parallel. Most of the techniques described were studied in the 70s. Thus in the application
domain, these concepts were reused with extensions suitable to the problem in hand.
As aptly pointed out in [Tombre, 1996], syntactic methods lack generality. An approach used for one
application is not very adaptable to other applications. The inability to handle the primitives and the semantic
part of grammatical methods in an independent modular way, further led to application specific algorithms
for inference and generation. A clear theory of how a grammar could be learnt from real world data, also
exists in a very abstract sense confined mainly to the domain of string and graph grammars. However, the
domains where different types of grammars are applied are too varied for such a unified theory to exist.
Thus we conclude that grammatical methods are useful when the problem under consideration has pat-
terns and relationships among sub-patterns. However, recognizing the existence of patterns does not guar-
antee a direct syntactic solution since the grammatical methods that exist as of today are not generic enough.
The robustness and the feasibility of the solution would depend a lot on the actual problem.
20
References
[Agarwal and Cagan, 1998] Agarwal, M. and Cagan, J. (1998). A blend of different tastes: The language
of coffee makers. Environment and Planning B: Planning and Design, 25(2):205–226.
[Aho et al., 1988] Aho, A., Sethi, R., and Ullman, J. (1988). Compilers, principles, techniques and tools.
Addison-Wesley, Reading, MA.
[Aho and Ullman, 1972] Aho, A. and Ullman, J. (1972). The Theory of Parsing, Translation and Compiling,
Vol. 1: Parsing. Prentice-Hall, Englewood Cliffs, NJ.
[Angluin and Smith, 1983] Angluin, D. and Smith, C. (1983). Introductory inference, theory and methods.
ACM Computing Surveys, 15:741–765.
[Brand, 1996] Brand, M. (1996). Understanding manipulation in video. In 2nd International Workshop on
Automatic Face and Gesture Recognition, pages 94–99.
[Brayer, 1977] Brayer, J. (1977). Parsing of web grammars. In IEEE Workshop on data description and
management, Long Beach, CA.
[Brayer and Fu, 1975] Brayer, J. and Fu, K. (1975). Web grammars and their application to pattern recog-
nition. Technical Report TR-EE 75-1, Purdue University.
[Bunke et al., 1984] Bunke, H., Grebner, K., and Sagarer, G. (1984). Syntactic analysis of noisy input
strings with an application to the analysis of heart-volume curves. Proc. 7th ICPR, pages 1145–1147.
[Bunke and Pasche, 1988] Bunke, H. and Pasche, D. (1988). A new syntactic parsing method and its ap-
plication to the tracking of noisy contours in images. Signal Processing IV : Theories and Applications,
pages 1201–1204.
[Bunke and Sanfeliu, 1990] Bunke, H. and Sanfeliu, A. (1990). Syntactic and Structural Pattern Recogni-
tion : Theory and Applications. World Scientific.
[Collin and Colnet, 1994] Collin, S. and Colnet, D. (1994). Syntactic analysis of technical drawing dimen-
sions. International Journal of Pattern Recognition and Artificial Intelligence, 8:1131–1148.
[Dori and Pnueli, 1988] Dori, D. and Pnueli, A. (1988). The grammar of dimensions in machine drawings.
CVGIP:Graphical Models and Image Processing, 42:1–18.
[Fan and Fu, 1979] Fan, T. and Fu, K. (1979). A syntactic approach to time-varying image analysis.
CVGIP:Graphical Models and Image Processing, 11:138–149.
[Flasinski, 1989] Flasinski, M. (1989). Characteristics of ednlc-graph grammar for syntactic pattern recog-
nition. CVGIP:Graphical Models and Image Processing, 47:1–21.
[Fu, 1974] Fu, K. (1974). Syntactic Methods in Pattern Recognition. Academic Press, New York and
London.
21
[Fu and Fan, 1986] Fu, K. and Fan, T. (1986). Tree translation and its application to time-varying image
analysis problem. IEEE Trans. SMC, 12:856–867.
[Fu and Shi, 1983] Fu, K. and Shi, Q. (1983). Parsing and translation of attributed expensive graph lan-
guages for scene analysis. IEEE Transactions on PAMI, 5:472–485.
[Ivanov and Bobick, 2000] Ivanov, Y. and Bobick, A. (2000). Recognition of visual activities and interac-
tions by stochastic parsing. PAMI, 22(8):852–872.
[J. Yamato and Ishii, 1992] J. Yamato, J. O. and Ishii, K. (1992). Recognizing human action in time-
sequential images using hidden markov model. In IEEE Conf. on Computer Vision and Pattern Recogni-
tion (CVPR), pages 379–385.
[K.H. Lee and Kashyap, 1988] K.H. Lee, K. E. and Kashyap, R. (1988). Character recognition using at-
tributed grammar. IEEE.
[Kirsch, 1964] Kirsch, R. (1964). Computer interpretation of english text and picture patterns. Trans. IEEE
EC, 13:363–376.
[Knuth, 1968] Knuth, D. (1968). Semantics of context-free languages. Math. Sys. Theory, 2:127–146.
[Koning and Eizenberg, 1981] Koning, H. and Eizenberg, J. (1981). The language of the prairie: Frank
Lloyd Wright’s prairie houses. Environment and Planning B, 8:295–323.
[Kurth, 1994] Kurth, W. (1994). Growth grammar interpreter grogra 2.4: A software tool for the 3-
dimensional interpretation of stochastic, sensitive growth grammars in the context of plant modelling.
introduction and reference manual. Berichte des Forschungszentrums Waldökosysteme der Universität
Göttingen, Ser. B, 38.
[Lin and Fu, 1984] Lin, W. and Fu, K. (1984). A syntactic approach to 3d object representation. IEEE
Trans. PAMI, 6:351–364.
[Lu and Fu, 1979] Lu, S. and Fu, K. (1979). Stochastic tree grammar inference for texture synthesis and
discrimination. CGIP, 9:234–245.
[Manning and Schutze, 2001] Manning, C. and Schutze, H. (2001). Foundations of Statistical Natural Lan-
guage Processing. MIT Press.
[Masini and Mohr, 1983] Masini, G. and Mohr, R. (1983). Mirabelle, a system for structural analysis of
line drawings. Pattern Recognition, 16:363–372.
[Miller and Viola, 1998] Miller, E. and Viola, P. (1998). Ambiguity and constraint in mathematical expres-
sion recognition. In AAAI Nat. Conf. on Artificial Intelligence.
[Miller, 1974] Miller, P. (1974). A locally organized parser for spoken input. Comm. ACM, 17:621–630.
22
[Miller and Shaw, 1968] Miller, W. and Shaw, A. (1968). Linguistic methods in picture processing - a
survey. In Proc. AFIPS 1968 Fall Joint Computer Conference, volume 33, pages 279–290.
[Minnen et al., 2003] Minnen, D., Essa, I., and Starner, T. (2003). Expectation grammars: Leveraging high-
level expectations for activity recognition. In IEEE Conf. on Computer Vision and Pattern Recognition
(CVPR).
[Moayer and Fu, 1975] Moayer, B. and Fu, K. (1975). A tree system approach for fingerprint pattern recog-
nition. IEEE Trans. on Computers, 25.
[Ouriachi, 1980] Ouriachi, K. (1980). Processus de reconnaissance des formes applicable a un assemblage
automatique. These de 3eme cycle, Universite des Sciences et Techniques de Lille.
[Parish and Muller, 2001] Parish, Y. and Muller, P. (2001). Procedural modeling of cities. SIGGRAPH,
pages 301–308.
[Pavlidis, 1972] Pavlidis, T. (1972). Grammatical and graph theoretical analysis of pictures. Graphic Lan-
guages.
[Prusinkiewicz and Lindenmayer, 1990] Prusinkiewicz, P. and Lindenmayer, A. (1990). The Algorithmic
Beauty of Plants. Springer-Verlag.
[Rabiner and Juang, 1986] Rabiner, L. and Juang, B. (1986). An introduction to hidden markov models. In
IEEE ASSP Magazine.
[Sainz and Sanfeliu, 1996] Sainz, M. and Sanfeliu, A. (1996). Learning bidimensional context dependent
models using a context-sensitive language. In 13th International Conference on Pattern Recognition,
Viena, Austria.
[Stiny, 1975] Stiny, G. (1975). Pictorial and Formal Aspects of Shape and Shape Grammars. Basel:
Birkhauser.
[Stiny, 1977] Stiny, G. (1977). Ice-ray: A note on the generation of chinese lattice designs. Environment
and Planning B: Planning and Design, 4:89–98.
[Thomason, 1975] Thomason, M. (1975). Stochastic sdts for correction of errors in context-free languages.
IEEE Trans. Comp., 24:1211–1216.
[Tombre, 1996] Tombre, K. (1996). Structural and syntactic methods in line drawing analysis : To what
extent do they work ? In SSPR, pages 310–321.
[Ullmann, 1976] Ullmann, J. (1976). An algorithm for subgraph isomorphism. Journal of the ACM, 23:31–
42.
23