Pattern
Recognition
Perceptron (“single-layer perceptron”)
• simple NN for
linearly separable
patterns only
error correcting learning
y(n) = sgn w (n)x(n)
T
w(n + 1) = w(n) + d (n) − y(n)x(n)
if x(n) C1
+ 1 ak
d ( n) = {
if x(n) C2
− 1 ak
AIDP, M.Oravec, ICSM FEI STU
Linear and nonlinear separability
2 regions separated by hyperplane
p
w x − = 0
i =1
i i
Class C2
hyperplane
Class C1
Separating boundary
Linear separability – 2D inputs, 2 classes
AIDP, M.Oravec, ICSM FEI STU
Linear and nonlinear separability
Class C1
Class C1
Class C2
Class C2
Linearly separable patterns Nonlinearly separable patterns
(decision surfaces are hyperplanes)
AIDP, M.Oravec, ICSM FEI STU
linear and nonlinear separability
• Minsky a Papert 1969
• Perceptron is not able
to separate even XOR
(exclusive or) logical A B A XOR B
function! 0 0 0
0 1 1
• NN were dead… 1 0 1
1 1 0
• 1986: PDP group, MLP,
backpropagation
algorithm
AIDP, M.Oravec, ICSM FEI STU
• Good illustration of an ANN for data
classification
• http://playground.tensorflow.org
AIDP, M.Oravec, ICSM FEI STU
Typical application areas
• Machine vision
• Character recognition (OCR)
• Computer aided diagnosis
• Speech recognition
• Face recognition
• Biometrics
Pattern • Image Data Base retrieval
• Data mining
Recognition) • Bioinformatics
The task:
• Assign unknown objects – patterns –
into the correct class. This is known as
classification.
AIDP, M.Oravec, ICSM FEI STU
o Features: These are measurable quantities obtained from
the patterns, and the classification task is based on their
respective values.
o Feature vectors: A number of features
𝑥1 , . . . , 𝑥𝑝 ,
constitute the feature vector
𝑇
𝑥 = 𝑥1 , . . . , 𝑥𝑝 ∈ 𝑅𝑝
Feature vectors are treated as random vectors.
AIDP, M.Oravec, ICSM FEI STU
An example: medical image
classification
a graph of the mean
value of the intensities
of objects from their
standard deviation
AIDP, M.Oravec, ICSM FEI STU
class A Mean and
standard
class B deviation are
features
• To which class does a new image (*) belongs? With greater
probability to A
• classifier - with its decision boundary it divides the space of flags
into areas of classes A, B
• Here it is a linear decision boundary (straight line) obtained from
training patterns
AIDP, M.Oravec, ICSM FEI STU
• The classifier consists of a set of functions, whose values,
computed at x , determine the class to which the corresponding
pattern belongs
• Classification system overview
Patterns
sensor
feature
generation
feature
selection
classifier
design
system
evaluation
AIDP, M.Oravec, ICSM FEI STU
Watanabe defines a pattern as
opposite of a chaos; it is an entity,
vaguely defined, that could be given a
name.
• For example, a pattern could be a fingerprint
Pattern, image, a handwritten cursive word, a human
face, or a speech signal.
pattern Given a pattern, its
recognition/classification may consist
recognition of one of the following two tasks:
• 1) supervised classification (e.g., discriminant
analysis) in which the input pattern is identified
as a member of a predefined class,
• 2) unsupervised classification (e.g., clustering)
in which the pattern is assigned to a hitherto
unknown class.
AIDP, M.Oravec, ICSM FEI STU
The design of a pattern recognition system
essentially involves the following three
aspects:
1) data
2) data 3) decision
Design of a acquisition and
preprocessing,
representation, making.
pattern
recognition
system Learning from a set of examples (training
set) is an important and desired attribute
of most pattern recognition systems.
AIDP, M.Oravec, ICSM FEI STU
• The four best known approaches for
pattern recognition are:
1) template matching,
Approaches 2) statistical classification,
for pattern 3) syntactic or structural matching,
4) neural networks.
recognition These models are not
necessarily independent and
sometimes the same pattern
recognition method exists with
different interpretations.
Attempts have been made to
design hybrid systems.
AIDP, M.Oravec, ICSM FEI STU
1) Template Matching
Matching is a generic operation in pattern recognition which is used to determine
the similarity between two entities (points, curves, or shapes) of the same type.
In template matching, a template (typically, a 2D shape) or a prototype of the
pattern to be recognized is available.
• The pattern to be recognized is matched against the stored template while taking into account all
allowable pose (translation and rotation) and scale changes.
• The similarity measure, often a correlation, may be optimized based on the available training set.
Often, the template itself is learned from the training set.
Disadvantages
• if the patterns are distorted due to the imaging process, viewpoint change, or large intraclass
variations among the patterns.
AIDP, M.Oravec, ICSM FEI STU
2) Statistical Classification
Each pattern is represented in terms of p features or measurements and is viewed as
a point in a p-dimensional space.
The goal is to choose those features that allow pattern vectors belonging to different
categories to occupy compact and disjoint regions in a p-dimensional feature space.
Given a set of training patterns from each class, the objective is to establish decision
boundaries in the feature space which separate patterns belonging to different
classes.
• In the statistical decision theoretic approach, the decision boundaries are determined by the probability
distributions of the patterns belonging to each class, which must either be specified or learned
• One can also take a discriminant analysis-based approach to classification: First a parametric form of the
decision boundary (e.g., linear or quadratic) is specified; then the best decision boundary of the specified
form is found based on the classification of training patterns.
AIDP, M.Oravec, ICSM FEI STU
3) Syntactic (Structural) Matching
• hierarchical perspective where a pattern is viewed as being composed of simple subpatterns which
are themselves built from yet simpler subpatterns
• The simplest/elementary subpatterns to be recognized are called primitives and the given complex
pattern is represented in terms of the interrelationships between these primitives.
• In syntactic pattern recognition, a formal analogy is drawn between the structure of patterns and
the syntax of a language.
– The patterns are viewed as sentences belonging to a language, primitives are viewed as the
alphabet of the language, and the sentences are generated according to a grammar. Thus, a
large collection of complex patterns can be described by a small number of primitives and
grammatical rules. The grammar for each pattern class must be inferred from the available
training samples.
– Suitable, if the patterns have a definite structure which can be captured in terms of a set of
rules,
• such as EKG waveforms, textured images, and shape analysis of contours
– Difficulties which primarily have to do with the segmentation of noisy patterns (to detect the
primitives) and the inference of the grammar from training data.
AIDP, M.Oravec, ICSM FEI STU
4) Neural Networks
• massive parallel computing systems
• huge number of simple processors (neurons)
• huge number of connections
• learning, generalization, adaptability, fault tolerance, distributed representation
• artificial neuron
• the ability to learn the complex relationships between inputs and outputs
• nonlinearity
• simple learning algorithms
• nonlinear algorithms for
– Feature extraction
– classification
• good hardware implementation
• boom of deep networks
AIDP, M.Oravec, ICSM FEI STU
Pattern Recognition Models
AIDP, M.Oravec, ICSM FEI STU
STATISTICAL PATTERN
RECOGNITION
AIDP, M.Oravec, ICSM FEI STU
Statistical pattern recognition
• input pattern: p-dimensional input vector
• pattern represented by m features (m-dimensional feature
vector)
• determination of decision boundaries uses known methods
of statistical decision theory
• Two modes of operation of the pattern recognition system:
– training (learning)
– classification (testing))
AIDP, M.Oravec, ICSM FEI STU
Model of statistical pattern recognition
• preprocessing
• dimensionality reduction
– „the curse of dimensionality“ - number of trainings points is exp. function of
the feature dimension
– „peaking phenomenon" - for a fixed training. the set of features can degrade
the activity of the classifier (the number of training points is small due to the
number of features)
• feature extraction
• feature selection
• classification
AIDP, M.Oravec, ICSM FEI STU
Pattern recognition systems
• One-stage system, direct classification
x1
x2
input classifier decision
...
xp
• Two-stage system
x1 y1
x2 feature
input decision
...
classifier
...
extraction
xp ym
AIDP, M.Oravec, ICSM FEI STU
Pattern recognition systems
• One-stage system, direct classification
Pattern Preproces class
Classifica-
dim=p sing dim=p tion
Many methods from 1st
part of semester ☺
• Two-stage system (feature extraction + classification)
Pattern Class
Preproces Feature Feature Classifica-
dim=p sing extraction selection tion
dim=p dim=p dim=m
m<p
Many methods from 1st
part of semester ☺ AIDP, M.Oravec, ICSM FEI STU
FEATURE EXTRACTION
AIDP, M.Oravec, ICSM FEI STU
Feature extraction
• feature extraction
• feature – the most important property of data
Data Feature
space space
transformation
dim: p dim: m=p
dim: m<p
decorrelation
dimensionality reduction
AIDP, M.Oravec, ICSM FEI STU
❖Principal Components Analysis
(The Karhunen – Loève transform):
➢ The goal: Given an original set of m measurements x
m
compute y
y = AT x
for an orthogonal A, so that the elements of y are optimally
mutually uncorrelated.
That is
E y (i ) y ( j ) = 0, i j
➢ Sketch of the proof:
Ry = E y y = E A x x A = AT Rx A
T T T
AIDP, M.Oravec, ICSM FEI STU
• If A is chosen so that its columns a i are the orthogonal
eigenvectors of Rx, then
R y = AT Rx A =
where Λ is diagonal with elements the respective
eigenvalues λi.
• Observe that this is a sufficient condition but not
necessary. It imposes a specific orthogonal structure on
A.
➢ Properties of the solution
• Mean Square Error approximation.
Due to the orthogonality of A:
m
x = y (i )a i , y (i ) = a i x
T
i =0
AIDP, M.Oravec, ICSM FEI STU
− Define −1
xˆ = y (i )a i
i =0
− The Karhunen – Loève transform minimizes the
square error:
m
2
E x − xˆ = E y (i )a i
2
i =
− The error is:
E x − xˆ
2
=
m
i =
i
It can be also shown that this is the minimum mean
square error compared to any other representation of x
by an ℓ-dimensional vector.
AIDP, M.Oravec, ICSM FEI STU
− In other words, x̂ is the projection of x into the
subspace spanned by the principal ℓ eigenvectors.
However, for Pattern Recognition this is not the
always the best solution.
AIDP, M.Oravec, ICSM FEI STU
Example-computation of PCA (KLT)
AIDP, M.Oravec, ICSM FEI STU
PCA (KLT)
• Lindsay I Smith: A tutorial on Principal Components Analysis, 2002, http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf
AIDP, M.Oravec, ICSM FEI STU
CLASSIFICATION
AIDP, M.Oravec, ICSM FEI STU
Statistical pattern recognition – decision making
process
• classifier proposal - classification according to what
information we have about class-conditional densities
– if all densities are known (but this is rare)
• Bayes' optimal rule
– if the densities are known, but some of their parameters are
not known
• parametric decision problem - unknown parameters are estimated
– if the densities are not known
• nonparametric problem
• either probability densities are estimated (eg Parzen window)
• or direct construction of the decision boundary from the training data
• e.g. k-nearest neighbor rule
• MLP is also a nonparametric method for constructing a decision boundary
AIDP, M.Oravec, ICSM FEI STU
Various approaches to statistical pattern recognition
AIDP, M.Oravec, ICSM FEI STU
Frequently
used
classifiers
AIDP, M.Oravec, ICSM FEI STU
Illustration of decision boundaries
• Data:
AIDP, M.Oravec, ICSM FEI STU
Illustration of decision boundaries
AIDP, M.Oravec, ICSM FEI STU
Statistical pattern recognition – decision making
process
• classifier proposal - classification according to what
information we have about class-conditional densities
– if all densities are known (but this is rare)
• Bayes' optimal rule
– if the densities are known, but some of their parameters are
not known
• parametric decision problem - unknown parameters are estimated
– if the densities are not known
• nonparametric problem
• either probability densities are estimated (eg Parzen window)
• or direct construction of the decision boundary from the training data
• e.g. k-nearest neighbor rule
• MLP is also a nonparametric method for constructing a decision boundary
AIDP, M.Oravec, ICSM FEI STU
K-NN RULE
k-Nearest Neighbour Rule
AIDP, M.Oravec, ICSM FEI STU
❖ The Nearest Neighbor Rule
➢ Choose k out of the N training vectors, identify the k
nearest ones to x
➢ Out of these k identify ki that belong to class ωi
➢ Assign x → i : ki k j i j
➢ The simplest version
k=1 !!!
➢ For large N this is not bad. It can be shown that:
if PB is the optimal Bayesian error probability, then:
M
PB PNN PB (2 − PB ) 2 PB
M −1
AIDP, M.Oravec, ICSM FEI STU
PB - probability of error of optimal
Bayes classifier
2 PNN
➢ PB PkNN PB +
k
➢ k → , PkNN → PB
➢ For small PB:
PNN 2 PB
P3 NN PB + 3( PB ) 2
AIDP, M.Oravec, ICSM FEI STU
❖ Voronoi tesselation
Ri = x : d ( x, x i ) d ( x, x j ) i j
AIDP, M.Oravec, ICSM FEI STU
Voronoi tessellation
◼ 2D case ◼ 3D case
AIDP, M.Oravec, ICSM FEI STU
kNN
◼ nice presentation:
http://courses.cs.tamu.edu/rgutier/cs790_w02/l8.pdf
AIDP, M.Oravec, ICSM FEI STU
kNN
http://courses.cs.tamu.edu/rgutier/cs790_w02/l8.pdf
AIDP, M.Oravec, ICSM FEI STU
kNN
http://courses.cs.tamu.edu/rgutier/cs790_w02/l8.pdf
AIDP, M.Oravec, ICSM FEI STU
kNN
http://courses.cs.tamu.edu/rgutier/cs790_w02/l8.pdf
AIDP, M.Oravec, ICSM FEI STU
Other nice solutions:
◼ ensemble learning
◼ SVM (support vector machine)
◼ neural networks
AIDP, M.Oravec, ICSM FEI STU
Other nice solutions: ensemble learning
AIDP, M.Oravec, ICSM FEI STU
Other nice solutions: ensemble learning
AIDP, M.Oravec, ICSM FEI STU
Other nice solution: SVM
◼ SVM
❑ https://www.youtube.com/watch?v=9NrALgHFwTo
❑ https://www.youtube.com/watch?v=ndNE8he7Nnk
AIDP, M.Oravec, ICSM FEI STU
Other nice solutions: NN
◼ Once again: good illustration of an ANN for data
classification ☺
❑ http://playground.tensorflow.org
AIDP, M.Oravec, ICSM FEI STU