Chapter 4
Pattern Recognition as a Machine Learning Application
In IT, pattern recognition is a branch of machine learning that emphasizes the
recognition of data patterns or data regularities in a given scenario. It is a subdivision of
machine learning and it should not be confused with actual machine learning study. Pattern
recognition can be either “supervised,” where previously known patterns can be found in a
given data, or “unsupervised,” where entirely new patterns are discovered.
Examples:
1. Identifying a handwritten character, CAPTCHAs; discriminating humans from
computers
2. Detecting text or face regions in images
Pattern Class: It is a category determined by some given common attributes.
Pattern: It is the description of any member of a category representing a pattern class.
When a set of patterns falling into disjoint classes is available, it is desired to categorize
these patterns into their respective classes through the use of some automatic device.
The basic functions of a pattern recognition system are to detect and extract
common features from the patterns describing the objects that belong to the same pattern
class, and to recognize this pattern in any new environment and classify it as a member of
one of the pattern classes under consideration.
Example: Classifying Salmon and Sea Bass
Preprocessing: User to reduce data complexity and/or variation, and applied before
feature extraction to permit/simplify feature computations; sometimes involves other
PR algorithms (e.g. segmentation)
Feature Selection: Choosing from available features those to be used in our
classification model. Ideally, these:
• Discriminate well between classes
• Are simple and efficient to compute
Feature Extraction: Computing features for inputs at run-time
Unsupervised Classification / Clustering Algorithm:
Classification procedures that use labeled samples to train the classifier are said
to be supervised.
Sometimes we do not have the training data. Classification procedures which use
only unlabeled samples are said to be unsupervised.
An unsupervised procedure must use the unlabeled input data to estimate the
parameter values for the classification problem at the hand and also to classify
the data.
K – Means Clustering Algorithm:
o Partitional clustering approach
o Each cluster is associated with a centroid (center point)
o Each point is assigned to the cluster with the closest centroid
o Number of clusters K must be specified
o Initial centroids are often chosen randomly. - Clusters produced vary from
one run to another
o The centroid is (typically) the mean of the points in the cluster.
o ‘Closeness’ is measured by Euclidean distance, cosine similarity,
correlation, etc.
o K-means will converge for common similarity measures mentioned above.
o Most of the convergence happens in the first few iterations.
- Often the stopping condition is changed to ‘Until relatively few
points change clusters’
Algorithm:
Euclidean Distance:
A simple example: Find the distance between two points, the original and the point (3,4):
Update Centroid: We use the following equation to calculate the n dimensional
centroid point amid k n-dimensional points
Example: Find the centroid of 3 2D points, (2,4), (5,2) and (8,9)
Select three initial centroids:
Assigning the points to nearest K clusters and re-compute the centroids:
K-means terminates since the centroids converge to certain points and do not
change:
Evaluating K-means Clusters:
Most common measure is Sum of Squared Error (SSE)
For each point, the error is the distance to the nearest cluster
To get SSE, we square these errors and sum them.
‘x’ is a data point in cluster Ci and mi is the representative point for cluster Ci
- can show that mi corresponds to the center (mean) of the cluster
Given two clusters we can choose the one with the smallest error
One easy way to reduce SSE is to increase K, the number of clusters
-A good clustering with smaller K can have a lower SSE than a poor clustering with
higher K.