0% found this document useful (0 votes)

15 views75 pages

Intro

Uploaded by

stcase

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views75 pages

Intro

Uploaded by

stcase

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 75

Pattern Recognition

Artificial Neural Networks,

and Machine Learning
Yuan-Fang Wang
Department of Computer Science
University of California
Santa Barbara, CA 93106, USA
“Pattern Recognition”
What is a Pattern?

PR , ANN, & ML 2
PR , ANN, & ML 3
PR , ANN, & ML 4
PR , ANN, & ML 5
 DNA patterns  Protein Patterns
 AGCTCGAT  20 amino acids

PR , ANN, & ML 6
PR , ANN, & ML 7
PR , ANN, & ML 8
PR , ANN, & ML 9
Faces

Finger prints

PR , ANN, & ML 10
Other Patterns
 Insurance, credit card applications
 applicants are characterized by a pattern
# of accidents, make of car, year of model
 income, # of dependents, credit worthiness,
mortgage amount
 Dating services
 Age, hobbies, income, etc. establish your
“desirability”

PR , ANN, & ML 11
Other Patterns
 Web documents
 Key words based description (e.g., documents
containing War, Bagdad, Hussen are different
from those containing football, NFL, AFL,
draft, quarterbacks)
 Intrusion detection
 Usage and connection patterns
 Cancer detection
 Image features for tumors, patient age,
treatment option, etc.

PR , ANN, & ML 12
Other Patterns
 Housing market
 Location, size, year, school district
 University ranking
 Student population, student-faculty ratio,
scholarship opportunities, location, faculty research
grants, etc.
 Too many
 E.g.,
http://www.ics.uci.edu/~mlearn/MLSummary.html

PR , ANN, & ML 13
What is a pattern?
 A pattern is a set of objects, processes or
events which consist of both deterministic
and stochastic components
 A pattern is a record of certain dynamic
processes influenced both by deterministic
and stochastic factors

PR , ANN, & ML 14
What is a Pattern? (cont.)

Constellation patterns,
texture patterns, EKG
patterns, etc.

Completely regular, Completely

deterministic random
(e.g., crystal structure) (e.g., white noise)

PR , ANN, & ML 15
What is Pattern Recognition?
 Classifies “patterns” into “classes”
 Patterns (x)
 have “measurements”, “traits”, or “features”
 Classes (  i)
 likelihood (a prior probabilityP( i ))
 class-conditional density p( x| i )
 Classifier (f(x) ->  i)
 An example
 four coin classes: penny, nickel, dime, and quarter
 measurements: weight, color, size, etc.
 Assign a coin to a class based on its size, weight, etc.
We use P to denote probability mass function (discrete) and
p to denote probability density function (continuous)
PR , ANN, & ML 16
An Example

Many visual inspection systems are like this:

Circuit board, fruit, OCR, etc.

PR , ANN, & ML 17
Another Example

PR , ANN, & ML 18
Features
 The intrinsic traits or characteristics that tell
one pattern (object) apart from another
 Features extraction and representation allows
 Focus on relevant, distinguishing parts of a pattern
 Data reduction and abstraction

PR , ANN, & ML 19
Detection vs. Description
 Detection: something  Description: what has
happened happened?
 Heard noise  Gun shot, talking,
 Saw something laughing, crying, etc.
interesting  Lines, corners,
 Non-flat signals textures
 Mouse, cat, dog, bike,
etc.

PR , ANN, & ML 20
Feature Selection
 More an art than a science
 Effectiveness criteria:

population

size

Size alone is not effective

PR , ANN, & ML 21
perimeter

compactness

Perimeter is not effective

Discrimination is accomplished by compactness alone

elongatedness

compactness

The two feature values are correlated, only one of them

is needed
PR , ANN, & ML 23
PR , ANN, & ML 24
Too simple Too complicated

PR , ANN, & ML 25
Optimal tradeoff between performance and generalization

PR , ANN, & ML 26
Importance of Features
 Cannot be over-stated
 We usually don’t know which to select,
what they represent, and how to tune them
(face, gait recognition, tumor detection, etc.)
 Classification and regression schemes are
mostly trying to make the best of whatever
features are available

PR , ANN, & ML 27
Features
 One is usually not descriptive (no silver
bullet)
 Many (shotgun approach) can actually hurt

 Many problems:
 Relevance
 Dimensionality
 Co-dpendency
 Time and space varying characteristics.
 Accuracy
 Uncertainty and error
 Missing values
PR , ANN, & ML 28
Feature Selection (cont.)
 Q: How to decide if a feature is effective?
 A: Through a training phase
 Trainingon typical samples and typical features
to discover
 Whether features are effective
 Whether there are any redundancy

 The typical cluster shape (e.g., Gaussian)

 Decision boundaries between samples

 Cluster centers of particular samples

 Etc.

PR , ANN, & ML 29
Classifiers
i if gi ( x )  g j ( x ) for all j  i
gi ( x )  P( i ) if no measurements are made
gi ( x )  P( i | x ) minimize misclassification rate
gi ( x )  R( i | x ) minimize associated risk

g1 g1 ( x )
g 2 (x)
x g2 max decision
g n (x)

PR , ANN, & ML 30
Traditional Pattern Recognition
 Parametric methods
 Based on class sample exhibiting a certain
parametric distribution (e.g. Gaussian)
 Learn the parameters through training

 Density methods
 Does not enforce a parametric form
 Learn the density function directly

 Decision boundary methods

 Learn the separation in the feature space

PR , ANN, & ML 31
Parametric Methods

I. population II. population

size size
III. population IV. population

size size
1 | x  x |2
1 
2 2
e
( 2 ) 
n/ 2 n

PR , ANN, & ML 32
Density Methods

PR , ANN, & ML 33
Feature space
d dimensional (d the number of features)
 populated with features from training samples

PR , ANN, & ML 34
Decision
fd
Boundary ?
Methods f2

• Decision surfaces • Cluster centers

fd fd

? ?
f2 f2

f1 f1

PR , ANN, & ML 35
PR , ANN, & ML 36
“Modern” vs “Traditional”
Pattern Recognition
 Hand-crafted features  Automatically learned
 Simple and low-level features
concatenation of  Hierarchical and
numbers or traits complex
 Syntactic  Semantic
 Feature detection and  Feature detection and
description are description are not
separate tasks from jointly optimized with
classifier design classifiers

PR , ANN, & ML 37
Traditional Features

PR , ANN, & ML 38
Modern Features

PR , ANN, & ML 39
Modern Features

PR , ANN, & ML 40
Modern Features

PR , ANN, & ML 41
Modern Features

PR , ANN, & ML 42
Modern Features

PR , ANN, & ML 43
“Modern” vs “Traditional”
Pattern Recognition

PR , ANN, & ML 44
Mathematical Foundation
 Does not matter what methods or
techniques you use, the underlying
mathematical principle is quite simple
 Bayesian theory is the foundation

PR , ANN, & ML 45
Review: Bayes Rule
 Forward (synthesis) route:
 From class to sample in a class
 Grammar rules to sentences
 Markov chain (or HMM) to pronunciation

 Texture rules (primitive + repetition) to textures

 Backward (analysis) route:

 From sample to class ID
A sentence parsed by a grammar
 A utterance is “congratulations” (not “constitution”)

 Brickwall vs. plaid shirt

PR , ANN, & ML 46
Review: Bayes Rule
 Backward is always harder
 Because the interpretation is not unique
 Presence of x has multiple possibilities

2
1 x n
3

PR , ANN, & ML 47
The simplest example
 Two classes: pennies and dimes
 No measurements

 Classification:
 based on the a prior probabilities
 Error rate:
1 if P(1 )  P( 2 )
2 if P(1 )  P( 2 )
1 or  2 otherwise

min( P(1 ), P( 2 )) 1 2

PR , ANN, & ML 48
A slightly more complicated example
 Two classes: pennies and dimes
 A measurement x is made (e.g. weight)

p( x|1 ) p( x |  2 )

weight
 P(1 )  P( 2 )
probability

p( x|1 ) P(1 )
p( x| 2 ) P( 2 )
weight
 p( x )  p( x )
probability
p(1| x ) p( 2 | x )

PR , ANN, & ML
weight 50
Why Both?
p( x | i ) & P( i ) ?

 In the day time, some animal runs in front of

you on the bike path, you know exactly what it
is (p(x|w) is sufficient)
 In the night time, some animal runs in front of
you on the bike path, you can hardly distinguish
the shape (p(x|w) is low for all cases, but you
know it is probably a squirrel, not a lion
because of p(w))

PR , ANN, & ML 51
Essence
 Turn a backward (analysis) problem into
several forward (synthesis) problem
 Or analysis-by-synthesis

 Whichever model has a highly likelihood of

synthesizing the outcome wins
 The formula is not mathematically provable

PR , ANN, & ML 52
Error rate
 Determined by
 The likelihood of a class
 The likelihood of measuring x in a class

min( P(1| x ), P( 2 | x )) or

1
min( p( x|1 ) P(1 ), p( x| 2 ) P( 2 ))
p( x )

PR , ANN, & ML 53
Error Rate (cont.)
 Bayes Decision Rule minimizes the average
error rate:

error   p(error | x ) p( x )dx

p ( error | x )   p ( | x )  1  p (
*
i ( x) | x)
 i  ( x )*

where
 ( x )  arg max p ( i | x )
*
i

PR , ANN, & ML 54
Various types of errors

PR , ANN, & ML 55
PR , ANN, & ML 56
Precision vs. Recall
 A very common measure used in PR and
MI community
 One goes up and the other HAS to go down

 A range of options (Receiver operating

characteristic curves)
 Area under the curve

as a goodness measure

PR , ANN, & ML 57
Various ways to measure error rates
 Training error
 Test error

 Empirical error

 Some under your control (training and test)

 Some not (empirical error)

 How: n-fold validation

 Why: Overfitting and underfitting problems

PR , ANN, & ML 58
An even more complicated example
 Two classes: pennies or dimes
 A measurement x is made
 Risk associated with making a wrong decision
 Based on the a posterior probabilities with
Bayesian risk
R(1| x )  11P(1| x )  12 P( 2 | x )
R( 2 | x )  21P(1| x )  22 P( 2 | x )
ij : the loss of action  i in state  j
R( i | x ): the conditional risk of action  i with x

PR , ANN, & ML 59
State1 State2

Mis-classification
Math
Observation

Decision1 Decision2

Mis-interpretation
Human factor

Observation

PR , ANN, & ML 60
State1 State2 Decision1 Decision2

Observation Observation

Decision1 Decision2 Decision1 Decision2

Incorrect decisions
Incur domain-specific cost

State1 State2

Observation
PR , ANN, & ML 61
An even more complicated example
p(x|pennies)P(pennies)
R(used as pennies | x) =
r(pennies used as pennies) * P(pennies | x) +

r(dimes used as pennies) * P(dimes | x)

p(x|dimes)P(dimes)
R(used as dimes | x) =
r(pennies used as dimes) * P(pennies | x) +

r(dimes used as dimes) * P(dimes | x)

 The risk associated with false negative is

much higher than that of false positive

PR , ANN, & ML 63
A more credible example
R(attack|battle field intelligence) =
r(attack,<50%)*P(<50%|intelligence) +
r(attack,>50%)*P(>50%|intelligence)
False positive
R(no attack|battle field intelligence)=
r(no attack, >50%)*P(>50%|intelligence) +
r(no attack, <50%)*P(<50%|intelligence)
False negative

PR , ANN, & ML 64
Baysian Risk
 Determined by
 likelihood of a class
 likelihood of measuring x in a class

 the risk of making a wrong action

 Classification
 Baysian risk should be minimized
min( R(1 | x), R( 2 | x))or
min( 11P( 1 | x)  12 P( 2 | x), 21P( 1 | x)  22 P( 2 | x)) or
R(1 | x)  R( 2 | x)   1
(21  11 ) P( 1 | x)  (12  22 ) P( 2 | x)

PR , ANN, & ML 65
Bayesian Risk (cont.)
 Again, decisions depend on
 likelihood of a class
 likelihood of observation of x in a class
 Modified by some positive risk factors
 Why?
 Because in the real world, it might not be the
misclassification rate that is important, it is the
action you assume

(21  11 ) P ( 1 | x)  (12  22 ) P ( 2 | x)

PR , ANN, & ML 66
Other generalizations
 Multiple classes n
 n classes  P( i )  1
i 1
 Multiple measurements
 X is a vector instead of a scalar
 Non-numeric measurements
 Actions vs. decisions
 Correlated vs. independent events
 speech signals and images
 Training allowed or not
 Time-varying behaviors

PR , ANN, & ML 67
Difficulties

 What features to use

 How many features (the curse of
dimensionality)
 The a prior probability P( i )

 The class-conditional density p( x| i )

 The a posterior probability P( i | x)

PR , ANN, & ML 68
Typical Approaches
 Supervised (with tagged samples x):
 parameters of a probability function (e.g. Gaussian
) p( x |  )  N (  ,  )
i i i
 density functions (w/o assuming any parametric forms)
 decision boundaries (classes are indeed separable)
 Unsupervised (w/o tagged samples x):
 minimum distance
 hierarchical clustering
 Reinforced (with hints)
 Right or wrong, but not correct answer
 Learning with a critic (not a teacher as in supervised)

PR , ANN, & ML 69
Pattern Recognition

Uncorrelated Events Correlated Events

Supervised Learning Un-supervised Clustering Hidden Markov Models

Parameter Density Decision Boundary Minimum Distance Hierarchical Clustering

PR , ANN, & ML 70
Applications
 DNA sequence
 Lie detectors
 Handwritten digits recognition
 Classification based on smell
 Web document classification and search engine
 Defect detection
 Texture classification
 Image database retrieval
 Face recognition
 etc.

PR , ANN, & ML 71
Other formulations
 We talked about 1/3 of the scenarios – that
of classification (discrete)
 Regression – continuous
 Extrapolation and interpolation
 Clustering
 Similarity

 Abnormality detection
 Concept drift (discovery), etc.

PR , ANN, & ML 72
Classification vs. Regression
 Classification  Regression
 Large vs. small hints  Large means large,
on category small means small
 Absolute values does  Absolute values matter
not matter as much  Fitting orders matter
(can actually hurt)
 Normalization is often
necessary
 Fitting order stays low

PR , ANN, & ML 73
Recent Development
 Data can be “massaged” Surprisingly,
massaging the data and use simple
classifiers is better than massaging the
classifiers and use simple data (for simple
problems & small data sets)
 Hard-to-visualize concept
 Transform data into higher dimensional space
(e.g., infinite dimensional) has a tendency to
separate data and increase error margin
 Concept of SVM and later kernel methods
74
More Recent Development
 Think about fitting linear data with a model
 Linear, quadratic, cubic, etc.

 Higher the order, better the fit

 n data points can be perfectly fit by an (n-1) order
polynomial
 However
 Overfitting is likely

 No ability to extrapolate

 “Massage” the classifiers (using deep networks)

 Feature detection and description
 Classification
 Jointly optimization

PR , ANN, & ML 75

Classifier Performance Insights
No ratings yet
Classifier Performance Insights
72 pages
CH 01
No ratings yet
CH 01
22 pages
Introduction To Pattern Recognition
No ratings yet
Introduction To Pattern Recognition
46 pages
Int and DF
No ratings yet
Int and DF
73 pages
PR Unit-1 Gem
No ratings yet
PR Unit-1 Gem
25 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
EE 583 Pattern Recognition: Overview, Basic Concepts Example Problems Different Approaches
No ratings yet
EE 583 Pattern Recognition: Overview, Basic Concepts Example Problems Different Approaches
7 pages
3 Pattern Recognition 1
No ratings yet
3 Pattern Recognition 1
25 pages
PRNN P S Sastry Lec 1
No ratings yet
PRNN P S Sastry Lec 1
177 pages
Tree
No ratings yet
Tree
40 pages
PR Some Solutions
No ratings yet
PR Some Solutions
26 pages
Spoken Dialog Systems and Voice XML
No ratings yet
Spoken Dialog Systems and Voice XML
94 pages
Pattern Classification
100% (1)
Pattern Classification
42 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
40 pages
Pattern Recognition
No ratings yet
Pattern Recognition
50 pages
Pattern Content
No ratings yet
Pattern Content
16 pages
Pattern L1 L6
No ratings yet
Pattern L1 L6
19 pages
Pattern Recognition: Dr. Farah Qais Al-Khalidi
No ratings yet
Pattern Recognition: Dr. Farah Qais Al-Khalidi
43 pages
ANN Unit 3
No ratings yet
ANN Unit 3
11 pages
Unit - V Pattern Recognition: Dr.K.Sampath Kumar Scse/Gu
No ratings yet
Unit - V Pattern Recognition: Dr.K.Sampath Kumar Scse/Gu
30 pages
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
100% (1)
To Pattern Recognition: CSE555, Fall 2021 Chapter 1, DHS
39 pages
1 Introduction
No ratings yet
1 Introduction
81 pages
Pattern Recognition: P.S.Sastry
No ratings yet
Pattern Recognition: P.S.Sastry
90 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
An Introduction To Pattern Recognition - 2
No ratings yet
An Introduction To Pattern Recognition - 2
46 pages
Pattern Recognition: Dr. Farah Qais Al-Khalidi
100% (1)
Pattern Recognition: Dr. Farah Qais Al-Khalidi
49 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
39 pages
Fundamentals of PR
No ratings yet
Fundamentals of PR
44 pages
Pattern Recognition
No ratings yet
Pattern Recognition
33 pages
Pattern Recognitionand Neural Networks
No ratings yet
Pattern Recognitionand Neural Networks
12 pages
Pattern Recognition Techniques
No ratings yet
Pattern Recognition Techniques
10 pages
Introduction To Machine Learning
100% (1)
Introduction To Machine Learning
119 pages
Introduction To Pattern Recognition: Vojtěch Franc
100% (1)
Introduction To Pattern Recognition: Vojtěch Franc
21 pages
3 DM Classification
No ratings yet
3 DM Classification
62 pages
Pattern Recognition
No ratings yet
Pattern Recognition
52 pages
DLWSS551 - Knowledge Representation
No ratings yet
DLWSS551 - Knowledge Representation
43 pages
Pattern Recognition 14
No ratings yet
Pattern Recognition 14
46 pages
Unit Pattern
No ratings yet
Unit Pattern
6 pages
Unit 1 Image Proc
No ratings yet
Unit 1 Image Proc
37 pages
Machine Learning Introduction
No ratings yet
Machine Learning Introduction
56 pages
Pattern Recognition
No ratings yet
Pattern Recognition
5 pages
Chapter 1
No ratings yet
Chapter 1
18 pages
PRCV Unit-2
No ratings yet
PRCV Unit-2
24 pages
PR - L1-Introduction To Pattern Recognition PDF
0% (1)
PR - L1-Introduction To Pattern Recognition PDF
20 pages
Unit 1
No ratings yet
Unit 1
92 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Pattern Recognition: Lecturer
No ratings yet
Pattern Recognition: Lecturer
43 pages
Lect 5 - Paradigms of PR (PRML) (IIITDMK)
No ratings yet
Lect 5 - Paradigms of PR (PRML) (IIITDMK)
34 pages
Skeleton Presentation of Statistical Approach
No ratings yet
Skeleton Presentation of Statistical Approach
4 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
Machine Learning Basics for Beginners
100% (5)
Machine Learning Basics for Beginners
134 pages
Pattern Recognition
No ratings yet
Pattern Recognition
11 pages
Lecture 9 Introducation To ML
No ratings yet
Lecture 9 Introducation To ML
48 pages
Week 2, ML Models
No ratings yet
Week 2, ML Models
39 pages
Code For BFS Applied On MAP To Reach From Arad To Bucharest. (Artificial Intelligence)
100% (1)
Code For BFS Applied On MAP To Reach From Arad To Bucharest. (Artificial Intelligence)
2 pages
NSCA Coach 4.2
No ratings yet
NSCA Coach 4.2
6 pages
The Scientific Revolution A Brief History With Documents 2nd Edition Margaret C. Jacob PDF Download
No ratings yet
The Scientific Revolution A Brief History With Documents 2nd Edition Margaret C. Jacob PDF Download
122 pages
Facebook Is Now Meta - But Why, and What Even Is The Metaverse?
No ratings yet
Facebook Is Now Meta - But Why, and What Even Is The Metaverse?
1 page
Who S Who in American Art 2001 2002 24th Edition Donald Bunton Jr. PDF Download
No ratings yet
Who S Who in American Art 2001 2002 24th Edition Donald Bunton Jr. PDF Download
52 pages
Teach Yourself C++ (Introductory Pages)
No ratings yet
Teach Yourself C++ (Introductory Pages)
17 pages
Spartacus Workout 2.0 PDF
No ratings yet
Spartacus Workout 2.0 PDF
2 pages
Industrial Catalogue - CRC
No ratings yet
Industrial Catalogue - CRC
80 pages
Grade 8 ICT Lesson Plan
No ratings yet
Grade 8 ICT Lesson Plan
7 pages
General Research Methodology (Sweta)
No ratings yet
General Research Methodology (Sweta)
16 pages
FE 1002 Model Paper Answers
No ratings yet
FE 1002 Model Paper Answers
9 pages
Buffer and Solution Preparation
No ratings yet
Buffer and Solution Preparation
3 pages
Atomic Theory and Periodic Table
No ratings yet
Atomic Theory and Periodic Table
46 pages
APSC 183 Course Syllabus & Schedule
No ratings yet
APSC 183 Course Syllabus & Schedule
9 pages
DLL Kra1 Obj2
No ratings yet
DLL Kra1 Obj2
5 pages
Online Polling System SRS Document
No ratings yet
Online Polling System SRS Document
29 pages
Technomancer's Textbook
No ratings yet
Technomancer's Textbook
281 pages
AliExpress B2C E-Commerce Overview
No ratings yet
AliExpress B2C E-Commerce Overview
2 pages
Life, Intermediate, Unit 6 Test: Vocabulary
50% (2)
Life, Intermediate, Unit 6 Test: Vocabulary
8 pages
Mathematics: Quarter IV - Module 6
No ratings yet
Mathematics: Quarter IV - Module 6
20 pages
Assistant Principal in NJ Resume Winifred Odom Patterson
No ratings yet
Assistant Principal in NJ Resume Winifred Odom Patterson
3 pages
Alternative Teaching Methods in Kalawit
No ratings yet
Alternative Teaching Methods in Kalawit
7 pages
Syabas Manual
67% (3)
Syabas Manual
148 pages
Shiv Export: Dehydrated Onion & Garlic Catalog
No ratings yet
Shiv Export: Dehydrated Onion & Garlic Catalog
12 pages
040824a5e9488aff717961cd20a2ccf9
No ratings yet
040824a5e9488aff717961cd20a2ccf9
8 pages
DURAN BottleCaps-Connections-Accessories EN
No ratings yet
DURAN BottleCaps-Connections-Accessories EN
24 pages
MMW Reviewer 2
No ratings yet
MMW Reviewer 2
16 pages
Concept Paper Template
100% (2)
Concept Paper Template
2 pages
Tableau Beginner's Guide
No ratings yet
Tableau Beginner's Guide
6 pages
Filter Circuits: 2. The Low-Pass Filter
No ratings yet
Filter Circuits: 2. The Low-Pass Filter
5 pages

Intro

Uploaded by

Intro

Uploaded by

Pattern Recognition

Artificial Neural Networks,

Completely regular, Completely

Many visual inspection systems are like this:

Size alone is not effective

Perimeter is not effective

The two feature values are correlated, only one of them

 The typical cluster shape (e.g., Gaussian)

 Decision boundaries between samples

 Cluster centers of particular samples

 Decision boundary methods

I. population II. population

• Decision surfaces • Cluster centers

 Texture rules (primitive + repetition) to textures

 Backward (analysis) route:

 Brickwall vs. plaid shirt

min( P(1 ), P( 2 )) 1 2

 In the day time, some animal runs in front of

 Whichever model has a highly likelihood of

min( P(1| x ), P( 2 | x )) or

error   p(error | x ) p( x )dx

 A range of options (Receiver operating

 Some under your control (training and test)

 Some not (empirical error)

 How: n-fold validation

 Why: Overfitting and underfitting problems

Decision1 Decision2 Decision1 Decision2

r(dimes used as pennies) * P(dimes | x)

r(dimes used as dimes) * P(dimes | x)

 The risk associated with false negative is

 the risk of making a wrong action

(21  11 ) P ( 1 | x)  (12  22 ) P ( 2 | x)

 What features to use

 The class-conditional density p( x| i )

 The a posterior probability P( i | x)

Uncorrelated Events Correlated Events

Supervised Learning Un-supervised Clustering Hidden Markov Models

Parameter Density Decision Boundary Minimum Distance Hierarchical Clustering

 Higher the order, better the fit

 “Massage” the classifiers (using deep networks)

You might also like