[go: up one dir, main page]

0% found this document useful (0 votes)
72 views28 pages

I2ml3e Chap8

This document summarizes key concepts in nonparametric machine learning methods. It discusses nonparametric estimation, density estimation using histograms, kernel estimators, and k-nearest neighbor methods. It also covers nonparametric classification, regression, and outlier detection. Methods described include kernel density estimation, nearest neighbor classification, regression using regressograms and kernel smoothers, and choosing bandwidth parameters through cross-validation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views28 pages

I2ml3e Chap8

This document summarizes key concepts in nonparametric machine learning methods. It discusses nonparametric estimation, density estimation using histograms, kernel estimators, and k-nearest neighbor methods. It also covers nonparametric classification, regression, and outlier detection. Methods described include kernel density estimation, nearest neighbor classification, regression using regressograms and kernel smoothers, and choosing bandwidth parameters through cross-validation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Lecture Slides for

INTRODUCTION
TO
MACHINE
LEARNING
3RD EDITION
ETHEM ALPAYDIN
© The MIT Press, 2014

alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
CHAPTER 8:

NONPARAMETRIC
METHODS
Nonparametric Estimation
3

 Parametric (single global model), semiparametric


(small number of local models)
 Nonparametric: Similar inputs have similar outputs
 Functions (pdf, discriminant, regression) change
smoothly
 Keep the training data;“let the data speak for
itself”
 Given x, find a small number of closest training
instances and interpolate from these
 Aka lazy/memory-based/case-based/instance-
based learning
Density Estimation
4

 Given the training set X={xt}t drawn iid from p(x)


 Divide data into bins of size h
 Histogram: # x t in the samebin as x
pˆx  
Nh
Naive estimator:
# x  h  x t  x  h

pˆx  
2Nh
or
1 N  x  xt  1/ 2 if u  1
pˆx    w
Nh t 1  h
 wu   
otherwise
 0
5
6
Kernel Estimator
7

 Kernel function, e.g., Gaussian kernel:

1  u 
2
K u   exp 
2  2

 Kernel estimator (Parzen windows)


1 N  x  xt 
pˆx    K 
Nh t 1  h


8
k-Nearest Neighbor Estimator
9

 Instead of fixing bin width h and counting the


number of instances, fix the instances (neighbors) k
and check bin width
k
pˆx  
2Ndk x 
dk(x), distance to kth closest instance to x
10
Multivariate Data
11

 Kernel density estimator


1 N
 x  xt 
pˆx   d  K 
t 1 

Nh h 
Multivariate Gaussian kernel

spheric  1 
d
 u
2

K u     exp 
 2   2 
ellipsoid
1  1 T 1 
K u   exp u S u 
2  S
d /2 1/ 2
 2 
Nonparametric Classification
12

 Estimate p(x|Ci) and use Bayes’ rule


 Kernel estimator
1 N
 x  xt t ˆ Ni
pˆx|C i    K  ri P C i  
Ni h d t 1  h  N
1 N
 x  xt t
gi x   pˆx|C i P C i   d
ˆ  K  ri
Nh t 1  h 

 k-NN estimator

pˆx|C i  
ki
PˆC i |x  
ˆ
p x | C i PˆC i  ki

NiV x 
k
pˆx  k
Condensed Nearest Neighbor
13

 Time/space complexity of k-NN is O (N)


 Find a subset Z of X that is small and is accurate in
classifying X (Hart, 1968)
E' Z | X   E X |Z    Z
Condensed Nearest Neighbor
14

 Incremental algorithm: Add instance if needed


Distance-based Classification
15

 Find a distance function D(xr,xs) such that


if xrand xsbelong to the same class, distance is small
and if they belong to different classes, distance is
large
 Assume a parametric model and learn its
parameters using data, e.g.,
Learning a Distance Function
16

 The three-way relationship between distances,


dimensionality reduction, and feature extraction.
 M=LTL is dxd and L is kxd

 Similarity-based representation using similarity


scores
 Large-margin nearest neighbor (chapter 13)
Euclidean distance (circle) is not suitable,
Mahalanobis distance using an M (ellipse) is suitable.
After the data is projected along L, Euclidean distance can be used.

17
Outlier Detection
18

 Find outlier/novelty points


 Not a two-class problem because outliers are very
few, of many types, and seldom labeled
 Instead, one-class classification problem: Find
instances that have low probability
 In nonparametric case: Find instances far away from
other instances
Local Outlier Factor
19
Nonparametric Regression
20

 Aka smoothing models


 Regressogram

t 1
N
bx , x t
 r t

gˆx  
t 1 b
N
x , x t

where
 if is in the samebin with x
bx , x   
t
t 1 x
0 otherwise
21
22
Running Mean/Kernel Smoother
23

 Running mean smoother  Kernel smoother


 x  xt  t
t 1K  h  r
N
 x  xt  t
t 1w h  r
N

gˆx    
gˆx    
 x  xt 
t 1K  h 
N
 x  xt 
t 1w h 
N
 
 
where where K( ) is Gaussian
1 i f u  1  Additive models (Hastie
w u   
0 otherwi s e and Tibshirani, 1990)
 Running line smoother
24
25
26
How to Choose k or h ?
27

 When k or h is small, single instances matter; bias is


small, variance is large (undersmoothing): High
complexity
 As k or h increases, we average over more instances
and variance decreases but bias increases
(oversmoothing): Low complexity
 Cross-validation is used to finetune k or h.
28

You might also like