0% found this document useful (0 votes)

4 views26 pages

Lect 06

Uploaded by

tahirnaquash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views26 pages

Lect 06

Uploaded by

tahirnaquash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Machine learning

Instance Based Learning

Hamid Beigy

Sharif University of Technology

November 14, 2021

Table of contents

1. Introduction

2. Nearest neighbor algorithms

3. Distance-weighted nearest neighbor algorithms

4. Locally weighted regression

5. Finding KNN(x) efficiently

6. Reading

1/18
Introduction
Introduction

1. The methods described before such as decision tree at the first find hypothesis and then
this hypothesis will be used for classification of new test examples.
2. These methods are called eager learning.
3. The instance based learning algorithms such as k-NN store all of the training examples
and then classify a new example x by finding the training example (xi , yi ) that is nearest

Voronoi Diagram
to x according to some distance metric.
4. Instance based classifiers do not explicitly compute decision boundaries. However, the
boundaries form a subset of the Voronoi diagram of the training data.

2/18
Nearest neighbor algorithms
Nearest neighbor algorithms

1. Fix k ≥ 1, given a labeled sample

S = {(x1 , t1 ), . . . , (xN , tN )}

where ti ∈ {0, 1}. The k-NN for all test examples x returns the hypothesis h defined by
 
X X
h(x) = I  wi > wi  .
i,ti =1 i,ti =0

where the weights w1 , . . . , wN are chosen such that wi = k1 if xi is among the k nearest
neighbors of x. Voronoi Diagram
2. The boundaries form a subset of the Voronoi diagram of the training data.

3/18
page 4
Nearest neighbor algorithms

1. The k-NN only requires

I An integer k.
I A set of labeled examples S.
I A metric to measure closeness.
2. For all points x, y , z, a metric d must satisfy the following properties.
I Non-negativity : d(x, y ) ≥ 0.
I Reflexivity : d(x, y ) = 0 ⇔ x = y .
I Symmetry : d(x, y ) = d(y , x).
I Triangle inequality : d(x, y ) + d(y , z) ≥ d(x, z).

4/18
Distance functions

1. The Minkowski distance for D-dimensional examples is the Lp norm.

D
! p1
X
Lp (x, y ) = |xi − yi |p
i=1
2. The Euclidean distance is the L2 norm
D
! 12
X
L2 (x, y ) = |xi − yi |2
i=1
3. The Manhattan or city block distance is the L2 norm
D
X
|xi − yi |
L1 (x, y ) = Functions
Distance
i=1
• The L1 norm is the maximum of the distances along individual
4. The L∞ norm is the coordinate
maximum of distances along axes
axes
unctions • The L norm is the maximum ofd the distances along individual
∞ (x,
coordinate axesL L y )y)==max
1(x, |xi −yyi|.|
max |x
i=1 |xi
L (x, y) = max
i yi |
i
ensional patterns is the Minkowski i i

! p1
X p
|xi yi |
1

rm
! 12
X 2
|xi yi |
1

ce is the L1 norm 5/18

Nearest neighbor algorithm for regression

1. The k-NN algorithm adapted for approximating continuous-valued target function.

2. We calculate the
Pk mean of k nearest neighborhood training examples rather than majority
f (x )
vote : fˆ(x) = i=1k i .

K -NN Behavior for Regression

RVFSZQPJOU

1
3. The effect of k on the performance of algorithm

Figure 1: Example of Locally Weighted Learning, containing in the upper graphic the set of data
points (x,y) (blue dots), query point (green line), local linear model (red line) and prediction (black
dot). The graphic in the middle shows the activation area of the model. The corresponding weighting
kernel (receptive field) is shown in the bottom graphic.

1 Pictures is assumed
are taken from with a continuous
P. Rai slide. function f(x) and noise ✏. The basic cost function of LWL is defined
as 6/18
n
X
Nearest neighbor algorithms

1. The k-NN algorithm is a lazy learning algorithm.

I It defers the hypothesis finding until a test example x arrives.
I For test example x, the k-NN uses the stored training data.
I Discards the the found hypothesis and any intermediate results.
2. This strategy is opposed to an eager learning algorithm which
I It finds a hypothesis h using the training set
I It uses the found hypothesis h for classification of test example x.
3. Trade offs
I During training phase, lazy algorithms have fewer computational costs than eager algorithms.
I During testing phase, lazy algorithms have greater storage requirements and higher
computational costs.
4. What is inductive bias of k-NN?

7/18
Properties of nearest neighbor algorithms

1. Advantages
I Analytically tractable
I Simple implementation
I Use local information, which results in highly adaptive behavior.
I It parallel implementation is very easy.
I Nearly optimal in the large sample (N → ∞).

E (Bayes) < E (NN) < 2 × E (Bayes).

2. Disadvantages
I Large storage requirements.
I It needs a high computational cost during testing.
I Highly susceptible to the irrelevant features.
3. Large values of k
I Results in smoother decision boundaries.
I Provides more accurate probabilistic information
4. But large values of k
I Increases computational cost.
I Destroys the locality of estimation.

8/18
Distance-weighted nearest neighbor
algorithms
Distance-weighted nearest neighbor algorithms

1. One refinement of k-NN is to weight the contribution of each k neighbors to their

distance to the query point x.
2. For two class classification
 
X X
h(x) = I  wi > wi  .
i,ti =1 i,ti =0

where
1
wi =
d(x, xi )2
3. For C class classification
k
X
h(x) = argmax wi δ(c, ti ).
c∈C i=1

4. For regression
Pk
wi f (xi )
fˆ(x) = i=1
.
wi

9/18
Locally weighted regression
Locally weighted regression

1. In locally weighted regression (LWR), we use a linear model to do the local approximation
fˆ:
ˆ = w 0 + w 1 x1 + w 2 x2 + . . . + w D xD .
f (x)
2. Suppose we aim to minimize the total squared error:
1X
E= (f (x) − fˆ(x))2
2
x∈S

3. Using gradient descent X

∆wj = η (f (x) − fˆ(x))xj
x∈S

where η is a small number (the learning rate).

10/18
Locally weighted regression i

1. How shall we modify this procedure to derive a local approximation rather than a global
one?
2. The simple way is to redefine the error criterion E to emphasize fitting the local training
examples.
3. Three possible criteria are given below. Note we write the error E (xq ) to emphasize the
fact that now the error is being defined as a function of the query point xq .
I Minimize the squared error over just the k nearest neighbors:

1 X
E1 (xq ) = (f (x) − fˆ(x))2
2
x∈KNN(xq )

I Minimize 1 squared error over the set S of training examples, while weighting the error of
each training example by some decreasing function k of its distance from xq

1X
E2 (xq ) = (f (x) − fˆ(x))2 K (d(xq , x))
2
x∈S
I Combine 1 and 2:

1 X
E3 (xq ) = (f (x) − fˆ(x))2 K (d(xq , x))
2
x∈KNN(xq )

11/18
Locally weighted regression ii

4. If we choose criterion (3) and re-derive the gradient descent rule, we obtain
X
∆wj = η K (d(xq , x))(f (x) − fˆ(x))xj
x∈KNN(xq )

where η is a small number (the learning rate).

5. Criterion (2) is perhaps the most esthetically pleasing because it allows every training
example to have an impact on the classification of xq .
6. However, this approach requires computation that grows linearly with the number of
training examples.
7. Criterion (3) is a good approximation to criterion (2) and has the advantage that
computational cost is independent of the total number of training examples; its cost
depends only on the number k of neighbors considered.

12/18
Finding KNN(x) efficiently
" the
Bucketing (a.k.a Elias’s algorithm) [Welch 1971] cell exceeds the distance to the closest point already visited
k-d trees [Bentley, 1975; Friedman et al, 1977]
Finding
nearest
!
neighbor KNN(x)
"

Bucketing
efficiently
" In the Bucketing algorithm, the space is divided into identical cells and for
each cell the data points inside it are stored in a list
" The cells are examined in order of increasing distance from the query point
and for each cell the distance is computed between its internal data points

"
1. How efficiently find KNN(x)?
and the query point
The search terminates when the distance from the query point to the cell
2. Tree-based data structures: pre-processing.
exceeds the distance to the closest point already visited
! k-d trees
" A k-d3.
tree Often kd-trees
is a generalization of a binary(k-dimensional trees) used in applications.
search tree in high dimensions
! Each internal node in a k-d tree is associated with a hyper-rectangle and a hyper-plane orthogonal to one of the
4.coordinate
!
A kd-tree
axis is a generalization of binary tree in high dimensions
The hyper-plane splits the hyper-rectangle into two parts, which are associated with the child nodes
The partitioning process goes on until the number of data points in the hyper-rectangle falls below some given threshold
"
!
4.1 Each internal node is associated with a hyper-rectangle and the hyper-plans is orthogonal to
The effect of a k-d tree is to partition the (multi-dimensional) sample space according to the underlying
distribution of the data, the partitioning being finer in regions where the density of data points is higher
one of its coordinates.
For a given query point, the algorithm works by first descending the the tree to find the data points lying in the cell that
!

contains the query point

Then 4.2
! The
it examines hyper-plan
surrounding splits
cells if they overlap the ballthe hyper-rectangle
centered to data
at the query point and the closest two pointparts,
so far which are associated with the child
nodes.
Introduction to Pattern Analysis KD-tree construction
4.3 The partitioning goes on until the number of data points in the hyper-plane falls below some
Ricardo Gutierrez-Osuna
19

Texas A&M University

high dimensions given threshold.
er-rectangle and a

hich are associated

X Y
ata points in the .15 .1
.03 .55
ulti-dimensional) .95 .1
on of the data,
y of data points ... ...

ing the the tree to

oint
ntered at the 5. Splitting
query
Start with a list of n-dimensional points
order : Widest first
6. Splitting value : Median
7. Stop condition : fewer than a threshold or box hit some minimum width.
13/18
distribution of the data, the partitioning being finer in regions where the density of data points is higher
! For a given query point, the algorithm works by first descending the the tree to find the data points lying in the cell that
kd-tree
!
contains the query point
Then it examines surrounding cells if they overlap the ball centered at the query point and the closest data point so far

Introduction to Pattern Analysis

Ricardo Gutierrez-Osuna
KD-tree construction 19

Texas A&M University

dimensions 1. initial data set
angle and a

e associated
X Y
oints in the .15 .1
.03 .55
mensional) .95 .1
the data,
ata points ... ...

the tree to
KD-tree construction
at the query 2. After first split Start with a list of n-dimensional points

X > .5
No
Yes
X Y X Y
.15 .1 .95 .1
.03 .55 ... ...
... ...

Consider each group separately and possibly split again 14/18

kd-tree

KD-tree construction
1. After second split

X > .5
No Yes
X Y
Y> .5 .95 .1
No Yes ... ...

X Y X Y
.15 .1 .03 .55
... ... ... ...

KD-tree construction
Consider each group separately and possibly split again Ke
2. Final split. structu
(along same/different dimension).

Will keep around one additional piece of information at each 15/18

Median of value of the split dimension for the points.
Nearest neighbor with kd-tree
ch • When do we stop ?
e. WhenNearest neighbour
there are fewer with left
then m points KD-trees
OR the box has Ne
hit looking
1. Traverse tree some minimum width.
for the nearest neighbor of the query point.

Nearest neighbour with KD-trees

Traverse
2. Explore a branch of treethe
thattree lookingtoforthe
is closest thequery
nearest neighbor
point first of the Exam
query point.

Examine nearby points first: Explore the branch of the 16/18

Reading
Readings

1. Chapter 8 of Machine Learning Book (Mitchell 1997).

17/18
References i

Mitchell, Tom M. (1997). Machine Learning. McGraw-Hill.

18/18
Questions?

18/18

Machine Learning Lecture 02
No ratings yet
Machine Learning Lecture 02
25 pages
Instance Based Learning: Vibhav Gogate The University of Texas at Dallas
No ratings yet
Instance Based Learning: Vibhav Gogate The University of Texas at Dallas
25 pages
CH 2
No ratings yet
CH 2
30 pages
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
No ratings yet
2EL1730-ML-Lecture04-Non Parametric Learning and Nearest Neighbor
47 pages
Intro to k-Nearest Neighbor Algorithm
No ratings yet
Intro to k-Nearest Neighbor Algorithm
3 pages
Module 4 A
No ratings yet
Module 4 A
29 pages
INSTANCE Based Learning
No ratings yet
INSTANCE Based Learning
12 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Notes 02
No ratings yet
Notes 02
79 pages
MLT Unit 3 Part 2
No ratings yet
MLT Unit 3 Part 2
57 pages
MCA 4th Sem
No ratings yet
MCA 4th Sem
18 pages
Nearest Neighbor Algorithms Guide
No ratings yet
Nearest Neighbor Algorithms Guide
26 pages
Unit 4
No ratings yet
Unit 4
20 pages
CS8082U4L01 - K-Nearest Neighbour Learning
No ratings yet
CS8082U4L01 - K-Nearest Neighbour Learning
21 pages
k-Nearest Neighbors Lecture Notes
No ratings yet
k-Nearest Neighbors Lecture Notes
23 pages
ML Lec7
No ratings yet
ML Lec7
5 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Wikipedia K Nearest Neighbor Algorithm
No ratings yet
Wikipedia K Nearest Neighbor Algorithm
4 pages
AML Mod5
No ratings yet
AML Mod5
33 pages
BTech V KCS 055 Unit3
No ratings yet
BTech V KCS 055 Unit3
12 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
ML KN
No ratings yet
ML KN
12 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
Intro to KNN for Data Science
No ratings yet
Intro to KNN for Data Science
37 pages
Week 7 Nearest Neighbours
No ratings yet
Week 7 Nearest Neighbours
21 pages
Challenges in KNN Classification: Shichao Zhang
No ratings yet
Challenges in KNN Classification: Shichao Zhang
13 pages
k-Nearest Neighbors Lecture Slides
No ratings yet
k-Nearest Neighbors Lecture Slides
57 pages
K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
AIML-Unit 4 Notes-Assignment 4
No ratings yet
AIML-Unit 4 Notes-Assignment 4
21 pages
A Parameter-Free Nearest Neighbor Algorithm With R
No ratings yet
A Parameter-Free Nearest Neighbor Algorithm With R
27 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
03d Algind KNN Eng
No ratings yet
03d Algind KNN Eng
23 pages
Presentation UNIT-2 (Old)
No ratings yet
Presentation UNIT-2 (Old)
58 pages
K-Nearest Neighbor Learning
No ratings yet
K-Nearest Neighbor Learning
31 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
k-NN Algorithm: Basics, Applications, and Advantages
No ratings yet
k-NN Algorithm: Basics, Applications, and Advantages
42 pages
ML04 KNN-SVM 2024-2025
No ratings yet
ML04 KNN-SVM 2024-2025
57 pages
ML Unit V
No ratings yet
ML Unit V
10 pages
Instance-Based Learning Explained
No ratings yet
Instance-Based Learning Explained
6 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
Unit 5 ML
No ratings yet
Unit 5 ML
13 pages
Unit 3 (B) NGP
No ratings yet
Unit 3 (B) NGP
84 pages
20180723161729D4730 - Pert18 - K-Nearest Neighbor
No ratings yet
20180723161729D4730 - Pert18 - K-Nearest Neighbor
22 pages
ML Assignment No. 3: 3.1 Title
No ratings yet
ML Assignment No. 3: 3.1 Title
6 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
21 pages
ML Mid2 Ans
No ratings yet
ML Mid2 Ans
24 pages
Dynamic KNNF
No ratings yet
Dynamic KNNF
3 pages
01 Basics 02knn 01
No ratings yet
01 Basics 02knn 01
7 pages
3.1 K Nearest Neighbour Classifier
No ratings yet
3.1 K Nearest Neighbour Classifier
24 pages
Class Notes ML1
No ratings yet
Class Notes ML1
115 pages
Class Notes ML1
No ratings yet
Class Notes ML1
111 pages
K-NN Algorithm Overview
No ratings yet
K-NN Algorithm Overview
8 pages
Unit 3
No ratings yet
Unit 3
12 pages
Machine Learning for Data Scientists
No ratings yet
Machine Learning for Data Scientists
13 pages
Supervised Learning Techniques
No ratings yet
Supervised Learning Techniques
33 pages
CS 7 21CS731 QPModel 3402
No ratings yet
CS 7 21CS731 QPModel 3402
3 pages
ML - Module 2
No ratings yet
ML - Module 2
41 pages
21CSL481
No ratings yet
21CSL481
3 pages
III I ML Notes r22 Compressed
No ratings yet
III I ML Notes r22 Compressed
216 pages
Reconfigurable Intelligent Surface-Assisted
No ratings yet
Reconfigurable Intelligent Surface-Assisted
30 pages
Sensors 23 08041 v3
No ratings yet
Sensors 23 08041 v3
17 pages
204CS001 2
No ratings yet
204CS001 2
2 pages
Federated Deep Reinforcement Learning For RIS-Assisted
No ratings yet
Federated Deep Reinforcement Learning For RIS-Assisted
6 pages
RIS Assisted UAV D2D Communications
No ratings yet
RIS Assisted UAV D2D Communications
9 pages
Product Brochure2
No ratings yet
Product Brochure2
22 pages
ATAL FDP Brochure File
No ratings yet
ATAL FDP Brochure File
3 pages
RIS-Assisted UAV For Timely Data Collection in
No ratings yet
RIS-Assisted UAV For Timely Data Collection in
12 pages
479 B.E. Computer Science Technology Scheme Syllabus
No ratings yet
479 B.E. Computer Science Technology Scheme Syllabus
1 page
AI-AIML Question Bank
No ratings yet
AI-AIML Question Bank
6 pages
1877677750
No ratings yet
1877677750
128 pages
Lecture 161
No ratings yet
Lecture 161
46 pages
Computer Networks Notes
67% (3)
Computer Networks Notes
23 pages
ch01-SLIDE - (2) Data Communications and Networking by Behrouz A.Forouzan
100% (3)
ch01-SLIDE - (2) Data Communications and Networking by Behrouz A.Forouzan
18 pages
Cmpe242 SPRING 2012-2013 FINAL
No ratings yet
Cmpe242 SPRING 2012-2013 FINAL
7 pages
Item Classes: Form Toolbar
No ratings yet
Item Classes: Form Toolbar
6 pages
Email Surveillance Audit Program
No ratings yet
Email Surveillance Audit Program
11 pages
07 Operations Engr Management v3.1
No ratings yet
07 Operations Engr Management v3.1
3 pages
Irn37-45k, 50-60H (CC) 10.01 List Parts
No ratings yet
Irn37-45k, 50-60H (CC) 10.01 List Parts
32 pages
Class XII Computer Science Exam
No ratings yet
Class XII Computer Science Exam
12 pages
Graduate Texts in Mathematics Overview
100% (1)
Graduate Texts in Mathematics Overview
419 pages
Agile Methods for Project Teams
No ratings yet
Agile Methods for Project Teams
8 pages
Empower Your Hybrid Workforce With HP Amd and Microsoft
No ratings yet
Empower Your Hybrid Workforce With HP Amd and Microsoft
27 pages
C10 E Business and Enterprise Resource Planning System 1
No ratings yet
C10 E Business and Enterprise Resource Planning System 1
9 pages
28 - Effective Pages: Beechcraft Corporation
No ratings yet
28 - Effective Pages: Beechcraft Corporation
236 pages
SKFML-FSL Farenhyt
No ratings yet
SKFML-FSL Farenhyt
2 pages
Butterfly Valve
No ratings yet
Butterfly Valve
12 pages
Communication Aids and Strategies Using Tool of Technology
No ratings yet
Communication Aids and Strategies Using Tool of Technology
10 pages
Operating - System - KCS 401 - Assignment - 1 PDF
No ratings yet
Operating - System - KCS 401 - Assignment - 1 PDF
5 pages
Data Analyst Career Overview
No ratings yet
Data Analyst Career Overview
2 pages
Telecare Technologies and The Transformation of Healthcare (Health, Technology and Society) - , 978-0230300200
100% (28)
Telecare Technologies and The Transformation of Healthcare (Health, Technology and Society) - , 978-0230300200
23 pages
ORICO NS Disk Station User Guide
No ratings yet
ORICO NS Disk Station User Guide
17 pages
Model 915-RG: Rain Guage
No ratings yet
Model 915-RG: Rain Guage
2 pages
Yuraj Singh
No ratings yet
Yuraj Singh
79 pages
Jeffrey D. Oldham: Experience and Skills
No ratings yet
Jeffrey D. Oldham: Experience and Skills
1 page
Agri Price List 01nov2019
No ratings yet
Agri Price List 01nov2019
40 pages
Construction Bill Sheet Summary
No ratings yet
Construction Bill Sheet Summary
3 pages
Position Announcement Temp Clerk II May 2023 RA1
No ratings yet
Position Announcement Temp Clerk II May 2023 RA1
2 pages
Hitungalh Volume Lubang Bor Dan Volume Anulus Curut
No ratings yet
Hitungalh Volume Lubang Bor Dan Volume Anulus Curut
9 pages
JEE Main Maths Important Chapters Sheet
No ratings yet
JEE Main Maths Important Chapters Sheet
2 pages
Project Description AND Scope of Work For Revamping of Fire Water Network of Ongc Residential Colony of Ankleshwar Asset
No ratings yet
Project Description AND Scope of Work For Revamping of Fire Water Network of Ongc Residential Colony of Ankleshwar Asset
111 pages
BLOCKCHAIN
100% (1)
BLOCKCHAIN
35 pages
Unit 2 Matrix Algebra
No ratings yet
Unit 2 Matrix Algebra
61 pages
CG
No ratings yet
CG
2 pages

Lect 06

Uploaded by

Lect 06

Uploaded by

Machine learning

Instance Based Learning

Sharif University of Technology

November 14, 2021

2. Nearest neighbor algorithms

3. Distance-weighted nearest neighbor algorithms

4. Locally weighted regression

5. Finding KNN(x) efficiently

1. Fix k ≥ 1, given a labeled sample

1. The k-NN only requires

1. The Minkowski distance for D-dimensional examples is the Lp norm.

ce is the L1 norm 5/18

1. The k-NN algorithm adapted for approximating continuous-valued target function.

K -NN Behavior for Regression

1. The k-NN algorithm is a lazy learning algorithm.

E (Bayes) < E (NN) < 2 × E (Bayes).

1. One refinement of k-NN is to weight the contribution of each k neighbors to their

3. Using gradient descent X

where η is a small number (the learning rate).

where η is a small number (the learning rate).

contains the query point

Texas A&M University

hich are associated

ing the the tree to

Introduction to Pattern Analysis

Texas A&M University

Consider each group separately and possibly split again 14/18

Will keep around one additional piece of information at each 15/18

Nearest neighbour with KD-trees

Examine nearby points first: Explore the branch of the 16/18

1. Chapter 8 of Machine Learning Book (Mitchell 1997).

Mitchell, Tom M. (1997). Machine Learning. McGraw-Hill.

You might also like