Machine Learning
LECTURE – 03
K-NEAREST-NEIGHBOR
ALGORITHM
K Nearest Neighbor Classification
K-Nearest Neighbors or KNN is one of the simplest machine
learning algorithms.
◦ easy to implement
◦ easy to understand.
It is a supervised machine learning algorithm. This means we
need a reference dataset to predict the category/group of the
future data point.
K Nearest Neighbor Classification
Once the KNN model is trained on the reference dataset, it
classifies the new data points based on the points (neighbors)
that are most similar to it. It uses the test data to make an
“educated guess” as to which category should the new data
points belong.
KNN algorithm simply means that “I am as good as my
nearest K neighbors.”
Why Is The KNN Called A “Lazy Learner”
Or A “Lazy Algorithm”?
KNN is called a lazy learner because when we supply training
data to this algorithm, the algorithm does not train itself at all.
KNN does not learn any discriminative function from the training
data. But it memorizes the entire training dataset instead.
There is no training time in KNN.
K Nearest Neighbor Classification(Lazy
Learner)
Each time a new data point comes in and we want to make a
prediction, the KNN algorithm will search for the nearest
neighbors in the entire training set.
Hence the prediction step becomes more time-consuming
and computationally expensive.
KNN CLASSIFICATION
APPROACH
KNN CLASSIFICATION
APPROACH
KNN CLASSIFICATION ALGORITHM
The table above represents our data set. We have two columns
— Brightness and Saturation.
Each row in the table has a class of either Red or Blue.
Before we introduce a new data entry, let's assume the value
of K is 5.
The table above represents our data set. We have two columns
How to Calculate Euclidean Distance in the K-Nearest Neighbors
Algorithm
Here's the new data entry:
To know its class, we have to calculate the distance from the new entry to
other entries in the data set using the Euclidean distance formula.
Here's what the table will look like after all the
distances have been calculated:
Since we chose 5 as the value of K, we'll only
consider the closest five records. That is:
As you can see above, the majority class within the 5
nearest neighbors to the new entry is Red. Therefore,
we'll classify the new entry as Red.
KNN - EXAMPLE
HOW TO SELECT THE VALUE OF
K
There is no particular way of choosing the value K, but here are
some common conventions to keep in mind:
Choosing a very low value will most likely lead to inaccurate
predictions.
The commonly used value of K is 5.
Always use an odd number as the value of K.
HOW TO SELECT THE VALUE OF K
There is no particular way to determine the best value for "K",
so we need to try some values to find the best out of them.
For each of N training example
Find its K nearest neighbours
Make a classification based on these K neighbours
Calculate classification error
Output average error over all examples
Use the K that gives lowest average error over the N training examples
PROS AND CONS OF KNN
Pros
Cons
THANK YOU!