A Complete Guide to K-Nearest-Neighbors with Applications in Python
and R
https://kevinzakka.github.io/2016/07/13/k-nearest-neighbor/
Characteristics
- Supervised learning: training dataset contains relationships between x and y; and the goal is
to infer y from a sample containing only x
- Non-parametric: No explicit assumption about the functional form of h(x), estimator of y
Mechanics
Form a majority vote of the K instances in the training sample that are the closest to the observation
to classify.
Euclidian distance: (1 , 2 ) = (11 21 )2 + + (1 2 )2
Other possible distances
Steps to classify an observation:
1. Run through the whole dataset computing d between x and each training observation
2. Find the K (odd to prevent ties) observations with the smallest d set
1
3. Find the conditional probability for each class: ( = | = ) = 1{=}
4. The observation is classified in the class with the highest probability
R code
library(class)
knn(train = , test = , cl = , k=K)