K Nearest Neighbor Algorithm: Fundamentals and Applications
By Fouad Sabry
()
About this ebook
What Is K Nearest Neighbor Algorithm
The k-nearest neighbors technique, also known as k-NN, is a non-parametric supervised learning method that was initially created in 1951 by Evelyn Fix and Joseph Hodges in the field of statistics. Thomas Cover later expanded on the original concept. It has applications in both regression and classification. In both scenarios, the input is made up of the k training instances in a data collection that are the closest to one another. Whether or not k-NN was used for classification or regression, the results are as follows:The output of a k-nearest neighbor classification is a class membership. A plurality of an item's neighbors votes on how the object should be classified, and the object is then assigned to the class that is most popular among its k nearest neighbors (where k is a positive number that is often quite small). If k is equal to one, then the object is simply classified as belonging to the category of its single closest neighbor.The result of a k-NN regression is the value of a certain property associated with an object. This value is the average of the values of the k neighbors that are the closest to the current location. If k is equal to one, then the value of the output is simply taken from the value of the one nearest neighbor.
How You Will Benefit
(I) Insights, and validations about the following topics:
Chapter 1: K-nearest neighbors algorithm
Chapter 2: Supervised learning
Chapter 3: Pattern recognition
Chapter 4: Curse of dimensionality
Chapter 5: Nearest neighbor search
Chapter 6: Cluster analysis
Chapter 7: Kernel method
Chapter 8: Large margin nearest neighbor
Chapter 9: Structured kNN
Chapter 10: Weak supervision
(II) Answering the public top questions about k nearest neighbor algorithm.
(III) Real world examples for the usage of k nearest neighbor algorithm in many fields.
(IV) 17 appendices to explain, briefly, 266 emerging technologies in each industry to have 360-degree full understanding of k nearest neighbor algorithm' technologies.
Who This Book Is For
Professionals, undergraduate and graduate students, enthusiasts, hobbyists, and those who want to go beyond basic knowledge or information for any kind of k nearest neighbor algorithm.
Read more from Fouad Sabry
Related to K Nearest Neighbor Algorithm
Titles in the series (100)
Restricted Boltzmann Machine: Fundamentals and Applications for Unlocking the Hidden Layers of Artificial Intelligence Rating: 0 out of 5 stars0 ratingsMathematical Equality: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsHierarchical Control System: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsArtificial Neural Networks: Fundamentals and Applications for Decoding the Mysteries of Neural Computation Rating: 0 out of 5 stars0 ratingsHebbian Learning: Fundamentals and Applications for Uniting Memory and Learning Rating: 0 out of 5 stars0 ratingsSubsumption Architecture: Fundamentals and Applications for Behavior Based Robotics and Reactive Control Rating: 0 out of 5 stars0 ratingsAlternating Decision Tree: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsConvolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery Rating: 0 out of 5 stars0 ratingsHybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models Rating: 0 out of 5 stars0 ratingsMultilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks Rating: 0 out of 5 stars0 ratingsRadial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks Rating: 0 out of 5 stars0 ratingsArtificial Immune Systems: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsBackpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning Rating: 0 out of 5 stars0 ratingsFeedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs Rating: 0 out of 5 stars0 ratingsLong Short Term Memory: Fundamentals and Applications for Sequence Prediction Rating: 0 out of 5 stars0 ratingsPerceptrons: Fundamentals and Applications for The Neural Building Block Rating: 0 out of 5 stars0 ratingsAttractor Networks: Fundamentals and Applications in Computational Neuroscience Rating: 0 out of 5 stars0 ratingsHopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories Rating: 0 out of 5 stars0 ratingsRecurrent Neural Networks: Fundamentals and Applications from Simple to Gated Architectures Rating: 0 out of 5 stars0 ratingsBio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World Rating: 0 out of 5 stars0 ratingsNouvelle Artificial Intelligence: Fundamentals and Applications for Producing Robots With Intelligence Levels Similar to Insects Rating: 0 out of 5 stars0 ratingsGroup Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis Rating: 0 out of 5 stars0 ratingsSituated Artificial Intelligence: Fundamentals and Applications for Integrating Intelligence With Action Rating: 0 out of 5 stars0 ratingsNeuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution Rating: 0 out of 5 stars0 ratingsStatistical Classification: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsCompetitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition Rating: 0 out of 5 stars0 ratingsFuzzy Set Theory: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsNon Monotonic Logic: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsMulti Agent System: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsKernel Methods: Fundamentals and Applications Rating: 0 out of 5 stars0 ratings
Related ebooks
Support Vector Machine: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsNaive Bayes Classifier: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsRadial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks Rating: 0 out of 5 stars0 ratingsAlternating Decision Tree: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsContextual Image Classification: Understanding Visual Data for Effective Classification Rating: 0 out of 5 stars0 ratingsStatistical Classification: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsExercises of Statistical Inference Rating: 0 out of 5 stars0 ratingsKernel Methods: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsMachine Learning - Advanced Concepts Rating: 0 out of 5 stars0 ratingsMathematical Optimization: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsCompetitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition Rating: 0 out of 5 stars0 ratingsRandom Optimization: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsScale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision Rating: 0 out of 5 stars0 ratingsBackpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning Rating: 0 out of 5 stars0 ratingsMarkov Random Field: Exploring the Power of Markov Random Fields in Computer Vision Rating: 0 out of 5 stars0 ratingsBundle Adjustment: Optimizing Visual Data for Precise Reconstruction Rating: 0 out of 5 stars0 ratingsDear The Weight Rating: 0 out of 5 stars0 ratingsMulti-dimensional Monte Carlo Integrations Utilizing Mathematica Rating: 0 out of 5 stars0 ratingsDirect Linear Transformation: Practical Applications and Techniques in Computer Vision Rating: 0 out of 5 stars0 ratingsPerceptrons: Fundamentals and Applications for The Neural Building Block Rating: 0 out of 5 stars0 ratingsBilinear Interpolation: Enhancing Image Resolution and Clarity through Bilinear Interpolation Rating: 0 out of 5 stars0 ratingsDynamic Bayesian Networks: Fundamentals and Applications Rating: 0 out of 5 stars0 ratingsSample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design Rating: 0 out of 5 stars0 ratingsRandom Sample Consensus: Robust Estimation in Computer Vision Rating: 0 out of 5 stars0 ratingsUnderstanding Vector Calculus: Practical Development and Solved Problems Rating: 0 out of 5 stars0 ratingsScale Space: Exploring Dimensions in Computer Vision Rating: 0 out of 5 stars0 ratingsCross Correlation: Unlocking Patterns in Computer Vision Rating: 0 out of 5 stars0 ratingsGauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised Rating: 0 out of 5 stars0 ratingsDe-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning Rating: 0 out of 5 stars0 ratings
Intelligence (AI) & Semantics For You
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Writing AI Prompts For Dummies Rating: 0 out of 5 stars0 ratingsArtificial Intelligence: A Guide for Thinking Humans Rating: 4 out of 5 stars4/5Nexus: A Brief History of Information Networks from the Stone Age to AI Rating: 4 out of 5 stars4/5Co-Intelligence: Living and Working with AI Rating: 4 out of 5 stars4/5The Instant AI Agency: How to Cash 6 & 7 Figure Checks in the New Digital Gold Rush Without Being A Tech Nerd Rating: 0 out of 5 stars0 ratings101 Midjourney Prompt Secrets Rating: 3 out of 5 stars3/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5The Coming Wave: Technology, Power, and the Twenty-first Century's Greatest Dilemma Rating: 5 out of 5 stars5/5Some Future Day: How AI Is Going to Change Everything Rating: 0 out of 5 stars0 ratingsAI Money Machine: Unlock the Secrets to Making Money Online with AI Rating: 0 out of 5 stars0 ratingsA Brief History of Artificial Intelligence: What It Is, Where We Are, and Where We Are Going Rating: 4 out of 5 stars4/5ChatGPT Millionaire: Work From Home and Make Money Online, Tons of Business Models to Choose from Rating: 5 out of 5 stars5/5Chat-GPT Income Ideas: Pioneering Monetization Concepts Utilizing Conversational AI for Profitable Ventures Rating: 3 out of 5 stars3/5Coding with AI For Dummies Rating: 1 out of 5 stars1/5Midjourney Mastery - The Ultimate Handbook of Prompts Rating: 5 out of 5 stars5/5ChatGPT For Fiction Writing: AI for Authors Rating: 5 out of 5 stars5/5Summary of Super-Intelligence From Nick Bostrom Rating: 4 out of 5 stars4/5A Quickstart Guide To Becoming A ChatGPT Millionaire: The ChatGPT Book For Beginners (Lazy Money Series®) Rating: 4 out of 5 stars4/5The Insane ChatGPT Millionaire Guide Rating: 0 out of 5 stars0 ratingsThe Singularity Is Nearer: When We Merge with AI Rating: 4 out of 5 stars4/52062: The World that AI Made Rating: 5 out of 5 stars5/5Build a Career in Data Science Rating: 5 out of 5 stars5/5The AI-Driven Leader: Harnessing AI to Make Faster, Smarter Decisions Rating: 2 out of 5 stars2/5
Reviews for K Nearest Neighbor Algorithm
0 ratings0 reviews
Book preview
K Nearest Neighbor Algorithm - Fouad Sabry
Chapter 1: k-nearest neighbors algorithm
Evelyn Fix and Joseph Hodges created the k-nearest neighbors algorithm (k-NN) in statistics as a non-parametric supervised learning technique in 1951. Regression and classification are two uses for it. The input in both situations consists of a data set's k closest training samples. Whether k-NN is applied for classification or regression determines the results:
The result of k-NN classification is a class membership. The class that an object is assigned to based on the majority vote of its k closest neighbors is determined by the item's neighbors (k is a positive integer, typically small). The object is simply assigned to the class of its one nearest neighbor if k = 1.
The output of k-NN regression is the object's property value. The average of the values of the k closest neighbors makes up this number. The output is just assigned to the value of the one nearest neighbor if k = 1.
With k-NN, all computation is postponed until after the function has been evaluated and the function is only locally approximated. Normalizing the training data can significantly increase the accuracy of this technique, which relies on distance for classification, if the features reflect various physical units or have wildly different sizes.
When using k-NN classification or regression, the neighbors are chosen from a set of objects for which the class or object property value is known. Although there is no explicit training phase needed, this can be considered of as the algorithm's training set.
The k-NN algorithm has the feature of being sensitive to the local structure of the data.
Suppose we have pairs
{\displaystyle (X_{1},Y_{1}),(X_{2},Y_{2}),\dots ,(X_{n},Y_{n})}taking values in {\mathbb {R}}^{d}\times \{1,2\} , where Y is X's class label, so that X|Y=r\sim P_{r} for r=1,2 (and probability distributions P_{r} ).
Given some norm \|\cdot \| on \mathbb {R} ^{d} and a point x\in {\mathbb {R}}^{d} , let
(X_{{(1)}},Y_{{(1)}}),\dots ,(X_{{(n)}},Y_{{(n)}})be a reordering of the training data such that
\|X_{{(1)}}-x\|\leq \dots \leq \|X_{{(n)}}-x\|.
The training examples are vectors with class labels in a multidimensional feature space. The algorithm's training phase consists solely of storing the training samples' feature vectors and class labels.
The label that is most prevalent among the k training samples closest to the unlabeled vector (a query or test point) is assigned during the classification phase, where k is a user-defined constant.
Euclidean distance is a typical distance metric for continuous variables. Another metric, such as the overlap metric, can be used for discrete variables, such as text categorization (or Hamming distance). For instance, k-NN has been used as a metric in the context of gene expression microarray data along with correlation coefficients like Pearson and Spearman. Often, learning the distance metric with specific algorithms like Large Margin Nearest Neighbor or Neighbourhood Components Analysis can greatly increase the classification accuracy of k-NN.
When the distribution of the classes is skewed, the fundamental majority voting
categorization has a disadvantage. In other words, because they tend to be common among the k nearest neighbors due to their huge number, examples of a more frequent class tend to dominate the forecast of the new example. To solve this issue, one solution is to weight the classification by accounting for the distance between the test location and each of its k closest neighbors. Each of the k closest points' class (or value, in regression issues) is multiplied by a weight that is proportional to the inverse of the distance separating it from the test point. Abstraction in data representation is a different strategy for dealing with skew. Depending on their density in the initial training data, each node in a self-organizing map (SOM) is a representative (a center) of a cluster of comparable points. The SOM can then be used with K-NN.
The ideal number for k depends on the data; generally, higher values of k lessen the impact of noise on classification, but they also blur the borders between classes. A good k can be chosen using a variety of heuristic methods (see hyperparameter optimization). The nearest neighbor approach is used in the particular circumstance where the class is predicted to be the class of the nearest training sample (i.e. when k = 1).
The existence of noisy or irrelevant features, or if the feature scales are inconsistent with their relevance, can significantly reduce the accuracy of the k-NN method. A lot of study has gone into choosing or scaling characteristics to enhance classification. Evolutionary algorithms are a particularly well-liked method for enhancing feature scalability.
The nearest neighbor classifier that assigns a point x to the class of its closest neighbor in the feature space is the most logical nearest neighbor classifier, that is C_{n}^{{1nn}}(x)=Y_{{(1)}} .
The one closest neighbour classifier ensures that the error rate won't be worse than twice the Bayes error rate as the size of the training data set approaches infinity (the minimum achievable error rate given the distribution of the data).
The k-nearest neighbour classifier can be viewed as assigning the k nearest neighbours a weight 1/k and all others 0 weight.
This is applicable to classifiers for weighted nearest neighbors.
That is, where the ith nearest neighbour is assigned a weight w_{{ni}} , with {\textstyle \sum _{i=1}^{n}w_{ni}=1} .
A similar finding on the high consistency of weighted closest neighbor classifiers is also true.
Let C_{n}^{{wnn}} denote the weighted nearest classifier with weights \{w_{{ni}}\}_{{i=1}}^{n} .
Depending on the regularity conditions, which, in asymptotic theory, are conditional variables that call for presumptions in order to distinguish between parameters using certain criteria.
The excess risk exhibits the following asymptotic growth on the class distributions.
{\displaystyle {\mathcal {R}}_{\mathcal {R}}(C_{n}^{wnn})-{\mathcal {R}}_{\mathcal {R}}(C^{\text{Bayes}})=\left(B_{1}s_{n}^{2}+B_{2}t_{n}^{2}\right)\{1+o(1)\},}for constants B_{1} and B_{2} where s_{n}^{2}=\sum _{{i=1}}^{n}w_{{ni}}^{2} and
{\displaystyle t_{n}=n^{-2/d}\sum _{i=1}^{n}w_{ni}\left\{i^{1+2/d}-(i-1)^{1+2/d}\right\}}.
The optimal weighting scheme \{w_{{ni}}^{*}\}_{{i=1}}^{n} , that keeps the two terms in the above graphic in balance, is given as follows: set k^{*}=\lfloor Bn^{{{\frac 4{d+4}}}}\rfloor ,
{\displaystyle w_{ni}^{*}={\frac {1}{k^{*}}}\left[1+{\frac {d}{2}}-{\frac {d}{2{k^{*}}^{2/d}}}\{i^{1+2/d}-(i-1)^{1+2/d}\}\right]}for i=1,2,\dots ,k^{*} and
{\displaystyle w_{ni}^{*}=0}for i=k^{*}+1,\dots ,n .
With optimal weights the dominant term in the asymptotic expansion of the excess risk is {\mathcal {O}}(n^{{-{\frac 4{d+4}}}}) .
When employing a bagged closest neighbour classifier, similar outcomes are obtained.
k-NN is a specific instance of a uniform kernel, variable-bandwidth, kernel density balloon
estimator.
By calculating the distances between each stored example and the test example, the naive form of the technique is simple to use but computationally demanding for large training sets. Even for enormous data sets, the approximate closest neighbor search algorithm makes k-NN computationally manageable. Over the years, a number of nearest neighbor search algorithms have been proposed; these typically aim to minimize the amount of actual distance evaluations.
Strong consistency results for k-NN are present. The two-class k-NN algorithm is guaranteed to produce an error rate no worse than twice the Bayes error rate as the amount of data grows toward infinity (the minimum achievable error rate given the distribution of the data).
Cover and Hart (1967) provide an upper bound error rate of 0.1 for multi-class k-NN classification.
{\displaystyle R^{*}\ \leq \ R_{k\mathrm {NN} }\ \leq \ R^{*}\left(2-{\frac {MR^{*}}{M-1}}\right)}where R^{*} is the Bayes error rate (which is the minimal error rate possible), {\displaystyle R_{kNN}} is the k-NN error rate, , where M denotes the number of classes in the issue.
For M=2 and as the Bayesian error rate R^{*} approaches zero, Not more than twice the Bayesian error rate
becomes the new upper limit.
Numerous findings have been made on the k closest neighbour classifiers' error rate.