K-nearest neighbors (KNN) is one of the simplest Machine Learning algorithms. It is a supervised learning algorithm which can be used for both classification and regression.
Let us understand this algorithm with a classification problem. For simplicity of visualization, we'll assume that our input data is 2 dimensional. We have two possible classes - green and red.
Lets plot out training data in feature space.
There is no explicit training phase in KNN! In other words, for classifying new data points, we'll directly use our dataset (in some sense, the dataset is the model).
To classify a new data point, we find the k points in the training data closest to it, and make a prediction based on whichever class is most common among these k points (i.e. we simulate a vote). Here closest is defined by a suitable distance metric such as euclidean distance. Other distance metrics are discussed below.
For example, if we want to classify blue point as shown in following figure, we consider k nearest data points and we assign the class which has the majority.
If k = 3, we get two data points with green class and one data point with red class. Hence, we'll predict green class for the new point.
Here's another example, let us change the position of new point (blue point) as shown below.
If we take k = 5 then we get four neighbors with red class and one neighbor with green class. Hence, new point will be classified as red point.
In case of regression (when target variable is a real value), we take the average of the K nearest neighbors.