Machine Learning Series - Nearest Neighbor Classifier

nearest neighbor classifier

passive learning

General classifiers, such as decision trees and support vector machines, start to learn a mapping model from input attributes to class labels whenever training data is available. Such learning strategies are called active learning methods. In contrast, passive learning algorithms have a strategy of deferring the modeling of the training data until it is time to classify the test examples. An example of passive learning is the Rote classifier, which remembers the entire training set and only classifies when a test example exactly matches a certain training example. The obvious flaw of this classification algorithm is that it often happens that the test examples cannot be classified because none of the training examples match them.

nearest neighbor classifier

A slight improvement to the Rote classifier to make it more flexible is to find all training examples that are close to the properties of the test example. These training examples are called nearest neighbors and can be used to determine the test examples. class label. This is the same principle as "like things gather, people are divided into groups". The nearest neighbor classifier treats each training example as d a point in dimensional space where d is the number of attributes, given a test case with ,calculate with Proximity to each training example, find the closest k a training example, k The class label that occurs most frequently in the training examples is assigned to with

Obviously, here k The value of has a great influence on the classification results: if k is too small, and the nearest neighbor classifier is susceptible to overfitting due to noise in the training examples; if k Too large and the nearest neighbor classifier is prone to misclassification because the nearest neighbor list may contain data points that are far away from the nearest neighbors.

to reduce k The impact of the choice of , one way is to use xi The difference in distance weights its effect: wi=1d(xi, with)2 , so that the nearest neighbors with farther distances have relatively little influence on the classification results.

Pros and cons of nearest neighbor classifiers

Advantages of Nearest Neighbor Classifiers

  1. There is no need to build a model for the training set.
  2. Nearest neighbor classifiers can generate decision boundaries of any shape.

Disadvantages of nearest neighbor classifiers

  1. susceptible to noise.
  2. Often the training set needs to be preprocessed before it can be used.
  3. Each classification takes a long time.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325923463&siteId=291194637