Machine Learning (3) of the KNN algorithm

@

Algorithm theory

  • K-nearest neighbor (K-nearst neighbors, KNN) is a basic machine learning algorithm, called k-nearest neighbor, k nearest neighbor is the meaning of that is that each sample can use its closest neighbors to represent k .
  • KNN doing regression and classification differs between the last forecast made when the decision-making. When KNN classification prediction, the general majority voting method ; in doing regression prediction, the general average method .

Algorithm steps

  1. Get the K most recent sample data from the sample to be predicted from the training set;
  2. To predict the current sample to be predicted according to the target attribute value K samples obtained by the data acquisition.

KNN three elements

  1. Selecting K value : K value for selecting, according to the sample distribution generally selects a smaller value, and to select a more appropriate value by the final cross validation; K when selecting a relatively small value, the indication field is small the prediction samples, training error decreases, but will cause the model to become complicated, easy overfitting; K when selecting the larger value, the larger the sample indication field prediction, training error will increase large, while the model will become simple, easily lead underfitting;
  2. Measurement distance : Usually Euclidean distance (Euclidean distance);
  3. Decision rule : In the classification model, the main use of majority voting method or weighted majority voting method; in the regression model, the main method using the average value or weighted average method.

KNN algorithm implementation

  1. Brute force implement (Brute) : calculated distances to all the prediction samples of the training set, and then selects the smallest k distances to the K nearest neighbors obtained. Disadvantage in that when more characteristic number, the number of samples more time, the efficiency of the algorithm is relatively low;
  2. KD-tree (kd_tree) : KD-tree algorithm, training data is first modeled, KD constructed tree, then the sample data acquired near the model-built.

Construction of the KD Tree

KD-tree using the n m-dimensional feature samples, the characteristic values of n are calculated variance, the maximum variance
of the k-dimensional feature n ~ k ~ as the root node. For this feature, the median value selected n ~ kv ~ division point as a sample, the sample is less than the value for the left divided sub-tree, for greater than or equal to the value of the sample divided into right subtree, the left and right subtrees the same way to find the maximum variance as the root node features, can produce KD recursive tree.

KD tree to find the nearest neighbor

When we generate KD tree, it can point to predict the target sample test set inside. For a project
punctuation, we first find the leaf nodes contain the target point in the KD tree inside. The target point as the center, to
the target sample point to the instance of the leaf nodes from a radius, to give a hypersphere, nearest neighbor a certain point in
the interior of the hypersphere. Then return the leaf node's parent, hyperrectangle check another child node contains the
body and whether hypersphere intersection if the intersection on to the child node to find out whether there is a more recent neighbor, any
updates nearest neighbor. If you do not intersect simple, we direct return of the parent's parent, the other
continues to search for the nearest neighbor sub-tree. When back to the root node, the algorithm ends, then saved the nearest neighbor
is the ultimate nearest neighbor.

Guess you like

Origin www.cnblogs.com/tankeyin/p/12129363.html