sklearn.neighbors nearest neighbors

First, there are five main types of nearest neighbor models:

1. k-nearest neighbor model

neighbors. KNeighborsClassifier k nearest neighbor classification

neighbors. KNeighborsRegressor k nearest neighbor regression

2. R nearest neighbor model

neighbors. RadiusNeighborsClassifier R

neighbors.RadiusNeighborsRegressor R nearest neighbor regression

3. The nearest centroid classification model

neighbors.NearestCentroid

4. Kernel Density Model

neighbors.KernelDensity

5. LOF Unsupervised Outlier Detection

neighbors.LocalOutlierFactor

In addition to this, there are base classes that support these models

1. neighbors. NearestNeighbors implements unsupervised learning of nearest neighbor search

2. neighbors. BallTree balltree tree data structure

3. neighbors. KDTree KDtree tree data structure

4. neighbors. DistanceMetric distance metric

5. neighbors. kneighbors_graph k nearest neighbor matrix graph

6. neighbors. radius_neighbors_graph R nearest neighbor matrix graph

2. The k-nearest neighbor model

1. k nearest neighbors classification neighbors.KNeighborsClassifier

Classify according to the majority vote of k-nearest neighbors

Model parameters :　

　　　　n_neighbors : int, optional (default = 5)

　　　　　　k value

　　　　weights : str or callable, optional (default = ‘uniform’)

　　　　　　Weighted contribution of k-nearest neighbors

　　　　　　● 'uniform' : all neighbors have the same weight

　　　　　　● 'distance': The weight is the inverse of the distance. The closer the point is, the greater the weight.

　　　　　　● [callable] : custom functions can be called

　　　　algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional

　　　　　　Algorithm for Computing Nearest Neighbors

　　　　　　● 'ball_tree' uses the BallTree class data structure

　　　　　　● 'kd_tree' uses the data structure of the KDTree class

　　　　　　● 'brute' uses brute force search

　　　　　　● 'auto' automatically selects the appropriate algorithm

　　　　leaf_size : int, optional (default = 30)

　　　　　　Leaf_size parameter will be passed to BallTree class or KDTree class. This affects the speed of building data structures and queries, as well as the memory size of the stored numbers. The optimal value depends on the actual problem. Usually the larger the Leaf_size, the more the data structure is created

　　　　　　The faster it is, the query will be slower; on the contrary, the smaller the Leaf_size, the slower the data structure will be created, but the query will be faster. If it is a brute force attack, this parameter does not need to be set.

　　　　p : integer, optional (default = 2)

　　　　　　The p-value for the Minkowski distance metric. Other distance metrics may not need to set this parameter. When Minkowski's p=2, it is equivalent to Euclidean distance.

　　　　metric : string or callable, default 'minkowski'

　　　　　　The distance metric used to compute nearest neighbors. E.g

　　　　　　● 'euclidean' Euclidean distance

　　　　　　● 'manhattan' Manhattan distance

　　　　　　For details, refer to neighbors.DistancsMetric

　　　　metric_params : dict, optional (default = None)

　　　　　　The parameters required for the distance metric used. Parameters are passed as a dictionary.

　　　　n_jobs : int, optional (default = 1)

　　　　　　The number of parallel jobs to run for the nearest neighbor search. If -1, the number of parallel jobs is set to the number of CPU cores. Does not affect the fit method.

Model method :

fit ( X , y ) fit the data, learn the model

参数： X : {array-like, sparse matrix, BallTree, KDTree}

　　　　y : {array-like, sparse matrix}

get_params ( deep=True ) get the setting parameters of the model

Parameters: deep : boolean, optional

　　　　　　If True, the parameters of the model and the sub-object model will be returned

kneighbors(X=None, n_neighbors=None, return_distance=True)

Find the nearest neighbors of a point or set of points. What is returned is the distance between the nearest neighbor of each point and the point and the index directory of the nearest neighbor.

参数：　X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’

　　　　　　The point to query. If not provided, it is the set of points learned when the fit method is used. In this case, the query point is not considered to be its own neighbor.

　　　　n_neighbors : int

　　　　　　k value (defaults to the k value of the model)

　　　　return_distance : boolean, optional. Defaults to True.

　　　　　　If False, no distance will be returned.

Returns:　dist : array

　　　　　　The distance between the nearest neighbor and the query point. Returned only if return_distance=True.

　　　　ind : array

　　　　　　Nearest neighbor index directory.

kneighbors_graph(X=None, n_neighbors=None, mode='connectivity')

k-nearest neighbor matrix graph

参数：　X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’

　　　　　　The point to query. If not provided, it is the set of points learned when the fit method is used. In this case, the query point is not considered to be its own neighbor.

　　　　n_neighbors : int

　　　　　　k value (defaults to the k value of the model)

　　　　mode : {‘connectivity’, ‘distance’}, optional

　　　　　　Returns the type of the matrix.

　　　　　　● 'connectivity' : returns a 0,1 matrix

　　　　　　● 'distance': returns the Euclidean distance matrix

返回：　A : sparse matrix in CSR format, shape = [n_samples, n_samples_fit]

　　　　　　CSR format. You need to call the toarray method to see the matrix diagram directly. Each row of the matrix represents each example of the method, and each column represents the relationship between each example in the model training sample and the method sample (if mode

　　　　　　is 'connectivity', then 1 represents the nearest neighbor; if mode is 'distance', the number represents the Euclidean distance of the nearest neighbor)

predict(X)

Predict the class of unknown data.

参数：　X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’

　　　　　　Test Data

返回：　y : array of shape [n_samples] or [n_samples, n_outputs]

　　　　　　Predicted class for test data

predict_proba(X)

Returns the probability that the unknown data belongs to each class

参数：　X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’

　　　　　　Test Data

返回：　p : array of shape = [n_samples, n_classes], or a list of n_outputs

　　　　　　Each row represents each example of the test data, and each column represents the probability of a class.

score(X, y, sample_weight=None)

Returns the accuracy of the model's predictions on the test data

参数：　X : array-like, shape = (n_samples, n_features)

　　　　　　Test Data

　　　　y : array-like, shape = (n_samples) or (n_samples, n_outputs)

　　　　　　True category for X

　　　　sample_weight : array-like, shape = [n_samples], optional

　　　　　　sample weight

Returns:　score : float

　　　　　　Precision value

set_params(**params)

You can modify the parameter values of the model by passing in a dictionary.

2. K nearest neighbor regression neighbors.KNeighborsRegressor

Regression based on k-nearest neighbors. The predicted value is obtained from the average of the k-nearest neighbors.

The parameters and methods can refer to the parameters and methods of k-nearest neighbor classification, which are basically the same. The differences are:

1. Removed the method predict_proba ( X )

2. score ( X , y , sample_weight=None ) returns not the precision, but the sample determination coefficient:

1-u/v, where u is the residual sum of squares and v is the sum of squares of the difference between the true value and the mean of the true value. The best score is 1, and the sample coefficient of determination can be negative.