sklearn.neighbors nearest neighbors
First, there are five main types of nearest neighbor models:
1. k-nearest neighbor model
neighbors. KNeighborsClassifier k nearest neighbor classification
neighbors. KNeighborsRegressor k nearest neighbor regression
2. R nearest neighbor model
neighbors. RadiusNeighborsClassifier R
neighbors.RadiusNeighborsRegressor R nearest neighbor regression
3. The nearest centroid classification model
neighbors.NearestCentroid
4. Kernel Density Model
neighbors.KernelDensity
5. LOF Unsupervised Outlier Detection
neighbors.LocalOutlierFactor
In addition to this, there are base classes that support these models
1. neighbors. NearestNeighbors implements unsupervised learning of nearest neighbor search
2. neighbors. BallTree balltree tree data structure
3. neighbors. KDTree KDtree tree data structure
4. neighbors. DistanceMetric distance metric
5. neighbors. kneighbors_graph k nearest neighbor matrix graph
6. neighbors. radius_neighbors_graph R nearest neighbor matrix graph
2. The k-nearest neighbor model
1. k nearest neighbors classification neighbors.KNeighborsClassifier
Classify according to the majority vote of k-nearest neighbors
Model parameters :
n_neighbors : int, optional (default = 5)
k value
weights : str or callable, optional (default = ‘uniform’)
Weighted contribution of k-nearest neighbors
● 'uniform' : all neighbors have the same weight
● 'distance': The weight is the inverse of the distance. The closer the point is, the greater the weight.
● [callable] : custom functions can be called
algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional
Algorithm for Computing Nearest Neighbors
● 'ball_tree' uses the BallTree class data structure
● 'kd_tree' uses the data structure of the KDTree class
● 'brute' uses brute force search
● 'auto' automatically selects the appropriate algorithm
leaf_size : int, optional (default = 30)
Leaf_size parameter will be passed to BallTree class or KDTree class. This affects the speed of building data structures and queries, as well as the memory size of the stored numbers. The optimal value depends on the actual problem. Usually the larger the Leaf_size, the more the data structure is created
The faster it is, the query will be slower; on the contrary, the smaller the Leaf_size, the slower the data structure will be created, but the query will be faster. If it is a brute force attack, this parameter does not need to be set.
p : integer, optional (default = 2)
The p-value for the Minkowski distance metric. Other distance metrics may not need to set this parameter. When Minkowski's p=2, it is equivalent to Euclidean distance.
metric : string or callable, default 'minkowski'
The distance metric used to compute nearest neighbors. E.g
● 'euclidean' Euclidean distance
● 'manhattan' Manhattan distance
For details, refer to neighbors.DistancsMetric
metric_params : dict, optional (default = None)
The parameters required for the distance metric used. Parameters are passed as a dictionary.
n_jobs : int, optional (default = 1)
The number of parallel jobs to run for the nearest neighbor search. If -1, the number of parallel jobs is set to the number of CPU cores. Does not affect the fit method.
Model method :
fit ( X , y ) fit the data, learn the model
参数: X : {array-like, sparse matrix, BallTree, KDTree}
y : {array-like, sparse matrix}
get_params ( deep=True ) get the setting parameters of the model
Parameters: deep : boolean, optional
If True, the parameters of the model and the sub-object model will be returned
kneighbors(X=None, n_neighbors=None, return_distance=True)
Find the nearest neighbors of a point or set of points. What is returned is the distance between the nearest neighbor of each point and the point and the index directory of the nearest neighbor.
参数: X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’
The point to query. If not provided, it is the set of points learned when the fit method is used. In this case, the query point is not considered to be its own neighbor.
n_neighbors : int
k value (defaults to the k value of the model)
return_distance : boolean, optional. Defaults to True.
If False, no distance will be returned.
Returns: dist : array
The distance between the nearest neighbor and the query point. Returned only if return_distance=True.
ind : array
Nearest neighbor index directory.
kneighbors_graph(X=None, n_neighbors=None, mode='connectivity')
k-nearest neighbor matrix graph
参数: X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’
The point to query. If not provided, it is the set of points learned when the fit method is used. In this case, the query point is not considered to be its own neighbor.
n_neighbors : int
k value (defaults to the k value of the model)
mode : {‘connectivity’, ‘distance’}, optional
Returns the type of the matrix.
● 'connectivity' : returns a 0,1 matrix
● 'distance': returns the Euclidean distance matrix
返回: A : sparse matrix in CSR format, shape = [n_samples, n_samples_fit]
CSR format. You need to call the toarray method to see the matrix diagram directly. Each row of the matrix represents each example of the method, and each column represents the relationship between each example in the model training sample and the method sample (if mode
is 'connectivity', then 1 represents the nearest neighbor; if mode is 'distance', the number represents the Euclidean distance of the nearest neighbor)
predict(X)
Predict the class of unknown data.
参数: X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’
Test Data
返回: y : array of shape [n_samples] or [n_samples, n_outputs]
Predicted class for test data
predict_proba(X)
Returns the probability that the unknown data belongs to each class
参数: X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’
Test Data
返回: p : array of shape = [n_samples, n_classes], or a list of n_outputs
Each row represents each example of the test data, and each column represents the probability of a class.
score(X, y, sample_weight=None)
Returns the accuracy of the model's predictions on the test data
参数: X : array-like, shape = (n_samples, n_features)
Test Data
y : array-like, shape = (n_samples) or (n_samples, n_outputs)
True category for X
sample_weight : array-like, shape = [n_samples], optional
sample weight
Returns: score : float
Precision value
set_params(**params)
You can modify the parameter values of the model by passing in a dictionary.
2. K nearest neighbor regression neighbors.KNeighborsRegressor
Regression based on k-nearest neighbors. The predicted value is obtained from the average of the k-nearest neighbors.
The parameters and methods can refer to the parameters and methods of k-nearest neighbor classification, which are basically the same. The differences are:
1. Removed the method predict_proba ( X )
2. score ( X , y , sample_weight=None ) returns not the precision, but the sample determination coefficient:
1-u/v, where u is the residual sum of squares and v is the sum of squares of the difference between the true value and the mean of the true value. The best score is 1, and the sample coefficient of determination can be negative.