【机器学习】Radius Neighbors Classifier(rNN,radius nearest neighbors)


一、半径近邻分类器 - 简介

Radius Neighbors Classifier is a classification machine learning algorithm.
半径近邻分类器是一种分类机器学习算法。

It is an extension to the k-nearest neighbors algorithm that makes predictions using all examples in the radius of a new example rather than the k-closest neighbors.
它是对 k 近邻算法的一种扩展,该算法使用新示例的半径内的所有示例而不是 k 个最近邻进行预测。

As such, the radius-based approach to selecting neighbors is more appropriate for sparse data, preventing examples that are far away in the feature space from contributing to a prediction.
因此,基于半径的邻居选择方法更适合于稀疏数据,从而防止了远离特征空间的示例对预测的贡献。


二、半径近邻分类器算法

Radius Neighbors is a classification machine learning algorithm. It is based on the k-nearest neighbors algorithm, or kNN. kNN involves taking the entire training dataset and storing it. Then, at prediction time, the k-closest examples in the training dataset are located for each new example for which we want to predict. The mode (most common value) class label from the k neighbors is then assigned to the new example.

半径近邻是一种分类机器学习算法。它基于 k 近邻算法,即 kNN。kNN 包括获取整个训练数据集并将其存储。然后,在预测时,为我们要预测的每个新示例找到训练数据集中的k个最近的示例。然后将k个邻居的模式(最常见值)类标签分配给新示例。

The Radius Neighbors Classifier is similar in that training involves storing the entire training dataset. The way that the training dataset is used during prediction is different.
半径近邻分类器与KNN的相似之处在于训练涉及存储整个训练数据集。但是,在预测期间使用训练数据集的方式不同。

Instead of locating the k-neighbors, the Radius Neighbors Classifier locates all examples in the training dataset that are within a given radius of the new example. The radius neighbors are then used to make a prediction for the new example.
半径近邻分类器不是找到 k 个近邻,而是找到训练数据集中位于新示例的给定半径内的所有示例。然后使用半径邻居对新示例进行预测。

The radius is defined in the feature space and generally assumes that the input variables are numeric and scaled to the range 0-1, e.g. normalized.
半径是在特征空间中定义的,通常假设输入变量是数字的,并缩放到范围0-1,例如归一化。

The radius-based approach to locating neighbors is appropriate for those datasets where it is desirable for the contribution of neighbors to be proportional to the density of examples in the feature space.
基于半径的定位邻居的方法适用于那些希望邻居的贡献与特征空间中示例的密度成比例的数据集。

Given a fixed radius, dense regions of the feature space will contribute more information and sparse regions will contribute less information. It is this latter case that is most desirable and it prevents examples very far in feature space from the new example from contributing to the prediction.
给定一个固定半径,特征空间的密集区域将贡献更多信息,稀疏区域将贡献更少信息。半径近邻分类方法是最理想的,它防止了特征空间中距离新示例很远的示例对预测做出贡献。

As such, the Radius Neighbors Classifier may be more appropriate for prediction problems where there are sparse regions of the feature space.
因此,半径近邻分类器可能更适合于特征空间中稀疏区域的预测问题。

Given that the radius is fixed in all dimensions of the feature space, it will become less effective as the number of input features is increased, which causes examples in the feature space to spread further and further apart. This property is referred to as the curse of dimensionality.
假设半径在特征空间的所有维度上都是固定的,那么随着输入特征数量的增加,它将变得不那么有效,这会导致特征空间中的示例越来越分散。这一特性被称为维度的诅咒。


三、Radius Neighbors Classifier With Scikit-Learn

The Radius Neighbors Classifier is available in the scikit-learn Python machine learning library via the RadiusNeighborsClassifier class.


参考链接

  1. Radius Neighbors Classifier Algorithm With Python

猜你喜欢

转载自blog.csdn.net/weixin_44211968/article/details/128446738