(上)-KNN機械学習分類アルゴリズム

K最近傍(K-NearestNeighbor)

:最近傍アルゴリズム、機械学習アルゴリズムをk個と、最も簡単な方法は、以下のプロセスである
(1)検査対象の各オブジェクトのトレーニングセットから計算される
からの距離に応じて(2)選別
(3)現在の最新のk個のオブジェクトとテストオブジェクトを選択します検査対象の隣人として、
(4)統計カテゴリ周波数K隣接
最高周波数カテゴリー(5)k個の近隣、検査対象のカテゴリであります

Pythonのコードの実装

1、自己実現KNNアルゴリズム

import numpy as np
from math import sqrt
from collections import Counter

# 定义分类器
class kNNClassifier:

    def __init__(self, k):
        """初始化分类器"""
        assert k >= 1, "k must be valid"
        self.k = k
        self._X_train = None
        self._y_train = None

    def fit(self, X_train, y_train):
        """根据训练数据集X_train和y_train训练kNN分类器"""
        assert X_train.shape[0] == y_train.shape[0], \
            "the size of X_train must be equal to the size of y_train"
        assert self.k <= X_train.shape[0], \
            "the size of X_train must be at least k"
        self._X_train = X_train
        self._y_train = y_train
        return self

    def predict(self,X_predict):
        """给定待预测数据集X_predict,返回表示X_predict结果的向量"""
        assert self._X_train is not None and self._y_train is not None, \
            "must fit before predict!"
        assert X_predict.shape[1] == self._X_train.shape[1], \
            "the feature number of X_predict must be equal to X_train"
        y_predict = [self._predict(x) for x in X_predict]
        return np.array(y_predict)

    def _predict(self, x):
        distances = [sqrt(np.sum((x_train - x) ** 2)) for x_train in self._X_train]
        nearest = np.argsort(distances)
        topK_y = [self._y_train[i] for i in nearest]
        votes = Counter(topK_y)
        return votes.most_common(1)[0][0]

    def score(self, X_test, y_test):
        """根据X_test进行预测, 给出预测的真值y_test,计算预测模型的准确度"""
        y_predict = self.predict(X_test)
        return self.accuracy_score(y_test, y_predict)

    def accuracy_score(y_true, y_predict):
        """计算y_true和y_predict之间的准确率"""
        assert y_true.shape[0] != y_predict.shape[0], \
            "the size of y_true must be equal to the size of y_predict"
        return sum(y_true == y_predict) / len(y_true)

    def __repr__(self):
        return "kNN(k=%d)" % self.k

raw_data_X = [[3.393533211, 2.331273381],
              [3.110073483, 1.781539638],
              [1.343853454, 3.368312451],
              [3.582294121, 4.679917921],
              [2.280362211, 2.866990212],
              [7.423436752, 4.685324231],
              [5.745231231, 3.532131321],
              [9.172112222, 2.511113104],
              [7.927841231, 3.421455345],
              [7.939831414, 0.791631213]
             ]
raw_data_y = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
# 设置训练组
trainX = np.array(raw_data_X)
trainY = np.array(raw_data_y)
# 预测数据
x1 = np.array([8.093607318,3.365731514])

knn_clf = kNNClassifier(k=6)
knn_clf.fit(trainX, trainY)
predict_X = x1.reshape(1,-1)
predict_Y = knn_clf.predict(predict_X)
print(predict_Y)

2、コールsklearnアルゴリズムライブラリ

import numpy as np
from sklearn.neighbors import KNeighborsClassifier

raw_data_X = [[3.393533211, 2.331273381],
              [3.110073483, 1.781539638],
              [1.343853454, 3.368312451],
              [3.582294121, 4.679917921],
              [2.280362211, 2.866990212],
              [7.423436752, 4.685324231],
              [5.745231231, 3.532131321],
              [9.172112222, 2.511113104],
              [7.927841231, 3.421455345],
              [7.939831414, 0.791631213]
             ]
raw_data_y = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1] # 设置训练组
X_train = np.array(raw_data_X)
y_train = np.array(raw_data_y) # 将数据可视化

x=np.array([8.093607318,3.365731514])

# 创建kNN_classifier实例
kNN_classifier = KNeighborsClassifier(n_neighbors=6)
# kNN_classifier做一遍fit(拟合)的过程,没有返回值,模型就存储在kNN_classifier实例中
kNN_classifier.fit(X_train, y_train)
# kNN进行预测predict,需要传入一个矩阵,而不能是一个数组
y_predict = kNN_classifier.predict(x.reshape(1,-1))
print(y_predict)

関連するコードは、アップロードされたのGitHubを参照のみを目的として使用することを学んで、。

問題に遭遇:
1、ユークリッド距離
2、random_state = 20の効果
3、変形(1、-1)意味
4、** 2の電力を表します

公開された118元の記事 ウォン称賛25 ビュー15万+

おすすめ

転載: blog.csdn.net/lhxsir/article/details/102999579