-KNN machine learning classification algorithm (under)

-KNN machine learning classification algorithm (on) before writing a python KNN basic concepts and basic implementation, and no model is good or bad judgment, then use the training data set and test data set to determine (using accurcay classification index)
(a) dividing the data set custom method data_shuffer.py

import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt

iris = datasets.load_iris()
X = iris.data
y = iris.target
print(X.shape)
# (150, 4)
---------方法1---------
# 使用concatenate函数进行拼接,因为传入的矩阵必须具有相同的形状。
# 因此需要对label进行reshape操作,axis=1表示纵向拼接。
tempConcat = np.concatenate((X, y.reshape(-1,1)), axis=1)
#  拼接好后,直接进行乱序操作
np.random.shuffle(tempConcat)
#  再将shuffle后的数组使用split方法拆分
shuffle_X,shuffle_y = np.split(tempConcat, [4], axis=1)
#  设置划分的比例
test_ratio = 0.2
test_size = int(len(X) * test_ratio)
X_train = shuffle_X[test_size:]
y_train = shuffle_y[test_size:]
X_test = shuffle_X[:test_size]
y_test = shuffle_y[:test_size]
---------方法2---------
# 将x长度这么多的数,返回一个新的打乱顺序的数组,
注意,数组中的元素不是原来的数据,而是混乱的索引
shuffle_index = np.random.permutation(len(X))
# 指定测试数据的比例
test_ratio = 0.2
test_size = int(len(X) * test_ratio)
test_index = shuffle_index[:test_size]
train_index = shuffle_index[test_size:]
X_train = X[train_index]
X_test = X[test_index]
y_train = y[train_index]
y_test = y[test_index]
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(B) the classification accuracy specifications call sklearn

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier

digits = datasets.load_digits()
X = digits.data
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=666)
knn_clf = KNeighborsClassifier(n_neighbors=3)
knn_clf.fit(X_train, y_train)
y_predict = knn_clf.predict(X_test)
print(accuracy_score(y_test, y_predict))

(C) looking for good value k

from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn import datasets
digits = datasets.load_digits()
X = digits.data
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=666)

# 指定最佳值的分数,初始化为0.0;设置最佳值k,初始值为-1
best_score = 0.0
best_k = -1
for k in range(1, 11):  # 暂且设定到1~11的范围内
    knn_clf = KNeighborsClassifier(n_neighbors=k)
    knn_clf.fit(X_train, y_train)
    score = knn_clf.score(X_test, y_test)
    if score > best_score:
        best_k = k
        best_score = score
print("best_k = ", best_k)
print("best_score = ", best_score)

The relevant code has been uploaded GitHub , learning to use for reference purposes only.

Published 118 original articles · won praise 25 · Views 150,000 +

Guess you like

Origin blog.csdn.net/lhxsir/article/details/103113093