1. 学习目标

2. 重要知识点

3. 拓展练习题

学习目标

了解 K-近邻分类器的理论
掌握 K-近邻分类器的使用
了解 K-近邻分类器的注意事项及解决方法

重要知识点

K-近邻分类器的理论

A Complete Guide to K-Nearest Neighbors with Applications in Python and R by Kevin Zakka's Blog

K-近邻算法

适用条件：在 predictor space 里面相近的两个点的 response 应当是相似的。

步骤：
1. 算出每个点到 target point 的距离并进行排序
2. 找出 k 个最近的点，称为 A 集合
3. 对于每一个 class j（我们把所有的 class / label统称为 C 集合），算出 P(y=j|X=x)
4. 把 P(y=j|X=x) 最大的 j 作为预测的 label
K-近邻分类器的使用

sklearn.neighbors.KNeighborsClassifier

扫描二维码关注公众号，回复： 5484709 查看本文章
K-近邻分类器的注意事项及解决方法
1. Scale
  
  解决办法：Normalize, Change Distance Metric
2. Skewed Class Distribution
  
  解决办法：Voted KNN
3. High Dimension
  
  解决办法：Dimension Reduction, Feature Selection
4. Very Long Testing Time
  
  解决办法：K－D Tree

拓展练习题

In [1]:

import numpy as np
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data[:, :2]
y = iris.target
X = X[y != 0, :2]
y = y[y != 0]
n_sample = len(X)
np.random.seed(0)
order = np.random.permutation(n_sample)
X = X[order]
y = y[order]
X_train = X[:int(.9 * n_sample)]
y_train = y[:int(.9 * n_sample)]
X_test = X[int(.9 * n_sample):]
y_test = y[int(.9 * n_sample):]

list_n_neighbors = [1, 5, 10, 15, 20]
print('y_test {}'.format(y_test))

y_test [2 1 1 2 1 2 2 2 1 1]

练习题 1

加载 Iris 数据集的前两维特征和后两类的数据，使用其中 90% 的数据训练 n_neighbors 分别为 1, 5, 10, 15, 20 的 K-近邻分类器，并对剩下的 10% 数据进行预测，输出预测结果

In [2]:

# your codes here

1 [1 2 1 2 1 1 2 2 2 2]
5 [2 2 1 2 1 2 2 1 2 1]
10 [2 2 1 2 1 2 2 1 1 1]
15 [2 2 2 2 1 2 2 1 2 1]
20 [2 2 2 2 1 1 2 1 2 1]

练习题 2

计算并输出上题中 5 个 K-近邻分类器的预测结果与 y_test 的准确率和混淆矩阵

In [3]:

# your codes here

n_neighbors: 1 accuracy: 0.5
[[2 3]
 [2 3]]
n_neighbors: 5 accuracy: 0.7
[[3 2]
 [1 4]]
n_neighbors: 10 accuracy: 0.8
[[4 1]
 [1 4]]
n_neighbors: 15 accuracy: 0.6
[[2 3]
 [1 4]]
n_neighbors: 20 accuracy: 0.5
[[2 3]
 [2 3]]

练习题 3

写一个KNN函数，要求：

输入 X, Y, X_test, n_neighbors
输出 y_test
距离函数使用 Euclidean Distance，当距离产生平局的时候，使用更靠前（行数更小）的数据

In [4]:

def KNN(X, y, X_test, n_neighbors=1):
    # your codes here
    return None


for n_neighbors in list_n_neighbors:
    y_pred = KNN(X_train, y_train, X_test, n_neighbors)
    print('{} {}'.format(n_neighbors, y_pred))

1 [1 2 1 2 1 1 2 2 2 2]
5 [2 2 1 2 1 2 2 1 1 1]
10 [2 2 1 2 1 2 2 1 1 1]
15 [2 2 2 2 1 2 2 1 2 1]
20 [2 2 2 2 1 1 2 1 2 1]

第8课大纲_科小神成长计划

1. 学习目标

2. 重要知识点

3. 拓展练习题

学习目标

重要知识点

拓展练习题

练习题 1

练习题 2

练习题 3

猜你喜欢