Outline
KNN classification is performed by measuring the distance between the different feature values. The so-called K-nearest neighbor, k nearest neighbor is the meaning of that is that each sample can use its closest neighbors k to represent. KNN algorithm is no way to predict hundred percent, increase data dimensions can improve the prediction performance.
k- nearest neighbor algorithm principle
Advantages: high accuracy, is insensitive to outliers, the input data is assumed that no
disadvantages: high time complexity O (n), the spatial high complexity
applicable range data: numeric and nominal type
KNN simple application
from sklearn.neighbors import KNeighborsClassifier #knn fenleiqi
x_train=[[182,78,45],[162,48,40],[187,50,41],
[178,65,44],[158,50,38],[185,67,44]]
y_train=['男','女','男','男','女','男']
knn=KNeighborsClassifier(n_neighbors=3)#KNN 对象
knn.fit(x_train,y_train)#训练数据,建立自适应模型
Test_data=[[182,75,44],[159,49,40]]
knn.predict(Test_data) #预测分类
KNN classification (sklearn) and regression
Regression line fitted to the data points, the best fit is the minimum residual
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier #knn fenleiqi
from sklearn.neighbors import KNeighborsRegressor
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import datasets
iris = datasets.load_iris()
x=iris.data[:,:2]
y=iris.target
print(x)
print(y)
# K=15,计算周围邻近的15个点
K=15
# 图片x,y轴的歩长
h=0.02
cmap_light=ListedColormap(["#FF0FFF","#0FFFFF","#00F0FF"])
cmap_bold=ListedColormap(["#FF0000","#00FF00","#0000FF"])
myknn=KNeighborsClassifier(n_neighbors=K)
myknn.fit(x,y)
#确定范围
xmin,xmax=x[:,0].min()-1,x[:,0].max()-1
ymin,ymax=x[:,1].min()-1,x[:,1].max()-1
#网格化
xx,yy=np.meshgrid(np.arange(xmin,xmax,h),
np.arange(ymin,ymax,h))
#预测
z=myknn.predict(np.c_[xx.ravel(),yy.ravel()])
z=z.reshape(xx.shape)
#背景颜色
plt.pcolormesh(xx,yy,z,cmap=cmap_light)
#画散点图
plt.scatter(x[:,0],x[:,1],c=y,cmap=cmap_bold)
plt.xlim(xx.min(),xx.max())
plt.ylim(yy.min(),yy.max())
plt.title("fenlei")
# plt.show()
plt.savefig("show.png")
# knn 回归
np.random.seed(0)
x = np.sort(5*np.random.rand(40,1),axis=0)
y=np.sin(x).ravel()
y[::5]+=1*(0.5-np.random.rand(8))
T=np.linspace(0,5,100)[:,np.newaxis]
# print(T)
knn=KNeighborsRegressor(n_neighbors=5)
knn.fit(x,y)
newy=knn.predict(T)
plt.scatter(x,y,c="k",label="data")
plt.plot(T,newy,c="b",label="predict")
plt.legend()
plt.savefig("show.png")
Save model
from sklearn.externals import joblib
joblib.dump(knn,r"path/1.m") #保存模型
mynewknn=joblib.load(r"path/1.m") #加载模型
to sum up
KNN uneven sample data for the result is bad, needs to be improved. K nearest neighbors of the data improved method of imparting Quanshi weight, such as the closer the test sample, the greater the weight the weight. KNN is time consuming, the time complexity is O (n), is generally applicable to a small number of sample data sets, when a large amount of data, the data can be presented in the form of a tree, to increase the speed, commonly used kd-tree and ball-tree.