K- nearest neighbor algorithm machine learning (KNN)

Outline

KNN classification is performed by measuring the distance between the different feature values. The so-called K-nearest neighbor, k nearest neighbor is the meaning of that is that each sample can use its closest neighbors k to represent. KNN algorithm is no way to predict hundred percent, increase data dimensions can improve the prediction performance.

k- nearest neighbor algorithm principle

Advantages: high accuracy, is insensitive to outliers, the input data is assumed that no
disadvantages: high time complexity O (n), the spatial high complexity
applicable range data: numeric and nominal type

KNN simple application

from sklearn.neighbors import KNeighborsClassifier #knn fenleiqi

x_train=[[182,78,45],[162,48,40],[187,50,41],
         [178,65,44],[158,50,38],[185,67,44]]
y_train=['男','女','男','男','女','男']

knn=KNeighborsClassifier(n_neighbors=3)#KNN 对象
knn.fit(x_train,y_train)#训练数据,建立自适应模型

Test_data=[[182,75,44],[159,49,40]]
knn.predict(Test_data) #预测分类

KNN classification (sklearn) and regression

Regression line fitted to the data points, the best fit is the minimum residual

import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier #knn fenleiqi
from sklearn.neighbors import KNeighborsRegressor
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import datasets

iris = datasets.load_iris()
x=iris.data[:,:2]
y=iris.target
print(x)
print(y)
# K=15,计算周围邻近的15个点
K=15
# 图片x,y轴的歩长
h=0.02
cmap_light=ListedColormap(["#FF0FFF","#0FFFFF","#00F0FF"])
cmap_bold=ListedColormap(["#FF0000","#00FF00","#0000FF"])
myknn=KNeighborsClassifier(n_neighbors=K)
myknn.fit(x,y)

#确定范围
xmin,xmax=x[:,0].min()-1,x[:,0].max()-1
ymin,ymax=x[:,1].min()-1,x[:,1].max()-1
#网格化
xx,yy=np.meshgrid(np.arange(xmin,xmax,h),
                  np.arange(ymin,ymax,h))
#预测
z=myknn.predict(np.c_[xx.ravel(),yy.ravel()])
z=z.reshape(xx.shape)

#背景颜色
plt.pcolormesh(xx,yy,z,cmap=cmap_light)

#画散点图
plt.scatter(x[:,0],x[:,1],c=y,cmap=cmap_bold)
plt.xlim(xx.min(),xx.max())
plt.ylim(yy.min(),yy.max())
plt.title("fenlei")
# plt.show()
plt.savefig("show.png")


# knn 回归
np.random.seed(0)
x = np.sort(5*np.random.rand(40,1),axis=0)
y=np.sin(x).ravel()

y[::5]+=1*(0.5-np.random.rand(8))

T=np.linspace(0,5,100)[:,np.newaxis]
# print(T)

knn=KNeighborsRegressor(n_neighbors=5)
knn.fit(x,y)
newy=knn.predict(T)

plt.scatter(x,y,c="k",label="data")
plt.plot(T,newy,c="b",label="predict")
plt.legend()
plt.savefig("show.png")

Save model

from sklearn.externals import joblib
joblib.dump(knn,r"path/1.m") #保存模型
mynewknn=joblib.load(r"path/1.m") #加载模型

to sum up

  KNN uneven sample data for the result is bad, needs to be improved. K nearest neighbors of the data improved method of imparting Quanshi weight, such as the closer the test sample, the greater the weight the weight. KNN is time consuming, the time complexity is O (n), is generally applicable to a small number of sample data sets, when a large amount of data, the data can be presented in the form of a tree, to increase the speed, commonly used kd-tree and ball-tree.

Guess you like

Origin www.cnblogs.com/focusTech/p/12299749.html