1.KNN algorithm as the classification algorithm, also known as k-nearest neighbor algorithm.
The core idea 2.KNN algorithm is a new sample in the feature space, k-nearest sample most of a class, then this sample also fall into this category.
Here we calculate the distance between samples using the Euler's formula.
import math
import numpy as np
from sklearn import datasets
import matplotlib.pyplot as plt
raw_data_X = [[3.393533211, 2.331273381],
[3.110073483, 1.781539638],
[1.343808831, 3.368360954],
[3.582294042, 4.679179110],
[2.280362439, 2.866990263],
[7.423436942, 4.696522875],
[5.745051997, 3.533989803],
[9.172168622, 2.511101045],
[7.792783481, 3.424088941],
[7.939820817, 0.791637231]
]
raw_data_y = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
X_train = np.array(raw_data_X)
y_train = np.array(raw_data_y)
First, we will establish the data, then convert the data into a numpy array.
plt.scatter(X_train[y_train == 0,0],X_train[y_train == 0,1],color = 'g')
plt.scatter(X_train[y_train == 1,0],X_train[y_train == 1,1],color = 'r')
plt.show()
Then use matplotlib draw the corresponding scatter plots.
Then, insert a new point.
x = np.array([8.093607318, 3.365731514])
plt.scatter(X_train[y_train == 0,0],X_train[y_train == 0,1],color = 'g')
plt.scatter(X_train[y_train == 1,0],X_train[y_train == 1,1],color = 'r')
plt.scatter(x[0],x[1],color = 'b')
plt.show()
Using the Euclidean distance equation, the new point is determined, what category.
Our most recent six points from group to determine the basis.
distances = []
for x_train in X_train:
d = sqrt(np.sum((x_train - x)**2))
distances.append(d)
nearest = np.argsort(distance)
c = [y_train[k] for k in nearest[:6]]
C is output:
found, the new data points determined to 1.
If the data points, it may also be utilized collections.
from collections import Counter
vote = Counter(c)
vote.most_common()
Output