Machine learning portal (a) of the K-nearest neighbor (KNN algorithm)

k-nearest neighbor algorithm can be described as a machine learning is the most basic, a simple algorithm, and it can be considered to be a need to construct models only need to train a machine learning algorithm can be set through the calculation processing of the outcome. So, KNN algorithm in the end is how to achieve it?

First, we have to determine the nature of the tumor as an example, if we get such a data training set:
raw_data_x = ([
[3.39353321, 2.33127338],
[3.21137674, 1.72371642],
[1.34657826, 3.34687858],
[3.58467832, 4.66841268],
[2.23684678, 2.86473684 ],
[7.45723474, 4.68964178],
[5.76727424, 3.46826478],
[9.13414325, 2.57938724],
[7.64781628, 3.46783268],
[7.93974896, 0.79486123]
])

Each submatrix representing different features of the patient's case, the first value of tumor size, tumor was found for the second time

These patients we used nature of the tumor to represent 0 and 1, 0 for positive, 1 for malignant. So have:

raw_data_y = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]

Then we can draw a scatter plot :( red dot malignant, benign Green Point)

Here Insert Picture Description

Now we want to determine with kNN algorithm
exam_x = np.array ([8.09632767, 3.64782264 ])
nature of the tumor patients with this set of data.

We define a k = 3, then treated determination point (indicated with a blue dot) in the end part of the determination or the red dot green dot, we are on the area around the point nearest (k) is determined 3 points:

Here Insert Picture Description

Then we calculated the distance to the point to be detected and neighboring points, due to the characteristics of the training set is n, the situation may occur that is multi-dimensional, so here we calculate the Euclidean distance:

Here Insert Picture Description

Finally, just look at it and most similar point up to which category of these samples, then this point would most likely point of the corresponding category.

Written over the course of our program can be expressed as:

import numpy as np
import matplotlib.pyplot as plt
from math import sqrt
from collections import Counter

raw_data_x = ([[3.39353321, 2.33127338],
               [3.21137674, 1.72371642],
               [1.34657826, 3.34687858],
               [3.58467832, 4.66841268],
               [2.23684678, 2.86473684],
               [7.45723474, 4.68964178],
               [5.76727424, 3.46826478],
               [9.13414325, 2.57938724],
               [7.64781628, 3.46783268],
               [7.93974896, 0.79486123]])
raw_data_y = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]

x = np.array(raw_data_x)
y = np.array(raw_data_y)
exam_x = np.array([8.09632767, 3.64782264])

plt.scatter(x[y == 0, 0], x[y == 0, 1],  color="g")
plt.scatter(x[y == 1, 0], x[y == 1, 1],  color="r")
plt.scatter(exam_x[0], exam_x[1], color="b")

k = 6
distances = [(sqrt(np.sum(x - exam_x) ** 2)) for x in x]
nearest = np.argsort(distances)
top_y = [y[i] for i in nearest[:k]]
votes = Counter(top_y)

predict_y = votes.most_common(1)[0][0]
print(predict_y)
plt.show()

In python, there have been third-party libraries to help us nicely encapsulated the KNN algorithm, and used the strict form of object-oriented programming

Such as the example above, we can write:

from sklearn.neighbors import KNeighborClassifier
kNN_classifier=KNeighborClassifier(n_neighbors=6)
kNN_classifier.fit(x,y)
x_predict=exam_x.reshape(1,-1)
y_predict=kNN_classifier.predict(x_predict)
print(y_predict)


We can get the final y_predict value 1, that is, we calculated the kNN algorithm to be determined by the nature of the tumor patients are more likely to be malignant.

Readers want to be helpful, like it can look at my public, then I will study notes made in the above, we will be able to learn together!
Here Insert Picture Description
Alt

Published 19 original articles · won praise 12 · views 6107

Guess you like

Origin blog.csdn.net/Rosen_er/article/details/104266430