＜＜Getting started with machine learning from zero＞＞ Encapsulate the code functionally in PyCharm and call it in JupyterNoteBook - take the KNN algorithm as an example (K nearest neighbor algorithm)

1. Main content of the article

This blog is dedicated to explaining the skills of functionally encapsulating code in PyCharm and calling it in JupyterNoteBook. Through the study of this blog, taking the KNN algorithm as an example, it is intended to master the skills of functional code in an easy-to-understand language. It takes about 5 minutes to read through this blog.
Note : The content of this blog is based on the machine learning course taught by teacher liuyubobobo, plus my own summary and thinking. For more detailed original videos, please search for teacher bobo's machine learning.

2. Functional encapsulation code

The previous article explained the KNN algorithm ( KNN algorithm blog address ), and implemented its core algorithm logic in jupyterNotebook through python language. But in order for us to call this algorithm logic conveniently in the future, we specially create a file with object-oriented thinking, and then encapsulate the KNN core algorithm code through the method in the file.

2.1 Define classes and functions (also called methods) in Pycharm

We can find carefully that the suffix of the newly created File in JupyterNoteBook is ipynb , but the suffix of the file we created should be py . The general idea is that we define files and functions through Pycharm, and then upload them to jupyter through the upload function of jupyterNoteBook to complete the encapsulation of the code.

2.2 Pycharm specific operation

Let's take the core code of the KNN algorithm in the previous blog as an example. Create a new file named knn.py in PyCharm , define a function (method) named kNN_classify , write the KNN core algorithm code into this method, and complete the code package. As shown below: insert image description here

2.3 JupyterNoteBook specific operation

In the same-level directory of the main file, we create a new folder named kNN_function, then enter this folder, find the directory where we wrote knn.py before through the upload button, and upload it to the jupyter server. The specific operation is as shown in the figure below Show:

Create a new folder, the specific operation is as follows: insert image description here
modify the name of the folder to kNN_function , the operation is as shown in the figure below:

enter this folder, find the location of the knn.py file through upload and upload it to the jupyterNoterbook service, as shown in the figure below:
insert image description here

so far , we have uploaded knn.py to the jupyter server.

2.4 Call knn.py file in JupyterNoteBook

We introduce knn.py in the main file, and then call its kNN_classify method to achieve the same effect as the previous blog. The specific whole code is as follows:

import numpy as np 
import matplotlib.pyplot as plt

raw_data_X = [[3.393533211, 2.331273381],
              [3.110073483, 1.781539638],
              [1.343808831, 3.368360954],
              [3.582294042, 4.679179110],
              [2.280362439, 2.866990263],
              [7.423436942, 4.696522875],
              [5.745051997, 3.533989803],
              [9.172168622, 2.511101045],
              [7.792783481, 3.424088941],
              [7.939820817, 0.791637231]
             ]
raw_data_y = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]

X_train = np.array(raw_data_X)
y_train = np.array(raw_data_y)
x = np.array([8.093607318, 3.365731514])

%run kNN_function/knn.py # 引入knn.py文件
predict_y = kNN_classify(6, X_train, y_train, x) # 调用kNN_classify方法

predict_y

Code in the knn.py file:

import numpy as np
from math import sqrt
from collections import Counter


def kNN_classify(k, X_train, y_train, x):
    assert 1 <= k <= X_train.shape[0], "k must be valid"
    assert X_train.shape[0] == y_train.shape[0], \
        "the size of X_train must equal to the size of y_train"
    assert X_train.shape[1] == x.shape[0], \
        "the feature number of must be equal to X_train"

    distances = [sqrt(np.sum((x_train - x) ** 2)) for x_train in X_train]
    nearest = np.argsort(distances)
    topK_y = [y_train[i] for i in nearest[:k]]
    votes = Counter(topK_y)

    return votes.most_common(1)[0][0]

3. Summary

Through this blog, I mainly explained how to functionally encapsulate the code through PyCharm and call it in the jupyter server. Functional encapsulation is very helpful for managing code. The next blog will introduce the KNN algorithm encapsulated in Scikit-Learn, and implement the simple Scikit-Learn underlying KNN algorithm through functional packaging code. If there is any mistake, please correct me, thank you