Learn to build a neural network for classification prediction in three days (tensorflow)

Learn to build a neural network for classification prediction in three days (tensorflow)


foreword

This article is suitable for those who are interested in deep learning and want to get started, or want to add simple classification and prediction functions to their own project applications. We also welcome comments and suggestions from experts in the AI ​​field. This article starts from the preparatory work (environment construction) to the basic calculation process and then to the construction and optimization of the network. Finally, a set of six-step methods will be summarized so that you can quickly build your own neural network to meet the functional requirements of classification.
Some friends will ask, why is it three days instead of one day? Learning programming, can you learn it without writing some codes? Learning programming without typing codes is like learning to drive without touching the steering wheel. If I tell you that you can learn to drive in a day without touching the steering wheel, wouldn’t that be nonsense?


Preparation

anaconda installation

Official website: https://www.anaconda.com/Download
and install the python3.8 version corresponding to your own system
insert image description here

tensorflow installation

1. In the start menu, select anaconda powershell prompt to enter the command line mode.
insert image description here
2. Enter the command conda create -n tf2.1 python=3.8 and press Enter to create a new environment named tf2.1
insert image description here

3. After creating a new environment, continue to enter conda activate tf2.1 and press Enter to enter the newly created environment. You can see that the environment name in the front brackets has changed
insert image description here

4. Then continue to enter conda install cudatoolkit=10.1 and press Enter to install Nvidia’s SDK10.1 version. When asking whether to install the dependent package, enter y to select installation, and press Enter to continue the installation. This process may fail many times due to network reasons. You can Try re-typing this command several times, you can also change the installation source, the method of changing the installation source is very simple.
insert image description here
5. After the installation is complete, continue to enter conda install cudnn=7.6 and press Enter to install the NVIDIA deep learning software package version 7.6
insert image description here

6. It doesn’t matter if you can’t install the Nvidia software package in the first two steps, or you don’t need to install it. These two packages are mainly used to use the GPU to speed up. For those who don’t have a GPU or the network situation is not ideal, you can choose to jump directly. After entering the next step, enter the command pip install tensorflow==2.1 to install tensorflow2.1 version
insert image description here

pycharm installation

Official website: https://www.jetbrains.com/pycharm/Choose
the version We choose the free version, which is completely enough. If you can do it for nothing, you can do it for nothing. One point saved is one point.
After installing pycharm, build a new project and install the anaconda we just installed The tf2.1 environment is imported into pycharm. Friends who do not know how to import the environment can search for the pycharm environment configuration on csdn. There are many articles written in detail.
insert image description here


1. Calculation of Neural Networks (Day 1)

On the first day, let's do something simple and easy to understand. Let's feel the magic of neural networks through a simple example of iris classification.

1. Basic process

Prepare data: collect a large amount of 'feature/label' data
Build network: build the basic structure of neural network
Optimize parameters: train the network to obtain the best parameters to make the predicted value and the real value closer to
the application network: save the trained network as a model , new data is passed in, and the prediction result is given

2. Data introduction (iris data set)

Choose the most classic iris data set, the classic will never go out of date.
Briefly introduce the structure of the iris data set:
irises are generally divided into three categories by us. The classification is based on the length of the sepal, the width of the sepal, the length of the petal, and the width of the petal. We use the attributes to be used for classification, such as these four Attributes are the characteristics of our training. According to these four attributes, we have divided three categories (Iris foxtail, Iris variegated, and Iris Virginia), which we call labels.
The structure of the iris data set is that four eigenvalues ​​correspond to a label value. The eigenvalues ​​are the values ​​of sepal length, sepal width, petal length, and petal width, such as [5.8, 4.0, 1.2, 0.2], and the label value is the corresponding category. value, we correspond to the three categories as 0, 1, and 2, and the iris data set is a collection of many groups with such a 4+1 structure.

3. Basic concepts (if you don’t understand, you can skip it temporarily)

w, b: training parameters, w is called weight, b is called bias. Predicted value matrix=eigenvalue matrix*w+b
loss function loss: the gap between the predicted value and the standard answer (label), the loss function can judge the pros and cons of w and b, when the output of the loss function is the smallest, that is, the predicted value and the real When the value difference is the smallest, the parameters w and b are the optimal values.
Gradient: the vector after the partial derivative of the function for each parameter. The function gradient descent direction is the direction in which the value of the function decreases.
Gradient descent method: Find the minimum value of the loss function along the direction of the gradient descent of the loss function to obtain the optimal parameters.
The purpose of gradient descent: to find a set of parameters w and b, so that the loss function value is the minimum
learning rate lr: the degree of parameter update each time the gradient is used, the learning rate is too small, the parameter update will be very slow, and the learning rate may be too large. Oscillating back and forth near the minimum value cannot converge. What kind of learning rate is the most suitable can only be tried by yourself. I like to reduce or expand by three times each time.
Backpropagation: From back to front, calculate the partial derivative of the loss function with respect to each layer of neuron network parameters layer by layer, and update all parameters iteratively.
W after update = W before update - learning rate × partial derivative of loss function to W before update

4. Basic usage of tensorflow (don’t need to remember, you can check it anytime, just take a look and get an impression)

Note: Briefly introduce some methods that will be used later, and the writing is very brief. If you are unclear, you can refer to the tensorflow documentation (https://tensorflow.google.cn/api_docs/python/tf)
to create a tensor: tf. constant (tensor content, dtype=data type (optional))
converts the numpy data type to tensor: tf.convert_to_tensor (data name, dtype=data type (optional))
creates all 0, 1, and specified values Tensor: tf.zeros (dimensions) tf.ones (dimensions) tf.fill (dimensions, specified value)
to generate a normal distribution of random numbers: tf.random.normal (dimensions, mean=mean, stddev=standard deviation)
generation Random numbers with truncated normal distribution: tf.random.truncated_normal(dimension, mean mean, stddev=standard deviation)
Generate uniformly distributed random numbers: tf.random.uniform(dimension, minval=minimum, maxval=maximum)
force tensor Convert to this data type: tf.cast (tensor name, dtype=data type)
Calculate the minimum and maximum values ​​of elements on the tensor: tf.reduce_min (tensor name) tf.reduce_max (tensor name)
understand axis: axis =0 means across rows, axis=1 means across columns, if not specified, all elements will participate in the calculation Calculate the
mean value of the tensor along the specified dimension: tf.reduce_mean (tensor name, axis=operating axis)
Calculate the sum of the tensor along the specified dimension: tf.reduce_sum(tensor name, axis=operating axis)
marks a variable as trainable: tf.Variable(variable)
Four operations: tf.add (tensor 1, tensor 2) tf.subtract (tensor 1, tensor 2) tf.multiply (tensor 1, tensor 2) tf.divide (tensor 1, tensor 2) )
Tensor square, nth power, root: tf.square (tensor name) tf.pow (tensor name, nth power number) tf.sqrt (tensor name)
multiply two matrices: tf. matmul (matrix 1, matrix 2)
slices the first dimension of the incoming tensor to generate feature/label pairs: tf.data.Dataset.from_tensor_slices((input features, labels))
with structure records the calculation process, and the gradient calculates the Zhang Quantity gradient:
with tf.GradientTape() as tape:
            Calculation process
grad=tape.gradient (function, for whom to derive)
traverse each element and combine into (index element) form: enmerate (iterable object)
data Data output converted to one-hot form: tf.one_hot (data to be converted, depth=several categories)
convert the output result into a probability distribution: tf.nn.softmax (output result)
parameter self-decrement operation: w.assign_sub (self decrement)
returns the index of the tensor's maximum value along the specified dimension: tf.argmax(tensor name, axis=axis of operation)

5. Code (write by hand, write by hand, write by hand, say important things three times)

# -*- coding: UTF-8 -*-
# 利用鸢尾花数据集,实现前向传播、反向传播,可视化loss曲线

# 导入所需模块
import tensorflow as tf
from sklearn import datasets
from matplotlib import pyplot as plt
import numpy as np

# 导入数据,分别为输入特征和标签
x_data = datasets.load_iris().data
y_data = datasets.load_iris().target

# 随机打乱数据(因为原始数据是顺序的,顺序不打乱会影响准确率)
# seed: 随机数种子,是一个整数,当设置之后,每次生成的随机数都一样(为方便教学,以保每位同学结果一致)
np.random.seed(116)  # 使用相同的seed,保证输入特征和标签一一对应
np.random.shuffle(x_data)
np.random.seed(116)
np.random.shuffle(y_data)
tf.random.set_seed(116)

# 将打乱后的数据集分割为训练集和测试集,训练集为前120行,测试集为后30
x_train = x_data[:-30]
y_train = y_data[:-30]
x_test = x_data[-30:]
y_test = y_data[-30:]

# 转换x的数据类型,否则后面矩阵相乘时会因数据类型不一致报错
x_train = tf.cast(x_train, tf.float32)
x_test = tf.cast(x_test, tf.float32)

# from_tensor_slices函数使输入特征和标签值一一对应。(把数据集分批次,每个批次batch组数据)
train_db = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32)
test_db = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(32)

# 生成神经网络的参数,4个输入特征故,输入层为4个输入节点;因为3分类,故输出层为3个神经元
# 用tf.Variable()标记参数可训练
# 使用seed使每次生成的随机数相同(方便教学,使大家结果都一致,在现实使用时不写seed)
w1 = tf.Variable(tf.random.truncated_normal([4, 3], stddev=0.1, seed=1))
b1 = tf.Variable(tf.random.truncated_normal([3], stddev=0.1, seed=1))

lr = 0.1  # 学习率为0.1
train_loss_results = []  # 将每轮的loss记录在此列表中,为后续画loss曲线提供数据
test_acc = []  # 将每轮的acc记录在此列表中,为后续画acc曲线提供数据
epoch = 500  # 循环500
loss_all = 0  # 每轮分4个step,loss_all记录四个step生成的4个loss的和

# 训练部分
for epoch in range(epoch):  #数据集级别的循环,每个epoch循环一次数据集
    for step, (x_train, y_train) in enumerate(train_db):  #batch级别的循环 ,每个step循环一个batch
        with tf.GradientTape() as tape:  # with结构记录梯度信息
            y = tf.matmul(x_train, w1) + b1  # 神经网络乘加运算
            y = tf.nn.softmax(y)  # 使输出y符合概率分布(此操作后与独热码同量级,可相减求loss)
            y_ = tf.one_hot(y_train, depth=3)  # 将标签值转换为独热码格式,方便计算loss和accuracy
            loss = tf.reduce_mean(tf.square(y_ - y))  # 采用均方误差损失函数mse = mean(sum(y-out)^2)
            loss_all += loss.numpy()  # 将每个step计算出的loss累加,为后续求loss平均值提供数据,这样计算的loss更准确
        # 计算loss对各个参数的梯度
        grads = tape.gradient(loss, [w1, b1])

        # 实现梯度更新 w1 = w1 - lr * w1_grad    b = b - lr * b_grad
        w1.assign_sub(lr * grads[0])  # 参数w1自更新
        b1.assign_sub(lr * grads[1])  # 参数b自更新

    # 每个epoch,打印loss信息
    print("Epoch {}, loss: {}".format(epoch, loss_all/4))
    train_loss_results.append(loss_all / 4)  # 将4个step的loss求平均记录在此变量中
    loss_all = 0  # loss_all归零,为记录下一个epoch的loss做准备

    # 测试部分
    # total_correct为预测对的样本个数, total_number为测试的总样本数,将这两个变量都初始化为0
    total_correct, total_number = 0, 0
    for x_test, y_test in test_db:
        # 使用更新后的参数进行预测
        y = tf.matmul(x_test, w1) + b1
        y = tf.nn.softmax(y)
        pred = tf.argmax(y, axis=1)  # 返回y中最大值的索引,即预测的分类
        # 将pred转换为y_test的数据类型
        pred = tf.cast(pred, dtype=y_test.dtype)
        # 若分类正确,则correct=1,否则为0,将bool型的结果转换为int型
        correct = tf.cast(tf.equal(pred, y_test), dtype=tf.int32)
        # 将每个batch的correct数加起来
        correct = tf.reduce_sum(correct)
        # 将所有batch中的correct数加起来
        total_correct += int(correct)
        # total_number为测试的总样本数,也就是x_test的行数,shape[0]返回变量的行数
        total_number += x_test.shape[0]
    # 总的准确率等于total_correct/total_number
    acc = total_correct / total_number
    test_acc.append(acc)
    print("Test_acc:", acc)
    print("--------------------------")

# 绘制 loss 曲线
plt.title('Loss Function Curve')  # 图片标题
plt.xlabel('Epoch')  # x轴变量名称
plt.ylabel('Loss')  # y轴变量名称
plt.plot(train_loss_results, label="$Loss$")  # 逐点画出trian_loss_results值并连线,连线图标是Loss
plt.legend()  # 画出曲线图标
plt.show()  # 画出图像

# 绘制 Accuracy 曲线
plt.title('Acc Curve')  # 图片标题
plt.xlabel('Epoch')  # x轴变量名称
plt.ylabel('Acc')  # y轴变量名称
plt.plot(test_acc, label="$Accuracy$")  # 逐点画出test_acc值并连线,连线图标是Accuracy
plt.legend()
plt.show()


2. Optimization of neural network (second day)

1. Exponential decay learning rate (automatically adjust the learning rate during training)

Exponential decay learning rate = initial learning rate × learning rate decay rate^ (current number of rounds/how many rounds to decay once)

LR_BASE = 0.2  # 最初学习率
LR_DECAY = 0.99  # 学习率衰减率
LR_STEP = 1  # 喂入多少轮BATCH_SIZE后,更新一次学习率

for epoch in range(epoch):  # for epoch 定义顶层循环,表示对数据集循环epoch次,此例数据集数据仅有1个w,初始化时候constant赋值为5,循环100次迭代。
    lr = LR_BASE * LR_DECAY ** (epoch / LR_STEP)
    pass

2. Activation function (fine-tuning the output of the network to make the output value meet the requirements)

Sigmoid function: tf.nn.sigmoid(x)
Tanh function: tf.math.tanh(x)
Relu function: tf.nn.relu(x)
Leaky Relu function: tf.nn.leaky_relu(x)

Suggestion: beginners prefer the relu function, set the learning rate to a small value, standardize before input features (the input features satisfy the mean value of 0), and centralize the initial parameters (the random parameters satisfy the mean value of 0, (2/the input features of the current layer number) ^1/2 is the normal distribution of the standard deviation)

3. Loss function (measure network effect, backpropagation derivation optimization parameters)

Loss function: the gap between the predicted value and the true value

loss_ce1 = tf.losses.categorical_crossentropy([1, 0], [0.6, 0.4])
loss_ce2 = tf.losses.categorical_crossentropy([1, 0], [0.8, 0.2])
print("loss_ce1:", loss_ce1)
print("loss_ce2:", loss_ce2)

4. Regularization alleviates overfitting (prevents overfitting and improves the generalization of the network)

Regularization introduces the model complexity index in the loss function, and uses the weighted value of w to weaken the noise of the training data

#-------正则化项---------
	 loss_regularization=[]
	 loss_regularization.append(tf.nn.l2_loss(w1))
	 loss_regularization.append(tf.nn.l2_loss(w2))
	 loss_regularization=tf.reduce_sum(loss_regularization)
	 loss=loss+REGULARTZER*loss_regularization

5. The optimizer updates network parameters (parameter update method)

SGD

# 实现梯度更新 w1 = w1 - lr * w1_grad    b = b - lr * b_grad
        w1.assign_sub(lr * grads[0])  # 参数w1自更新
        b1.assign_sub(lr * grads[1])  # 参数b自更新

SGDM

# sgd-momentun  
        m_w = beta * m_w + (1 - beta) * grads[0]
        m_b = beta * m_b + (1 - beta) * grads[1]
        w1.assign_sub(lr * m_w)
        b1.assign_sub(lr * m_b)

Dosing

# adagrad
        v_w += tf.square(grads[0])
        v_b += tf.square(grads[1])
        w1.assign_sub(lr * grads[0] / tf.sqrt(v_w))
        b1.assign_sub(lr * grads[1] / tf.sqrt(v_b))

RMSProp

# rmsprop
        v_w = beta * v_w + (1 - beta) * tf.square(grads[0])
        v_b = beta * v_b + (1 - beta) * tf.square(grads[1])
        w1.assign_sub(lr * grads[0] / tf.sqrt(v_w))
        b1.assign_sub(lr * grads[1] / tf.sqrt(v_b))

Adam

# adam
        m_w = beta1 * m_w + (1 - beta1) * grads[0]
        m_b = beta1 * m_b + (1 - beta1) * grads[1]
        v_w = beta2 * v_w + (1 - beta2) * tf.square(grads[0])
        v_b = beta2 * v_b + (1 - beta2) * tf.square(grads[1])

        m_w_correction = m_w / (1 - tf.pow(beta1, int(global_step)))
        m_b_correction = m_b / (1 - tf.pow(beta1, int(global_step)))
        v_w_correction = v_w / (1 - tf.pow(beta2, int(global_step)))
        v_b_correction = v_b / (1 - tf.pow(beta2, int(global_step)))

        w1.assign_sub(lr * m_w_correction / tf.sqrt(v_w_correction))
        b1.assign_sub(lr * m_b_correction / tf.sqrt(v_b_correction))

3. The six-step method of building a neural network (the third day)

So many basic concepts and basic usages mentioned above are for the purpose of using the six-step method to quickly build a neural network on the basis of understanding. Let's enter the most critical and interesting link, using the six-step method to build a neural network.

1. Stop talking nonsense and go to the six-step Dafa first

The first step is to guide the library: import
The second step is to divide the training set and test set: train, test
The third step is to define the model structure: model=tf.kears.models.Sequential
The fourth step is to define the optimizer and loss function:
model.compile Five-step training model: model.fit
The sixth step prints the network structure: model.summary

2. The third step is the network structure layer

insert image description here

3. The fourth step optimizer and loss function

insert image description here

4. The fifth step optimizer and loss function

insert image description here

5. Code

import tensorflow as tf
from sklearn import datasets
import numpy as np

x_train = datasets.load_iris().data
y_train = datasets.load_iris().target

np.random.seed(116)
np.random.shuffle(x_train)
np.random.seed(116)
np.random.shuffle(y_train)
tf.random.set_seed(116)

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(3, activation='softmax', kernel_regularizer=tf.keras.regularizers.l2())
])

model.compile(optimizer=tf.keras.optimizers.SGD(lr=0.1),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

model.fit(x_train, y_train, batch_size=32, epochs=500, validation_split=0.2, validation_freq=20)

model.summary()

6. Supplement

In the third step, the network structure can be constructed in the form of classes

insert image description here

upper code

import tensorflow as tf
from tensorflow.keras.layers import Dense
from tensorflow.keras import Model
from sklearn import datasets
import numpy as np

x_train = datasets.load_iris().data
y_train = datasets.load_iris().target

np.random.seed(116)
np.random.shuffle(x_train)
np.random.seed(116)
np.random.shuffle(y_train)
tf.random.set_seed(116)

class IrisModel(Model):
    def __init__(self):
        super(IrisModel, self).__init__()
        self.d1 = Dense(3, activation='softmax', kernel_regularizer=tf.keras.regularizers.l2())

    def call(self, x):
        y = self.d1(x)
        return y

model = IrisModel()

model.compile(optimizer=tf.keras.optimizers.SGD(lr=0.1),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

model.fit(x_train, y_train, batch_size=32, epochs=500, validation_split=0.2, validation_freq=20)
model.summary()



Summarize

People are lazy and omitted here. If you have any comments, suggestions or questions, welcome to the comment area, aba aba aba. . . . . . .

Guess you like

Origin blog.csdn.net/qq_45904885/article/details/119386226