[Tensorflow] Use class class to build neural network structure

Code

import tensorflow as tf
from tensorflow.keras.layers import Dense
from tensorflow.keras import Model
from sklearn import datasets
import numpy as np

x_train = datasets.load_iris().data
y_train = datasets.load_iris().target

np.random.seed(116)
np.random.shuffle(x_train)
np.random.seed(116)
np.random.shuffle(y_train)
tf.random.set_seed(116)
# 只有这里与上一篇不同，定义了一个类IrisModel，其参数为继承了tensorflow的Model类的Model
# 两个函数__init__函数：super函数继承父类语法，是固定的，self.d1中的d1是这一层神经网络的名字
# 每一层都由self.引导这里定义的网络结构块被call函数调用
# call函数：调用__init__函数，实现对输入x的预测得到输出结果y
class IrisModel(Model):
    def __init__(self):
        super(IrisModel, self).__init__()
        self.d1 = Dense(3, activation='softmax', kernel_regularizer=tf.keras.regularizers.l2())

    def call(self, x):
        y = self.d1(x)
        return y

model = IrisModel()

model.compile(optimizer=tf.keras.optimizers.SGD(lr=0.1),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['sparse_categorical_accuracy'])

model.fit(x_train, y_train, batch_size=32, epochs=500, validation_split=0.2, validation_freq=20)
model.summary()

from_logits explained

1. Logits are often seen in tensorflow, which refers to Logistic (note that Logistic Regression refers to classification problems, not regression problems, and regression problems are to predict a continuous value).
And from_logits refers to whether there is a Logistic function. Common Logistic functions include Sigmoid and Softmax functions. Here, the iris classification problem uses the softmax function as the activation function, and the output results have been probabilized, so there is no need to set from_logits to True here, just keep the default value.
2. Logistic regression Logistic Regression refers to the classification problem, not the regression problem, and the regression problem is to predict a continuous value.
Regression problems generally use MSE as the loss function, binary classification problems generally use BCE (Binary Cross-Entropy) as the loss function,
multi-class classification problems generally use CCE (Categorical Cross-Entropy) as the loss function, both BCE and CCE There are some differences in the formulas, but the essence is the same, and CE can be used as a collective term.
The model prediction output in classification problems generally first passes through the Sigmoid or Softmax activation function.
3. Common activation functions include Sigmoid, Tanh, ReLU, Softmax.
ReLU can generally be used as a non-linear output after each hidden layer or output layer.
If Sigmoid and Tanh are used as the activation function of the hidden layer in the neural network, both may cause the problem of gradient disappearance, so the activation function of the hidden layer is generally recommended to use ReLU, and Sigmoid and Tanh are not recommended.
Sigmoid is generally only used in the output layer, not in the hidden layer, because when the network layer is too deep, using Sigmoid as the nonlinear output of the hidden layer will easily cause the gradient to disappear, making it difficult to update the network, and ReLU can effectively avoid it
. For the problem of gradient disappearance, Softmax is also generally used in the output layer.
For example, Sigmoid is generally added at the end of the output layer in the binary classification logistic regression model, then the output value will be between 0 and 1, representing two different classifications.
For example, Softmax is generally added at the end of the output layer in the multi-classification logistic regression model, then the sum of all output values will be equal to 1, which means that the sum of the probability values of all different classifications is equal to 1.
The following is an example of Sigmoid that passes through multiple hidden layers in backpropagation. First, because the maximum value of the derivative function of Sigmoid is f(0)=0.25, that is to say, when each layer of Sigmoid passes through a hidden layer in backpropagation,
its The gradient value must be multiplied by 0.25, which means that the gradient value will be attenuated by 0.25 times every time it passes through. If the Sigmoid with five hidden layers needs to pass through the backpropagation, then the gradient value will be at least 5 times attenuated by 0.25. square multiples.
Then it will cause the weights of the first few layers of the neural network to be difficult to update. Similarly, Tanh also has a similar situation, so most of the hidden layer activation functions use ReLU, and the calculation speed of ReLU is also faster than Sigmoid and Tanh
. Much faster.

Exploring 'Azu-nyan'

[Tensorflow] Use class class to build neural network structure

[Tensorflow] Use class class to build neural network structure

Code

from_logits explained

This article uses classes to encapsulate the structure we want for each layer of the neural network.

Guess you like