1. Foreword
Earlier we used TensorFlow to make a weather forecast. Although the effect was not very good, the whole process was run through. In this article, we briefly introduce some parameters of TensorFlow on the basis of the previous ones, and understand the simple meaning and usage of each parameter on the basis of the interface file.
Second, build the model again
Let's delete all the previous redundant codes and do a simple model training and prediction.
3. Parameters that can be modified
The main things we can modify in these codes include: network model adjustment, network configuration modification, and training methods. Let's look at them one by one.
1. Network model construction
1)tf.keras.Sequential()
The meaning of this sentence is to instantiate a model1, and then stack layers in model1 in order to build a neural network (the model1.add operation later).
2)model1.add(layers.Dense(16))
The meaning of this sentence is to add a layer of neural network to model1, and the style of the neural network is defined by layers.Dense.
3)layers.Dense(16)
The meaning of this sentence is to generate a layer of neural network composed of 16 neurons, where Dense means "a conventional fully connected NN layer", which is also a more conventional and commonly used layer. Since there is Dense, there are actually other types of layer structures, the more common ones are (reference link: https://www.bbsmax.com/A/D854PnYW5E/):
Dense layer: fully connected layer
Activation layer: The activation layer applies an activation function to the output of a layer
Dropout layer: Dropout is applied to the input data. Dropout will randomly disconnect the input neurons with a certain probability (rate) every time the parameters are updated during the training process, and the Dropout layer is used to prevent overfitting.
Flatten layer: The Flatten layer is used to "flatten" the input, that is, to make the multi-dimensional input one-dimensional, and is often used in the transition from the convolutional layer to the fully connected layer. Flatten does not affect the batch size.
Reshape layer: The Reshape layer is used to convert the input shape to a specific shape
Permute layer: The Permute layer rearranges the input dimensions according to a given pattern. For example, this layer may be used when RNN and CNN networks need to be connected. The so-called rearrangement is to exchange two lines
.... (there are many others not listed)
At present, we don't need to understand the specific usage of these layers, just know that there are many types of neural network structures. For now, we can just use Dense.
4) Dense parameters
units |
Positive integer, the dimension of the output space. It can be understood that it is the number of neurons. |
activation |
The activation function to use. If nothing is specified, no activation is applied (ie. "linear" activation: a(x) = x). Activation functions are very useful, and we will talk about them in detail below. |
use_bias |
Whether the network layer uses bias vectors. The function is to determine whether the output of the convolutional layer has b. The default is True, and it may be necessary to set False in some special scenarios. |
kernel_initializer |
This is to set how to initialize the neural network internally. The default meaning of glorot_uniform is "uniform distribution initializer". Naturally, there are other initialization methods, which will be described in detail later. |
bias_initializer |
Initializer for the bias vector. That is, how to set b to initialize. We understand that this b is also part of the neural network. The default zeros is the initial value 0. It has many initialization methods like kernel_initializer, which will be described in detail later. |
kernel_regularizer |
Regularizer function to apply to the kernel weight matrix. The regularization term adds a penalty term to the parameters of the layer or the activation value of the layer during the optimization process. The purpose is to prevent overfitting. |
bias_regularizer |
Regularizer function to apply to the bias vector. It is the penalty term imposed on b, which is also used to prevent overfitting. |
activity_regularizer |
Regularizer function to apply to layer output. It is the penalty term imposed on the activation function, or to prevent overfitting. |
kernel_constraint |
Constraint function to apply to the kernel weight matrix. Simply put, it is to limit the parameter values in the neural network to a certain range, or to prevent overfitting, but it is rarely used. |
bias_constraint |
Constraint function to apply to the bias vector. As above, it is still not commonly used. |
5) Dense parameters and the structure of the neural network
If you just look at the previous explanation of the Dense parameter, you will most likely be confused, but if you have a little understanding of the internal structure of the neural network, it will be easier to understand. Let me try to describe the internal details of the neural network in a few words. See Would it be a little bit helpful.
(It is recommended to combine blog posts written a long time ago: https://mbb.eet-china.com/blog/3887969-408491.html)
In simple terms, the neural network is roughly composed of 4 functions:
Propagation function, activation function, backpropagation function, loss function
The core functions to be realized by these 4 functions include:
Forward prediction, output mapping, reverse optimization, loss calculation
Express it with a formula:
Y = w*f(X)+b
Among them, X is our input, Y is the prediction result, and we can use label_Y to identify the "label", which means the Y value that should be theoretically correct.
So the so-called training is:
- Randomly give a w and b value, and then input X to calculate Y (forward prediction)
- The calculated Y cannot be used directly, we then use the activation function to calculate Y to get the real Y value (activation function)
- Compare the values of label_Y and Y to get the loss loss (compared with the label)
- Optimize w and b according to the loss (backpropagation)
When we input a large number of X and Y values, we can adjust w and b to more appropriate values. The values of w and b are our trained models, so that when we need to predict, we can input an X1 Get a suitable Y1.
So, by analogy:
- features = pd.read_csv('training set.csv') is to import training data X
- model1 = tf.keras.Sequential(); model1.add(layers.Dense(16)) generates and initializes w and b. Note that w and b are not a number, but a set of numbers.
- labels_avg = np.array(features[avg]) is to save the "label", that is, label_Y
- The training performed by model1.fit() is the propagation function + activation function + comparison result + backpropagation
- The purpose of backpropagation is to update w and b (optimization parameters).
Then let's take a look at the parameters of Dense to understand it better.
units |
Set the number of neurons, and at the same time set the numerical dimension (part) of w and b. The numerical dimensions of w and b are jointly determined by the number of neurons and the number of neural network layers. |
activation |
Set the activation function. After calculating Y through w*f(X)+b, an operation performed on Y is the activation function. What kind of operation is used here? |
use_bias |
Set whether to use b, if not used, the formula becomes w*f(X) |
kernel_initializer |
Set the initialization value of w, such as initializing all to 0, or initializing to some random numbers, or random numbers conforming to the normal distribution. |
bias_initializer |
Set the initialization value of b |
kernel_regularizer |
Sets the regularization function to apply to w. After each prediction and optimization of the value of w, in order to make w not fit (the value of w is too optimized to the training data), a penalty operation is performed to make the value of w more random. |
bias_regularizer |
Sets the regularization function to apply to b. To prevent overfitting, the principle is the same as above. |
activity_regularizer |
Sets the regularization function to apply to Y. To prevent overfitting, the principle is the same as above. |
kernel_constraint |
Set the value of w to be limited to a certain range, or to prevent overfitting, which is rarely used. |
bias_constraint |
Set the value of b to be limited to a certain range, as above, it is still not commonly used. |
6) Optional parameters of Dense
Units number of neurons |
positive integer. For example, 10 means that this layer of network uses 10 neurons Usage: model1.add(layers.Dense(10)) |
Activation activation function |
Usage: model1.add(layers.Dense(10,activation='relu')) activation=’softmax’ activation=’softplus’ activation=’softsign’ activation='reread' activation=’tanh’ activation=’sigmoid’ activation=’hard_sigmoid’ activation=’linear’ It’s useless to know so many meanings, you can try them one by one, the most commonly used one is relu |
use_bias whether to use b |
The default is True Usage: model1.add(layers.Dense(10,use_bias=True)) |
kernel_initializer initial value of w |
用法:model1.add(layers.Dense(10,kernel_initializer=’glorot_uniform’)) kernel_initializer='zeros' all 0 initialization kernel_initializer='one' All 1 initialization kernel_initializer='constant' is initialized to a fixed value kernel_initializer='random_uniform' uniform distribution kernel_initializer='random_normal' normal distribution kernel_initializer='truncated_normal' produces a truncated normal distribution kernel_initializer='identity' identity matrix kernel_initializer='orthogonal' Orthogonal kernel_initializer='glorot_normal' Normalized Glorot initialization, ie Xavier kernel_initializer='glorot_uniform' Glorot uniform distribution Except for all 0 and all 1, I don’t understand the rest, anyway, you can fill it in and try it out. |
bias_initializer the initialization value of b |
用法:model1.add(layers.Dense(10,bias_initializer=’zeros’)) Optional settings are the same as kernel_initializer |
kernel_regularizer Regularization applied to w |
用法:model1.add(layers.Dense(10,kernel_regularizer=regularizers.l2(0.01))) kernel_regularizer=regularizers.l2(0.01) kernel_regularizer=regularizers.l1(0.01) |
bias_regularizer 对b施加的正则化 |
同上 |
activity_regularizer 对Y施加的正则化 |
同上上 |
kernel_constraint w的值限定 |
一般不使用就不多说了 |
bias_constraint b的值限定 |
一般不使用就不多说了 |
7)Dense修改参数的实例
A.啥参数也不加
训练结果:
B.增加激活函数relu
训练结果:看起来差别不大
C.增加激活函数softmax
训练结果:更差了。所以这个激活函数得选择合适的,而不是瞎选。
D.不使用b
训练结果:完全没法看了
E.设置w的初始化值为random_uniform
训练结果:效果也是不明显
后面的大家可以自己试,累了。
2、对网络的配置
1)model1.compile()
作用是设置优化器、损失函数和准确率评测标准。
optimizer 设置优化器 |
用法: model1.compile( optimizer=tf.keras.optimizers.SGD(0.001), loss='mean_squared_error' ) optimizers.SGD(0.001) optimizers.Adagrad(0.001) optimizers.Adadelta(0.001) optimizers.Adam(0.001) 这里0.001是学习率,简单来说,这个数值就是每次对w和b进行优化的幅度。太大太小都不合适。 |
loss 损失函数 |
用法: model1.compile( optimizer=tf.keras.optimizers.SGD(0.001), loss='mean_squared_error' ) loss='mean_squared_error' # 均方误差 loss='mean_absolute_error' # 平均绝对误差 loss='mean_absolute_percentage_error' # 平均绝对百分比误差 loss='mean_squared_logarithmic_error' # 均方对数误差 loss='kullback_leibler_divergence' 损失函数的目的是计算模型在训练期间应寻求最小化的数量 |
metrics 评价函数 |
用法: model1.compile( optimizer=tf.keras.optimizers.SGD(0.001), loss='mean_squared_error', metrics=’accuracy’ ) metrics=’accuracy’ metrics=’sparse_accuracy’ metrics=’sparse_categorical_accuracy’ 评价函数用于评估当前训练模型的性能 |
loss_weights |
这4个参数用的极少,也找不到啥资料,所以忽略吧。 |
sample_weight_mode |
|
weighted_metrics |
|
target_tensors |
2)compile修改参数的实例
A.SDG
训练结果:
B.Adagrad
训练结果:
C.Adadelta
训练结果:
D.Adam
、
训练结果:
Loss就不演示了,大家自己改着玩吧。
3、对训练函数的配置
x |
输入数据 |
y |
标签,也就是训练数据里的正确答案 |
batch_size |
每次训练使用的数据量,一般设置成32、64、128这样,比如有10000个数据,这里你设置成64,那就是将所有数据分成156个组,每次将一组的数据放进去一起训练。主要就是加快训练速度,不然一个一个来得多久。 |
epochs |
总训练次数,比如我们设置的是50,那就是将输入的数据放到模型里训练50次,也就是对模型进行了50次的优化 |
verbose |
日志显示,0为不在标准输出流输出日志信息,1为输出进度条记录,2为每个epoch输出一行记录 |
callbacks |
回调函数,在每个training/epoch/batch结束时,如果我们想执行某些任务,例如模型缓存、输出日志。这个现在用不到不多说。 |
validation_split |
0~1之间的浮点数,用来指定训练集的一定比例数据作为验证集。 |
validation_data |
形式为(X,y)的tuple,是指定的验证集。此参数将覆盖validation_spilt。也就是不使用训练数据里的数据作为验证集,而是从外部另外输入。实际上这个验证集对训练没有影响,只是方便你看现在训练的效果。 |
shuffle |
表示是否在训练过程中随机打乱输入样本的顺序 |
class_weight |
很少用到 |
sample_weight |
|
initial_epoch |
四、模型的存储和调用
我们训练好以后,想要保存起来咋弄呢,非常简单。
五、回顾
本篇我们将TensorFlow主要用到的方法和参数进行简单的介绍,大家可以自己挨个换着试试。另外还将模型的保存和调用也演示了一下。