Deep learning with grandpa 2 Basic usage of TensorFlow

1. Foreword

Earlier we used TensorFlow to make a weather forecast. Although the effect was not very good, the whole process was run through. In this article, we briefly introduce some parameters of TensorFlow on the basis of the previous ones, and understand the simple meaning and usage of each parameter on the basis of the interface file.

Second, build the model again

Let's delete all the previous redundant codes and do a simple model training and prediction.

 

 

 

3. Parameters that can be modified

The main things we can modify in these codes include: network model adjustment, network configuration modification, and training methods. Let's look at them one by one.

 

1. Network model construction

1)tf.keras.Sequential()

The meaning of this sentence is to instantiate a model1, and then stack layers in model1 in order to build a neural network (the model1.add operation later).

2)model1.add(layers.Dense(16))

The meaning of this sentence is to add a layer of neural network to model1, and the style of the neural network is defined by layers.Dense.

3)layers.Dense(16)

The meaning of this sentence is to generate a layer of neural network composed of 16 neurons, where Dense means "a conventional fully connected NN layer", which is also a more conventional and commonly used layer. Since there is Dense, there are actually other types of layer structures, the more common ones are (reference link: https://www.bbsmax.com/A/D854PnYW5E/):

Dense layer: fully connected layer

Activation layer: The activation layer applies an activation function to the output of a layer

Dropout layer: Dropout is applied to the input data. Dropout will randomly disconnect the input neurons with a certain probability (rate) every time the parameters are updated during the training process, and the Dropout layer is used to prevent overfitting.

Flatten layer: The Flatten layer is used to "flatten" the input, that is, to make the multi-dimensional input one-dimensional, and is often used in the transition from the convolutional layer to the fully connected layer. Flatten does not affect the batch size.

Reshape layer: The Reshape layer is used to convert the input shape to a specific shape

Permute layer: The Permute layer rearranges the input dimensions according to a given pattern. For example, this layer may be used when RNN and CNN networks need to be connected. The so-called rearrangement is to exchange two lines

.... (there are many others not listed)

At present, we don't need to understand the specific usage of these layers, just know that there are many types of neural network structures. For now, we can just use Dense.

4) Dense parameters

 

units

Positive integer, the dimension of the output space. It can be understood that it is the number of neurons.

activation

The activation function to use. If nothing is specified, no activation is applied (ie. "linear" activation: a(x) = x). Activation functions are very useful, and we will talk about them in detail below.

use_bias

Whether the network layer uses bias vectors. The function is to determine whether the output of the convolutional layer has b. The default is True, and it may be necessary to set False in some special scenarios.

kernel_initializer

This is to set how to initialize the neural network internally. The default meaning of glorot_uniform is "uniform distribution initializer". Naturally, there are other initialization methods, which will be described in detail later.

bias_initializer

Initializer for the bias vector. That is, how to set b to initialize. We understand that this b is also part of the neural network. The default zeros is the initial value 0. It has many initialization methods like kernel_initializer, which will be described in detail later.

kernel_regularizer

Regularizer function to apply to the kernel weight matrix. The regularization term adds a penalty term to the parameters of the layer or the activation value of the layer during the optimization process. The purpose is to prevent overfitting.

bias_regularizer

Regularizer function to apply to the bias vector. It is the penalty term imposed on b, which is also used to prevent overfitting.

activity_regularizer

Regularizer function to apply to layer output. It is the penalty term imposed on the activation function, or to prevent overfitting.

kernel_constraint

Constraint function to apply to the kernel weight matrix. Simply put, it is to limit the parameter values ​​​​in the neural network to a certain range, or to prevent overfitting, but it is rarely used.

bias_constraint

Constraint function to apply to the bias vector. As above, it is still not commonly used.

5) Dense parameters and the structure of the neural network

If you just look at the previous explanation of the Dense parameter, you will most likely be confused, but if you have a little understanding of the internal structure of the neural network, it will be easier to understand. Let me try to describe the internal details of the neural network in a few words. See Would it be a little bit helpful.

(It is recommended to combine blog posts written a long time ago: https://mbb.eet-china.com/blog/3887969-408491.html)

In simple terms, the neural network is roughly composed of 4 functions:

Propagation function, activation function, backpropagation function, loss function

The core functions to be realized by these 4 functions include:

Forward prediction, output mapping, reverse optimization, loss calculation

Express it with a formula:

Y = w*f(X)+b

Among them, X is our input, Y is the prediction result, and we can use label_Y to identify the "label", which means the Y value that should be theoretically correct.

So the so-called training is:

  1. Randomly give a w and b value, and then input X to calculate Y (forward prediction)
  2. The calculated Y cannot be used directly, we then use the activation function to calculate Y to get the real Y value (activation function)
  3. Compare the values ​​of label_Y and Y to get the loss loss (compared with the label)
  4. Optimize w and b according to the loss (backpropagation)

When we input a large number of X and Y values, we can adjust w and b to more appropriate values. The values ​​of w and b are our trained models, so that when we need to predict, we can input an X1 Get a suitable Y1.

 

So, by analogy:

  1. features = pd.read_csv('training set.csv') is to import training data X
  2. model1 = tf.keras.Sequential(); model1.add(layers.Dense(16)) generates and initializes w and b. Note that w and b are not a number, but a set of numbers.
  3. labels_avg = np.array(features[avg]) is to save the "label", that is, label_Y
  4. The training performed by model1.fit() is the propagation function + activation function + comparison result + backpropagation
  5. The purpose of backpropagation is to update w and b (optimization parameters).

Then let's take a look at the parameters of Dense to understand it better.

units

Set the number of neurons, and at the same time set the numerical dimension (part) of w and b. The numerical dimensions of w and b are jointly determined by the number of neurons and the number of neural network layers.

activation

Set the activation function. After calculating Y through w*f(X)+b, an operation performed on Y is the activation function. What kind of operation is used here?

use_bias

Set whether to use b, if not used, the formula becomes w*f(X)

kernel_initializer

Set the initialization value of w, such as initializing all to 0, or initializing to some random numbers, or random numbers conforming to the normal distribution.

bias_initializer

Set the initialization value of b

kernel_regularizer

Sets the regularization function to apply to w. After each prediction and optimization of the value of w, in order to make w not fit (the value of w is too optimized to the training data), a penalty operation is performed to make the value of w more random.

bias_regularizer

Sets the regularization function to apply to b. To prevent overfitting, the principle is the same as above.

activity_regularizer

Sets the regularization function to apply to Y. To prevent overfitting, the principle is the same as above.

kernel_constraint

Set the value of w to be limited to a certain range, or to prevent overfitting, which is rarely used.

bias_constraint

Set the value of b to be limited to a certain range, as above, it is still not commonly used.

6) Optional parameters of Dense

Units

number of neurons

positive integer. For example, 10 means that this layer of network uses 10 neurons

Usage: model1.add(layers.Dense(10))

Activation

activation function

Usage: model1.add(layers.Dense(10,activation='relu'))

activation=’softmax’

activation=’softplus’

activation=’softsign’

activation='reread'

activation=’tanh’

activation=’sigmoid’

activation=’hard_sigmoid’

activation=’linear’

It’s useless to know so many meanings, you can try them one by one, the most commonly used one is relu

use_bias

whether to use b

The default is True

Usage: model1.add(layers.Dense(10,use_bias=True))

kernel_initializer

initial value of w

用法:model1.add(layers.Dense(10,kernel_initializer=’glorot_uniform’))

kernel_initializer='zeros' all 0 initialization

kernel_initializer='one' All 1 initialization

kernel_initializer='constant' is initialized to a fixed value

kernel_initializer='random_uniform' uniform distribution

kernel_initializer='random_normal' normal distribution

kernel_initializer='truncated_normal' produces a truncated normal distribution

kernel_initializer='identity' identity matrix

kernel_initializer='orthogonal' Orthogonal

kernel_initializer='glorot_normal' Normalized Glorot initialization, ie Xavier

kernel_initializer='glorot_uniform' Glorot uniform distribution

Except for all 0 and all 1, I don’t understand the rest, anyway, you can fill it in and try it out.

bias_initializer

the initialization value of b

用法:model1.add(layers.Dense(10,bias_initializer=’zeros’))

Optional settings are the same as kernel_initializer

kernel_regularizer

Regularization applied to w

用法:model1.add(layers.Dense(10,kernel_regularizer=regularizers.l2(0.01)))

kernel_regularizer=regularizers.l2(0.01)

kernel_regularizer=regularizers.l1(0.01)

bias_regularizer

对b施加的正则化

同上

activity_regularizer

对Y施加的正则化

同上上

kernel_constraint

w的值限定

一般不使用就不多说了

bias_constraint

b的值限定

一般不使用就不多说了

7)Dense修改参数的实例

A.啥参数也不加

 

训练结果:

 

 B.增加激活函数relu

 

训练结果:看起来差别不大

 

C.增加激活函数softmax

 

训练结果:更差了。所以这个激活函数得选择合适的,而不是瞎选。

 

 D.不使用b

 

训练结果:完全没法看了

 

 E.设置w的初始化值为random_uniform

 

训练结果:效果也是不明显

 

后面的大家可以自己试,累了。

2、对网络的配置

1)model1.compile()

作用是设置优化器、损失函数和准确率评测标准。

 

 

optimizer

设置优化器

用法:

model1.compile(

optimizer=tf.keras.optimizers.SGD(0.001),

loss='mean_squared_error'

)

optimizers.SGD(0.001)

optimizers.Adagrad(0.001)

optimizers.Adadelta(0.001)

optimizers.Adam(0.001)

这里0.001是学习率,简单来说,这个数值就是每次对w和b进行优化的幅度。太大太小都不合适。

loss

损失函数

用法:

model1.compile(

optimizer=tf.keras.optimizers.SGD(0.001),

loss='mean_squared_error'

)

loss='mean_squared_error' # 均方误差

loss='mean_absolute_error' # 平均绝对误差

loss='mean_absolute_percentage_error' # 平均绝对百分比误差

loss='mean_squared_logarithmic_error' # 均方对数误差

loss='kullback_leibler_divergence'

损失函数的目的是计算模型在训练期间应寻求最小化的数量

metrics

评价函数

用法:

model1.compile(

optimizer=tf.keras.optimizers.SGD(0.001),

loss='mean_squared_error',

metrics=accuracy

)

metrics=’accuracy’

metrics=’sparse_accuracy’

metrics=’sparse_categorical_accuracy’

评价函数用于评估当前训练模型的性能

loss_weights

这4个参数用的极少,也找不到啥资料,所以忽略吧。

sample_weight_mode

weighted_metrics

target_tensors

2)compile修改参数的实例

A.SDG

 

训练结果:

 

B.Adagrad

 

训练结果:

 

C.Adadelta

 

训练结果:

 

D.Adam

训练结果:

 

Loss就不演示了,大家自己改着玩吧。

 3、对训练函数的配置

 

x

输入数据

y

标签,也就是训练数据里的正确答案

batch_size

每次训练使用的数据量,一般设置成32、64、128这样,比如有10000个数据,这里你设置成64,那就是将所有数据分成156个组,每次将一组的数据放进去一起训练。主要就是加快训练速度,不然一个一个来得多久。

epochs

总训练次数,比如我们设置的是50,那就是将输入的数据放到模型里训练50次,也就是对模型进行了50次的优化

verbose

日志显示,0为不在标准输出流输出日志信息,1为输出进度条记录,2为每个epoch输出一行记录

callbacks

回调函数,在每个training/epoch/batch结束时,如果我们想执行某些任务,例如模型缓存、输出日志。这个现在用不到不多说。

validation_split

0~1之间的浮点数,用来指定训练集的一定比例数据作为验证集。

validation_data

形式为(X,y)的tuple,是指定的验证集。此参数将覆盖validation_spilt。也就是不使用训练数据里的数据作为验证集,而是从外部另外输入。实际上这个验证集对训练没有影响,只是方便你看现在训练的效果。

shuffle

表示是否在训练过程中随机打乱输入样本的顺序

class_weight

很少用到

sample_weight

initial_epoch

四、模型的存储和调用

我们训练好以后,想要保存起来咋弄呢,非常简单。

 

五、回顾

本篇我们将TensorFlow主要用到的方法和参数进行简单的介绍,大家可以自己挨个换着试试。另外还将模型的保存和调用也演示了一下。

Guess you like

Origin blog.csdn.net/weixin_40402375/article/details/130168660