Deep learning best practices

  • This article first number from the public: RAIS , welcome attention.

Best Practices, by definition, is the best way of doing something, of course, where the vast majority of cases must be the best, but this is not a hundred percent, we do not have to tangle this problem, we need to remember is the following these methods are very good practice in depth learning practice.

Callback mechanism

If you see here, I have reason to believe that you know is a program designed to have some development experience programmers understand programming, in that case, you certainly are no strangers to the callback, the callback is similar to one kind of observer design pattern, I pay give you a task to perform, I will continue to account for the end to do their work, and you run over, regardless of the outcome is good or bad, you have to tell me the results, and this is the meaning of the callback.

In the case of various depths of our previous study, there is always a parameter is epochs, meaning the number of loop iterations of the network model, the beginning of time, we will give a big value, so that network training and then adjust the parameters, the epochs in the process of adjusting the parameters of the model to be in total to ensure that over-fitting state, the only way we can know under what circumstances the optimal network. In most of the previous examples it is reasonable also possible because our network trained quickly, there is no large data sets to the case can not be completed within an acceptable time, so this does not affect anything, but there are some exceptions, such as recurrent neural network previously mentioned, the training time may make you a bit impatient, this highlights the problem out, we need to solve this problem.

We view the data in the training process to find where to start fitting too, that the number of loop iterations we'll set that critical point, to re-train network, that ideas to solve the problem of whether an appeal can be changed to our training network, their own when did you start to judge over-fitting (or the data showed no significant change), and then automatically stop, naturally, the callback is the best way to deal with this problem, and now almost all kinds of deep learning framework provides such a method, specifically how to do it? We were still Keras example, here are three common examples, there are some nuances different frameworks, please pay attention when using:

# 定义需要哪些 callback
callbacks_list = [
    # 当所监控的数据不再增长,停止训练
    keras.callbacks.EarlyStopping(
        monitor='acc',
        patience=1,
    ),
    # 每轮训练都进行一次模型参数的保存,
    # monitor 表示如果监控数据没有改善(自动推断,还可用 mode 参数控制),不进行保存
    # save_best_only=True 时:
    #   val_loss: 自动推断 min
    #   val_acc: 自动推断 max
    keras.callbacks.ModelCheckpoint(
        filepath='point_model.h5',
        monitor='val_loss',
        save_best_only=True,
    ),
    # 当评价的指标没有提升时,减小学习率
    keras.callbacks. ReduceLROnPlateau(
        monitor='val_loss' 
        factor=0.1,    # 减小 10 倍学习率,2 或者 10 倍是常用值
        patience=10,   # 10 次迭代没有改善
    )
    # 当然,还可以自定义一些,keras.callbacks.Callback 的子类
]
model.fit(callbacks = callbacks_list)

TensorBoard

For TensorBoard, we Wen previous neural network handwritten numeral recognition mentioned in, when we say that we know that a trained network model, but do not know specifically how to run the network, it is necessary to visualize TensorBoard view, you can browser view, similar to this:

image

image

Code to achieve almost the same as the above Callback:

from keras.callbacks.tensorboard_v2 import TensorBoard
callback_list = [
        keras.callbacks.TensorBoard(
            log_dir='./logs',
            histogram_freq=1,
            embeddings_freq=1,
        )
    ]
model.fit(train_images, train_labels, epochs=5, batch_size=128, callbacks=callback_list)
# shell 中:tensorboard --logdir=logs
# 浏览器:http://localhost:6006/

I believe you now look at the contents of the previous data export, there will be a different feeling, it's not a devoted TensorBoard article, I suggest you go back and look, there will be different. Of course, there is a small episode, the latest official version of Keras and TensorFlow not compatible with this callback will complain, already Fix, GitHub Issues relevant here: https://bit.ly/2QDtXqN , No. 20 February change, I believe that the new version will be released shortly after this Fix.

Next, we discuss how to improve the depth of learning performance-related content.

The use of advanced design patterns

Method design pattern is often one of the best practices in the general programming, in the depth of learning, where the design pattern is somewhat different. Here includes three design patterns: a residual connection, and the depth of the batch standardization separable convolution.

Residual connection

image

Using the above- described functional programming method, the output of the front layer as an input training behind some of the layers, and to solve the problem represented by the disappearance of the gradient bottlenecks.

Batch standardization

Standardization is a batch standardization, standardization of the foregoing methods we've seen is: Let all the data minus the average value, and then divided by the standard deviation, respectively, so that the average value of zero and a standard deviation of 1, is not think of the standard normal distribution (the computer is cured mathematics), right, it is the way!

Approved standardized approach is convolution neural network During training, each layer of input distribution remains the same, how do effect is very good, but also a specific principle paper, talk to you later, but we know the training is significantly faster convergence process is significantly accelerated, increasing the effect of classification because the learning rate of less demanding, so the parameter adjustment is also easier, in short, is cool with it.

Here I must point out, especially for some of the more deep network, batch standardization helps gradient spread. Keras corresponding code is BatchNormalization, after convolution used in a dense layer or layers, offer codes:

conv_model.add(layers.Conv2D(32, 3, activation='relu'))
conv_model.add(layers.BatchNormalization())
​
dense_model.add(layers.Dense(32, activation='relu'))
​dense_model.add(layers.BatchNormalization())

Separable convolution depth

Depth practice separable convolution (SeparableConv2D) the input layer is performed separately spatial convolution, the convolution point by point and then mixing the output channel, can be replaced in some cases Conv2D layer, since its argument less, more floating point small, and therefore have better efficiency, is a lightweight model.

image

The following is an example of image classification code:

model = Sequential()
model.add(layers.SeparableConv2D(32, 3, activation='relu', input_shape=(64, 64, 3,)))
model.add(layers.SeparableConv2D(64, 3, activation='relu'))
model.add(layers.MaxPooling2D(2))
model.add(layers.SeparableConv2D(64, 3, activation='relu'))
model.add(layers.SeparableConv2D(128, 3, activation='relu'))
model.add(layers.MaxPooling2D(2))
model.add(layers.SeparableConv2D(64, 3, activation='relu'))
model.add(layers.SeparableConv2D(128, 3, activation='relu'))
# 极大的减少运算量
model.add(layers.GlobalAveragePooling2D())
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

Ultra parameter optimization

We have constructed a network model when decisions are often made seat of your pants: stacking many layers, what with the activation function, the need for standardization of batches etc. These parameters, called Super parameters. Usually we call parameter refers to the network training process, the internal nodes of the model parameters, and this parameter does not matter.

We know too complex parameter adjustment is too large, and therefore we humans make use of the machine to adjust the feedback signals as a basis for regulation, that same, this ultra parameter is not also consider it according to the machines, after all, the programmer do such a thing, programmers write code that will automate the completion of this work. So the conclusion is that in any case the machine needs to be done.

However, this method of optimizing parameters are not particularly good especially good unified approach, the process is generally read (or auto-generated) a set of parameters to build and train the model, saved the results, with the other set of parameters again model training, the result of which is better, leave a good result, so to repeat, a certain degree of time to get the final argument that it is the best argument. Note that this parameter is actually super-regulator to verify the training set, but also considered fitting that this requires attention.

Of course, this method of adjusting parameters over a certain algorithm is a typical Bayesian optimization algorithm may not be effective, the situation is too complicated, this is just one way, this is a big issue, talk to you later.

Model Integration

To address this issue, are generally effective in competition or production environment, limited role in the research study, it should be noted. What is called integrated model, that is, I have the same set of data, training methods with different network models, the results are pretty good, to combine their weighting, resulting outcome than each of which may better.

How to explain this? As an example not so accurate, picture classification, and some models are more concerned about the lines, some model focuses on color, they put together their model on both focus and attention to the color of the lines, so the final result will be better , in other words, it is to focus on different aspects of different models of data, more full three-dimensional analysis of the data, of course, the resulting effect is better. Of course, this also requires the integrated model are also good, otherwise there is then a trouble, you will waste this integrated model, which is well understood.

to sum up

Deep learning is a big issue, best practice has not so much, here is a high probability of mature and immature effective methods, some methods still in development stage, but all the details have been the middle of the neural network training process on difficult to say, so that only valid is the best, mature immature are worthy of discussion herein, this has to end here.

  • This article first number from the public: RAIS , welcome attention.

Guess you like

Origin www.cnblogs.com/renyuzhuo/p/12547422.html