Transfer learning, fine-tune the parameters and local recovery

Reference: transfer learning --Fine-tune

 

A migration study

It is to have trained model parameters are migrated to the new model to help train the new model.

Training and prediction models:
model deep learning can be divided into training and forecasting in two stages.
Training is divided into two strategies: one is self-made from scratch to build a model for training, one is trained by pre-training model.
Forecast is relatively simple, direct use of the model has been trained on data sets to predict.

advantage:

1) standing on the shoulders of giants : the former model spent a lot of energy out of training on a large probability model will be powerful than your own from scratch ride no need to re-create the wheel.
2) training costs can be very low : If you are migrating to learn the method of deriving a feature vector, the latter part of the training cost is very low, with the CPU is completely stress-free, no depth learning machine can do.
3) suitable for small data sets : the case itself is very small (a few thousand pictures) data set, from the beginning large-scale training the neural network has tens of millions of parameters is unrealistic, because they require larger amounts of data model The larger, over-fitting can not be avoided. If you want to spend this time of super large-scale neural network feature extraction capabilities, you can only learn by migration.

Migration ways of learning:

1, Transfer Learning: freeze all pre-trained convolution layer model, only their own customized training fully connected layer.

2, Extract Feature Vector: first calculate the feature vector convolution layer pre-training model for all training and test data, then set aside pre-training model, only train their own customized version of Jane with a fully connected network.
3, Fine-tune: frozen pre-trained model layer portion convolution (convolution most typically near the input layer), the remaining training convolution layer (layer is typically convolution portion near the output) and fully connected layer.
* Note: Transfer question Learning concern is: What is the "knowledge" and "knowledge" of how to get better use of before, it can have a lot of ways and means, eg: SVM, Bayesian, CNN and so on.

And fine-tune just a means to describe the more commonly used in fine-tuning the late migration of learning.

Comparison of three migration learning

1, and the model is essentially no difference between the first and second receive training, but the second computational complexity is much better than the first.
2, the first two are complementary to the third method, to further improve the model performance. It should be noted that this method is not necessarily true of the model has improved.
In essence: The three transfer learning way is to make pre-training model capable of identifying the work of the new data set, allowing the original features of the pre-training model extraction capacity has been fully released and utilized. However, on this basis, if you want the model to achieve a lower Loss, then transfer learning alone is not enough to rely on more or structure, and richness of new data collection model.

 Second, the experiment: try to fine-tune the model, in order to further improve model performance

1, fine-tune the effect:

Get a new set of data, pre-treatment with the first model train, usually using the above method or a second method to test the performance of pre-trained model on new data, if performed well, you can try to fine-tune, to continue to further unlock the convolution layer training.

But do not expect a qualitative leap, in addition, if due to the new data set with the original data set too different results in poor performance, one can consider re-training, on the other hand can also be considered relatively unlock multiple layers of training.

2, different data sets using the jog

Dataset 1: a small amount of data, but the data is very high similarity

In this case, all we do is modify the output category softmax final layers or final layer, a method

Data Set 2: a small amount of data, low data similarity

In this case, we can freeze the initial layer of pre-training model (such as k layer), and train the remaining (nk) layer again. Due to the low degree of similarity of the new data set, and therefore is significant new data set to a higher level based on re-training. Method Three

Data Set 3: large amount of data, low data similarity

In this case, because we have a large data set, our neural network training will be very effective. However, there are very different because of our data and the data used to train our model compared to pre-training. The use of pre-trained predictive model will not be effective. Therefore, it is best to start from scratch based on your data to train the neural network (Training from scatch).

Data Set 4: large amount of data, high similarity

This is the ideal situation. In this case, the pre-training model should be the most effective. The best model is the method of using the initial weight and the architecture model models weight retention. Then, we can use right model pre-trained in weight to retrain the model.

3. Notes trimmed
1) is common practice to cut off the last pre-trained network layer (layer softmax), and replace it with a new layer softmax related to our own problems.
2) Use a smaller learning rate to train the network.
3) If the number of data sets is too small, we just came in training last layer, if moderate number of data sets, frozen pre-trained network layers before the weight is also a common practice.

Note: The core is a convolutional neural network:
(1) base layer was extracted convolution shallow features, such as basic feature edge, contour and the like.
(2) a deep convolution abstraction layer was extracted features, such as the entire face.
(3) fully connected layers are classified according to the combination of features rates.

4, specific experimental operation steps

1, download the pre-training model

2, pre-processing: preprocessing of the data in accordance with the original pre-training model pretreatment, using a pre-training model must ensure that the data of the original data set to be training close as possible, so as to maximize the ability to model knowledge map.

3, the base models and custom model: Construction and inside the same model pre-trained.

4, view and restore fixed node name

5, the training process is set to recover, the list of fixed tensor

Third, the code details

The base model and customization model

Import slim.nets.resnet_v1 AS resnet_v1 

# define the model, because only the parameters given, and no model, where the need to specify the specific structure of the model 
    with slim.arg_scope (resnet_v1.resnet_arg_scope ()):
         # logits is the last predicted value, images is the input data, designated num_classes = None resnet model is provided to enable the final output layer is disabled 
        logits, end_points = resnet_v1.resnet_v1_50 (input_images inputs =, = num_classes None) 

    # custom output layer 
    with tf.variable_scope ( " logits " ):
         # the output data of the original dimension is a model remove 2 and 3, only the last batch number and the dimensions of the dimensions 300 * 300 * 1 3 4 
        # is the original two hundred thirty-four dimension to the fourth dimension all the compressed 
        net tf.squeeze = (logits, Axis = [. 1, 2 ])
         # were added one layer dropout
        = slim.dropout NET (NET, keep_prob = 0.5, scope = ' dropout_scope ' )
         # were added one fully connected layers, specify the size of the final output 
        logits = slim.fully_connected (NET, num_outputs = labels_nums, scope = ' FC ' )

Check fixed and restore node name

look_checkpoint.py

Import OS
 from tensorflow.python Import pywrap_tensorflow 

model_dir = The os.getcwd () # Get the current working directory file 
Print (model_dir) # output current path 
checkpoint_path R & lt = ' G: \. 1-modelused \ Siamese_Densenet_Single_Net \ Output \ 640model \ model3 / model_epoch_20 .ckpt ' # model_dir + "\\ \\ ckpt_dir is CKPT-Model-100" 

Print (checkpoint_path) # file read path output 
# read parameters from the checkpoint file 
Reader = pywrap_tensorflow.NewCheckpointReader (checkpoint_path) 
var_to_shape_map = reader.get_variable_to_shape_map ()
 #Variable values and output variable names 
for Key in var_to_shape_map:
     # IF key.startswith ( 'DenseNet_121 / AuxLogits'): 

    #      Print (. 1) 
    #      Print (Key) 
    Print ( " tensor_name: " , Key)

Set resume training process, a list of fixed tensor

R = CKPT_FILE ' . \ Pretrain \ resnet_v1_50.ckpt ' 
# does not require parameters loaded from Google trained model. Here is the final layer fully connected, because in the new issue to be re-training parameters in this layer. 
# Here are prefixed parameters 
CHECKPOINT_EXCLUDE_SCOPES = ' logits ' 
# # designating the last layer is fully connected training parameters, parameter names requires trained network layer is the last layer fully connected in the process of fine-tuning 
TRAINABLE_SCOPES = ' logits ' 

# get all need to be loaded from Google trained model parameters 
DEF get_tuned_variables (): 
    Exclusions = [scope.strip () for scope in CHECKPOINT_EXCLUDE_SCOPES.split ( ' , ' )] 
    variables_to_restore = []
    # Enumeration inception-v3 all model parameters, and then determines whether the list is removed from the load 
    for var in slim.get_model_variables (): 
        excluded = False
         for Exclusion in Exclusions:
             IF var.op.name.startswith (Exclusion) : 
                excluded = True
                 BREAK 
        IF  not excluded: 
            variables_to_restore.append (var) 
    return variables_to_restore 

# get a list of all variables need to be trained. 
DEF get_trainable_variables (): 
    Scopes = [scope.strip () for scope inTRAINABLE_SCOPES.split ( ' , ' )] 
    variables_to_train = []
     # enumerations prefix all parameters need to be trained and find all of these parameters by prefix. 
    for scope in Scopes: 
        the Variables = tf.get_collection (tf.GraphKeys.TRAINABLE_VARIABLES, scope) 
        variables_to_train.extend (the Variables) 
    return variables_to_train 






# define load Google trained Saver Inception-v3 model 
   load_fn = slim.assign_from_checkpoint_fn ( 
        CKPT_FILE, 
        get_tuned_variables ( ), 
        ignore_missing_vars = True 
    )
 
    Saver = tf.train.Saver (max_to_keep = 100)
    max_acc = 0.0
    with tf.Session() as sess:
        ckpt = tf.train.get_checkpoint_state('models/resnet_v1/')
        if ckpt and tf.train.checkpoint_exists(ckpt.model_checkpoint_path):
            saver.restore(sess, ckpt.model_checkpoint_path)
        else:
            sess.run(tf.global_variables_initializer())
            sess.run(tf.local_variables_initializer())
            # 加载谷歌已经训练好的模型
            print('Loading tuned variables from %s' % CKPT_FILE)
            load_fn(sess)

 

Guess you like

Origin www.cnblogs.com/qqw-1995/p/11423203.html