Object detection part of the source code parsing

The main reference
based ssd VGG of
ssd mobilenet based on
pre-training model based on slim

1 Build your own model

The first step, to establish a framework to understand the new slim model. Model definition of a class, there are five main parts:

  • preprocess: Before running the detector on the input image, the input of any pre-treatment values ​​(e.g., scaling / shift / reshaped).
  • predict: generating prediction tensor can be transferred "primitive" to the loss of or behind the handler. Here it would involve model structure.
  • postprocess: converting the predicted output tensor final detection result.
  • loss: loss tensor calculation for groundtruth entity.
  • restore: Tensorflow loaded into the checkpoint FIG.

2 Understanding the source code
Here Insert Picture Description

2.1 Training Logic

Explain profile dataset / ssd_mobilenet_v1_pascal.config in / home / users / py3_project / models / research / object_detection directory: model name and model framework is set to call in the hyper-parameters, ultra-training parameters, data source files, etc.

Code a train.py: the beginning of the training model

  • First of all, mainly on the logic of the training process train.py.
    This step is a training module model train.py call. train.py called multiple modules, as follows:
    Here Insert Picture Description
    This step is one of the main functions of train.py - read the configuration file. Wherein train.py used mainly three portions, Model (model feature) and train_config (optimizer model pre-trained) and train_input_reader (the training input data set). The main input parameters calls - profile dataset / ssd_mobilenet_v1_pascal.config, below three part model, train_config, train_input_reader:
    Here Insert Picture Description
    Here Insert Picture Description
    

This step is one of the main functions of train.py - call training methods. In addition to the above call profile portion, wherein the main train.py follows:
Here Insert Picture Description

Code II: The main part of the operating model training trainer.py, specific implementation calls other modules

The step from the beginning to explain the training process trainer.py.

Model training mainly in trainer.py completed, training in the use of the migration model (incomplete use, fine-tuning), using lightweight slim frame, mainly because it comes with a lot GOOGLE trained models such as vgg, mobilenet etc. a variety of model structure.

The main steps of the migration model to explain the step of: initializing the model detection_model = create_model_fn, after detection_model.preprocess, detection_model.provide_groundtruth, detection_model.predict, detection_model.loss, detection_model.restore_map, set optimiter, into the training.

First introduced in trainer.py class model structure (243 lines of code), and then it was originally defined in train.py implemented as follows:
Here Insert Picture Description
call parameters model_build.build model structure is determined, and can be seen from FIG model_builder. import module py two model feature extractor and the corresponding feature extraction operation, model building mainly realized here:
Here Insert Picture Description
then, in solving total_loss trainer.py, after setting Optimizer, their specific implementation calls or other modules;

最后使用slim.learning.train进行训练,如下:
Here Insert Picture Description
上述函数中,其中train_tensor用于计算损失和应用梯度操作,train_dir存储检查点文件路径,init_fn应用预训练模型,并且使用的是预训练模型(检测、分类)中的检测模型(之前dataset/ssd_mobilenet_v1_pascal.config配置文件相应部分(train_config下的fine_checkpoint_type)为True),因为检测模型不需要修改,这样不仅能提高检测准确性还能加快整个模型的收敛速度,而图像分类模型类别数不同,需要重新训练,故将配置文件中model中的num-classes进行修改就可以进行训练。
Here Insert Picture Description

2.2 模型构建

最后以ssd_mobilenetV1为例讲解源码中的model构建关键点

问题一:模型怎么知道用的是ssd_mobilenetV1模型?

代码三builders/model_build.py:模型初始化部分,获取配置文件中的模型初始化超参数等信息

最开始由trainer.py里的create_model_fn方法里,追溯到它是在train.py中定义的方法,有第一个model_builder.build参数,
Here Insert Picture Description
Here Insert Picture Description
继续追踪builders/model_build.py代码,由下面代码可知与配置文件里的model配置参数有关,此处为ssd,并且后面有type: ‘ssd_mobilenet_v1’,故后面展开的是ssd_mobilenetV1模型的构建。
Here Insert Picture Description

问题二:模型构建的主要部分在哪里?

代码四meta_architectures/ssd_meta_arch.py:模型框架类的构造,含preprocess,predict,postprocess,loss,restore_map等,重点关注predict部分

代码五models/ssd_mobilenet_v1_feature_extractor.py、代码六models/feature_map_genetors.py:特征feature_maps的生成部分,前者主要继承mobilenetv1框架,后者在mobilenetv1加一些卷积等操作,实现具体构造特征feature_maps。

该段操作在meta_architectures/ssd_meta_arch.py中,首先定义类及参数,里面包含基本的preprocess,predict,postprocess,loss,restore_map操作。并且可以看出SSDMetaArch这个类继承于model.DetectionModel。
Here Insert Picture Description
已知模型网络结构一般会出现predict操作(meta_architectures/ssd_meta_arch.py第343行)中,如果需要对模型结构进行微调需要修改predict代码,由prediction_dict(一般为predict模块重要输出)跟踪到_feature_extractor出,进而跟踪它在models/ssd_mobilenet_v1_feature_extractor.py中进行了定义(第383行跟踪到)。
Here Insert Picture Description
接下来这部分才是模型的网络结构部分,可以着重看这部分代码。由上面跟踪到models/ssd_mobilenet_v1_feature_extractor.py中的extract_features方法才是模型网络结构框架的主要代码,from_layer应该对应ssd_mobilenetV1中在6个层上进行feature map特征提取进行预测,对应配置文件中的num_layers,可以看出模型在mobilenet的11层、13层产生了feature map,与ssd_mobilenetV1模型结构相符,use_depthwise代表使用的是dw深层卷积,是mobilenet新提出来的卷积结构。
Here Insert Picture Description
Here Insert Picture Description
可以看出配置文件该部分是核心部分(feature map特征提取层),num_layers表示提取6层特征,不同的aspect_ratios代表不同框大小,共有6个default boxes(初始预测框大小,aspect_ratios=1.0产生两个default boxes,其他各一个),框大小计算如下:
Here Insert Picture Description
紧接着上面models/ssd_mobilenet_v1_feature_extractor.py,这部分调用了mobilenetv1的图模型,并且可以看出最后点是Conv2d_13_pointwise,也是mobilenetv1图模型的最后一个节点,如下:
Here Insert Picture Description
从上面可以看到feature_maps(上图代码125行)的细节调用了models/feature_map_genetors.py模块的multi_resolution_feature_maps方法,这里面包含了提取特征层feature_maps的生成,包含了一些卷积操作,可以细看,每个特征层由在最后一层后面加两个普通卷积层而来,尺寸分别为11,33,深度每次不一样,参考ssd_mobilenetv1的框架结构。
Here Insert Picture Description
代码七core/box_predictor.py:在上一步构造完feature_maps后,使用这些层实现(框)回归和分类

feature_maps构造成功之后,将会使用它进行更重要的操作–回归和分类,主要在core/box_predict.py(上一级是meta_architectures/ssd_meta_arch.py)中,在feature_maps层的基础上进行回归和分类,具体实现就是在每层特征层后面再多加一个卷积操作。
Here Insert Picture Description
Here Insert Picture Description
问题三:模型能否从0开始训练?

模型可以从0开始训练,这时需将训练方法slim.learning.train(trainer.py中)中的init_fn=init_fn参数去掉,即不使用训练好的checkpoint文件预训练模型。

3 模型优化

上面讲解了很多ssd模型的原理,通过查阅资料得知ssd模型的优化主要在三方面:

  • 数据增强
    该部分已经完成,可以将图片在输入模型前增加一些增强性样本,比如增加一些旋转、高斯模糊、高斯噪声等的新样本进入训练;

  • feature map个数或位置改变
    打印出源代码中的feature_maps特征提取层如下,前两个map卷积层深度(512,1024)对应下图中(models/ssd_mobilenet_v1_feature_extractor.py)layer_depth的前两位(-1,-1),后四个map卷积层深度(512,256,256,128)对应下图中(models/ssd_mobilenet_v1_feature_extractor.py)layer_depth的后四位(512,256,256,128),并且它们与预训练模型checkpoint文件内变量的参数要保持一致。
    Here Insert Picture Description
    feature map个数:由于自己的训练集样本数较少,因此不需要使用较多的特征层,可以减少一些feature map个数,除了修改配置文件中的num_layers,比如改为5,还要修改模型的框架结构models/ssd_mobilenet_v1_feature_extractor.py,可以自建函数print_tensors_in_checkpoint_file输出TensorFlow中checkpoint内变量辅助查看,尝试把最后一个特征层去掉代码正常运行(红色框):

  • changing the default number of boxes
    due to less number of its training set samples, the number of general profile aspect_ratios reduced, such as to remove aspect_ratios = 3 and 1/3, this can speed up training, finally own set of data show that the operation does not affect the final result of the detection

Summarized
addition to the above method, through source code analysis of the above known frame structure feature_maps layer is very important, so the structure can be changed by changing the feature_maps models, such as the use of convolutional dw (source code using a common convolution), or increase, decrease convolution layer, or modify the operation of part of the inheritance mobilenetv1 convolution layer, we have modified the model can be achieved, but at this time can not be used pre-training model.

4 View checkpoint in the variable method:

def print_tensors_in_checkpoint_file(file_name, tensor_name):
  """Prints tensors in a checkpoint file.
  If no `tensor_name` is provided, prints the tensor names and shapes
  in the checkpoint file.
  If `tensor_name` is provided, prints the content of the tensor.
  Args:
    file_name: Name of the checkpoint file.
    tensor_name: Name of the tensor in the checkpoint file to print.
  """
  try:
    reader = tf.train.NewCheckpointReader(file_name)
    if not tensor_name:
      print(reader.debug_string().decode("utf-8"))
    else:
      print("tensor_name: ", tensor_name)
      print(reader.get_tensor(tensor_name))
  except Exception as e:  # pylint: disable=broad-except
    print(str(e))
    if "corrupted compressed block contents" in str(e):
      print("It's likely that your checkpoint file has been compressed "  
            "with SNAPPY.")

The output portion of the inner checkpoint following variables:

Here Insert Picture Description

Published 29 original articles · won praise 5 · views 10000 +

Guess you like

Origin blog.csdn.net/qq_29153321/article/details/103936938