tensorflow/model library source code Deeplabv3+ implementation (two) training on the PASCAL VOC2012 data set


Deeplabv3+ environment configuration see the previous tutorial: environment configuration model training can be carried out after environment configuration, I choose to train on the PASCAL VOC2012 data set

1. Download the data set

Run the command on ubuntu:

#From the tensorflow/models/research/deeplab/datasets directory.
  sh download_and_convert_voc2012.sh

This command can be downloaded and converted into tfrecord format. Create a pascal_voc_seg folder in the models/research/deeplab/datasets directory, the folder structure is as follows:
Insert picture description here

2. Download the pre-trained model

If you don't know which pre-training model to download, open the local_test_mobilenetv2.sh script to have a look. I started to use the wrong pre-training model, resulting in continuous training errors.

mobilenetv2_coco_voc_trainaug

After downloading, unzip it and save it to the local directory: /home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/init_models (need to be created manually)

3. Training

Refer to the training instructions given in the local_test_mobilenetv2.sh script:

NUM_ITERATIONS=10
python "${WORK_DIR}"/train.py \
  --logtostderr \
  --train_split="trainval" \
  --model_variant="mobilenet_v2" \
  --output_stride=16 \
  --train_crop_size="513,513" \
  --train_batch_size=4 \
  --training_number_of_steps="${NUM_ITERATIONS}" \
  --fine_tune_batch_norm=true \
  --tf_initial_checkpoint="${INIT_FOLDER}/${CKPT_NAME}/model.ckpt-30000" \
  --train_logdir="${TRAIN_LOGDIR}" \
  --dataset_dir="${PASCAL_DATASET}"

Parameter format of actual training:

# from models/research directory
python deeplab/train.py \
--logtostderr \
--train_split="train" \
--model_variant="mobilenet_v2" \
--output_stride=16 \
--train_crop_size="513,513" \ # 网上说内存不够时可调小为321,我没调也能正常训练完成
--train_batch_size=1 \
--training_number_of_steps=1000 \
--fine_tune_batch_norm=False \
--tf_initial_checkpoint="/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/init_models/deeplabv3_mnv2_pascal_train_aug/model.ckpt-30000" \
--train_logdir="/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train" \
--dataset_dir="/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/tfrecord"

Parameter understanding

  • training_number_of_steps: The number of training iterations, here is just verification, so set the smaller to 1000
  • train_crop_size: The crop size of the training image, there is no change, the changed rules are mentioned in this article https://www.jianshu.com/p/1a07990705ee
    crop_size = output_stride * k + 1
  • train_batch_size: training batchsize, because of hardware conditions, set to 1 fine_tune_batch_norm=False: whether to use batch_norm, because batchsize is 1, so set to False
  • tf_initial_checkpoint: the initial checkpoint for pre-training
  • train_logdir: The directory where the training weights are saved. Note that it is created when the project directory is created at the beginning
  • dataset_dir: The address of the dataset, the TFRecords directory created earlier

ERROR1

ValueError: Total size of new array must be unchanged for MobilenetV2/Conv/weights lh_shape: [(3, 3, 3, 16)], rh_shape: [(3, 3, 3, 32)]

It is because the pre-training model is downloaded incorrectly, and it can be replaced with the correct one.
The trained
Insert picture description here model files have been saved in the train folder of the training results . The folder contains:

  • checkpoint
  • graph.pbtxt
  • model.ckpt-0.data-00000-of-00002

4. Run eval.py

Same as tran.py, the parameters refer to the official ones. The parameters I used are as follows:

python deeplab/eval.py \
--logtostderr \
--eval_split="val" \
--model_variant="mobilenet_v2" \
--eval_crop_size="513,513" \
--checkpoint_dir='/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train' \
--eval_logdir='/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/eval' \
--dataset_dir='/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/tfrecord' \
--max_number_of_evaluations=1

ERROR2
Insert picture description hereshows that the memory is not enough. Use ps aux to check that the program occupies 330% of the CPU... Use kill -9 PID to kill. Change eval_crop_size to 321,321, an error occurred. . . Guess whether train_crop_size and eval_crop_size are the same (to be verified). Because there is no change during train, there is no way to keep 513, and then there is no error but no mIOU result is given to me. Let's move on to the next step.

5. Run vis.py to visualize the results

My parameters:

python deeplab/vis.py \
--logtostderr \
--vis_split="val" \
--model_variant="mobilenet_v2" \
--vis_crop_size="513,513" \
--checkpoint_dir="/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train" \
--vis_logdir="/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/vis" \
--dataset_dir="/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/tfrecord" \
--max_number_of_iterations=1

Fortunately, it can be run, and it will be displayed after the run:
Insert picture description hereyou can see the results in the vis folder.

  • First look at the renderings given on the official website:
    Insert picture description here
  • The renderings I ran out:
    Insert picture description hereInsert picture description hereInsert picture description hereInsert picture description here

6. Run export_model.py

My parameter settings:

# export_model
python deeplab/export_model.py \
--logtostderr \
--checkpoint_path="/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/init_models/deeplabv3_mnv2_pascal_train_aug/model.ckpt-30000" \
--export_path="/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/export/frozen_inference_graph.pb" \
--model_variant="mobilenet_v2" \
--num_classes=21 \
--crop_size=513 \
--crop_size=513 \
--inference_scales=1.0

This step should be to export the parameters (to be verified)

Guess you like

Origin blog.csdn.net/qq_43265072/article/details/105477047