Article Directory
Deeplabv3+ environment configuration see the previous tutorial: environment configuration model training can be carried out after environment configuration, I choose to train on the PASCAL VOC2012 data set
1. Download the data set
Run the command on ubuntu:
#From the tensorflow/models/research/deeplab/datasets directory.
sh download_and_convert_voc2012.sh
This command can be downloaded and converted into tfrecord format. Create a pascal_voc_seg folder in the models/research/deeplab/datasets directory, the folder structure is as follows:
2. Download the pre-trained model
If you don't know which pre-training model to download, open the local_test_mobilenetv2.sh script to have a look. I started to use the wrong pre-training model, resulting in continuous training errors.
mobilenetv2_coco_voc_trainaug
After downloading, unzip it and save it to the local directory: /home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/init_models (need to be created manually)
3. Training
Refer to the training instructions given in the local_test_mobilenetv2.sh script:
NUM_ITERATIONS=10
python "${WORK_DIR}"/train.py \
--logtostderr \
--train_split="trainval" \
--model_variant="mobilenet_v2" \
--output_stride=16 \
--train_crop_size="513,513" \
--train_batch_size=4 \
--training_number_of_steps="${NUM_ITERATIONS}" \
--fine_tune_batch_norm=true \
--tf_initial_checkpoint="${INIT_FOLDER}/${CKPT_NAME}/model.ckpt-30000" \
--train_logdir="${TRAIN_LOGDIR}" \
--dataset_dir="${PASCAL_DATASET}"
Parameter format of actual training:
# from models/research directory
python deeplab/train.py \
--logtostderr \
--train_split="train" \
--model_variant="mobilenet_v2" \
--output_stride=16 \
--train_crop_size="513,513" \ # 网上说内存不够时可调小为321,我没调也能正常训练完成
--train_batch_size=1 \
--training_number_of_steps=1000 \
--fine_tune_batch_norm=False \
--tf_initial_checkpoint="/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/init_models/deeplabv3_mnv2_pascal_train_aug/model.ckpt-30000" \
--train_logdir="/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train" \
--dataset_dir="/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/tfrecord"
Parameter understanding
- training_number_of_steps: The number of training iterations, here is just verification, so set the smaller to 1000
- train_crop_size: The crop size of the training image, there is no change, the changed rules are mentioned in this article https://www.jianshu.com/p/1a07990705ee
crop_size = output_stride * k + 1 - train_batch_size: training batchsize, because of hardware conditions, set to 1 fine_tune_batch_norm=False: whether to use batch_norm, because batchsize is 1, so set to False
- tf_initial_checkpoint: the initial checkpoint for pre-training
- train_logdir: The directory where the training weights are saved. Note that it is created when the project directory is created at the beginning
- dataset_dir: The address of the dataset, the TFRecords directory created earlier
ERROR1
ValueError: Total size of new array must be unchanged for MobilenetV2/Conv/weights lh_shape: [(3, 3, 3, 16)], rh_shape: [(3, 3, 3, 32)]
It is because the pre-training model is downloaded incorrectly, and it can be replaced with the correct one.
The trained
model files have been saved in the train folder of the training results . The folder contains:
- checkpoint
- graph.pbtxt
- model.ckpt-0.data-00000-of-00002
…
4. Run eval.py
Same as tran.py, the parameters refer to the official ones. The parameters I used are as follows:
python deeplab/eval.py \
--logtostderr \
--eval_split="val" \
--model_variant="mobilenet_v2" \
--eval_crop_size="513,513" \
--checkpoint_dir='/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train' \
--eval_logdir='/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/eval' \
--dataset_dir='/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/tfrecord' \
--max_number_of_evaluations=1
ERROR2
shows that the memory is not enough. Use ps aux to check that the program occupies 330% of the CPU... Use kill -9 PID to kill. Change eval_crop_size to 321,321, an error occurred. . . Guess whether train_crop_size and eval_crop_size are the same (to be verified). Because there is no change during train, there is no way to keep 513, and then there is no error but no mIOU result is given to me. Let's move on to the next step.
5. Run vis.py to visualize the results
My parameters:
python deeplab/vis.py \
--logtostderr \
--vis_split="val" \
--model_variant="mobilenet_v2" \
--vis_crop_size="513,513" \
--checkpoint_dir="/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/train" \
--vis_logdir="/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/vis" \
--dataset_dir="/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/tfrecord" \
--max_number_of_iterations=1
Fortunately, it can be run, and it will be displayed after the run:
you can see the results in the vis folder.
- First look at the renderings given on the official website:
- The renderings I ran out:
6. Run export_model.py
My parameter settings:
# export_model
python deeplab/export_model.py \
--logtostderr \
--checkpoint_path="/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/init_models/deeplabv3_mnv2_pascal_train_aug/model.ckpt-30000" \
--export_path="/home/hy/software/models/research/deeplab/datasets/pascal_voc_seg/exp/train_on_train_set/export/frozen_inference_graph.pb" \
--model_variant="mobilenet_v2" \
--num_classes=21 \
--crop_size=513 \
--crop_size=513 \
--inference_scales=1.0
This step should be to export the parameters (to be verified)