tensorflow/model library source code Deeplabv3+ implementation (three) the VOC2012 data set re-divided training set, validation set

- 1. Introduction to PASACAL VOC2012 Data Set
- 2. Randomly divide training set and validation set

1. Introduction to PASACAL VOC2012 Data Set

The previous sections just run the code mechanically, and don't know the deep meaning. Here we are back to where we started-the data set. There are many introductions to the voc2012 data set on the Internet, and this introduction is very detailed: PASCAL-VOC2012 data set (vocdevkit, Vocbenchmark_release) is introduced in detail .

VOCdevkit
    +VOC2012
        +Annotations（17125） {
    
    id}.xml形式保存信息
        +ImageSets
            +Action （33） 存放人的动作
            +Layout（3） train.txt/val.tx/trainval.txt  存放人体部位
            +Main（63）  以{
    
    class}_val.txt等形式命名
            +Segmentation（3） train.txt等存放语义分割图像信息
        +JPEGImages（17125）  原图jpg格式
        +SegmentationClass（2913）分割图像png格式
        +SegmentationClassRaw（2913）
        +SegmentationObject（2913）  实例分割对象

The above is the structure of the data set folder, and the corresponding number after it indicates the number of sub-files contained. The pascal voc2012 data set has 17,125 images, but only 2913 images are used for semantic segmentation. The segmentation corresponding to the xml format information of each picture in the Annotation folder is 1, which means it is used for semantic segmentation.
For semantic segmentation, we need to pay attention to the three txt files in the Segmentation folder, corresponding to the number of train, val, and trainval. The train used by the official deeplab is 1464, val1449, and trainval2913. If you want to re-divide the training set and validation set, modify the content of these documents. The following describes how to modify it.

2. Randomly divide training set and validation set

Only the random sampling method is given here, and try again after k fold cross validation.

from __future__ import absolute_import, print_function
import os
import pandas as pd

path = '/home/hy/document/dataset/VOCdevkit/VOC2012/SegmentationClass'

lis = [i.split('.')[0] for i in os.listdir(path)]   # 读取原图
df = pd.DataFrame(lis, columns=['name'])
temp1 = df.sample(n=1464)   
train = temp1['name'].values.tolist()
print(len(train))
with open('/home/hy/document/dataset/VOCdevkit/VOC2012/ImageSets/Segmentation/train.txt', 'w') as f:
    for i in train[:-1]:
        f.write(i+'\r\n')
    f.write(train[-1])

temp2 = df.sample(n=583)
val = temp2['name'].values.tolist()
print(len(val))
with open('/home/hy/document/dataset/VOCdevkit/VOC2012/ImageSets/Segmentation/val.txt', 'w') as f:  #保存为val.txt
    for i in val[:-1]:
        f.write(i+'\r\n')
    f.write(val[-1])

print(len(set(train) & set(val)))

Because there was an out-of-memory error during val, I changed the val data set to a smaller size. Then the tfrecord file is generated. I modified it on the official build_data.py and build_voc2012_data.py code, and only need to modify some paths to convert:

import math
import os.path
import sys
import tensorflow as tf
import collections
import six

root_path = '/home/hy/document/dataset/VOCdevkit/VOC2012/ImageSets/Segmentation'
output_dir = '/home/hy/document/dataset/tfrecord'
image_folder = '/home/hy/document/dataset/VOCdevkit/VOC2012/JPEGImages'
semantic_segmentation_folder = '/home/hy/document/dataset/VOCdevkit/VOC2012/SegmentationClassRaw'
_NUM_SHARDS = 4

FLAGS = tf.app.flags.FLAGS

tf.app.flags.DEFINE_enum('image_format', 'png', ['jpg', 'jpeg', 'png'],
                         'Image format.')

tf.app.flags.DEFINE_enum('label_format', 'png', ['png'],
                         'Segmentation label format.')

# A map from image format to expected data format.
_IMAGE_FORMAT_MAP = {
    
    
    'jpg': 'jpeg',
    'jpeg': 'jpeg',
    'png': 'png',
}


class ImageReader(object):
  """Helper class that provides TensorFlow image coding utilities."""

  def __init__(self, image_format='jpeg', channels=3):
    with tf.Graph().as_default():
      self._decode_data = tf.placeholder(dtype=tf.string)
      self._image_format = image_format
      self._session = tf.Session()
      if self._image_format in ('jpeg', 'jpg'):
        self._decode = tf.image.decode_jpeg(self._decode_data,
                                            channels=channels)
      elif self._image_format == 'png':
        self._decode = tf.image.decode_png(self._decode_data,
                                           channels=channels)

  def read_image_dims(self, image_data):
    image = self.decode_image(image_data)
    return image.shape[:2]

  def decode_image(self, image_data):
    image = self._session.run(self._decode,
                              feed_dict={
    
    self._decode_data: image_data})
    if len(image.shape) != 3 or image.shape[2] not in (1, 3):
      raise ValueError('The image channels not supported.')

    return image


def _int64_list_feature(values):
  if not isinstance(values, collections.Iterable):
    values = [values]

  return tf.train.Feature(int64_list=tf.train.Int64List(value=values))


def _bytes_list_feature(values):
  def norm2bytes(value):
    return value.encode() if isinstance(value, str) and six.PY3 else value

  return tf.train.Feature(
      bytes_list=tf.train.BytesList(value=[norm2bytes(values)]))


def image_seg_to_tfexample(image_data, filename, height, width, seg_data):
  return tf.train.Example(features=tf.train.Features(feature={
    
    
      'image/encoded': _bytes_list_feature(image_data),
      'image/filename': _bytes_list_feature(filename),
      'image/format': _bytes_list_feature(
          _IMAGE_FORMAT_MAP[FLAGS.image_format]),
      'image/height': _int64_list_feature(height),
      'image/width': _int64_list_feature(width),
      'image/channels': _int64_list_feature(3),
      'image/segmentation/class/encoded': (
          _bytes_list_feature(seg_data)),
      'image/segmentation/class/format': _bytes_list_feature(
          FLAGS.label_format),
  }))


def _convert_dataset(dataset_split):
  """Converts the specified dataset split to TFRecord format.

  Args:
    dataset_split: The dataset split (e.g., train, test).

  Raises:
    RuntimeError: If loaded image and label have different shape.
  """
  dataset = os.path.basename(dataset_split)[:-4]
  sys.stdout.write('Processing ' + dataset)
  filenames = [x.strip('\n') for x in open(dataset_split, 'r')]
  num_images = len(filenames)
  num_per_shard = int(math.ceil(num_images / float(_NUM_SHARDS)))

  image_reader = ImageReader('jpeg', channels=3)
  label_reader = ImageReader('png', channels=1)

  if not tf.gfile.Exists(output_dir):
      tf.gfile.MakeDirs(output_dir)

  for shard_id in range(_NUM_SHARDS):
    output_filename = os.path.join(output_dir,
                                   '%s-%05d-of-%05d.tfrecord' % (dataset, shard_id, _NUM_SHARDS))
    with tf.python_io.TFRecordWriter(output_filename) as tfrecord_writer:
      start_idx = shard_id * num_per_shard
      end_idx = min((shard_id + 1) * num_per_shard, num_images)
      for i in range(start_idx, end_idx):
        sys.stdout.write('\r>> Converting image %d/%d shard %d' % (
            i + 1, len(filenames), shard_id))
        sys.stdout.flush()
        # Read the image.
        image_filename = os.path.join(image_folder, filenames[i] + '.' + 'jpg')
        image_data = tf.gfile.FastGFile(image_filename, 'rb').read()
        height, width = image_reader.read_image_dims(image_data)
        # Read the semantic segmentation annotation.
        seg_filename = os.path.join(semantic_segmentation_folder,
                                    filenames[i] + '.' + FLAGS.label_format)
        seg_data = tf.gfile.FastGFile(seg_filename, 'rb').read()
        seg_height, seg_width = label_reader.read_image_dims(seg_data)
        if height != seg_height or width != seg_width:
          raise RuntimeError('Shape mismatched between image and label.')
        # Convert to tf example.
        example = image_seg_to_tfexample(
            image_data, filenames[i], height, width, seg_data)
        tfrecord_writer.write(example.SerializeToString())
    sys.stdout.write('\n')
    sys.stdout.flush()


if __name__ == '__main__':
    dataset_splits = tf.gfile.Glob(os.path.join(root_path, '*.txt'))
    for dataset_split in dataset_splits:
        _convert_dataset(dataset_split)

Conversion result:
Insert picture description here
Then replace the corresponding tfrecord file and Segmentation file with the official deeplab corresponding file. Train.py and began to run again eval.py, assessed the results:
pit pit pit
has not miou result eval before, I found tensorflow under / deeplab module with the latest versions of models of different codes, it is estimated that no print miou value. After copying the eval.py code from the latest version of deeplab, it can be output.

-------------------------2020.4.24 update--------------------- -------------------------------------------------- -------
When converting the tfrecord file, you don't need to be as troublesome as me, just use build_voc2012_data.py directly, and pass the path etc. as parameters.

# 转换TFRecord
python build_voc2012_data.py --image_folder=‘/home/hy/document/dataset/VOCdevkit/VOC2012/JPEGImages’ \
                             --semantic_segmentation_folder=‘/home/hy/document/dataset/VOCdevkit/VOC2012/SegmentationClassRaw’ \
                             --list_folder=‘/home/hy/document/dataset/VOCdevkit/VOC2012/ImageSets/Segmentation’ \ 
                             --output_dir=/home/hy/document/dataset/tfrecord