[mmsegmentation model training deeplabv3] custom data set loading and training | rle encoding to mmsegmentation | coco to mmsegmentation

Table of contents

foreword

mmsegmentation download:

mmsegmentation official website: Welcome to MMSegmentation documentation! — MMSegmentation 0.27.0 documentation https://mmsegmentation.readthedocs.io/zh_CN/latest/environmental configuration:

mmsegmentation dataset production 

Differences and connections between mask images, binarized images, and mmsegmentation annotation files:

SenseTime's annotation file format requirements:

Coco data set to mmsegmetation: (need to download pycocotools or pycocotools-windows)

 rle encoding to mmsegmentation format annotation:

model training

config file naming rules:​Edit

​Edit stand-alone single-card training:

Single machine multi-card training:

Multi-machine multi-card training:


foreword

mmsegmentation download:

GitHub - open-mmlab/mmsegmentation: OpenMMLab Semantic Segmentation Toolbox and Benchmark.https://github.com/open-mmlab/mmsegmentation

mmsegmentation official website: Welcome to MMSegmentation documentation! — MMSegmentation 0.27.0 documentation https://mmsegmentation.readthedocs.io/zh_CN/latest/environmental configuration:

Here I conducted training and testing under Windows 10 and Linux (Centos7) respectively.

Windows:

RTX 2080 Ti 1 sheet

CUDA 11.4

torch 1.9.1

torchvision 0.10.1

mmcv-full 1.4.0

Linux:

RTX 2080 Ti 3 sheets 

CUDA 10.2

torch 1.7.0

torchvision 0.8.1

mmcv-full 1.6.1

Note: mmcv-full is the full version, mmcv is the simplified version. Note that the version of mmcv corresponds to pytorch and cuda. 

 Install MMCV — mmcv 1.6.1 documentation https://mmcv.readthedocs.io/zh_CN/latest/get_started/installation.html


mmsegmentation dataset production 

Differences and connections between mask images, binarized images, and mmsegmentation annotation files:

Mask image: mask is a graphics operation used to partially or completely hide parts of an object or element .  The effect of applying a mask to a graphic object is as if the graphic object is painted on the background through a mask , so that each part of the graphic object is completely or partially covered , and the image in the mask does not change.

Binarized image: " Image Binarization " is the process of setting the grayscale value of the pixels on the image to 0 or 255 , that is, the process of presenting an obvious black and white effect to the entire image.

Mmsegmentation annotation file : generally refers to the unique annotation format that SenseTime loads semantic segmentation datasets.

Rle encoding: " The full name of RLE ( run-length encoding ), translated into run-length encoding, also translated into run-length encoding, also known as variable length encoding ( run coding ), is a kind of encoding for binary images in cybernetics The method encodes the continuous black and white pixel numbers ( runs ) with different codewords.

(From Baidu Encyclopedia~~~)

SenseTime's annotation file format requirements:

(1) data folder structure:

Note: seg_map_suffix is ​​the labeled image, and img_suffix is ​​the original image. Label the image as the mask corresponding to the original image, and the required number and shape of the image are the same as the original image. 

(2) Precautions for mask labeling of labeled images:

 Note: From the official website.

Coco data set to mmsegmetation: (need to download pycocotools or pycocotools-windows)

We use the standard coco data set to convert the annotation type of mmsegmentation. His principle is to binarize the coco type annotation into rle coded image annotation, decode the rle code to get the binary image, and then classify Association operation, so that it has multiple pixels representing multiple categories.

code:

import os
from pathlib import Path
import logging
import json
logger = logging.getLogger(__name__)

# @logger.catch(reraise=True)
def coco_to_mmsegmentation(
    annotations_file: str, output_annotations_file: str, output_masks_dir: str
):
    """Convert json in [segmentation format](https://gradiant.github.io/ai-dataset-template/supported_tasks/#segmentation) to txt in [mmsegmentation format](https://mmsegmentation.readthedocs.io/en/latest/tutorials/new_dataset.html#reorganize-dataset-to-existing-format).

    Args:
        annotations_file:
            path to json in [segmentation format](https://gradiant.github.io/ai-dataset-template/supported_tasks/#segmentation)
        output_annotations_file:
            path to write the txt in [mmsegmentation format](https://mmsegmentation.readthedocs.io/en/latest/tutorials/customize_datasets.html#customize-datasets-by-reorganizing-data)
        output_masks_dir:
            path where the masks generated from the annotations will be saved to.
            A single `{file_name}.png` mask will be generated for each image.
    """
    import cv2
    import numpy as np
    from pycocotools.coco import COCO

    if not os.path.isdir(output_masks_dir):
        os.mkdir(output_masks_dir)

    Path(output_annotations_file).parent.mkdir(parents=True, exist_ok=True)
    Path(output_masks_dir).mkdir(parents=True, exist_ok=True)

    logger.info(f"Loading annotations form {annotations_file}")
    annotations = json.load(open(annotations_file))

    logger.info(f"Saving annotations to {output_annotations_file}")
    with open(output_annotations_file, "w") as f:
        for image in annotations["images"]:
            # 读图片信息
            filename = Path(image["file_name"]).parent / Path(image["file_name"]).stem
            # 重新保存图片路径到txt
            f.write(str(filename))
            f.write("\n")

    logger.info(f"Saving masks to {output_masks_dir}")
    coco_annotations = COCO(annotations_file)
    for image_id, image_data in coco_annotations.imgs.items():

        filename = image_data["file_name"]

        anns_ids = coco_annotations.getAnnIds(imgIds=image_id)  # 一个图片会对应多个标注
        image_annotations = coco_annotations.loadAnns(anns_ids)  # 加载这个图片所有的标注

        logger.info(f"Creating output mask for {filename}")
        # 000纯黑色  这里是一通道矩阵,(二维矩阵)
        output_mask = np.zeros(
            (image_data["height"], image_data["width"]), dtype=np.uint8
        )
        # 找到这个图片对应的所有标注框
        for image_annotation in image_annotations:
            # print(type(image_annotation))  # dict
            # print(image_annotation.keys())  # dict_keys(['id', 'image_id', 'category_id', 'iscrowd', 'area', 'bbox', 'segmentation', 'width', 'height'])
            category_id = image_annotation["category_id"]  # 每一个标注对应的类别id
            # print(type(category_id))
            try:
                category_mask = coco_annotations.annToMask(image_annotation)
                # print(category_mask)
            except Exception as e:
                logger.warning(e)
                logger.warning(f"Skipping {image_annotation}")
                print('出错啦!!!---------------------------------')
                continue
            category_mask *= category_id  # mask值乘以id:id从1开始 --》 这里的框必须是同一个类别
            category_mask *= output_mask == 0  # 如果output_mask其中一个像素值是0,那么保持不动,如果不是0,则清零
            output_mask += category_mask  # 标注合并

        output_filename = Path(output_masks_dir) / Path(filename).with_suffix(".png")
        output_filename.parent.mkdir(parents=True, exist_ok=True)

        logger.info(f"Writting mask to {output_filename}")
        cv2.imwrite(str(output_filename), output_mask)

if __name__ == "__main__":
    coco_to_mmsegmentation(r"steel_coco.json", 'steel_coco.txt', 'mask_ann')

Note: The principle of coco to mmsegmentation type labeling is still to use rle encoding for conversion. If you have obtained the binary code labeling corresponding to the image, you can also directly convert it.

 rle encoding to mmsegmentation format annotation:

Here we do not use the pycocotools third-party library tool to operate from the rle encoding format.

code:

import os

import cv2
import numpy as np


def rle_decode(mask_rle: str = '', shape: tuple = (1400, 2100)):
    '''
    Decode rle encoded mask.

    :param mask_rle: run-length as string formatted (start length)
    :param shape: (height, width) of array to return
    Returns numpy array, 1 - mask, 0 - background
    '''
    s = mask_rle.split()  # 这个运算前后没啥区别
    # print("-----------------------------------------------------------")
    # print("s[0:][::2]=", s[0:][::2])  # 这个获取的是变化的像素的位置序号的列表
    # # ['1', '13']
    # print("s[1:][::2]=", s[1:][::2])  # 这个获取的是相同像素的长度列表(分别记录每个变化的像素后面连续的同等像素值的连续长度)
    # # ['2', '2']

    starts, lengths = [np.asarray(x, dtype=int) for x in (s[0:][::2], s[1:][::2])]
    # print("看下最初的starts=", starts)  # 变化的像素的位置序号的列表
    # print("lengths=", lengths)
    starts -= 1
    ends = starts + lengths
    # print("ends=", ends)
    img = np.zeros(shape[0] * shape[1], dtype=np.uint8)
    for lo, hi in zip(starts, ends):  # 进行恢复
        img[lo:hi] = 1
    return img.reshape(shape, order='F')


if __name__ == '__main__':
    import csv
    from PIL import Image

    if not os.path.isdir('mask_ann3'):
        os.mkdir('mask_ann3')

    with open(r'train.csv') as fp:
        fcsv = csv.reader(fp)

        file_mm_dict = {}
        for step, each_line in enumerate(fcsv):
            if step == 0:
                continue

            category_id = each_line[1]
            category_id = int(category_id)
            img_name = os.path.join('mask_ann', each_line[0][:-4] + '.png')
            image = Image.open(img_name)

            if each_line[0][:-4] not in file_mm_dict.keys():
                file_mm_dict[each_line[0][:-4]] = np.zeros((image.height, image.width), dtype=np.uint8)

            # print(image.size)
            result = rle_decode(each_line[-1], (image.height, image.width))
            # print(np.max(result))
            result *= category_id * 50 # mask值乘以id:id从1开始 --》 这里的框必须是同一个类别
            result *= file_mm_dict[each_line[0][:-4]] == 0  # 如果output_mask其中一个像素值是0,那么保持不动,如果不是0,则清零
            # print(np.max(file_mm_dict[each_line[0][:-4]]))
            file_mm_dict[each_line[0][:-4]] += result  # 标注合并
            # print(np.max(file_mm_dict[each_line[0][:-4]]))

        for key in file_mm_dict.keys():
            img = Image.fromarray(file_mm_dict[key])
            cv2.imwrite(r'mask_ann3/' + key + '.png', file_mm_dict[key])
            print(key + '.png' + '完成转换!!!')

model training

config file naming rules:

Single machine single card training:

We can use the most common commands directly!

 python .\tools\train.py configs/deeplabv3/deeplabv3_r50-d8_512x512_4x4_80k_coco-stuff164k.py --work-dir test_80k --gpus 1

Single machine multi-card training:

(1) Windows needs to use -m torch.distributed.launch to start, and configure --launch and --gpu-id parameters.

(2) Linux uses dist_train.sh for training.

Multi-machine multi-card training:

Linux uses slurm_train.sh for training.

Note: Since the load imbalance occurred when I was doing multi-card training, I will not elaborate here.

See: Train a model — MMSegmentation 0.27.0 documentation

Guess you like

Origin blog.csdn.net/m0_61139217/article/details/126228866