超详细！使用Mask R-CNN训练自己的数据过程记录

0 感谢
1 环境和源码
2 数据集的制作
3. 开始训练
- 3.1 下载预训练权重文件
- 3.2 训练脚本
4. 效果测试
- 4.1 tensorboard可视化
- 4.2 分割结果可视化展示

0 感谢

本文主要参考了博客https://blog.csdn.net/doudou_here/article/details/87855273.尤其感谢该博主提供的训练脚本，让我少走了很多弯路。如果本文内容有所侵权，请提醒我删除，谢谢。
除了训练过程的操作，本文还会贴出一些方便数据集制作的代码（代码很低级，但是基本的功能可以实现），有需要的同学可以自行复制。

1 环境和源码

原文博主的环境是ubuntu16 + tensorflow-gpu 1.4.0 + keras 2.1.0，我的环境与其略有不同，但实践证明也是可以跑通的。
我的环境是ubuntu18.04 + tensorflow-gpu 1.9.0（考虑到1.4.0太旧了），keras 2.1.2（中科大的channel没有提供2.1.0，于是采用了相近的版本。
没有在windows环境中尝试是否可以运行成功，但据说是可以的。

源码链接https://github.com/matterport/Mask_RCNN.
为了方便大家下载，我把代码传到百度云了。链接: https://pan.baidu.com/s/1dIpt_8NNXbT6rDNz1f5INQ，密码：qwii

还有一个预训练权重文件mask_rcnn_coco.h5，是从别人的博客里下载的，来源忘记了，这里也传到百度云方便大家下载。链接：https://pan.baidu.com/s/1VBIFgpX95FXi6-5u2SB2Cw，密码：pn2f

2 数据集的制作

2.1 使用labelme对数据进行标注

首先，在对数据进行标注之前，建议先把所有数据转为统一的尺寸，并且重命名使得更加规整。
可以参考下面这段代码（但是我这个代码执行完之后，会多出一倍的图片，比如原来是5张，他会变成10张，我不知道是为什么，但不影响使用，把后面的多出来删掉就好了，按名称排序前5张和后5张是完全相同的）

import os
import os.path
from PIL import Image
def ResizeImage(f_path, out_dir, width, height, type):
    i = 0
    for root, dirs, files in os.walk(f_path):
        # 使用os.walk()方法遍历输出一个文件夹下的所有文件名
        # os.path.join()方法拼接文件名返回所有文件的路径
        for name in files:
            file = os.path.join(root, name)
            img = Image.open(file)
            out = img.resize((width, height), Image.ANTIALIAS)

            i += 1
            f_name = str(i) + '.jpeg'
            out_name = os.path.join(out_dir, f_name)
            out.save(out_name, type)

f_path = r'*********************'                                 # 图片所在路径
out_dir = r'****************************/out_dir'         #  图片输出路径
width = 384                       #  定义输出图片的宽和高
height = 512
type = 'jpeg'                     #  定义输出图片类型

ResizeImage(f_path, out_dir, width, height, type)

接下来，使用labelme工具对图片进行标注，如果你没有用过labelme工具，首先一键安装它。

sudo apt-get install python3-pyqt5
sudo pip3 install labelme

如果是windows，则用下面的代码

conda install pyqt
pip install labelme

如果下载过慢的话后面加上清华镜像 -i https://pypi.tuna.tsinghua.edu.cn/simple
安装很简单，遇到问题百度很容易解决。
然后在cmd中输入指令“labelme”则自动弹出窗口。
labelme 左上角open_dir 选择刚才输出的out_dir ，选择左边create那个，就可以创建区域了，然后输入标签的名字（这个名字要记好，因为后面还会再用到），比如我这里创建一个类别叫patrick_star。当然如果有更多的类，你也可以创建第二类第三类。

创建好了之后，会生成一个.json 文件，里边记录这各个标记点的位置信息和类别，然后需要把这个json文件转换成png格式的mask文件。

2.2 将json文件转换为模型需要的mask文件

使用的是labelme的labelme_json_to_dataset函数。
可以写一个简单的脚本让它批量执行。在json文件的同一目录下创建一个json2dataset.py文件，里面是这些内容：

import os
files=os.listdir('./')
files.remove('json2dataset.py')   # 删除这个py文件本身
for i in range(len(files)):
    os.system('labelme_json_to_dataset '+files[i])

然后在当前目录开启命令行，用python执行它就好了。
转换完之后，你看到的应该是这个样子：
json文件夹
每个文件夹里边应该包含这些文件：

json文件夹内部
注意这个1.png最初你看到的应该是label.png，但是接下来在2.3中我们要把它转换成对应文件夹的名字。
如果你生成的文件夹里没有.yaml文件，不要着急，跟着下面的操作来。
首先在你的电脑上找到json_to_dataset.py这个脚本，打开之后做如下修改：

# 最前面加入导包
import yaml

# 中间是代码的主体部分就不贴了
# 然后在最下面main函数之前加上这一部分：
    logger.warning('info.yaml is being replaced by label_names.txt')
    info = dict(label_names=label_names)
    with open(osp.join(out_dir, 'info.yaml'), 'w') as f:
        yaml.safe_dump(info, f, default_flow_style=False)
    logger.info('Saved to: {}'.format(out_dir))

# 下面就是main函数了
if __name__ == '__main__':
    main()

再重新执行一次json_to_dataset，然后就生成yaml了。

2.3 将数据整理成模型认可的形式

首先，创建一个文件夹my_data，在里边创建四个文件夹，并分别存放以下内容：

文件夹	内容
cv2_mask	json_to_dataset生成文件夹中的png格式label文件
json	labelme生成的json文件
labelme_json	json_to_dataset生成的文件夹
pic	尺寸标准化之后的原图

先别着急把图存进去，在此之前，为了适应模型内部默认的路径格式，需要对label.png进行简单的重命名（否则就要去代码里边改，比较麻烦）。
比如你的json文件夹叫1_json，那这个png的图就应该改成1.png，为了大家批量操作，我写了个简单的脚本。

# 把label.png改名为1.png
import os
for root, dirs, names in os.walk(r'******out_dir'):   # 改成你自己的json文件夹所在的目录
    for dr in dirs:
        file_dir = os.path.join(root, dr)
        # print(dr)
        file = os.path.join(file_dir, 'label.png')
        # print(file)
        new_name = dr.split('_')[0] + '.png'
        new_file_name = os.path.join(file_dir, new_name)
        os.rename(file, new_file_name)

其他三个文件夹都比较容易准备好，cv2_mask这个挨个复制比较麻烦，可以用下面的方法批量复制：

import os
from shutil import copyfile
for root, dirs, names in os.walk(r'******'):   # 改成你自己的json文件夹所在的目录
    for dr in dirs:
        file_dir = os.path.join(root, dr)
        print(dr)
        file = os.path.join(file_dir,'label.png')
        print(file)
        new_name = dr.split('_')[0] + '.png'
        new_file_name = os.path.join(file_dir, new_name)
        print(new_file_name)
        
        tar_root = r'******my_data/cv2_mask'      # 目标路径
        tar_file = os.path.join(tar_root, new_name)
        copyfile(new_file_name, tar_file)

至此，数据的准备工作就完成了，如果你已经下载了mask RCNN的源码，会发现里边有一个叫samples的目录，这是存放训练数据的目录，把整个my_data文件夹复制到这个位置。

3. 开始训练

3.1 下载预训练权重文件

这个权重文件mask_rcnn_coco.h5我在最上面部分放在百度网盘里了，需要的同学可以自行下载。
这个是在coco数据集上训练，迁移过来的。
下载好了之后放在项目根目录Mask_RCNN-master中就可以了。

3.2 训练脚本

在Mask_RCNN-master目录下创建一个名为train.py的文件，然后加入以下内容：
（首先声明一下，这段代码是从别人那里参考来的，侵权请提醒删除）

# -*- coding: utf-8 -*-
 
import os
import sys
import random
import math
import re
import time
import numpy as np
import cv2
# import matplotlib
# import matplotlib.pyplot as plt
import tensorflow as tf
from mrcnn.config import Config
# import utils
from mrcnn import model as modellib, utils
from mrcnn import visualize
import yaml
from mrcnn.model import log
from PIL import Image
 
# os.environ["CUDA_VISIBLE_DEVICES"] = "0"
# Root directory of the project
ROOT_DIR = os.getcwd()
 
# ROOT_DIR = os.path.abspath("../")
# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")
 
iter_num = 0
 
# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "mask_rcnn_coco.h5")
# Download COCO trained weights from Releases if needed
# if not os.path.exists(COCO_MODEL_PATH):
#     utils.download_trained_weights(COCO_MODEL_PATH)
 
 
class ShapesConfig(Config):
    """Configuration for training on the toy shapes dataset.
    Derives from the base Config class and overrides values specific
    to the toy shapes dataset.
    """
    # Give the configuration a recognizable name
    NAME = "shapes"
 
    # Train on 1 GPU and 8 images per GPU. We can put multiple images on each
    # GPU because the images are small. Batch size is 8 (GPUs * images/GPU).
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1
 
    # Number of classes (including background)
    NUM_CLASSES = 4 + 1  # background + 1 shapes  注意这里我是4类，所以是4+1
 
    # Use small images for faster training. Set the limits of the small side
    # the large side, and that determines the image shape.
    IMAGE_MIN_DIM = 256
    IMAGE_MAX_DIM = 1024
 
    # Use smaller anchors because our image and objects are small
    # RPN_ANCHOR_SCALES = (8 * 6, 16 * 6, 32 * 6, 64 * 6, 128 * 6)  # anchor side in pixels
    RPN_ANCHOR_SCALES = (16 * 6, 32 * 6, 64 * 6, 128 * 6, 256 * 6)    # 我的图片中目标比较大，所以我把anchor的尺寸也设置的大了一点
 
    # Reduce training ROIs per image because the images are small and have
    # few objects. Aim to allow ROI sampling to pick 33% positive ROIs.
    TRAIN_ROIS_PER_IMAGE = 100
 
    # Use a small epoch since the data is simple
    STEPS_PER_EPOCH = 50     #  每个epoch中迭代的step，最好不要改动
 
    # use small validation steps since the epoch is small
    VALIDATION_STEPS = 50
 
 
config = ShapesConfig()
config.display()
 
 
class DrugDataset(utils.Dataset):
    # 得到该图中有多少个实例（物体）
    def get_obj_index(self, image):
        n = np.max(image)
        return n
 
    # 解析labelme中得到的yaml文件，从而得到mask每一层对应的实例标签
    def from_yaml_get_class(self, image_id):
        info = self.image_info[image_id]
        with open(info['yaml_path']) as f:
            temp = yaml.load(f.read())
            labels = temp['label_names']
            del labels[0]
        return labels
 
    # 重新写draw_mask
    def draw_mask(self, num_obj, mask, image, image_id):
        # print("draw_mask-->",image_id)
        # print("self.image_info",self.image_info)
        info = self.image_info[image_id]
        # print("info-->",info)
        # print("info[width]----->",info['width'],"-info[height]--->",info['height'])
        for index in range(num_obj):
            for i in range(info['width']):
                for j in range(info['height']):
                    # print("image_id-->",image_id,"-i--->",i,"-j--->",j)
                    # print("info[width]----->",info['width'],"-info[height]--->",info['height'])
                    at_pixel = image.getpixel((i, j))
                    if at_pixel == index + 1:
                        mask[j, i, index] = 1
        return mask
 
    # 重新写load_shapes，里面包含自己的自己的类别
    # 并在self.image_info信息中添加了path、mask_path 、yaml_path
    # yaml_pathdataset_root_path = "/dateset/"
    # img_floder = dataset_root_path + "rgb"
    # mask_floder = dataset_root_path + "mask"
    # dataset_root_path = "/tongue_dateset/"
    def load_shapes(self, count, img_floder, mask_floder, imglist, dataset_root_path):
        """Generate the requested number of synthetic images.
        count: number of images to generate.
        height, width: the size of the generated images.
        """
        # Add classes
        self.add_class("shapes", 1, "leibie1")
        self.add_class("shapes", 2, "leibie2")
        self.add_class("shapes", 3, "leibie3")
        self.add_class("shapes", 4, "leibie4")
 
        for i in range(count):
            # 获取图片宽和高
            print(i)
            filestr = imglist[i].split(".")[0]
            # print(imglist[i],"-->",cv_img.shape[1],"--->",cv_img.shape[0])
            # print("id-->", i, " imglist[", i, "]-->", imglist[i],"filestr-->",filestr)
            # filestr = filestr.split("_")[1]
            mask_path = mask_floder + "/" + filestr + ".png"
            yaml_path = dataset_root_path + "labelme_json/" + filestr + "_json/info.yaml"
            print(dataset_root_path + "labelme_json/" + filestr + "_json/img.png")
            cv_img = cv2.imread(dataset_root_path + "labelme_json/" + filestr + "_json/img.png")
            print(type(cv_img))
 
            self.add_image("shapes", image_id=i, path=img_floder + "/" + imglist[i],
                           width=cv_img.shape[1], height=cv_img.shape[0], mask_path=mask_path, yaml_path=yaml_path)
 
    # 重写load_mask
    def load_mask(self, image_id):
        """Generate instance masks for shapes of the given image ID.
        """
        global iter_num
        print("image_id", image_id)
        info = self.image_info[image_id]
        count = 1  # number of object
        img = Image.open(info['mask_path'])
        num_obj = self.get_obj_index(img)
        mask = np.zeros([info['height'], info['width'], num_obj], dtype=np.uint8)
        mask = self.draw_mask(num_obj, mask, img, image_id)
        occlusion = np.logical_not(mask[:, :, -1]).astype(np.uint8)
        for i in range(count - 2, -1, -1):
            mask[:, :, i] = mask[:, :, i] * occlusion
 
            occlusion = np.logical_and(occlusion, np.logical_not(mask[:, :, i]))
        labels = []
        labels = self.from_yaml_get_class(image_id)
        labels_form = []
        for i in range(len(labels)):
            if labels[i].find("leibie1") != -1:
                labels_form.append("leibie1")
            elif labels[i].find("leibie2") != -1:
                labels_form.append("leibie2")
            elif labels[i].find("leibie3") != -1:
                labels_form.append("leibie3")
            elif labels[i].find("leibie4") != -1:
                labels_form.append("leibie4")

        class_ids = np.array([self.class_names.index(s) for s in labels_form])
        return mask, class_ids.astype(np.int32)
 
'''
def get_ax(rows=1, cols=1, size=8):
    """Return a Matplotlib Axes array to be used in
    all visualizations in the notebook. Provide a
    central point to control graph sizes.
    Change the default size attribute to control the size
    of rendered images
    """
    _, ax = plt.subplots(rows, cols, figsize=(size * cols, size * rows))
    return ax
'''
 
# 基础设置
dataset_root_path = "samples/my_data/"    # 你的数据的路径
img_floder = dataset_root_path + "pic"
mask_floder = dataset_root_path + "cv2_mask"
# yaml_floder = dataset_root_path
imglist = os.listdir(img_floder)
count = len(imglist)
 
# train与val数据集准备
dataset_train = DrugDataset()
dataset_train.load_shapes(count, img_floder, mask_floder, imglist, dataset_root_path)
dataset_train.prepare()
 
# print("dataset_train-->",dataset_train._image_ids)
 
dataset_val = DrugDataset()
dataset_val.load_shapes(count, img_floder, mask_floder, imglist, dataset_root_path)
dataset_val.prepare()
 
# print("dataset_val-->",dataset_val._image_ids)
 
# Load and display random samples
# image_ids = np.random.choice(dataset_train.image_ids, 4)
# for image_id in image_ids:
#    image = dataset_train.load_image(image_id)
#    mask, class_ids = dataset_train.load_mask(image_id)
#    visualize.display_top_masks(image, mask, class_ids, dataset_train.class_names)
 
# Create model in training mode
model = modellib.MaskRCNN(mode="training", config=config,
                          model_dir=MODEL_DIR)
 
# Which weights to start with?
init_with = "coco"  # imagenet, coco, or last
 
if init_with == "imagenet":
    model.load_weights(model.get_imagenet_weights(), by_name=True)
elif init_with == "coco":
    # Load weights trained on MS COCO, but skip layers that
    # are different due to the different number of classes
    # See README for instructions to download the COCO weights
    # print(COCO_MODEL_PATH)
    model.load_weights(COCO_MODEL_PATH, by_name=True,
                       exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",
                                "mrcnn_bbox", "mrcnn_mask"])
elif init_with == "last":
    # Load the last model you trained and continue training
    model.load_weights(model.find_last()[1], by_name=True)
 
# Train the head branches
# Passing layers="heads" freezes all layers except the head
# layers. You can also pass a regular expression to select
# which layers to train by name pattern.
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE,
            epochs=10,
            layers='heads')         # 固定其他层，只训练head，epoch为10
 
# Fine tune all layers
# Passing layers="all" trains all layers. You can also
# pass a regular expression to select which layers to
# train by name pattern.
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE / 10,
            epochs=10,
            layers="all")           # 微调所有层的参数，epoch为10

在上面的代码中主要有这几个地方需要修改：
（我懒得找第几行了，直接ctrl F找吧）
1、NUM_CLASSES，类别总数，要包含背景数。比如你只有一类patrick_star，那这里就是2，如果你有两类，那这里就是3。
2、添加你的类别

# Add classes
self.add_class("shapes", 1, "leibie1")

如果你只有一类patrick_star，那就把这里的leibie1换成patrick_star，注意这里的名字一定要与你在labelme中标注的类别名称保持一致，否则会报错。
然后在下面对应的

        for i in range(len(labels)):
            if labels[i].find("leibie1") != -1:
                labels_form.append("leibie1")

leibie1换成相应的名字。
3、修改数据集所在路径dataset_root_path = “samples/my_data/”
4、其他设置
代码默认的将batch_size设置为8，效果还不错，没有必要修改它（除非你的数据集特别大/小）
epoch的设置在代码最后面。
step的设置在代码前段，默认是50，改的话最好是50的倍数，因为模型的日志保存是每100步保存一次，我之前把它设置成30结果tensorboard什么也没存下来（也有可能是我的问题，没仔细看），当然你也可以修改模型保存的参数。
anchor的尺寸，在step设置的前面。

然后python执行它就可以开始训练了，应该会提示一堆缺少model，根据提示安装它就可以了。
如果报了错误IndexError: boolean index did not match indexed array along dimension 0; dim
dimension相关的错误，要么是你的数据没有没有处理好（比如名称、路径不对，labelme中的类别名和代码中的类别名不匹配等等），要么是代码里的类别数量没设置对，或者某些拼写错误，总之仔细检查一遍应该可以解决。

然后开始训练之后应该会看到大概这个样子：

然后等他训练完成就可以了。

4. 效果测试

4.1 tensorboard可视化

训练好的模型文件会保存在Mask_RCNN-master/logs目录下
在命令行中执行tensorboard --logdir=‘logs/shape*****’ 就可以看损失的变化情况了。
比如这是我整体的loss情况
loss
可以看出20个epoch之后loss还在下降，这个时候就应该多训练几个epoch。
看class loss更为明显：
class_loss
那就接着训练吧。

4.2 分割结果可视化展示

测试的代码也是从别人那里弄来的，侵删。
在项目根目录下创建forecast.py，加入以下内容，需要修改的部分我加了中文注释：

# -*- coding: utf-8 -*-
import os
import sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt
import cv2
import time
from mrcnn.config import Config
from datetime import datetime
# Root directory of the project
ROOT_DIR = os.getcwd()
 
# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize
# Import COCO config
# sys.path.append(os.path.join(ROOT_DIR, "samples/coco/"))  # To find local version
# from samples.coco import coco
 
 
# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")
 
# Local path to trained weights file
COCO_MODEL_PATH = "*******/Mask_RCNN-master/logs/shapes*****/mask_rcnn_shapes_00**.h5"   #  模型保存目录
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)
    print("cuiwei***********************")
 
# Directory of images to run detection on
IMAGE_DIR = os.path.join(ROOT_DIR, "images")
 
class ShapesConfig(Config):
    """Configuration for training on the toy shapes dataset.
    Derives from the base Config class and overrides values specific
    to the toy shapes dataset.
    """
    # Give the configuration a recognizable name
    NAME = "shapes"
 
    # Train on 1 GPU and 8 images per GPU. We can put multiple images on each
    # GPU because the images are small. Batch size is 8 (GPUs * images/GPU).
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1
 
    # Number of classes (including background)
    NUM_CLASSES = 4 + 1  # background + 3 shapes
 
    # Use small images for faster training. Set the limits of the small side
    # the large side, and that determines the image shape.
    IMAGE_MIN_DIM = 320
    IMAGE_MAX_DIM = 384
 
    # Use smaller anchors because our image and objects are small
    RPN_ANCHOR_SCALES = (8 * 6, 16 * 6, 32 * 6, 64 * 6, 128 * 6)  # anchor side in pixels
 
    # Reduce training ROIs per image because the images are small and have
    # few objects. Aim to allow ROI sampling to pick 33% positive ROIs.
    TRAIN_ROIS_PER_IMAGE =100
 
    # Use a small epoch since the data is simple
    STEPS_PER_EPOCH = 50
 
    # use small validation steps since the epoch is small
    VALIDATION_STEPS = 50
 
#import train_tongue
#class InferenceConfig(coco.CocoConfig):
class InferenceConfig(ShapesConfig):
    # Set batch size to 1 since we'll be running inference on
    # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1
 
config = InferenceConfig()
 
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)
 
 
# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)
 
# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)
 
# COCO Class names
# Index of the class in the list is its ID. For example, to get ID of
# the teddy bear class, use: class_names.index('teddy bear')
class_names = ['BG', 'leibie1'']   # 注意修改类别名称
# Load a random image from the images folder
file_names = next(os.walk(IMAGE_DIR))[2]
image = skimage.io.imread("img.png")      # 你想要测试的图片
 
a=datetime.now()
# Run detection
results = model.detect([image], verbose=1)
b=datetime.now()
# Visualize results
print("shijian",(b-a).seconds)
r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'],
                            class_names, r['scores'])

会出现这样的信息：
test 然后过一会儿会弹出图片。

如果卡在了Downloading pretrained model to *****，说明你的模型路径没设置好，仔细检查。

至此应该是运行完成了，剩下的调参工作因人而异，仔细看代码吧。
如果你觉得本文对你有帮助，记得点赞、投币、收藏，支持一下up主，那么我们下期再见。

超详细！使用Mask R-CNN训练自己的数据过程记录

超详细！使用Mask R-CNN训练自己的数据过程记录

0 感谢

1 环境和源码

2 数据集的制作

2.1 使用labelme对数据进行标注

2.2 将json文件转换为模型需要的mask文件

2.3 将数据整理成模型认可的形式

3. 开始训练

3.1 下载预训练权重文件

3.2 训练脚本

4. 效果测试

4.1 tensorboard可视化

4.2 分割结果可视化展示

猜你喜欢