最近做了个项目，中间用到maskrcnn提取掩模，环境需要部署在Windows系统中，对速度要求不高，在此做出分享。

疑点：

1、安装好vs2015以后，cocoapi才能正常安装，maskrcnn才能在windows中能够顺利运行，这里vs2015究竟是哪里影响它的实现；
2、对标注的有限数据，如何实现其数据增强，让json文件也能跟随一起变动；
3、matterport/maskrcnn相比于maskrcnn_benchmark，运行速度真不是慢一点，后者10帧/s，前者2s/帧，（750*500左右像素），相差近乎20倍呀。可能是我的GPU没有设置好；
4、skimage.io.imread(“pic.jpg”) 和 cv2.imread(‘pic.jpg’)，前者是array，后者是ndarray，两个真的是不一样，用cv2.imread读取在后面显示就会出现颜色的变化；
visualize.display_instances(image, r[‘rois’], r[‘masks’], r[‘class_ids’],class_names, r[‘scores’])

1、环境的准备

现在的深度学习模型大多采用linux开发，对windows不够友好。在此遇到大坑。此前在网上下载的是facebookresearch的官方版本maskrcnn-benchmark，它的前身是facebookresearch的Detectron 。

Detectron：它是基于linux环境下pytorch2和caffee2开发的，nvidia相配置的是CUDA8.0和cuDNN 6.0.21，离开了这些环境可能就不太好控制了，排除在外。
maskrcnn-benchmark :是基于linux环境下python3和pytorch1.0运行的，nvidia相配置的是CUDA=9.0, CUDNN=7。pytorch1.0是18年4月份发布的，据说是python下pytorch和caffe的合体，能够让pytorch开发，并用caffe2部署。宣传很高调，但对于这种新东西，盲目追从的结果就是踩坑。用maskrcnn-benchmark最苦逼的地方就是

import maskrcnn_benchmark._C

不能够正确导入._C。
去年在linux中部署成功过，上个月环境崩掉后，再没安装成功。在网上久求未果，说是环境没配好，对于这个，折腾半个月余，未有效果，暂且放弃。

opencv/dnn:是基于opencv下的dnn直接调用maskrcnn的tensorflow模型，其中只需要tensorflow的.pb和.pbtxt文件，前者权重，后者结构配置，用C++直接调用到工程中，用i5的cpu跑速度大概是2s左右，很好用。但最大的问题就是它不好训练，我用materport/Mask_RCNN训练得到一个.h5，但不知道如何将.h5转成opencv的C++可调用的.pb和.pbtxt，实在可惜。对于这种不能训练的网络，很鸡肋。
materport/Mask_RCNN：接下来就是matterport版本的Mask_RCNN，基于keras和tensorflow十分简单方便。缺点是跑的很慢，用GPU处理一张图片的速度和opencv/C++的cpu差不多，都是2秒左右，我真的很怀疑这个程序没有使用GPU加速。但其能够训练自己的数据集，能够方便的在window上python调用，这一点是很让人欣慰的。节下来就是如何训练自己数据集的过程。

2、让maskrcnn跑起来

在Windows上要安装好VS2015，我因为之前安装过vs2017，vs2012，vs2015，卸载vs2017，卸载vs2012，卸载vs2015，再安装vs2015，这样折腾了几遍，在也不能正常玩了。我用同门的电脑试了下，vs2015环境，很好的让maskrcnn跑起来了。具体细节还在探究。
环境部署好了，让maskrcnn跑起来：
如下是最简化版本，只需要修改一下两处：

这里用的是coco数据集训练的模型，所以必须要找到，mask_rcnn_coco.h5，放在COCO_MODEL_PATH处
加载待检测的图片路径到： image = cv2.imread(“images/262985539_1709e54576_z.jpg”)

import cv2
import time
from mrcnn.config import Config
import mrcnn.model as modellib
from mrcnn import visualize

class CocoConfig(Config):
    """Configuration for training on MS COCO.
    Derives from the base Config class and overrides values specific
    to the COCO dataset.
    """
    NAME = "coco"
    IMAGES_PER_GPU = 1
    NUM_CLASSES = 1 + 80  # COCO has 80 classes

config = CocoConfig()

# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference", model_dir="", config=config)

# Local path to trained weights file
COCO_MODEL_PATH = "weight/mask_rcnn_coco.h5"
# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)

# COCO Class names
# Index of the class in the list is its ID. For example, to get ID of
# the teddy bear class, use: class_names.index('teddy bear')
class_names = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
               'bus', 'train', 'truck', 'boat', 'traffic light',
               'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',
               'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',
               'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
               'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
               'kite', 'baseball bat', 'baseball glove', 'skateboard',
               'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
               'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
               'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
               'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
               'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
               'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',
               'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
               'teddy bear', 'hair drier', 'toothbrush']

image = cv2.imread("images/262985539_1709e54576_z.jpg")
start =time.clock()
results = model.detect([image], verbose=1)
r = results[0]
end = time.clock()
print(end - start)
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'],
                            class_names, r['scores'])

3、制作训练数据集

主要参考了两个博客：
https://blog.csdn.net/doudou_here/article/details/87855273
https://blog.csdn.net/qq_29462849/article/details/81037343
两位博主写的都很详细。
大致步骤主要分成3部分：

1、安装labelme并对自己的数据集做标记，生成每张图片的json文件
2、使用labelme的命令将json文件批量拆分
3、重新生成cv2_mask、json、labelme_json、pic，然后就可以用于训练
4、数据增强（待处理）
具体如下：

3.1 labelme标注数据集遇到问题

这里的labelme标注自己的数据集时，我的项目实现的是实例分割出一类物体，开始标的时候都打一样的label，结果变成了就变成了语义分割，凉。这里的每个类别中不同个体也必须要不同，如person1，person2，这都是person一个大类中的不同个体。

3.2 使用labelme的命令行命令将json文件拆分

one: 对文件进行重命名：

def rename():
    # 将所有的json文件重新命名
    import os
    file_dir = "labelme_dir"
    i = 0
    for files in os.listdir(file_dir):
        os.rename('labelme1\\'+files,  "labelme1\\"+str(i)+".json")
        i += 1

two: 命令行中对json文件进行批量处理

 for /l %x in (0, 1,114) do  labelme_json_to_dataset %x.json

3.3 重新生成cv2_mask、json、labelme_json、pic，然后就可以用于训练

生成用于训练的四个文件夹：cv2_mask、json、labelme_json、pic
其中最主要的就是把每个labelme_json文件copy到cv2_mask文件中，其中注意labelme_json中的mask命名都是一样的，在移动的时候要将其重新命名

def copyfile():
   # 将文件送到各个文件夹中：
   # cv2_mask:存放所有图片的mask
   # json：存放labelme标记的json文件
   # labelme_json：labelme_json_to_dataset img.json以后的图片
   # pic：存放原图片
   import shutil
   import os
   sourceDir = "labelme_json\\"
   cv2_maskDir = "cv2_mask\\"
   for i in range(115):
       shutil.copy(sourceDir+str(i)+"_json\\label.png",cv2_maskDir)
       os.rename(cv2_maskDir+"\\label.png",cv2_maskDir + "\\"+str(i)+".png")

3.4 数据的增强（待处理）

我们发现自己标记的数据有限，需要将这些数据进行旋转、加噪等操作，增加数据集。可是加了这些操作以后，如何让其json标记的文件能够跟着一起变换。

4、训练自己的数据集

需要修改

1、添加预训练的 mask_rcnn_coco.h5 模型作为我们参数的初始化
2、修改config中类别数，NUM_CLASSES = 1 + 1 #（背景 + 训练数据的类别数）
3、修改我们制作数据源的路径，将其指向我们制作数据的路径
4、labels[i].find(“face”) ！= -1 需要改成自己的类别名称，然后labels_form.append(“face”) 也要改成你自己的类别名称。后面两个elif 你可以把他删除掉。
如果你有更多的类别，只需要在后面加 elif 然后 labels_form.append(”你的类别名称“)即可
5、另外就是修改config中的一些配置了

# -*- coding: utf-8 -*-
import os
import numpy as np
import cv2
import matplotlib.pyplot as plt
from mrcnn.config import Config
from mrcnn import model as modellib,utils
import yaml
from PIL import Image

#os.environ["CUDA_VISIBLE_DEVICES"] = "0"
# Root directory of the project
ROOT_DIR = os.getcwd()

#ROOT_DIR = os.path.abspath("../")
# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")

iter_num=0

# Local path to trained weights file
COCO_MODEL_PATH = os.path.join(ROOT_DIR, "weight/mask_rcnn_coco.h5")
# Download COCO trained weights from Releases if needed
if not os.path.exists(COCO_MODEL_PATH):
    utils.download_trained_weights(COCO_MODEL_PATH)


class ShapesConfig(Config):
    """Configuration for training on the toy shapes dataset.
    Derives from the base Config class and overrides values specific
    to the toy shapes dataset.
    """
    # Give the configuration a recognizable name
    NAME = "shapes"

    # Train on 1 GPU and 8 images per GPU. We can put multiple images on each
    # GPU because the images are small. Batch size is 8 (GPUs * images/GPU).
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

    # Number of classes (including background)
    NUM_CLASSES = 1 + 1  # background + 3 shapes

    # Use small images for faster training. Set the limits of the small side
    # the large side, and that determines the image shape.
    IMAGE_MIN_DIM = 540
    IMAGE_MAX_DIM = 960

    # Use smaller anchors because our image and objects are small
    RPN_ANCHOR_SCALES = (8 * 6, 16 * 6, 32 * 6, 64 * 6, 128 * 6)  # anchor side in pixels

    # Reduce training ROIs per image because the images are small and have
    # few objects. Aim to allow ROI sampling to pick 33% positive ROIs.
    TRAIN_ROIS_PER_IMAGE = 100

    # Use a small epoch since the data is simple
    STEPS_PER_EPOCH = 20

    # use small validation steps since the epoch is small
    VALIDATION_STEPS = 50

    BACKBONE = "resnet50"


config = ShapesConfig()
config.display()

class DrugDataset(utils.Dataset):
    # 得到该图中有多少个实例（物体）
    def get_obj_index(self, image):
        n = np.max(image)
        return n

    # 解析labelme中得到的yaml文件，从而得到mask每一层对应的实例标签
    def from_yaml_get_class(self, image_id):
        info = self.image_info[image_id]
        with open(info['yaml_path']) as f:
            temp = yaml.load(f.read())
            labels = temp['label_names']
            del labels[0]
        return labels

    # 重新写draw_mask
    def draw_mask(self, num_obj, mask, image,image_id):
        #print("draw_mask-->",image_id)
        #print("self.image_info",self.image_info)
        info = self.image_info[image_id]
        #print("info-->",info)
        #print("info[width]----->",info['width'],"-info[height]--->",info['height'])
        for index in range(num_obj):
            for i in range(info['width']):
                for j in range(info['height']):
                    #print("image_id-->",image_id,"-i--->",i,"-j--->",j)
                    #print("info[width]----->",info['width'],"-info[height]--->",info['height'])
                    at_pixel = image.getpixel((i, j))
                    if at_pixel == index + 1:
                        mask[j, i, index] = 1
        return mask

    # 重新写load_shapes，里面包含自己的类别,可以任意添加
    # 并在self.image_info信息中添加了path、mask_path 、yaml_path
    # yaml_pathdataset_root_path = "/tongue_dateset/"
    # img_floder = dataset_root_path + "rgb"
    # mask_floder = dataset_root_path + "mask"
    # dataset_root_path = "/tongue_dateset/"
    def load_shapes(self, count, img_floder, mask_floder, imglist, dataset_root_path):
        """Generate the requested number of synthetic images.
        count: number of images to generate.
        height, width: the size of the generated images.
        """
        # Add classes,可通过这种方式扩展多个物体
        self.add_class("shapes", 1, "shape") # 黑色素瘤
        # self.add_class("shapes", 2, "person")
        # self.add_class("shapes", 3, "car")
        for i in range(count):
            # 获取图片宽和高

            filestr = imglist[i].split(".")[0]
            #print(imglist[i],"-->",cv_img.shape[1],"--->",cv_img.shape[0])
            #print("id-->", i, " imglist[", i, "]-->", imglist[i],"filestr-->",filestr)
            #filestr = filestr.split("_")[1]
            mask_path = mask_floder + "/" + filestr + ".png"
            yaml_path = dataset_root_path + "labelme_json/" + filestr + "_json/info.yaml"
            root = dataset_root_path + "labelme_json/" + filestr + "_json/img.png"
            cv_img = cv2.imread(root)
            # i += 1
            # if i < 113:
            #     continue
            self.add_image("shapes", image_id=i, path=img_floder + "/" + imglist[i],
                           width=cv_img.shape[1], height=cv_img.shape[0], mask_path=mask_path, yaml_path=yaml_path)
            print(dataset_root_path + "labelme_json/" + filestr + "_json/img.png")
            # print(i)
    # 重写load_mask
    def load_mask(self, image_id):
        """Generate instance masks for shapes of the given image ID.
        """
        global iter_num
        print("image_id",image_id)
        info = self.image_info[image_id]
        count = 1  # number of object
        img = Image.open(info['mask_path'])
        num_obj = self.get_obj_index(img)
        mask = np.zeros([info['height'], info['width'], num_obj], dtype=np.uint8)
        mask = self.draw_mask(num_obj, mask, img,image_id)
        occlusion = np.logical_not(mask[:, :, -1]).astype(np.uint8)
        for i in range(count - 2, -1, -1):
            mask[:, :, i] = mask[:, :, i] * occlusion

            occlusion = np.logical_and(occlusion, np.logical_not(mask[:, :, i]))
        labels = []
        labels = self.from_yaml_get_class(image_id)
        labels_form = []
        for i in range(len(labels)):
            if labels[i].find("shape") != -1:
                # print "box"
                labels_form.append("shape")
            # elif labels[i].find("person")!=-1:
            #     #print "column"
            #     labels_form.append("person")
            # elif labels[i].find("car")!=-1:
            #     #print "package"
            #     labels_form.append("car")
        class_ids = np.array([self.class_names.index(s) for s in labels_form])
        return mask, class_ids.astype(np.int32)

def get_ax(rows=1, cols=1, size=8):
    """Return a Matplotlib Axes array to be used in
    all visualizations in the notebook. Provide a
    central point to control graph sizes.

    Change the default size attribute to control the size
    of rendered images
    """
    _, ax = plt.subplots(rows, cols, figsize=(size * cols, size * rows))
    return ax

#基础设置
dataset_root_path="train_data/"
img_floder = dataset_root_path + "pic"
mask_floder = dataset_root_path + "cv2_mask"
#yaml_floder = dataset_root_path
imglist = os.listdir(img_floder)
count = len(imglist)

#train与val数据集准备
dataset_train = DrugDataset()
dataset_train.load_shapes(count, img_floder, mask_floder, imglist,dataset_root_path)
dataset_train.prepare()

#print("dataset_train-->",dataset_train._image_ids)

dataset_val = DrugDataset()
dataset_val.load_shapes(7, img_floder, mask_floder, imglist,dataset_root_path)
dataset_val.prepare()

#print("dataset_val-->",dataset_val._image_ids)



# Create model in training mode
model = modellib.MaskRCNN(mode="training", config=config,
                          model_dir=MODEL_DIR)

# Which weights to start with?
init_with = "coco"  # imagenet, coco, or last

if init_with == "imagenet":
    model.load_weights(model.get_imagenet_weights(), by_name=True)
elif init_with == "coco":
    # Load weights trained on MS COCO, but skip layers that
    # are different due to the different number of classes
    # See README for instructions to download the COCO weights
    model.load_weights(COCO_MODEL_PATH, by_name=True,
                       exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",
                                "mrcnn_bbox", "mrcnn_mask"])
elif init_with == "last":
    # Load the last model you trained and continue training
    model.load_weights(model.find_last()[1], by_name=True)

# Train the head branches
# Passing layers="heads" freezes all layers except the head
# layers. You can also pass a regular expression to select
# which layers to train by name pattern.
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE,
            epochs=20,
            layers='heads')

# Fine tune all layers
# Passing layers="all" trains all layers. You can also
# pass a regular expression to select which layers to
# train by name pattern.
model.train(dataset_train, dataset_val,
            learning_rate=config.LEARNING_RATE / 10,
            epochs=40,
            layers="all")

5、运用自己训练好的权重到实际项目中

这里最需要注意的就是，要和训练时的config参数一定要对应

# -*- coding: utf-8 -*-
import os
import sys
import random
import math
import numpy as np
import skimage.io
import matplotlib
import matplotlib.pyplot as plt
import cv2
import time
from mrcnn.config import Config
from datetime import datetime

# Root directory of the project
ROOT_DIR = os.getcwd()

# Import Mask RCNN
sys.path.append(ROOT_DIR)  # To find local version of the library
from mrcnn import utils
import mrcnn.model as modellib
from mrcnn import visualize

# Directory to save logs and trained model
MODEL_DIR = os.path.join(ROOT_DIR, "logs")

# Local path to trained weights file
COCO_MODEL_PATH = os.path.join("logs/shapes20190424T1512/mask_rcnn_shapes_0040.h5")

# Directory of images to run detection on
IMAGE_DIR = os.path.join(ROOT_DIR, "images")

class ShapesConfig(Config):
    """Configuration for training on the toy shapes dataset.
    Derives from the base Config class and overrides values specific
    to the toy shapes dataset.
    """
    # Give the configuration a recognizable name
    NAME = "shapes"

    # Train on 1 GPU and 8 images per GPU. We can put multiple images on each
    # GPU because the images are small. Batch size is 8 (GPUs * images/GPU).
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

    # Number of classes (including background)
    NUM_CLASSES = 1 + 1  # background + 3 shapes

    # Use small images for faster training. Set the limits of the small side
    # the large side, and that determines the image shape.
    IMAGE_MIN_DIM = 540
    IMAGE_MAX_DIM = 960

    # Use smaller anchors because our image and objects are small
    RPN_ANCHOR_SCALES = (8 * 6, 16 * 6, 32 * 6, 64 * 6, 128 * 6)  # anchor side in pixels

    # Reduce training ROIs per image because the images are small and have
    # few objects. Aim to allow ROI sampling to pick 33% positive ROIs.
    TRAIN_ROIS_PER_IMAGE = 100

    # Use a small epoch since the data is simple
    STEPS_PER_EPOCH = 20

    # use small validation steps since the epoch is small
    VALIDATION_STEPS = 50
    BACKBONE = "resnet50"

# import train_tongue
# class InferenceConfig(coco.CocoConfig):
class InferenceConfig(ShapesConfig):
    # Set batch size to 1 since we'll be running inference on
    # one image at a time. Batch size = GPU_COUNT * IMAGES_PER_GPU
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

config = InferenceConfig()

# Create model object in inference mode.
model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)

# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)

# COCO Class names
# Index of the class in the list is its ID. For example, to get ID of
# the teddy bear class, use: class_names.index('teddy bear')
class_names = ['BG', 'shape']

image = skimage.io.imread("images/img193.jpg")
a = datetime.now()
# Run detection
results = model.detect([image], verbose=1)
b = datetime.now()
print("shijian", (b - a).seconds)
# Visualize results
r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'],
                            class_names, r['scores'])

到此，一个完整的maskrcnn的tensorflow版本就能够运用到自己的项目中，前提是对时间要求不高。

matterport/MaskRCNN在windows中实现自己的分割项目

疑点：