6.8 Mask RCNN分割案例

学习目标

  • 目标
    • 知道分割数据集的读取处理方式
  • 应用
    • 应用完成数据集内容标签结果的读取

6.8.1 分割数据集介绍-气球分割数据集

气球分割数据集是一个小型的分割任务数据。目的是将气球从图片或者视频中分割出来。数据集有训练集和验证集

目录如下:

  • 训练验证都含有.json标注文件以及jpg文件
    • 大多数分割通过VIA tool标注工具可以生成每个图片的mask结果
    • 注意:这里的标注数据中并没有提供检测框的标注信息,后期检测框的生成是动态根据mask结果生成的
      • voc,coco等数据集中会提供了两者标注结果

其中json文件中的标注格式包含如下:

{
# 第一张图片的物体标记结果
"24631331976_defa3bb61f_k.jpg668058":{"fileref":"",
"size":668058,"filename":"24631331976_defa3bb61f_k.jpg",
"base64_img_data":"","file_attributes":{},
"regions":{"0":{"shape_attributes":{"name":"polygon","all_points_x":[916,913,905,889,868,836,809,792,789,784,777,769,767,777,786,791,769,739,714,678,645,615,595,583,580,584,595,614,645,676,716,769,815,849,875,900,916,916],"all_points_y":[515,583,616,656,696,737,753,767,777,785,785,778,768,766,760,755,755,743,728,702,670,629,588,539,500,458,425,394,360,342,329,331,347,371,398,442,504,515]},"region_attributes":{}}}},
# 第二张图片的物体标记
"16335852991_f55de7958d_k.jpg1767935":{"fileref":"","size":1767935,"filename":"16335852991_f55de7958d_k.jpg","base64_img_data":"","file_attributes":{},
"regions":{
  "0":{"shape_attributes":{"name":"polygon","all_points_x":[588,617,649,673,692,708,722,730,737,718,706,699,697,676,650,613,580,552,534,520,513,513,521,526,541,560,588],"all_points_y":[173,168,172,182,197,216,237,260,283,312,341,367,390,369,349,337,337,347,361,332,296,266,243,225,205,187,173]},"region_attributes":{}},
  "1":{"shape_attributes":{"name":"polygon","all_points_x":[845,861,880,892,902,910,889,869,844,813,785,762,745,739,731,746,767,790,821,845],"all_points_y":[219,229,242,260,275,299,277,263,254,250,255,265,279,283,258,241,225,216,213,219]},"region_attributes":{}},
  "2":{"shape_attributes":{"name":"polygon","all_points_x":[931,928,920,913,897,872,840,811,789,768,754,730,726,724,718,698,698,707,721,734,746,769,794,822,845,865,889,910,921,929,931],"all_points_y":[378,402,435,454,475,460,450,449,450,460,469,489,486,459,426,390,367,335,306,290,278,261,252,250,254,261,277,299,323,354,378]},"region_attributes":{}},
  "3":{"shape_attributes":{"name":"polygon","all_points_x":[927,946,968,989,992,985,975,957,937,913,889,862,852,876,897,910,925,933,939,939,935,927,910,900,927],"all_points_y":[486,498,516,553,593,630,649,668,686,700,707,707,708,691,675,656,635,610,587,562,538,512,492,480,486]},"region_attributes":{}},
  "4":{"shape_attributes":{"name":"polygon","all_points_x":[704,692,690,691,699,711,723,742,766,785,807,839,865,887,904,923,933,939,939,931,920,905,885,861,839,808,786,769,754,748,746,738,738,729,722,718,704],"all_points_y":[664,631,604,580,545,521,498,480,461,452,449,449,457,469,484,506,532,565,584,620,643,662,682,701,713,719,723,728,733,731,738,737,729,720,708,690,664]},"region_attributes":{}},
  "5":{"shape_attributes":{"name":"polygon","all_points_x":[526,509,497,493,490,493,501,512,526,546,573,603,626,662,688,709,721,724,724,719,704,694,691,691,682,683,687,688,684,682,679,676,664,648,620,587,564,548,526],"all_points_y":[551,526,498,470,444,422,398,381,365,351,340,338,340,357,381,408,438,466,493,504,531,568,584,604,609,612,610,617,625,625,619,616,620,619,609,599,585,573,551]},"region_attributes":{}},
  "6":{"shape_attributes":{"name":"polygon","all_points_x":[594,579,567,563,564,568,579,605,631,656,671,676,682,684,687,687,684,684,691,691,694,702,711,719,722,729,737,738,746,749,756,765,757,728,714,683,654,623,594],"all_points_y":[735,712,691,659,631,612,596,605,613,621,618,616,625,625,616,612,612,608,605,619,637,656,678,692,706,719,727,737,739,731,734,730,741,762,766,772,769,757,735]},"region_attributes":{}}
}},
...
...
...

其中:"all_points_x":[588,617,649,673,692,708,722,730,737,718,706,699,697,676,650,613,580,552,534,520,513,513,521,526,541,560,588],"all_points_y":[173,168,172,182,197,216,237,260,283,312,341,367,390,369,349,337,337,347,361,332,296,266,243,225,205,187,173]}

表示该被标注物体所有像素点的坐标。

6.8.2 模型介绍

选用maskrcnn模型进行分割案例。maskrcnn的源码版本中选择最新的2.0版本。

github高星实现版本
  • 高星版本:MaskRCNN
    • TensorFlow与keras实现的版本,代码只能在1.x版本运行,需要同时keras和TensorFlow两个库才能运行

使用版本是基于这个版本修改之后能在2.0环境下运行的maskrcnn源码。

并且预训练模型地址:maskrcnn迁移学习预训练模型

  • 可以使用多种预训练模型,这里提供Imagenet数据集训练的迁移模型

6.8.3 项目介绍

1、分割效果演示:

1、图片效果

分割图片

分割效果

注:我们这里做的是直接将分割的结果显示原色,其他部分变成灰度图。

2、项目模块介绍

  • ballon_dataset:项目的数据集
  • logs:模型训练保存结果
  • mrcnn:模型结构以及配置代码
  • balloon_main:模型训练以及测试代码

其中Images:测试检测的图片或视频以及输出结果

6.8.4 项目训练过程实现

  • 步骤
    • 1、数据集读取处理和准备
      • maskrcnn模型源码中Sequence封装数据集类使用
      • 实现数据标签文件的读取
    • 2、模型配置文件解析与修改、模型预训练模型加载、模型构建
      • maskrcnn配置介绍
      • 模型文件过程使用源码解析
    • 3、模型训练过程实现
      • 训练代码封装介绍
    • 4、模型测试过程实现
      • 图片预测结果处理

6.8.4.1 数据集的读取处理

maskrcnn源码中utils.py文件封装了Dataset类,其中包含怎么获取分割数据集以及各式如何存储的方法。

class Dataset(object):
    """The base class for dataset classes.
    To use it, create a new class that adds functions specific to the dataset
    you want to use. For example:

    See COCODataset and ShapesDataset as examples.
    """

可以通过编写自己的Dataset类以加载数据集进入的任何格式。

其中各个方法解释如下

  • def add_class(self, source, class_id, class_name):

    • 添加类别信息,默认背景类别是第一个,记录在class_info中
    • self.class_info = [{"source": "", "id": 0, "name": "BG"}]
  • def add_image(self, source, image_id, path, **kwargs):

    • 添加图片信息
    • self.image_info = { "id": image_id,"source": source,"path": path,}
  • def load_image(self, image_id): return image

    • 加载指定图片id到[H,W,3]的numpy数组,并返回
  • load_mask通过绘制多边形为图像中的每个对象生成位图蒙版(hitmap masks)。
    • 加载图片id对应的mask,并且返回物体的mask [height, width, instance count]
    • 以及物体类别id 1D array
  • def prepare(self, class_map=None):
    • 准备Dataset类数据使用

还有一个image_reference只是返回一个标识图像的字符串以进行调试。只是返回图像文件的路径。默认为空

1、Dataset的使用

  • 使用过程

如下,需要继承重写load_mask方法,定义一个读取我们的气球数据的方法,添加到image_info当中

class CatsAndDogsDataset(Dataset):
  """
  """
  def load_cats_and_dogs():
    ...
  def load_mask(self, image_id):
    ...

注:通常我们可以自己实现数据读取处理的方法或者格式,如果有一些方便的通用工具也可以借鉴使用

  • 比如:load_balloons读取JSON文件,提取注释,并迭代调用内部的add_class和add_image函数以构建数据集。
  • load_mask:

2、获取结果展示数据

下面是我们定义获取数据过程和结果

dataset = balloon.BalloonDataset()
# 获取图片类别和图片其他信息
dataset.load_balloon(BALLOON_DIR, "train")
# 准备图片的dataset数据
dataset.prepare()

print("图片 数量: {}".format(len(dataset.image_ids)))
print("类别 数量: {}".format(dataset.num_classes))
for i, info in enumerate(dataset.class_info):
    print("{:3}. {:50}".format(i, info['name']))

Image Count: 61

Class Count: 2

0.BG

1.balloon

展示样本的mask

可以使用模型中的visualize.display_top_masks(image, mask, class_ids, dataset.class_names)

image = dataset.load_image(image_id)
mask, class_ids = dataset.load_mask(image_id)
visualize.display_top_masks(image, mask, class_ids, dataset.class_names)

展示样本的bbox以及mask

没有bbox标记,通过utils.extract_bboxes对图片中的mask,计算出bbox位置

  • 1、utils.extract_bboxes(mask):
    • mask: [height, width, num_instances].mask的结果处理成 1 or 0.
    • Returns: bbox array [num_instances, (y1, x1, y2, x2)].
image = dataset.load_image(image_id)
mask, class_ids = dataset.load_mask(image_id)
# 计算 Bounding box
bbox = utils.extract_bboxes(mask)

print("image_id ", image_id, dataset.image_reference(image_id))
# model中log方法
log("image", image)
log("mask", mask)
log("class_ids", class_ids)
log("bbox", bbox)


# 结果
image_id  1 /deepmatter/libs/mask_rcnn/datasets/balloon/train/25899693952_7c8b8b9edc_k.jpg
image                    shape: (1365, 2048, 3)       min:    0.00000  max:  255.00000
mask                     shape: (1365, 2048, 1)       min:    0.00000  max:    1.00000
class_ids                shape: (1,)                  min:    1.00000  max:    1.00000
bbox                     shape: (1, 4)                min:  116.00000  max:  965.00000

通过visualize.display_instances(image, bbox, mask, class_ids, dataset.class_names)展示

  • 2、通过modellib.load_image_gt:传入dataset,配置、图片id
image, image_meta, class_ids, bbox, mask = modellib.load_image_gt(
    dataset, config, image_id, use_mini_mask=False)

print("image", image)
print("image_meta", image_meta)
print("class_ids", class_ids)
print("bbox", bbox)
print("mask", mask)

mage                    shape: (1024, 1024, 3)       min:    0.00000  max:  255.00000
image_meta               shape: (10,)                 min:    0.00000  max: 1024.00000
class_ids                shape: (2,)                  min:    1.00000  max:    1.00000
bbox                     shape: (2, 4)                min:  181.00000  max: 1024.00000
mask                     shape: (1024, 1024, 2)       min:    0.00000  max:    1.00000

6.8.4.2 数据集BalloonDataset实现

  • 步骤
    • 继承dataset类别
    • 实现load_balloon方法
    • 实现load_mask方法

这里我们创建一个utils文件夹作为训练数据集读取工具,其中编写一个balloon_dataset.py文件

import os
import json
import sys
import numpy as np
sys.path.append("../")
from mrcnn import utils, visualize
import skimage

class BalloonDataset(utils.Dataset):
    """气球分割数据集获取类
    """
    def load_balloon(self, dataset_dir, subset):
        pass

    def load_mask(self, image_id):
        pass

1、实现load_balloon方法

  • 目的:添加每张图片的id、路径、长、宽、标注信息到selfi.mage_info字典中
    • 1、读取标注json文件
    • 2、获取标注区域
    • 3、对每个图片,保存其中各个区域的相关信息,图片路径、长宽、filename
    def load_balloon(self, dataset_dir, subset):
        """
        加载数据集
        :param dataset_dir: 数据集目录
        :param subset: 训练集还是测试机
        :return:
        """
        # 添加数据集类别数量
        self.add_class("balloon", 1, "balloon")

        # 是否提供在训练或者验证集字符串
        assert subset in ["train", "val"]
        dataset_dir = os.path.join(dataset_dir, subset)
        # Load annotations
        # { 'filename': '28503151_5b5b7ec140_b.jpg',
        #   'regions': {
        #       '0': {
        #           'region_attributes': {},
        #           'shape_attributes': {
        #               'all_points_x': [...],
        #               'all_points_y': [...],
        #               'name': 'polygon'}},
        #       ... more regions ...
        #   },
        # }
        # 读取标注区域:
        annotations = json.load(open(os.path.join(dataset_dir, "via_region_data.json")))
        annotations = list(annotations.values())

        # 如果annotations不存在直接跳过
        annotations = [a for a in annotations if a['regions']]
        # 添加每张图片的坐标
        for a in annotations:
            # 获取所有多边形的x, y 的所有点坐标,存储在shape_attributes
            # 判断其中类型是否是字典,若果字典
            if isinstance(a['regions'], dict):
                polygons = [r['shape_attributes'] for r in a['regions'].values()]
            else:
                polygons = [r['shape_attributes'] for r in a['regions']]

            # 读取图片内容获取长宽
            image_path = os.path.join(dataset_dir, a['filename'])
            image = skimage.io.imread(image_path)
            height, width = image.shape[:2]

            # 加入到image_info字典当中
            self.add_image(
                "balloon",
                image_id=a['filename'],
                path=image_path,
                width=width, height=height,
                polygons=polygons)

注:源码中大量使用skimage模块做图片读取处理

  • 其中pil处理流程读取结果默认sRGB格式,不是rgb的,还需要转换成数组

image = pil.Image.open("")

image = image.convert('RGB')

arr = np.array(image)

  • skimage.io.read()直接转换成array数组

2、实现load_mask方法

def load_mask(self, image_id):
    """加载图片中的mask返回每个图片的mask及其id
    :param image_id: 图片ID
    :return: masks: 一个实例的布尔形状 [height, width, instance count]
    class_ids: 类别的 1D 数组
    """
    # 如果不是balloon类别的图片数据,默认返回空
    image_info = self.image_info[image_id]
    if image_info["source"] != "balloon":
        return super(self.__class__, self).load_mask(image_id)

    # 将坐标转换成bitmap [height, width, instance_count]
    info = self.image_info[image_id]
    mask = np.zeros([info["height"], info["width"], len(info["polygons"])],
                    dtype=np.uint8)
    for i, p in enumerate(info["polygons"]):
        # Get indexes of pixels inside the polygon and set them to 1
        # 获取图片像素中的这个mask多边形区域中像素下标,将其标记为1
        rr, cc = skimage.draw.polygon(p['all_points_y'], p['all_points_x'])
        mask[rr, cc, i] = 1

    # 返回mask区域标记 [height, width, instance count]
    # 以及mask物体的个数
    return mask.astype(np.bool), np.ones([mask.shape[-1]], dtype=np.int32)

测试结果

if __name__ == '__main__':
    dataset_train = BalloonDataset()
    dataset_train.load_balloon("../balloon_dataset/", "train")
    dataset_train.prepare()

可以做上述的测试

打印结果:

print("图片数量: {}".format(len(dataset_train.image_ids)))
print("类别数量: {}".format(dataset_train.num_classes))
for i, info in enumerate(dataset_train.class_info):
    print("{}. {}".format(i, info['name']))

1、展示mask

 # 1、随机选择部分图片进行展示mask区域
  image_id = np.random.choice(dataset_train.image_ids, 1)[0]
  image = dataset_train.load_image(image_id)
  mask, class_ids = dataset_train.load_mask(image_id)
  visualize.display_top_masks(image, mask, class_ids, dataset_train.class_names)

2、通过mask计算bbox区域,并进行展示

# 计算bbox
bbox = utils.extract_bboxes(mask)
from mrcnn.model import log
log("image", image)
log("mask", mask)
log("class_ids", class_ids)
log("bbox", bbox)
# 显示mask,以及bbox
visualize.display_instances(image, bbox, mask, class_ids, dataset_train.class_names)

结果

图片数量: 61
类别数量: 2
0. BG
1. balloon
image                    shape: (681, 1024, 3)        min:    0.00000  max:  255.00000  uint8
mask                     shape: (681, 1024, 2)        min:    0.00000  max:    1.00000  bool
class_ids                shape: (2,)                  min:    1.00000  max:    1.00000  int32
bbox                     shape: (2, 4)                min:  191.00000  max: 1024.00000  int32