6.8 Mask RCNN分割案例
学习目标
- 目标
- 知道分割数据集的读取处理方式
- 应用
- 应用完成数据集内容标签结果的读取
6.8.1 分割数据集介绍-气球分割数据集
气球分割数据集是一个小型的分割任务数据。目的是将气球从图片或者视频中分割出来。数据集有训练集和验证集
目录如下:

- 训练验证都含有.json标注文件以及jpg文件
- 大多数分割通过VIA tool标注工具可以生成每个图片的mask结果
- 注意:这里的标注数据中并没有提供检测框的标注信息,后期检测框的生成是动态根据mask结果生成的
- voc,coco等数据集中会提供了两者标注结果
其中json文件中的标注格式包含如下:
{
# 第一张图片的物体标记结果
"24631331976_defa3bb61f_k.jpg668058":{"fileref":"",
"size":668058,"filename":"24631331976_defa3bb61f_k.jpg",
"base64_img_data":"","file_attributes":{},
"regions":{"0":{"shape_attributes":{"name":"polygon","all_points_x":[916,913,905,889,868,836,809,792,789,784,777,769,767,777,786,791,769,739,714,678,645,615,595,583,580,584,595,614,645,676,716,769,815,849,875,900,916,916],"all_points_y":[515,583,616,656,696,737,753,767,777,785,785,778,768,766,760,755,755,743,728,702,670,629,588,539,500,458,425,394,360,342,329,331,347,371,398,442,504,515]},"region_attributes":{}}}},
# 第二张图片的物体标记
"16335852991_f55de7958d_k.jpg1767935":{"fileref":"","size":1767935,"filename":"16335852991_f55de7958d_k.jpg","base64_img_data":"","file_attributes":{},
"regions":{
"0":{"shape_attributes":{"name":"polygon","all_points_x":[588,617,649,673,692,708,722,730,737,718,706,699,697,676,650,613,580,552,534,520,513,513,521,526,541,560,588],"all_points_y":[173,168,172,182,197,216,237,260,283,312,341,367,390,369,349,337,337,347,361,332,296,266,243,225,205,187,173]},"region_attributes":{}},
"1":{"shape_attributes":{"name":"polygon","all_points_x":[845,861,880,892,902,910,889,869,844,813,785,762,745,739,731,746,767,790,821,845],"all_points_y":[219,229,242,260,275,299,277,263,254,250,255,265,279,283,258,241,225,216,213,219]},"region_attributes":{}},
"2":{"shape_attributes":{"name":"polygon","all_points_x":[931,928,920,913,897,872,840,811,789,768,754,730,726,724,718,698,698,707,721,734,746,769,794,822,845,865,889,910,921,929,931],"all_points_y":[378,402,435,454,475,460,450,449,450,460,469,489,486,459,426,390,367,335,306,290,278,261,252,250,254,261,277,299,323,354,378]},"region_attributes":{}},
"3":{"shape_attributes":{"name":"polygon","all_points_x":[927,946,968,989,992,985,975,957,937,913,889,862,852,876,897,910,925,933,939,939,935,927,910,900,927],"all_points_y":[486,498,516,553,593,630,649,668,686,700,707,707,708,691,675,656,635,610,587,562,538,512,492,480,486]},"region_attributes":{}},
"4":{"shape_attributes":{"name":"polygon","all_points_x":[704,692,690,691,699,711,723,742,766,785,807,839,865,887,904,923,933,939,939,931,920,905,885,861,839,808,786,769,754,748,746,738,738,729,722,718,704],"all_points_y":[664,631,604,580,545,521,498,480,461,452,449,449,457,469,484,506,532,565,584,620,643,662,682,701,713,719,723,728,733,731,738,737,729,720,708,690,664]},"region_attributes":{}},
"5":{"shape_attributes":{"name":"polygon","all_points_x":[526,509,497,493,490,493,501,512,526,546,573,603,626,662,688,709,721,724,724,719,704,694,691,691,682,683,687,688,684,682,679,676,664,648,620,587,564,548,526],"all_points_y":[551,526,498,470,444,422,398,381,365,351,340,338,340,357,381,408,438,466,493,504,531,568,584,604,609,612,610,617,625,625,619,616,620,619,609,599,585,573,551]},"region_attributes":{}},
"6":{"shape_attributes":{"name":"polygon","all_points_x":[594,579,567,563,564,568,579,605,631,656,671,676,682,684,687,687,684,684,691,691,694,702,711,719,722,729,737,738,746,749,756,765,757,728,714,683,654,623,594],"all_points_y":[735,712,691,659,631,612,596,605,613,621,618,616,625,625,616,612,612,608,605,619,637,656,678,692,706,719,727,737,739,731,734,730,741,762,766,772,769,757,735]},"region_attributes":{}}
}},
...
...
...
其中:"all_points_x":[588,617,649,673,692,708,722,730,737,718,706,699,697,676,650,613,580,552,534,520,513,513,521,526,541,560,588],"all_points_y":[173,168,172,182,197,216,237,260,283,312,341,367,390,369,349,337,337,347,361,332,296,266,243,225,205,187,173]}
表示该被标注物体所有像素点的坐标。
6.8.2 模型介绍
选用maskrcnn模型进行分割案例。maskrcnn的源码版本中选择最新的2.0版本。
github高星实现版本
- 高星版本:MaskRCNN。
- TensorFlow与keras实现的版本,代码只能在1.x版本运行,需要同时keras和TensorFlow两个库才能运行
使用版本是基于这个版本修改之后能在2.0环境下运行的maskrcnn源码。
并且预训练模型地址:maskrcnn迁移学习预训练模型。
- 可以使用多种预训练模型,这里提供Imagenet数据集训练的迁移模型

6.8.3 项目介绍
1、分割效果演示:
1、图片效果
分割图片

分割效果

注:我们这里做的是直接将分割的结果显示原色,其他部分变成灰度图。
2、项目模块介绍

- ballon_dataset:项目的数据集
- logs:模型训练保存结果
- mrcnn:模型结构以及配置代码
- balloon_main:模型训练以及测试代码
其中Images:测试检测的图片或视频以及输出结果
6.8.4 项目训练过程实现
- 步骤
- 1、数据集读取处理和准备
- maskrcnn模型源码中Sequence封装数据集类使用
- 实现数据标签文件的读取
- 2、模型配置文件解析与修改、模型预训练模型加载、模型构建
- maskrcnn配置介绍
- 模型文件过程使用源码解析
- 3、模型训练过程实现
- 训练代码封装介绍
- 4、模型测试过程实现
- 图片预测结果处理
- 1、数据集读取处理和准备
6.8.4.1 数据集的读取处理

maskrcnn源码中utils.py文件封装了Dataset类,其中包含怎么获取分割数据集以及各式如何存储的方法。
class Dataset(object):
"""The base class for dataset classes.
To use it, create a new class that adds functions specific to the dataset
you want to use. For example:
See COCODataset and ShapesDataset as examples.
"""
可以通过编写自己的Dataset类以加载数据集进入的任何格式。
其中各个方法解释如下
def add_class(self, source, class_id, class_name):
- 添加类别信息,默认背景类别是第一个,记录在class_info中
- self.class_info = [{"source": "", "id": 0, "name": "BG"}]
def add_image(self, source, image_id, path, **kwargs):
- 添加图片信息
- self.image_info = { "id": image_id,"source": source,"path": path,}
def load_image(self, image_id): return image
- 加载指定图片id到[H,W,3]的numpy数组,并返回
- load_mask通过绘制多边形为图像中的每个对象生成位图蒙版(hitmap masks)。
- 加载图片id对应的mask,并且返回物体的mask [height, width, instance count]
- 以及物体类别id 1D array
- def prepare(self, class_map=None):
- 准备Dataset类数据使用
还有一个image_reference只是返回一个标识图像的字符串以进行调试。只是返回图像文件的路径。默认为空
1、Dataset的使用
- 使用过程
如下,需要继承重写load_mask方法,定义一个读取我们的气球数据的方法,添加到image_info当中
class CatsAndDogsDataset(Dataset):
"""
"""
def load_cats_and_dogs():
...
def load_mask(self, image_id):
...
注:通常我们可以自己实现数据读取处理的方法或者格式,如果有一些方便的通用工具也可以借鉴使用
- 比如:load_balloons读取JSON文件,提取注释,并迭代调用内部的add_class和add_image函数以构建数据集。
- load_mask:
2、获取结果展示数据
下面是我们定义获取数据过程和结果
dataset = balloon.BalloonDataset()
# 获取图片类别和图片其他信息
dataset.load_balloon(BALLOON_DIR, "train")
# 准备图片的dataset数据
dataset.prepare()
print("图片 数量: {}".format(len(dataset.image_ids)))
print("类别 数量: {}".format(dataset.num_classes))
for i, info in enumerate(dataset.class_info):
print("{:3}. {:50}".format(i, info['name']))
Image Count: 61
Class Count: 2
0.BG
1.balloon
展示样本的mask
可以使用模型中的visualize.display_top_masks(image, mask, class_ids, dataset.class_names)
image = dataset.load_image(image_id)
mask, class_ids = dataset.load_mask(image_id)
visualize.display_top_masks(image, mask, class_ids, dataset.class_names)

展示样本的bbox以及mask
没有bbox标记,通过utils.extract_bboxes对图片中的mask,计算出bbox位置
- 1、utils.extract_bboxes(mask):
- mask: [height, width, num_instances].mask的结果处理成 1 or 0.
- Returns: bbox array [num_instances, (y1, x1, y2, x2)].
image = dataset.load_image(image_id)
mask, class_ids = dataset.load_mask(image_id)
# 计算 Bounding box
bbox = utils.extract_bboxes(mask)
print("image_id ", image_id, dataset.image_reference(image_id))
# model中log方法
log("image", image)
log("mask", mask)
log("class_ids", class_ids)
log("bbox", bbox)
# 结果
image_id 1 /deepmatter/libs/mask_rcnn/datasets/balloon/train/25899693952_7c8b8b9edc_k.jpg
image shape: (1365, 2048, 3) min: 0.00000 max: 255.00000
mask shape: (1365, 2048, 1) min: 0.00000 max: 1.00000
class_ids shape: (1,) min: 1.00000 max: 1.00000
bbox shape: (1, 4) min: 116.00000 max: 965.00000
通过visualize.display_instances(image, bbox, mask, class_ids, dataset.class_names)展示

- 2、通过modellib.load_image_gt:传入dataset,配置、图片id
image, image_meta, class_ids, bbox, mask = modellib.load_image_gt(
dataset, config, image_id, use_mini_mask=False)
print("image", image)
print("image_meta", image_meta)
print("class_ids", class_ids)
print("bbox", bbox)
print("mask", mask)
mage shape: (1024, 1024, 3) min: 0.00000 max: 255.00000
image_meta shape: (10,) min: 0.00000 max: 1024.00000
class_ids shape: (2,) min: 1.00000 max: 1.00000
bbox shape: (2, 4) min: 181.00000 max: 1024.00000
mask shape: (1024, 1024, 2) min: 0.00000 max: 1.00000

6.8.4.2 数据集BalloonDataset实现
- 步骤
- 继承dataset类别
- 实现load_balloon方法
- 实现load_mask方法
这里我们创建一个utils文件夹作为训练数据集读取工具,其中编写一个balloon_dataset.py文件
import os
import json
import sys
import numpy as np
sys.path.append("../")
from mrcnn import utils, visualize
import skimage
class BalloonDataset(utils.Dataset):
"""气球分割数据集获取类
"""
def load_balloon(self, dataset_dir, subset):
pass
def load_mask(self, image_id):
pass
1、实现load_balloon方法
- 目的:添加每张图片的id、路径、长、宽、标注信息到selfi.mage_info字典中
- 1、读取标注json文件
- 2、获取标注区域
- 3、对每个图片,保存其中各个区域的相关信息,图片路径、长宽、filename
def load_balloon(self, dataset_dir, subset):
"""
加载数据集
:param dataset_dir: 数据集目录
:param subset: 训练集还是测试机
:return:
"""
# 添加数据集类别数量
self.add_class("balloon", 1, "balloon")
# 是否提供在训练或者验证集字符串
assert subset in ["train", "val"]
dataset_dir = os.path.join(dataset_dir, subset)
# Load annotations
# { 'filename': '28503151_5b5b7ec140_b.jpg',
# 'regions': {
# '0': {
# 'region_attributes': {},
# 'shape_attributes': {
# 'all_points_x': [...],
# 'all_points_y': [...],
# 'name': 'polygon'}},
# ... more regions ...
# },
# }
# 读取标注区域:
annotations = json.load(open(os.path.join(dataset_dir, "via_region_data.json")))
annotations = list(annotations.values())
# 如果annotations不存在直接跳过
annotations = [a for a in annotations if a['regions']]
# 添加每张图片的坐标
for a in annotations:
# 获取所有多边形的x, y 的所有点坐标,存储在shape_attributes
# 判断其中类型是否是字典,若果字典
if isinstance(a['regions'], dict):
polygons = [r['shape_attributes'] for r in a['regions'].values()]
else:
polygons = [r['shape_attributes'] for r in a['regions']]
# 读取图片内容获取长宽
image_path = os.path.join(dataset_dir, a['filename'])
image = skimage.io.imread(image_path)
height, width = image.shape[:2]
# 加入到image_info字典当中
self.add_image(
"balloon",
image_id=a['filename'],
path=image_path,
width=width, height=height,
polygons=polygons)
注:源码中大量使用skimage模块做图片读取处理
- 其中pil处理流程读取结果默认sRGB格式,不是rgb的,还需要转换成数组
image = pil.Image.open("")
image = image.convert('RGB')
arr = np.array(image)
- skimage.io.read()直接转换成array数组
2、实现load_mask方法
def load_mask(self, image_id):
"""加载图片中的mask返回每个图片的mask及其id
:param image_id: 图片ID
:return: masks: 一个实例的布尔形状 [height, width, instance count]
class_ids: 类别的 1D 数组
"""
# 如果不是balloon类别的图片数据,默认返回空
image_info = self.image_info[image_id]
if image_info["source"] != "balloon":
return super(self.__class__, self).load_mask(image_id)
# 将坐标转换成bitmap [height, width, instance_count]
info = self.image_info[image_id]
mask = np.zeros([info["height"], info["width"], len(info["polygons"])],
dtype=np.uint8)
for i, p in enumerate(info["polygons"]):
# Get indexes of pixels inside the polygon and set them to 1
# 获取图片像素中的这个mask多边形区域中像素下标,将其标记为1
rr, cc = skimage.draw.polygon(p['all_points_y'], p['all_points_x'])
mask[rr, cc, i] = 1
# 返回mask区域标记 [height, width, instance count]
# 以及mask物体的个数
return mask.astype(np.bool), np.ones([mask.shape[-1]], dtype=np.int32)
测试结果
if __name__ == '__main__':
dataset_train = BalloonDataset()
dataset_train.load_balloon("../balloon_dataset/", "train")
dataset_train.prepare()
可以做上述的测试
打印结果:
print("图片数量: {}".format(len(dataset_train.image_ids)))
print("类别数量: {}".format(dataset_train.num_classes))
for i, info in enumerate(dataset_train.class_info):
print("{}. {}".format(i, info['name']))
1、展示mask
# 1、随机选择部分图片进行展示mask区域
image_id = np.random.choice(dataset_train.image_ids, 1)[0]
image = dataset_train.load_image(image_id)
mask, class_ids = dataset_train.load_mask(image_id)
visualize.display_top_masks(image, mask, class_ids, dataset_train.class_names)
2、通过mask计算bbox区域,并进行展示
# 计算bbox
bbox = utils.extract_bboxes(mask)
from mrcnn.model import log
log("image", image)
log("mask", mask)
log("class_ids", class_ids)
log("bbox", bbox)
# 显示mask,以及bbox
visualize.display_instances(image, bbox, mask, class_ids, dataset_train.class_names)
结果
图片数量: 61
类别数量: 2
0. BG
1. balloon
image shape: (681, 1024, 3) min: 0.00000 max: 255.00000 uint8
mask shape: (681, 1024, 2) min: 0.00000 max: 1.00000 bool
class_ids shape: (2,) min: 1.00000 max: 1.00000 int32
bbox shape: (2, 4) min: 191.00000 max: 1024.00000 int32