6.9 Mask RCNN分割案例2

学习目标

目标
- 知道maskrcnn中的源码编译训练流程
- 知道maskrcnn中的anchor设置以及计算
- 掌握模型的训练预测流程
应用
- 应用完成maskrcnn指定气球分割数据集模型训练
- 应用完成maskrcnn指定图片或视频预测输出

6.9.1 项目步骤

步骤

1、数据集读取处理和准备
- 实现数据标签文件的读取
2、模型配置文件解析与修改、模型预训练模型加载、模型构建
- maskrcnn模型源码中Sequence封装数据集类使用
- maskrcnn配置介绍
- 模型文件过程使用源码解析
3、模型训练过程实现
- 训练代码封装介绍
4、模型测试过程实现
- 图片预测结果处理

6.9.1.1 模型分析以及模型训练流程实现

步骤分析
- 1、进行参数传入判断
- 2、配置模型的参数、数据集的训练读取配置
- 3、创建模型
- 4、训练测试逻辑实现

完整代码实现过程

args = parser.parse_args()

    # 1、进行参数传入判断
    if args.command == "train":
        assert args.dataset, "指定训练的时候必须传入 --dataset数据目录"
    elif args.command == "test":
        assert args.image or args.video,\
               "指定测试的时候必须提供图片或者视频"

    # 2、配置模型的参数、数据集的训练读取配置
    if args.command == "train":
        config = BalloonConfig()
    else:
        # 测试的配置修改：设置batch_size为1，Batch size = GPU_COUNT * IMAGES_PER_GPU
        class InferenceConfig(BalloonConfig):
            GPU_COUNT = 1
            IMAGES_PER_GPU = 1
        config = InferenceConfig()
    config.display()

    # 3、创建模型
    if args.command == "train":
        model = maskrcnn.MaskRCNN(mode="training", config=config,
                                  model_dir=args.logs)
    else:
        model = maskrcnn.MaskRCNN(mode="inference", config=config,
                                  model_dir=args.logs)
    # 4、训练测试逻辑实现
    if args.command == "train":
        # 选择加载的预训练模型类别并下载
        if args.weights.lower() == "imagenet":
            weights_path = model.get_imagenet_weights()
        else:
            raise ValueError("提供一种预训练模型种类")

        # 加载预训练模型权重
        print("Loading weights ", weights_path)
        model.load_weights(weights_path, by_name=True)

        # 进行训练
        train(model)

    elif args.command == "test":
        model.load_weights(args.model, by_name=True)
        # 进行检测
        detect_and_draw_segmentation(args, model)
    else:
        print("'{}' 传入参数无法识别. "
              "请使用 'train' or 'test'".format(args.command))

6.9.1.2 模型配置文件

其中config.py是模型的配置文件，我们可以根据自己的需求训练的需求进行修改
- maskrcnn参数等设置众多，通常提供一个参数配置更好
- 其中某些重要的配置我们进行介绍
  - 1、训练参数配置
  - 2、测试参数配置
  - 3、学习率相关设置

class Config(object):
    """Base configuration class. For custom configurations, create a
    sub-class that inherits from this one and override properties
    that need to be changed.
    """
    # 1、训练配置
    # NUMBER OF GPUs to use. When using only a CPU, this needs to be set to 1.
    # 如果设置大于1，多个GPU进行并行运算，源码中parallel_model.py使用1.xAPI进行多GPU计算
    GPU_COUNT = 1
    # 每个GPU训练的图片数量 
    # A 12GB GPU can typically handle 2 images of 1024x1024px.
    # Adjust based on your GPU memory and image sizes. Use the highest
    # number that your GPU can handle for best performance.
    IMAGES_PER_GPU = 2
    # 一个epoch的训练步数
    # This doesn't need to match the size of the training set. Tensorboard
    # updates are saved at the end of each epoch, so setting this to a
    # smaller number means getting more frequent TensorBoard updates.
    # Validation stats are also calculated at each epoch end and they
    # might take a while, so don't set this too small to avoid spending
    # a lot of time on validation stats.
    STEPS_PER_EPOCH = 1000
    # Number of validation steps to run at the end of every training epoch.
    # A bigger number improves accuracy of validation stats, but slows
    # down the training.
    VALIDATION_STEPS = 50
    # 主网络架构
    # Supported values are: resnet50, resnet101.
    # You can also provide a callable that should have the signature
    # of model.resnet_graph. If you do so, you need to supply a callable
    # to COMPUTE_BACKBONE_SHAPE as well
    BACKBONE = "resnet101"
    # 基于resnet101架构的图像金字塔FPN到达的每层 步长
    BACKBONE_STRIDES = [4, 8, 16, 32, 64]
    # Size of the fully-connected layers in the classification graph
    FPN_CLASSIF_FC_LAYERS_SIZE = 1024
    # Size of the top-down layers used to build the feature pyramid
    TOP_DOWN_PYRAMID_SIZE = 256
    # 总类别个数 (including background)
    NUM_CLASSES = 1  # Override in sub-classes
    # RPN anchor的面积根号设置
    RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512)
    # anchor的比率用于设置长宽 (width/height)
    # A value of 1 represents a square anchor, and 0.5 is a wide anchor
    RPN_ANCHOR_RATIOS = [0.5, 1, 2]
    # Anchor 设置步长，如果为2跳过一些特征层设置anchor
    # If 1 then anchors are created for each cell in the backbone feature map.
    # If 2, then anchors are created for every other cell, and so on.
    RPN_ANCHOR_STRIDE = 1
    # 过滤RPN proposals的NMS阈值，值越大产生更多的建议框
    # You can increase this during training to generate more propsals.
    RPN_NMS_THRESHOLD = 0.7
    # 每张图产生多少anchors用于RPN training
    RPN_TRAIN_ANCHORS_PER_IMAGE = 256
    # 经过tf.nn.top_k筛选之后并且在 non-maximum suppression进行之前的ROIs 数量
    PRE_NMS_LIMIT = 6000
    # 在non-maximum suppression ROIs 的数量(training and inference)
    POST_NMS_ROIS_TRAINING = 2000
    POST_NMS_ROIS_INFERENCE = 1000
    # Input image resizing，默认square模式，设置成[max_dim, max_dim]
    # square: Resize and pad with zeros to get a square image
    #         of size [max_dim, max_dim].
    IMAGE_RESIZE_MODE = "square"
    IMAGE_MIN_DIM = 800
    IMAGE_MAX_DIM = 1024
    # 每个image提供给classifier/mask heads中的rois数量
    # The Mask RCNN paper uses 512 but often the RPN doesn't generate
    # enough positive proposals to fill this and keep a positive:negative
    # ratio of 1:3. You can increase the number of proposals by adjusting
    # the RPN NMS threshold.
    TRAIN_ROIS_PER_IMAGE = 200
    # ROIs 用于训练 classifier/mask heads的正样本比率
    ROI_POSITIVE_RATIO = 0.33
    # ROIs池化层大小
    POOL_SIZE = 7
    MASK_POOL_SIZE = 14
    # 输出mask的大小
    # To change this you also need to change the neural network mask branch
    MASK_SHAPE = [28, 28]
    # 每张图的GT实例数量的最大值
    MAX_GT_INSTANCES = 100

    # 2、检测的配置
    # 最后测试检测的时候实例数量100
    DETECTION_MAX_INSTANCES = 100
    # Minimum probability value to accept a detected instance
    # ROIs below this threshold are skipped
    DETECTION_MIN_CONFIDENCE = 0.7
    # 用于检测的Non-maximum suppression阈值
    DETECTION_NMS_THRESHOLD = 0.3

    # 3、学习率设置相关设置
    # The Mask RCNN paper uses lr=0.02, but on TensorFlow it causes
    # weights to explode. Likely due to differences in optimizer
    # implementation.
    LEARNING_RATE = 0.001
    LEARNING_MOMENTUM = 0.9
    # Weight decay regularization
    WEIGHT_DECAY = 0.0001
    # 损失计算公式分配权重
    # Loss weights for more precise optimization.
    # Can be used for R-CNN training setup.
    LOSS_WEIGHTS = {
        "rpn_class_loss": 1.,
        "rpn_bbox_loss": 1.,
        "mrcnn_class_loss": 1.,
        "mrcnn_bbox_loss": 1.,
        "mrcnn_mask_loss": 1.
    }
    # 梯度截断值
    GRADIENT_CLIP_NORM = 5.0

其中有配置的几个方法，通过display显示模型的当前配置

def to_dict(self):
    return {a: getattr(self, a)
            for a in sorted(dir(self))
            if not a.startswith("__") and not callable(getattr(self, a))}

def display(self):
    """Display Configuration values."""
    print("\nConfigurations:")
    for key, val in self.to_dict().items():
        print(f"{key:30} {val}")
    print("\n")

6.9.1.3 案例：气球数据集配置代码编写

在balloon_dataset中我们添加自己数据集需要的配置，如下

from mrcnn.config import Config
class BalloonConfig(Config):
    """继承MaskRCNN的模型配置信息
    修改其中需要的训练集数据信息
    """
    # 给配置一个名称
    NAME = "balloon"

    IMAGES_PER_GPU = 2

    # 类别数量（包括背景），气球类别+1
    NUM_CLASSES = 1 + 1

    # 一个epoch的步数
    STEPS_PER_EPOCH = 100

    # 检测的时候过滤置信度的阈值
    DETECTION_MIN_CONFIDENCE = 0.9

然后在训练过程balloon_main.py中加入以下获取配置等代码：

from utils.balloon_dataset import BalloonDataset, BalloonConfig

  args = parser.parse_args()
# 1、进行参数传入判断
    if args.command == "train":
        assert args.dataset, "指定训练的时候必须传入 --dataset数据目录"
    elif args.command == "test":
        assert args.image or args.video,\
               "指定测试的时候必须提供图片或者视频"

    # 2、配置模型的参数、数据集的训练读取配置
    if args.command == "train":
        config = BalloonConfig()
    else:
        # 测试的配置修改：设置batch_size为1，Batch size = GPU_COUNT * IMAGES_PER_GPU
        class InferenceConfig(BalloonConfig):
            GPU_COUNT = 1
            IMAGES_PER_GPU = 1
        config = InferenceConfig()
    # 显示配置
    config.display()

其中初始导入包以及运行命令行参数如下：

import numpy as np
import skimage.draw
import argparse

from mrcnn import model as maskrcnn
from utils.balloon_dataset import BalloonDataset, BalloonConfig
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
# 命令行参数
parser = argparse.ArgumentParser(
    description='气球分割模型maskrcnn训练')
parser.add_argument("--command", type=str, default='test',
                    help="'train' or 'test' 训练还是进行测试")
parser.add_argument('--dataset', type=str, default='./balloon_data',
                    help='气球分割数据集目录')
parser.add_argument('--weights', type=str, default='imagenet',
                    help="预训练模型权重"
                         "imagenet:https://github.com/fchollet/"
                         "deep-learning-models/releases/"
                         "download/v0.2/resnet50_weights"
                         "_tf_dim_ordering_tf_kernels_notop.h5")
parser.add_argument('--logs', type=str, default='./logs/',
                    help='打印日志目录')
parser.add_argument('--image', type=str, default='./images/2917282960_06beee649a_b.jpg',
                    help='需要进行检测分割的图片目录')
parser.add_argument('--video', type=str, default='./images/v0200fd10000bq043q9pskdh7ri20vm0.MP4',
                    help='需要进行检测分割的视频目录')
parser.add_argument('--model', type=str, default='./logs/mask_rcnn_balloon.h5',
                    help='指定测试使用的训练好的模型文件')

6.9.1.4 模型使用介绍

模型使用比较简答，直接通过导入model即可，使用细节如下

from mrcnn import model as maskrcnn
# 1、选用训练模式，加入配置以及模型保存目录
model = maskrcnn.MaskRCNN(mode="training", config=config,
                              model_dir=args.logs)
# 2、选用测试推理模式，加入配置以及模型保存目录
model = maskrcnn.MaskRCNN(mode="inference", config=config,
                              model_dir=args.logs)

那么其中MaskRCNN类提供了建立模型的一套过程，主要源代码过程以下几个步骤：

模型过程：
- 输入构建、构建GT、RPN模型搭建输出、通过ProposalLayer（源码类）产生感兴趣区域
- 计算5种损失、构建模型输入输出

1、模型数据读取与训练源码顺序分析

模型对于训练过程封装较深，所以在这里需要对源码做出相应解释，主要介绍有几个重要函数

self.train():模型的训练逻辑
- 1、class DataGenerator(KU.Sequence):数据准备阶段构建generator
  - 设置RPN训练目标框
- 2、self.compile：模型编译阶段
  - 设置损失计算、正则化化
- 3、self.fit训练

在训练的时候我们只需要调用maskcnn中的train函数即可，我们这里对源代码中的train函数进行分析：

1、def train(self, train_dataset, val_dataset, learning_rate, epochs, layers,
```
          augmentation=None, custom_callbacks=None, no_augmentation_sources=None):
```
- train_dataset, val_dataset:训练、验证数据集dataset对象
- learning_rate：学习率
- epochs：迭代次数
- layers:选择那些层进行训练
  - heads:RPN、classifier以及mask 网络进行训练
- custom_callbacks=None：自定义的训练回调函数

源码分析：

# 1、选择不同的层的名称，根据传入参数设置
layer_regex = {
    # all layers but the backbone
    "heads": r"(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
    # From a specific Resnet stage and up
    "3+": r"(res3.*)|(bn3.*)|(res4.*)|(bn4.*)|(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
    "4+": r"(res4.*)|(bn4.*)|(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
    "5+": r"(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
    # All layers
    "all": ".*",
}
if layers in layer_regex.keys():
    layers = layer_regex[layers]

# 2、Data获取，model中实现的 DataGenerator 类继承keras.utils.Sequence
train_generator = DataGenerator(train_dataset, self.config, shuffle=True,
                                 augmentation=augmentation)
val_generator = DataGenerator(val_dataset, self.config, shuffle=True)

# Create log_dir if it does not exist
if not os.path.exists(self.log_dir):
    os.makedirs(self.log_dir)

# Callbacks
callbacks = [
    keras.callbacks.TensorBoard(log_dir=self.log_dir,
                                histogram_freq=0, write_graph=True, write_images=False),
    keras.callbacks.ModelCheckpoint(self.checkpoint_path,
                                    verbose=0, save_weights_only=True),
]

# Add custom callbacks to the list
if custom_callbacks:
    callbacks += custom_callbacks

# 4、训练，指定训练的层，优化器等
log("\nStarting at epoch {}. LR={}\n".format(self.epoch, learning_rate))
log("Checkpoint Path: {}".format(self.checkpoint_path))
self.set_trainable(layers)
self.compile(learning_rate, self.config.LEARNING_MOMENTUM)

# Work-around for Windows: Keras fails on Windows when using
# multiprocessing workers. See discussion here:
# https://github.com/matterport/Mask_RCNN/issues/13#issuecomment-353124009
if os.name == 'nt':
    workers = 0
else:
    workers = multiprocessing.cpu_count()

self.keras_model.fit(
    train_generator,
    initial_epoch=self.epoch,
    epochs=epochs,
    steps_per_epoch=self.config.STEPS_PER_EPOCH,
    callbacks=callbacks,
    validation_data=val_generator,
    validation_steps=self.config.VALIDATION_STEPS,
    max_queue_size=100,
    workers=workers,
    use_multiprocessing=workers > 1,
)
self.epoch = max(self.epoch, epochs)

2、class DataGenerator(KU.Sequence):
- 初始化：def init(self, dataset, config, shuffle=True, augmentation=None,random_rois=0, detection_targets=False):
- 对于传入的Dataset对象建立序列数据，提供每批次数据给训练器
- （1）主要根据配置文件产生RPN网络相应的anchor先验框的坐标（后面会调用显示）
- ```
self.backbone_shapes = compute_backbone_shapes(config, config.IMAGE_SHAPE)
self.anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES,
                                              config.RPN_ANCHOR_RATIOS,
                                              self.backbone_shapes,
                                              config.BACKBONE_STRIDES,
                                              config.RPN_ANCHOR_STRIDE)
```
- （2）def getitem(self, idx):
  - 产生第idx批次的数据。
- return inputs, outputs。下面为返回的两个结果的解析
- 注：网络训练的时候只需要inputs即可。outputs输出默认为空，但是如果提供random_rois的值大于0的参数。那么DataGenerator将会过滤之后返回整个网络中第二阶段maskrcnn的需要的RoIs感兴趣框相关信息
  - 因为训练期间网络在第一阶段产生bbox训练之后

# inputs返回值包含
- images: [batch, H, W, C]
- image_meta: [batch, (meta data)] Image details. 
meta = np.array(
        [image_id] +                  # size=1
        list(original_image_shape) +  # size=3
        list(image_shape) +           # size=3
        list(window) +                # size=4 (y1, x1, y2, x2) in image cooredinates
        [scale] +                     # size=1
        list(active_class_ids)        # size=num_classes
    )
- rpn_match: [batch, N] Integer (1=positive anchor, -1=negative, 0=neutral) # 政府严格不能
- rpn_bbox: [batch, N, (dy, dx, log(dh), log(dw))] Anchor bbox deltas.# anchor与GT进行偏移的值
- gt_class_ids: [batch, MAX_GT_INSTANCES] # GTclass IDs
- gt_boxes: [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)]# GT物体框位置
- gt_masks: [batch, height, width, MAX_GT_INSTANCES]. The height and width # GT mask目标值
                    are those of the image unless use_mini_mask is True, in which
                    case they are defined in MINI_MASK_SHAPE.


# outputs 默认为空，如果提供random_rois 大于0的参数
ouptuts=[batch_mrcnn_class_ids, batch_mrcnn_bbox, batch_mrcnn_mask]

其中在DataGenerator的返回批次数据中会调用下面函数。

（3）build_rpn_targets(image_shape, anchors, gt_class_ids, gt_boxes, config): anchor->bbox

对于目标GT以及RPN的众多acnhor，进行正负样本匹配，并且将转换极坐标到中心坐标
- 并且根据变换公式改变anchor，

# 返回结果
    anchors: [num_anchors, (y1, x1, y2, x2)]
    gt_class_ids: [num_gt_boxes] Integer class IDs.
    gt_boxes: [num_gt_boxes, (y1, x1, y2, x2)]

    Returns:
    rpn_match: [N] (int32) matches between anchors and GT boxes.
               1 = positive anchor, -1 = negative anchor, 0 = neutral
    rpn_bbox: [N, (dy, dx, log(dh), log(dw))] Anchor bbox deltas.
# GT坐标变化公式
gt_h = gt[2] - gt[0]
gt_w = gt[3] - gt[1]
gt_center_y = gt[0] + 0.5 * gt_h
gt_center_x = gt[1] + 0.5 * gt_w
# Anchor的坐标变换
a_h = a[2] - a[0]
a_w = a[3] - a[1]
a_center_y = a[0] + 0.5 * a_h
a_center_x = a[1] + 0.5 * a_w  
# 保存偏移结果
rpn_bbox[ix] = [
(gt_center_y - a_center_y) / a_h,
(gt_center_x - a_center_x) / a_w,
np.log(gt_h / a_h),
np.log(gt_w / a_w),
]

（4）build_detection_targets(rpn_rois, gt_class_ids, gt_boxes, gt_masks, config):（如果需要返回outputs就会调用该函数返回）

Generate targets for training Stage 2 classifier and mask heads.

# 训练期间不使用，是用作调试debug使用或者单独训练不带有RPN网络的maskrcnn结构时时使用
This is not used in normal training. It's useful for debugging or to train
the Mask RCNN heads without using the RPN head.

# 输入Inputs:
rpn_rois: [N, (y1, x1, y2, x2)] proposal boxes.
gt_class_ids: [instance count] Integer class IDs
gt_boxes: [instance count, (y1, x1, y2, x2)]
gt_masks: [height, width, instance count] Ground truth masks. Can be full
          size or mini-masks.

# 输出Returns:感兴趣区域以及感兴趣区域的框位置和mask
# 一张图片只产生200区域给maskrcnn训练
# Number of ROIs per image to feed to classifier/mask heads
# The Mask RCNN paper uses 512 but often the RPN doesn't generate
# enough positive proposals to fill this and keep a positive:negative
# ratio of 1:3. You can increase the number of proposals by adjusting
# the RPN NMS threshold.
#TRAIN_ROIS_PER_IMAGE = 200
rois: [TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)]
class_ids: [TRAIN_ROIS_PER_IMAGE]. Integer class IDs.
bboxes: [TRAIN_ROIS_PER_IMAGE, NUM_CLASSES, (y, x, log(h), log(w))]. Class-specific
        bbox refinements.
masks: [TRAIN_ROIS_PER_IMAGE, height, width, NUM_CLASSES). Class specific masks cropped
       to bbox boundaries and resized to neural network output size.

3、compile(self, learning_rate, momentum):
- Gets the model ready for training. Adds losses, regularization, and metrics. Then calls the Keras compile() function.
- 指定训练的学习率损失、添加五种损失最终结果计算、L2 Regularization正则化，并且会在函数中调用keras.compile()函数
- loss_names = ["rpn_class_loss", "rpn_bbox_loss","mrcnn_class_loss", "mrcnn_bbox_loss", "mrcnn_mask_loss"]
4、model.fit就是正常的训练函数

6.9.1.7 案例：网络数据Anchor以及目标值设置分析

为了更好理解maskrcnn中的RPN结构设置的acnhor。这里对于anchor做输出分析,这里使用上面dataset中使用的方法。在ballon_dataset.py文件中进行测试。

1、utils.generate_pyramid_anchors
- 需要rpn anchor的大小，rpnanchor的长宽比率等参数

# 需要创建一个配置来进行打印
  # 3、计算anchor结果
    config = BalloonConfig()
    # 添加一个特征图大小属性（这里做测试需要设置一下才能用generate_pyramid_anchors），dataset中直接计算出来特征图大小
    config.BACKBONE_SHAPES = [[256, 256], [128, 128], [64, 64], [32, 32], [16, 16]]
    anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES,
                                             config.RPN_ANCHOR_RATIOS,
                                             config.BACKBONE_SHAPES,
                                             config.BACKBONE_STRIDES,
                                             config.RPN_ANCHOR_STRIDE)

    # 打印anchor相关信息
    num_levels = len(config.BACKBONE_SHAPES)
    anchors_per_cell = len(config.RPN_ANCHOR_RATIOS)
    print("Count: ", anchors.shape[0])
    print("Scales: ", config.RPN_ANCHOR_SCALES)
    print("ratios: ", config.RPN_ANCHOR_RATIOS)
    print("Anchors per Cell: ", anchors_per_cell)
    print("Levels: ", num_levels)
    anchors_per_level = []
    for l in range(num_levels):
        num_cells = config.BACKBONE_SHAPES[l][0] * config.BACKBONE_SHAPES[l][1]
        anchors_per_level.append(anchors_per_cell * num_cells // config.RPN_ANCHOR_STRIDE ** 2)
        print("Anchors in Level {}: {}".format(l, anchors_per_level[l]))

总共给结果：

# 总共anchor数量
Count:  261888
Scales:  (32, 64, 128, 256, 512)
ratios:  [0.5, 1, 2]
Anchors per Cell:  3
Levels:  5
# 第一层特征图的anchor数量
Anchors in Level 0: 196608
Anchors in Level 1: 49152
Anchors in Level 2: 12288
Anchors in Level 3: 3072
Anchors in Level 4: 768

2、数据集的准备阶段结果分析

（1）对于RPN产生的261888去做感兴趣区域计算得到默认200个输入到msrcnn中。
（2）对默认标记的RPN样本统计正负样本，正样本位置进行refine显示。总共256个
（3）ROIs感兴趣区域会进行正负样本标记。总共200个

（1）获取anchor的代码以及进行获取感兴趣区域结果

model.DataGenerator会在内部返回结果

# 4、anchor到rois感兴趣区域
    from mrcnn import model
    random_rois = 2000
    # 获取4个数据测试看结果
    g = model.DataGenerator(dataset_train, config,
                            shuffle=True,
                            random_rois=random_rois,
                            detection_targets=True)
    # 针对数据集的GT计算得到rpn的预测框以及mrcnn的输出预测框
    if random_rois:
        [normalized_images, image_meta, rpn_match, rpn_bbox, gt_class_ids, gt_boxes, gt_masks, rpn_rois, rois], \
        [mrcnn_class_ids, mrcnn_bbox, mrcnn_mask] = g.__getitem__(0)

        # 打印rois以及mrcnn
        log("rois", rois)
        log("mrcnn_class_ids", mrcnn_class_ids)
        log("mrcnn_bbox", mrcnn_bbox)
        log("mrcnn_mask", mrcnn_mask)

    # 打印GT结果
    log("gt_class_ids", gt_class_ids)
    log("gt_boxes", gt_boxes)
    log("gt_masks", gt_masks)
    log("rpn_match", rpn_match, )
    log("rpn_bbox", rpn_bbox)
    image_id = image_meta[0][0]
    print("image_id: ", image_id)

打印输出结果

# 1、RPN的anchor过滤之后传入maskrcnn阶段感兴趣区域，这里2指的样本数默认最小返回数量
rois                     shape: (2, 200, 4)           min:    0.00000  max: 1021.00000  int32
mrcnn_class_ids          shape: (2, 200, 1)           min:    0.00000  max:    1.00000  int32
mrcnn_bbox               shape: (2, 200, 2, 4)        min:   -3.46591  max:    2.96960  float32
mrcnn_mask               shape: (2, 200, 28, 28, 2)   min:    0.00000  max:    1.00000  float32
# 2、msrcnn最终会100个框
gt_class_ids             shape: (2, 100)              min:    0.00000  max:    1.00000  int32
gt_boxes                 shape: (2, 100, 4)           min:    0.00000  max:  985.00000  int32
gt_masks                 shape: (2, 56, 56, 100)      min:    0.00000  max:    1.00000  bool
# 4、rpn的anchor标记结果  
rpn_match                shape: (2, 261888, 1)        min:   -1.00000  max:    1.00000  int32
# 每张图RPN 使用256个样本，
rpn_bbox                 shape: (2, 256, 4)           min:   -1.95943  max:    1.38107  float64
# 此图片ID
image_id:  17.0

（2）然后对于标记之后的结果，做正负样本数量统计，并且对于正样本的数据做微调之后结果打印在图片中。负样本同时也打印在图片中显示出来。

# 5、对于其中一张图片进行anchor的坐标转换显示
    # 获取正负样本匹配结果
    b = 0
    positive_anchor_ids = np.where(rpn_match[b] == 1)[0]
    print("Positive anchors: {}".format(len(positive_anchor_ids)))
    negative_anchor_ids = np.where(rpn_match[b] == -1)[0]
    print("Negative anchors: {}".format(len(negative_anchor_ids)))
    neutral_anchor_ids = np.where(rpn_match[b] == 0)[0]
    print("Neutral anchors: {}".format(len(neutral_anchor_ids)))

    # 对于标记为正样本anchor进行位置refine计算
    indices = np.where(rpn_match[b] == 1)[0]
    refined_anchors = utils.apply_box_deltas(anchors[indices], rpn_bbox[b, :len(indices)] * config.RPN_BBOX_STD_DEV)
    log("anchors", anchors)
    log("refined_anchors", refined_anchors)

    # 获取其中默认第一张图片的数据，打印正样本标记结果和负样本标记结果
    sample_image = model.unmold_image(normalized_images[b], config)
    # ROI的类别数量
    for c, n in zip(dataset_train.class_names, np.bincount(mrcnn_class_ids[b].flatten())):
        if n:
            print("{:23}: {}".format(c[:20], n))

    # 展示正样本输出结果
    import matplotlib.pyplot as plt
    fig, ax = plt.subplots(1, figsize=(16, 16))
    visualize.draw_boxes(sample_image, boxes=anchors[positive_anchor_ids],
                         refined_boxes=refined_anchors, ax=ax)

输出结果：

# RPN标记正样本数量 10+246=256
Positive anchors: 10
# RPN标记负样本数量
Negative anchors: 246
# RPN标记无效框数量
Neutral anchors: 261632
# anchors总数量
anchors                  shape: (261888, 4)           min: -362.03867  max: 1322.03867  float64
# 正样本进行refined之后的anchor数量
refined_anchors          shape: (10, 4)               min:    1.00000  max:  826.00000  float32  
# 其中这200个bbox框的类别结果
# 背景数量
BG                     : 176
# 气球数量
balloon                : 24

效果，这是我们对于训练数据中一张图片之后的筛选的结果

显示负样本标记的246个结果

# 展示负样本输出
visualize.draw_boxes(sample_image, boxes=anchors[negative_anchor_ids])

这里换了一张图（所以负样本数量不一定是上面的246）

其中没有标记的anchor是不会参与网络训练的

（3）Rois：msrcnn的感兴趣区域标记显示

print("Positive ROIs: ", mrcnn_class_ids[b][mrcnn_class_ids[b] > 0].shape[0])
print("Negative ROIs: ", mrcnn_class_ids[b][mrcnn_class_ids[b] == 0].shape[0])
print("Positive Ratio: {:.2f}".format(
    mrcnn_class_ids[b][mrcnn_class_ids[b] > 0].shape[0] / mrcnn_class_ids[b].shape[0]))

结果为

Positive ROIs:  27
Negative ROIs:  173
Positive Ratio: 0.14

6.9.1.6 案例：模型加载训练过程代码编写

创建模型判断参数如果是训练，调用训练模型

# 3、创建模型
if args.command == "train":
    model = maskrcnn.MaskRCNN(mode="training", config=config,
                              model_dir=args.logs)
else:
    model = maskrcnn.MaskRCNN(mode="inference", config=config,
                              model_dir=args.logs)
# 4、训练测试逻辑实现
if args.command == "train":
    # 选择加载的预训练模型类别并下载
    if args.weights.lower() == "imagenet":
        weights_path = model.get_imagenet_weights()
    else:
        raise ValueError("提供一种预训练模型种类")

    # 加载预训练模型权重
    print("Loading weights ", weights_path)
    model.load_weights(weights_path, by_name=True)

    # 进行训练
    train(model)

（1）加载预训练模型，模型中提供了多种预训练读取使用方法，在这里我们使用imagenet的模型读取

model.get_imagenet_weights()：msrcnn模型中封装的此方法会指定模型下载
- 然后通过model.load_weights加载模型权重（此方法也是msrcnn模型本身封装的函数）

当指定好预训练模型的时候，会从一些官方释放的预训练模型路径中下载，下载到本地：/root/.keras/models/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5路径中，所以确保家目录下的.keras有足够的空间存储模型。

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
94658560/94653016 [==============================] - 472s 5us/step
Loading weights  /root/.keras/models/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5

注：或者指定COCO预训练路径

weights_path = COCO_WEIGHTS_PATH

通过msrcnn中的utils中下面方法获取权重加载到模型中

utils.download_trained_weights(weights_path)

（2）其中train函数中是数据读取和模型训练代码

def train(model):
    """训练模型逻辑
    :param model: maskrcnn模型
    :return:
    """
    # 1、获取分割数据集
    dataset_train = BalloonDataset()
    dataset_train.load_balloon(args.dataset, "train")
    dataset_train.prepare()

    # 2、获取分割验证数据集
    dataset_val = BalloonDataset()
    dataset_val.load_balloon(args.dataset, "val")
    dataset_val.prepare()

    # 3、开始训练
    print("开始训练网络:")
    model.train(dataset_train, dataset_val,
                learning_rate=config.LEARNING_RATE,
                epochs=20,
                layers='heads')

模型训练保存到./logs/中模型文件，这里提供了训练好的版本mask_rcnn_balloon.h5。方便进行测试使用。

6.9.1.7 案例：maskrcnn网络结构流程源码分析

其中源码中导入会有些库进行简写

import tensorflow.keras as keras
import tensorflow.keras.backend as K
import tensorflow.keras.layers as KL
import tensorflow.keras.layers as KE
import tensorflow.keras.utils as KU
import tensorflow.keras.models as KM

1、构造输出

input_image = KL.Input(
    shape=[None, None, config.IMAGE_SHAPE[2]], name="input_image")
input_image_meta = KL.Input(shape=[config.IMAGE_META_SIZE],
                            name="input_image_meta")

2、如果训练的话，构造RPN层的anchor输入样本及其位置、构造masrcnn的GT输入，并对坐标进行normalize，如果使用了USE_MINI_MASK=True，那么input_gt_masks就必须是配置文件中的[56, 56]大小

# RPN GT
            input_rpn_match = KL.Input(
                shape=[None, 1], name="input_rpn_match", dtype=tf.int32)
            input_rpn_bbox = KL.Input(
                shape=[None, 4], name="input_rpn_bbox", dtype=tf.float32)

            # Detection GT (class IDs, bounding boxes, and masks)
            # 1. GT Class IDs (zero padded)
            input_gt_class_ids = KL.Input(
                shape=[None], name="input_gt_class_ids", dtype=tf.int32)
            # 2. GT Boxes in pixels (zero padded)
            # [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates
            input_gt_boxes = KL.Input(
                shape=[None, 4], name="input_gt_boxes", dtype=tf.float32)
            # Normalize coordinates
            gt_boxes = KL.Lambda(lambda x: norm_boxes_graph(
                x, K.shape(input_image)[1:3]))(input_gt_boxes)
            # 3. GT Masks (zero padded)
            # [batch, height, width, MAX_GT_INSTANCES]
            if config.USE_MINI_MASK:
                input_gt_masks = KL.Input(
                    shape=[config.MINI_MASK_SHAPE[0],
                           config.MINI_MASK_SHAPE[1], None],
                    name="input_gt_masks", dtype=bool)
            else:
                input_gt_masks = KL.Input(
                    shape=[config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1], None],
                    name="input_gt_masks", dtype=bool)

3、如果测试，直接构造anchors的输入

# Anchors in normalized coordinates
input_anchors = KL.Input(shape=[None, 4], name="input_anchors")

4、构造前面Resnet网络输入，输出

_, C2, C3, C4, C5 = resnet_graph(input_image, config.BACKBONE,
                                 stage5=True, train_bn=config.TRAIN_BN)

5、Reset多级特征输出，经过FPN得到5层特征输出P2、P3、P4、P5、P6

# Top-down Layers
        # TODO: add assert to varify feature map sizes match what's in config
        P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c5p5')(C5)
        P4 = KL.Add(name="fpn_p4add")([
            KL.UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5),
            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c4p4')(C4)])
        P3 = KL.Add(name="fpn_p3add")([
            KL.UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4),
            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c3p3')(C3)])
        P2 = KL.Add(name="fpn_p2add")([
            KL.UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3),
            KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c2p2')(C2)])
        # Attach 3x3 conv to all P layers to get the final feature maps.
        P2 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p2")(P2)
        P3 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p3")(P3)
        P4 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p4")(P4)
        P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p5")(P5)
        # P6 is used for the 5th anchor scale in RPN. Generated by
        # subsampling from P5 with stride of 2.
        P6 = KL.MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5)

6、构造RPN的输入以及maskrcnn的输入

 # Note that P6 is used in RPN, but not in the classifier heads.
        rpn_feature_maps = [P2, P3, P4, P5, P6]
        mrcnn_feature_maps = [P2, P3, P4, P5]

7、如果是训练，就获取每一层若干anchors合并（就是前面演示的结果），做形状改变。测试直接获取input_anchors

# Anchors
if mode == "training":
    anchors = self.get_anchors(config.IMAGE_SHAPE)
    # Duplicate across the batch dimension because Keras requires it
    # TODO: can this be optimized to avoid duplicating the anchors?
    anchors = np.broadcast_to(anchors, (config.BATCH_SIZE,) + anchors.shape)
    # A hack to get around Keras's bad support for constants
    anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)
else:
    anchors = input_anchors

8、构造RPN网络，对每一个特征图，都做输入得到输出结果，最终得到RPN网络的输出概率、类别以及网络预测bbox框

# RPN Model
rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE,
                      len(config.RPN_ANCHOR_RATIOS), config.TOP_DOWN_PYRAMID_SIZE)
config.TOP_DOWN_PYRAMID_SIZE)
        # Loop through pyramid layers
        layer_outputs = []  # list of lists
        for p in rpn_feature_maps:
            layer_outputs.append(rpn([p]))
output_names = ["rpn_class_logits", "rpn_class", "rpn_bbox"]
        outputs = list(zip(*layer_outputs))
        outputs = [KL.Concatenate(axis=1, name=n)(list(o))
                   for o, n in zip(outputs, output_names)]

        rpn_class_logits, rpn_class, rpn_bbox = outputs

9、产生proposals建议框，根据anchors和网络预测输出，产生配置指定过滤到2000个

ROIs kept after non-maximum suppression (training and inference)
- POST_NMS_ROIS_TRAINING = 2000
- POST_NMS_ROIS_INFERENCE = 1000

proposal_count = config.POST_NMS_ROIS_TRAINING if mode == "training"\
    else config.POST_NMS_ROIS_INFERENCE
rpn_rois = ProposalLayer(
    proposal_count=proposal_count,
    nms_threshold=config.RPN_NMS_THRESHOLD,
    name="ROI",
    config=config)([rpn_class, rpn_bbox, anchors])

10、如果是训练过程

DetectionTargetLayer(config, name="proposal_targets")
- 对于建议框，以及输入的GT结果，产生用于目标区域框、类别、位置、mask
- 设置成config.MASK_SHAPE=[28, 28]

if mode == "training":
    ....
    # Generate detection targets
    # Subsamples proposals and generates target outputs for training
    # Note that proposal class IDs, gt_boxes, and gt_masks are zero
    # padded. Equally, returned rois and targets are zero padded.
    rois, target_class_ids, target_bbox, target_mask =\
        DetectionTargetLayer(config, name="proposal_targets")([
            target_rois, input_gt_class_ids, gt_boxes, input_gt_masks])

11、Network Heads（第二阶段的分类、回归、mask）

第二阶段maskrcnn的fpn_classifier_graph函数输入感兴趣区域进行计算分类回归输出
- Builds the computation graph of the feature pyramid network classifier and regressor heads.
第二阶段maskrcnn的mask分支，得到mrcnn_mask输出结果
- MASK_POOL_SIZE=[14, 14]

# TODO: verify that this handles zero padded ROIs
#Returns:
#        logits: [batch, num_rois, NUM_CLASSES] classifier logits (before softmax)
#        probs: [batch, num_rois, NUM_CLASSES] classifier probabilities
#        bbox_deltas: [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))] Deltas to apply to
#                     proposal boxes
mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\
    fpn_classifier_graph(rois, mrcnn_feature_maps, input_image_meta,
                         config.POOL_SIZE, config.NUM_CLASSES,
                         train_bn=config.TRAIN_BN,
                         fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE)

# Builds the computation graph of the mask head of Feature Pyramid Network.
# Returns: Masks [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, NUM_CLASSES]
mrcnn_mask = build_fpn_mask_graph(rois, mrcnn_feature_maps,
                                  input_image_meta,
                                  config.MASK_POOL_SIZE,
                                  config.NUM_CLASSES,
                                  train_bn=config.TRAIN_BN)

12、损失计算以及模型最终构建

# TODO: clean up (use tf.identify if necessary)
output_rois = KL.Lambda(lambda x: x * 1, name="output_rois")(rois)

# Losses
rpn_class_loss = KL.Lambda(lambda x: rpn_class_loss_graph(*x), name="rpn_class_loss")(
    [input_rpn_match, rpn_class_logits])
rpn_bbox_loss = KL.Lambda(lambda x: rpn_bbox_loss_graph(config, *x), name="rpn_bbox_loss")(
    [input_rpn_bbox, input_rpn_match, rpn_bbox])
class_loss = KL.Lambda(lambda x: mrcnn_class_loss_graph(*x), name="mrcnn_class_loss")(
    [target_class_ids, mrcnn_class_logits, active_class_ids])
bbox_loss = KL.Lambda(lambda x: mrcnn_bbox_loss_graph(*x), name="mrcnn_bbox_loss")(
    [target_bbox, target_class_ids, mrcnn_bbox])
mask_loss = KL.Lambda(lambda x: mrcnn_mask_loss_graph(*x), name="mrcnn_mask_loss")(
    [target_mask, target_class_ids, mrcnn_mask])

# Model
inputs = [input_image, input_image_meta,
          input_rpn_match, input_rpn_bbox, input_gt_class_ids, input_gt_boxes, input_gt_masks]
if not config.USE_RPN_ROIS:
    inputs.append(input_rois)
outputs = [rpn_class_logits, rpn_class, rpn_bbox,
           mrcnn_class_logits, mrcnn_class, mrcnn_bbox, mrcnn_mask,
           rpn_rois, output_rois,
           rpn_class_loss, rpn_bbox_loss, class_loss, bbox_loss, mask_loss]
model = KM.Model(inputs, outputs, name='mask_rcnn')

6.9.2 模型预测流程

这里重点在于对图片的读取预测和显示。另外也提供了opencv读取视频分割的结果（了解流程即可）

完成过程代码如下

from utils.draw_segmention_utils import detect_and_draw_segmentation

  elif args.command == "test":
        model.load_weights(args.model, by_name=True)
        # 进行检测
        detect_and_draw_segmentation(model,
                                     image_path=args.image,
                                     video_path=args.video)
    else:
        print("'{}' 传入参数无法识别. "
              "请使用 'train' or 'test'".format(args.command))

其中detect_and_draw_segmentation()方法中提供了对图片或者视频的预测标记显示过程。

我们在根目录中创建的utils目录中添加一个draw_segmention_utils.py的文件，用于预测流程中的绘制图片与视频的工具函数

主函数逻辑:
- 判断参数传入是否是图片还是视频，分别处理（使用skimage模块读取处理）
- 1、图片读取、检测结果、分割区域绘制、保存输出
- 2、视频的读取、读取每一帧绘制每一帧结果、返回数据到指定视频目录（了解过程）

import numpy as np
import skimage

def detect_and_draw_segmentation(args, model):
    """
    检测结果并画出分割区域
    :param args: 命令行参数
    :param model: 模型
    :return:
    """
    if not args.image or not args.video:
        raise ValueError("请提供要检测的图片或者视频路径之一")

    # 传入的图片
    if args.image:
        print("正在分割图片：{}".format(args.image))
        # 1、读取图片
        image = skimage.io.imread(args.image)
        # 2、模型检测返回结果
        r = model.detect([image], verbose=1)[0]
        # 3、画出分割区域
        segmentation = draw_segmentation(image, r['masks'])
        # 4、保存输出
        file_name = "./images/segment_{}".format(args.image.split("/")[-1])
        skimage.io.imsave(file_name, segmentation)

    if args.video:
        import cv2
        # 1、获取视频的读取
        vcapture = cv2.VideoCapture(args.video)
        width = int(vcapture.get(cv2.CAP_PROP_FRAME_WIDTH))
        height = int(vcapture.get(cv2.CAP_PROP_FRAME_HEIGHT))
        fps = vcapture.get(cv2.CAP_PROP_FPS)

        # 2、定义video writer后续写入
        file_name = "./images/segmentation_{}".format(args.video.split("/")[-1])
        vwriter = cv2.VideoWriter(file_name,
                                  cv2.VideoWriter_fourcc(*'mp4v'),
                                  fps, (width, height))

        # 3、循环获取每帧数据进行处理，完成之后写入本地文件
        count = 0
        success = True
        while success:
            print("帧数: ", count)
            # 读取图片
            success, image = vcapture.read()
            if success:
                # OpenCV 返回的BGR格式转换成RGB
                image = image[..., ::-1]
                # 模型检测mask
                r = model.detect([image], verbose=0)[0]
                # 画出区域
                segmentation = draw_segmentation(image, r['masks'])
                # RGB -> BGR
                segmentation = segmentation[..., ::-1]
                # 添加这张图到video writer
                vwriter.write(segmentation)
                count += 1
        vwriter.release()

    print("保存到检测结果到路径文件：", file_name)

其中涉及到对每一张图片的分割区域以及图片绘制

def draw_segmentation(image, mask):
    """
    对图片进行分割区域的标记
    :param image: 输出图片 RGB image [height, width, 3]
    :param mask: 分割区域[height, width, instance count]
    :return: 返回黑白图片，并且将分割区域保留原来的颜色
    """
    # 1、将彩色图片变成灰度图，并保留image以及同份灰色的图片
    # 这里经过两次转变目的，gray必须有三个通道才能与后面np.where(mask, image, gray)进行设置得到segmentation
    gray = skimage.color.gray2rgb(skimage.color.rgb2gray(image)) * 255

    # 2、将彩色格式中mask部分保留其余部分都设置成gray
    if mask.shape[-1] > 0:
        # 如果多个物体，要将预测结果的多个物体的mask相加，得到一张mask
        mask = (np.sum(mask, -1, keepdims=True) >= 1)
        # 讲Mask中为1的设置成图片原色，0的设置成gray对应的
        segmentation = np.where(mask, image, gray).astype(np.uint8)
    else:
        segmentation = gray.astype(np.uint8)
    return segmentation

使用函数解释

from skimage import io, data, color
# 一张彩色图片转换为灰度图后，它的类型就由unit8变成了float
img_gray = color.rgb2gray(img)
# 再次转换会有损失的
image2 = color.gray2rgb(img_gray)
# API
skimage.color.gray2rgb(image, alpha=None)[source]
Create an RGB representation of a gray-level image.
Parameters
Input 
image of shape (M[, N][, P]).
Returns
rgbndarray
RGB image of shape (M[, N][, P], 3).

图像数据类型

在skimage中，一张图片以numpy数组形式存储，数组的数据类型有很多中，相互之间可以转换，数据类型以及取值范围如下表所示

数据类型    数值范围
uint8    0 to 255
uint16    0 to 65535
float16    半精度浮点数：16位，正负号1位，指数5位，精度10位
float32    单精度浮点数：32位，正负号1位，指数8位，精度23位
float64    双精度浮点数：64位，正负号1位，指数11位，精度52位

我们读取提供的测试文件如项目image目录下两个文件

parser.add_argument('--image', type=str, default='./images/2917282960_06beee649a_b.jpg',
                    help='需要进行检测分割的图片目录')
parser.add_argument('--video', type=str, default='./images/v0200fd10000bq043q9pskdh7ri20vm0.MP4',
                    help='需要进行检测分割的视频目录')

最终测试结果如项目之前所示（当然如果有需要可以使用前面介绍的绘制bbox工具，将预测框也绘制出来）：

6.9.3 小结

maskrcnn中的源码编译训练流程
maskrcnn中的anchor设置以及计算
模型的训练预测流程
完成maskrcnn指定气球分割数据集模型训练
完成maskrcnn指定图片或视频预测输出