6.9 Mask RCNN分割案例2
学习目标
- 目标
- 知道maskrcnn中的源码编译训练流程
- 知道maskrcnn中的anchor设置以及计算
- 掌握模型的训练预测流程
- 应用
- 应用完成maskrcnn指定气球分割数据集模型训练
- 应用完成maskrcnn指定图片或视频预测输出
6.9.1 项目步骤
步骤
- 1、数据集读取处理和准备
- 实现数据标签文件的读取
- 2、模型配置文件解析与修改、模型预训练模型加载、模型构建
- maskrcnn模型源码中Sequence封装数据集类使用
- maskrcnn配置介绍
- 模型文件过程使用源码解析
- 3、模型训练过程实现
- 训练代码封装介绍
- 4、模型测试过程实现
- 图片预测结果处理
6.9.1.1 模型分析以及模型训练流程实现
- 步骤分析
- 1、进行参数传入判断
- 2、配置模型的参数、数据集的训练读取配置
- 3、创建模型
- 4、训练测试逻辑实现
完整代码实现过程
args = parser.parse_args()
# 1、进行参数传入判断
if args.command == "train":
assert args.dataset, "指定训练的时候必须传入 --dataset数据目录"
elif args.command == "test":
assert args.image or args.video,\
"指定测试的时候必须提供图片或者视频"
# 2、配置模型的参数、数据集的训练读取配置
if args.command == "train":
config = BalloonConfig()
else:
# 测试的配置修改:设置batch_size为1,Batch size = GPU_COUNT * IMAGES_PER_GPU
class InferenceConfig(BalloonConfig):
GPU_COUNT = 1
IMAGES_PER_GPU = 1
config = InferenceConfig()
config.display()
# 3、创建模型
if args.command == "train":
model = maskrcnn.MaskRCNN(mode="training", config=config,
model_dir=args.logs)
else:
model = maskrcnn.MaskRCNN(mode="inference", config=config,
model_dir=args.logs)
# 4、训练测试逻辑实现
if args.command == "train":
# 选择加载的预训练模型类别并下载
if args.weights.lower() == "imagenet":
weights_path = model.get_imagenet_weights()
else:
raise ValueError("提供一种预训练模型种类")
# 加载预训练模型权重
print("Loading weights ", weights_path)
model.load_weights(weights_path, by_name=True)
# 进行训练
train(model)
elif args.command == "test":
model.load_weights(args.model, by_name=True)
# 进行检测
detect_and_draw_segmentation(args, model)
else:
print("'{}' 传入参数无法识别. "
"请使用 'train' or 'test'".format(args.command))
6.9.1.2 模型配置文件

- 其中config.py是模型的配置文件,我们可以根据自己的需求训练的需求进行修改
- maskrcnn参数等设置众多,通常提供一个参数配置更好
- 其中某些重要的配置我们进行介绍
- 1、训练参数配置
- 2、测试参数配置
- 3、学习率相关设置
class Config(object):
"""Base configuration class. For custom configurations, create a
sub-class that inherits from this one and override properties
that need to be changed.
"""
# 1、训练配置
# NUMBER OF GPUs to use. When using only a CPU, this needs to be set to 1.
# 如果设置大于1,多个GPU进行并行运算,源码中parallel_model.py使用1.xAPI进行多GPU计算
GPU_COUNT = 1
# 每个GPU训练的图片数量
# A 12GB GPU can typically handle 2 images of 1024x1024px.
# Adjust based on your GPU memory and image sizes. Use the highest
# number that your GPU can handle for best performance.
IMAGES_PER_GPU = 2
# 一个epoch的训练步数
# This doesn't need to match the size of the training set. Tensorboard
# updates are saved at the end of each epoch, so setting this to a
# smaller number means getting more frequent TensorBoard updates.
# Validation stats are also calculated at each epoch end and they
# might take a while, so don't set this too small to avoid spending
# a lot of time on validation stats.
STEPS_PER_EPOCH = 1000
# Number of validation steps to run at the end of every training epoch.
# A bigger number improves accuracy of validation stats, but slows
# down the training.
VALIDATION_STEPS = 50
# 主网络架构
# Supported values are: resnet50, resnet101.
# You can also provide a callable that should have the signature
# of model.resnet_graph. If you do so, you need to supply a callable
# to COMPUTE_BACKBONE_SHAPE as well
BACKBONE = "resnet101"
# 基于resnet101架构的图像金字塔FPN到达的每层 步长
BACKBONE_STRIDES = [4, 8, 16, 32, 64]
# Size of the fully-connected layers in the classification graph
FPN_CLASSIF_FC_LAYERS_SIZE = 1024
# Size of the top-down layers used to build the feature pyramid
TOP_DOWN_PYRAMID_SIZE = 256
# 总类别个数 (including background)
NUM_CLASSES = 1 # Override in sub-classes
# RPN anchor的面积根号设置
RPN_ANCHOR_SCALES = (32, 64, 128, 256, 512)
# anchor的比率用于设置长宽 (width/height)
# A value of 1 represents a square anchor, and 0.5 is a wide anchor
RPN_ANCHOR_RATIOS = [0.5, 1, 2]
# Anchor 设置步长,如果为2跳过一些特征层设置anchor
# If 1 then anchors are created for each cell in the backbone feature map.
# If 2, then anchors are created for every other cell, and so on.
RPN_ANCHOR_STRIDE = 1
# 过滤RPN proposals的NMS阈值,值越大产生更多的建议框
# You can increase this during training to generate more propsals.
RPN_NMS_THRESHOLD = 0.7
# 每张图产生多少anchors用于RPN training
RPN_TRAIN_ANCHORS_PER_IMAGE = 256
# 经过tf.nn.top_k筛选之后并且在 non-maximum suppression进行之前的ROIs 数量
PRE_NMS_LIMIT = 6000
# 在non-maximum suppression ROIs 的数量(training and inference)
POST_NMS_ROIS_TRAINING = 2000
POST_NMS_ROIS_INFERENCE = 1000
# Input image resizing,默认square模式,设置成[max_dim, max_dim]
# square: Resize and pad with zeros to get a square image
# of size [max_dim, max_dim].
IMAGE_RESIZE_MODE = "square"
IMAGE_MIN_DIM = 800
IMAGE_MAX_DIM = 1024
# 每个image提供给classifier/mask heads中的rois数量
# The Mask RCNN paper uses 512 but often the RPN doesn't generate
# enough positive proposals to fill this and keep a positive:negative
# ratio of 1:3. You can increase the number of proposals by adjusting
# the RPN NMS threshold.
TRAIN_ROIS_PER_IMAGE = 200
# ROIs 用于训练 classifier/mask heads的正样本比率
ROI_POSITIVE_RATIO = 0.33
# ROIs池化层大小
POOL_SIZE = 7
MASK_POOL_SIZE = 14
# 输出mask的大小
# To change this you also need to change the neural network mask branch
MASK_SHAPE = [28, 28]
# 每张图的GT实例数量的最大值
MAX_GT_INSTANCES = 100
# 2、检测的配置
# 最后测试检测的时候实例数量100
DETECTION_MAX_INSTANCES = 100
# Minimum probability value to accept a detected instance
# ROIs below this threshold are skipped
DETECTION_MIN_CONFIDENCE = 0.7
# 用于检测的Non-maximum suppression阈值
DETECTION_NMS_THRESHOLD = 0.3
# 3、学习率设置相关设置
# The Mask RCNN paper uses lr=0.02, but on TensorFlow it causes
# weights to explode. Likely due to differences in optimizer
# implementation.
LEARNING_RATE = 0.001
LEARNING_MOMENTUM = 0.9
# Weight decay regularization
WEIGHT_DECAY = 0.0001
# 损失计算公式分配权重
# Loss weights for more precise optimization.
# Can be used for R-CNN training setup.
LOSS_WEIGHTS = {
"rpn_class_loss": 1.,
"rpn_bbox_loss": 1.,
"mrcnn_class_loss": 1.,
"mrcnn_bbox_loss": 1.,
"mrcnn_mask_loss": 1.
}
# 梯度截断值
GRADIENT_CLIP_NORM = 5.0
其中有配置的几个方法,通过display显示模型的当前配置
def to_dict(self):
return {a: getattr(self, a)
for a in sorted(dir(self))
if not a.startswith("__") and not callable(getattr(self, a))}
def display(self):
"""Display Configuration values."""
print("\nConfigurations:")
for key, val in self.to_dict().items():
print(f"{key:30} {val}")
print("\n")
6.9.1.3 案例:气球数据集配置代码编写
在balloon_dataset中我们添加自己数据集需要的配置,如下
from mrcnn.config import Config
class BalloonConfig(Config):
"""继承MaskRCNN的模型配置信息
修改其中需要的训练集数据信息
"""
# 给配置一个名称
NAME = "balloon"
IMAGES_PER_GPU = 2
# 类别数量(包括背景),气球类别+1
NUM_CLASSES = 1 + 1
# 一个epoch的步数
STEPS_PER_EPOCH = 100
# 检测的时候过滤置信度的阈值
DETECTION_MIN_CONFIDENCE = 0.9
然后在训练过程balloon_main.py中加入以下获取配置等代码:
from utils.balloon_dataset import BalloonDataset, BalloonConfig
args = parser.parse_args()
# 1、进行参数传入判断
if args.command == "train":
assert args.dataset, "指定训练的时候必须传入 --dataset数据目录"
elif args.command == "test":
assert args.image or args.video,\
"指定测试的时候必须提供图片或者视频"
# 2、配置模型的参数、数据集的训练读取配置
if args.command == "train":
config = BalloonConfig()
else:
# 测试的配置修改:设置batch_size为1,Batch size = GPU_COUNT * IMAGES_PER_GPU
class InferenceConfig(BalloonConfig):
GPU_COUNT = 1
IMAGES_PER_GPU = 1
config = InferenceConfig()
# 显示配置
config.display()
其中初始导入包以及运行命令行参数如下:
import numpy as np
import skimage.draw
import argparse
from mrcnn import model as maskrcnn
from utils.balloon_dataset import BalloonDataset, BalloonConfig
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
# 命令行参数
parser = argparse.ArgumentParser(
description='气球分割模型maskrcnn训练')
parser.add_argument("--command", type=str, default='test',
help="'train' or 'test' 训练还是进行测试")
parser.add_argument('--dataset', type=str, default='./balloon_data',
help='气球分割数据集目录')
parser.add_argument('--weights', type=str, default='imagenet',
help="预训练模型权重"
"imagenet:https://github.com/fchollet/"
"deep-learning-models/releases/"
"download/v0.2/resnet50_weights"
"_tf_dim_ordering_tf_kernels_notop.h5")
parser.add_argument('--logs', type=str, default='./logs/',
help='打印日志目录')
parser.add_argument('--image', type=str, default='./images/2917282960_06beee649a_b.jpg',
help='需要进行检测分割的图片目录')
parser.add_argument('--video', type=str, default='./images/v0200fd10000bq043q9pskdh7ri20vm0.MP4',
help='需要进行检测分割的视频目录')
parser.add_argument('--model', type=str, default='./logs/mask_rcnn_balloon.h5',
help='指定测试使用的训练好的模型文件')
6.9.1.4 模型使用介绍
模型使用比较简答,直接通过导入model即可,使用细节如下
from mrcnn import model as maskrcnn
# 1、选用训练模式,加入配置以及模型保存目录
model = maskrcnn.MaskRCNN(mode="training", config=config,
model_dir=args.logs)
# 2、选用测试推理模式,加入配置以及模型保存目录
model = maskrcnn.MaskRCNN(mode="inference", config=config,
model_dir=args.logs)
那么其中MaskRCNN类提供了建立模型的一套过程,主要源代码过程以下几个步骤:
- 模型过程:
- 输入构建、构建GT、RPN模型搭建输出、通过ProposalLayer(源码类)产生感兴趣区域
- 计算5种损失、构建模型输入输出
1、模型数据读取与训练源码顺序分析
模型对于训练过程封装较深,所以在这里需要对源码做出相应解释,主要介绍有几个重要函数
- self.train():模型的训练逻辑
- 1、class DataGenerator(KU.Sequence):数据准备阶段构建generator
- 设置RPN训练目标框
- 2、self.compile:模型编译阶段
- 设置损失计算、正则化化
- 3、self.fit训练
- 1、class DataGenerator(KU.Sequence):数据准备阶段构建generator
在训练的时候我们只需要调用maskcnn中的train函数即可,我们这里对源代码中的train函数进行分析:
- 1、def train(self, train_dataset, val_dataset, learning_rate, epochs, layers,
augmentation=None, custom_callbacks=None, no_augmentation_sources=None):- train_dataset, val_dataset:训练、验证数据集dataset对象
- learning_rate:学习率
- epochs:迭代次数
- layers:选择那些层进行训练
- heads:RPN、classifier以及mask 网络进行训练
- custom_callbacks=None:自定义的训练回调函数
源码分析:
# 1、选择不同的层的名称,根据传入参数设置
layer_regex = {
# all layers but the backbone
"heads": r"(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
# From a specific Resnet stage and up
"3+": r"(res3.*)|(bn3.*)|(res4.*)|(bn4.*)|(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
"4+": r"(res4.*)|(bn4.*)|(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
"5+": r"(res5.*)|(bn5.*)|(mrcnn\_.*)|(rpn\_.*)|(fpn\_.*)",
# All layers
"all": ".*",
}
if layers in layer_regex.keys():
layers = layer_regex[layers]
# 2、Data获取,model中实现的 DataGenerator 类继承keras.utils.Sequence
train_generator = DataGenerator(train_dataset, self.config, shuffle=True,
augmentation=augmentation)
val_generator = DataGenerator(val_dataset, self.config, shuffle=True)
# Create log_dir if it does not exist
if not os.path.exists(self.log_dir):
os.makedirs(self.log_dir)
# Callbacks
callbacks = [
keras.callbacks.TensorBoard(log_dir=self.log_dir,
histogram_freq=0, write_graph=True, write_images=False),
keras.callbacks.ModelCheckpoint(self.checkpoint_path,
verbose=0, save_weights_only=True),
]
# Add custom callbacks to the list
if custom_callbacks:
callbacks += custom_callbacks
# 4、训练,指定训练的层,优化器等
log("\nStarting at epoch {}. LR={}\n".format(self.epoch, learning_rate))
log("Checkpoint Path: {}".format(self.checkpoint_path))
self.set_trainable(layers)
self.compile(learning_rate, self.config.LEARNING_MOMENTUM)
# Work-around for Windows: Keras fails on Windows when using
# multiprocessing workers. See discussion here:
# https://github.com/matterport/Mask_RCNN/issues/13#issuecomment-353124009
if os.name == 'nt':
workers = 0
else:
workers = multiprocessing.cpu_count()
self.keras_model.fit(
train_generator,
initial_epoch=self.epoch,
epochs=epochs,
steps_per_epoch=self.config.STEPS_PER_EPOCH,
callbacks=callbacks,
validation_data=val_generator,
validation_steps=self.config.VALIDATION_STEPS,
max_queue_size=100,
workers=workers,
use_multiprocessing=workers > 1,
)
self.epoch = max(self.epoch, epochs)
2、class DataGenerator(KU.Sequence):
初始化:def init(self, dataset, config, shuffle=True, augmentation=None,random_rois=0, detection_targets=False):
对于传入的Dataset对象建立序列数据,提供每批次数据给训练器
(1)主要根据配置文件产生RPN网络相应的anchor先验框的坐标(后面会调用显示)
self.backbone_shapes = compute_backbone_shapes(config, config.IMAGE_SHAPE) self.anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES, config.RPN_ANCHOR_RATIOS, self.backbone_shapes, config.BACKBONE_STRIDES, config.RPN_ANCHOR_STRIDE)(2)def getitem(self, idx):
- 产生第idx批次的数据。
return inputs, outputs。下面为返回的两个结果的解析
注:网络训练的时候只需要inputs即可。outputs输出默认为空,但是如果提供random_rois的值大于0的参数。那么DataGenerator将会过滤之后返回整个网络中第二阶段maskrcnn的需要的RoIs感兴趣框相关信息
- 因为训练期间网络在第一阶段产生bbox训练之后
# inputs返回值包含
- images: [batch, H, W, C]
- image_meta: [batch, (meta data)] Image details.
meta = np.array(
[image_id] + # size=1
list(original_image_shape) + # size=3
list(image_shape) + # size=3
list(window) + # size=4 (y1, x1, y2, x2) in image cooredinates
[scale] + # size=1
list(active_class_ids) # size=num_classes
)
- rpn_match: [batch, N] Integer (1=positive anchor, -1=negative, 0=neutral) # 政府严格不能
- rpn_bbox: [batch, N, (dy, dx, log(dh), log(dw))] Anchor bbox deltas.# anchor与GT进行偏移的值
- gt_class_ids: [batch, MAX_GT_INSTANCES] # GTclass IDs
- gt_boxes: [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)]# GT物体框位置
- gt_masks: [batch, height, width, MAX_GT_INSTANCES]. The height and width # GT mask目标值
are those of the image unless use_mini_mask is True, in which
case they are defined in MINI_MASK_SHAPE.
# outputs 默认为空,如果提供random_rois 大于0的参数
ouptuts=[batch_mrcnn_class_ids, batch_mrcnn_bbox, batch_mrcnn_mask]
其中在DataGenerator的返回批次数据中会调用下面函数。
(3)build_rpn_targets(image_shape, anchors, gt_class_ids, gt_boxes, config): anchor->bbox
- 对于目标GT以及RPN的众多acnhor,进行正负样本匹配,并且将转换极坐标到中心坐标
- 并且根据变换公式改变anchor,
# 返回结果
anchors: [num_anchors, (y1, x1, y2, x2)]
gt_class_ids: [num_gt_boxes] Integer class IDs.
gt_boxes: [num_gt_boxes, (y1, x1, y2, x2)]
Returns:
rpn_match: [N] (int32) matches between anchors and GT boxes.
1 = positive anchor, -1 = negative anchor, 0 = neutral
rpn_bbox: [N, (dy, dx, log(dh), log(dw))] Anchor bbox deltas.
# GT坐标变化公式
gt_h = gt[2] - gt[0]
gt_w = gt[3] - gt[1]
gt_center_y = gt[0] + 0.5 * gt_h
gt_center_x = gt[1] + 0.5 * gt_w
# Anchor的坐标变换
a_h = a[2] - a[0]
a_w = a[3] - a[1]
a_center_y = a[0] + 0.5 * a_h
a_center_x = a[1] + 0.5 * a_w
# 保存偏移结果
rpn_bbox[ix] = [
(gt_center_y - a_center_y) / a_h,
(gt_center_x - a_center_x) / a_w,
np.log(gt_h / a_h),
np.log(gt_w / a_w),
]
(4)build_detection_targets(rpn_rois, gt_class_ids, gt_boxes, gt_masks, config):(如果需要返回outputs就会调用该函数返回)
Generate targets for training Stage 2 classifier and mask heads.
# 训练期间不使用,是用作调试debug使用或者单独训练不带有RPN网络的maskrcnn结构时时使用 This is not used in normal training. It's useful for debugging or to train the Mask RCNN heads without using the RPN head.
# 输入Inputs:
rpn_rois: [N, (y1, x1, y2, x2)] proposal boxes.
gt_class_ids: [instance count] Integer class IDs
gt_boxes: [instance count, (y1, x1, y2, x2)]
gt_masks: [height, width, instance count] Ground truth masks. Can be full
size or mini-masks.
# 输出Returns:感兴趣区域以及感兴趣区域的框位置和mask
# 一张图片只产生200区域给maskrcnn训练
# Number of ROIs per image to feed to classifier/mask heads
# The Mask RCNN paper uses 512 but often the RPN doesn't generate
# enough positive proposals to fill this and keep a positive:negative
# ratio of 1:3. You can increase the number of proposals by adjusting
# the RPN NMS threshold.
#TRAIN_ROIS_PER_IMAGE = 200
rois: [TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)]
class_ids: [TRAIN_ROIS_PER_IMAGE]. Integer class IDs.
bboxes: [TRAIN_ROIS_PER_IMAGE, NUM_CLASSES, (y, x, log(h), log(w))]. Class-specific
bbox refinements.
masks: [TRAIN_ROIS_PER_IMAGE, height, width, NUM_CLASSES). Class specific masks cropped
to bbox boundaries and resized to neural network output size.
3、compile(self, learning_rate, momentum):
- Gets the model ready for training. Adds losses, regularization, and metrics. Then calls the Keras compile() function.
- 指定训练的学习率损失、添加五种损失最终结果计算、L2 Regularization正则化,并且会在函数中调用keras.compile()函数
- loss_names = ["rpn_class_loss", "rpn_bbox_loss","mrcnn_class_loss", "mrcnn_bbox_loss", "mrcnn_mask_loss"]
- 4、model.fit就是正常的训练函数
6.9.1.7 案例:网络数据Anchor以及目标值设置分析
为了更好理解maskrcnn中的RPN结构设置的acnhor。这里对于anchor做输出分析,这里使用上面dataset中使用的方法。在ballon_dataset.py文件中进行测试。
- 1、utils.generate_pyramid_anchors
- 需要rpn anchor的大小,rpnanchor的长宽比率等参数
# 需要创建一个配置来进行打印
# 3、计算anchor结果
config = BalloonConfig()
# 添加一个特征图大小属性(这里做测试需要设置一下才能用generate_pyramid_anchors),dataset中直接计算出来特征图大小
config.BACKBONE_SHAPES = [[256, 256], [128, 128], [64, 64], [32, 32], [16, 16]]
anchors = utils.generate_pyramid_anchors(config.RPN_ANCHOR_SCALES,
config.RPN_ANCHOR_RATIOS,
config.BACKBONE_SHAPES,
config.BACKBONE_STRIDES,
config.RPN_ANCHOR_STRIDE)
# 打印anchor相关信息
num_levels = len(config.BACKBONE_SHAPES)
anchors_per_cell = len(config.RPN_ANCHOR_RATIOS)
print("Count: ", anchors.shape[0])
print("Scales: ", config.RPN_ANCHOR_SCALES)
print("ratios: ", config.RPN_ANCHOR_RATIOS)
print("Anchors per Cell: ", anchors_per_cell)
print("Levels: ", num_levels)
anchors_per_level = []
for l in range(num_levels):
num_cells = config.BACKBONE_SHAPES[l][0] * config.BACKBONE_SHAPES[l][1]
anchors_per_level.append(anchors_per_cell * num_cells // config.RPN_ANCHOR_STRIDE ** 2)
print("Anchors in Level {}: {}".format(l, anchors_per_level[l]))
总共给结果:
# 总共anchor数量
Count: 261888
Scales: (32, 64, 128, 256, 512)
ratios: [0.5, 1, 2]
Anchors per Cell: 3
Levels: 5
# 第一层特征图的anchor数量
Anchors in Level 0: 196608
Anchors in Level 1: 49152
Anchors in Level 2: 12288
Anchors in Level 3: 3072
Anchors in Level 4: 768
2、数据集的准备阶段结果分析
- (1)对于RPN产生的261888去做感兴趣区域计算得到默认200个输入到msrcnn中。
- (2)对默认标记的RPN样本统计正负样本,正样本位置进行refine显示。总共256个
- (3)ROIs感兴趣区域会进行正负样本标记。总共200个
(1)获取anchor的代码以及进行获取感兴趣区域结果
- model.DataGenerator会在内部返回结果
# 4、anchor到rois感兴趣区域
from mrcnn import model
random_rois = 2000
# 获取4个数据测试看结果
g = model.DataGenerator(dataset_train, config,
shuffle=True,
random_rois=random_rois,
detection_targets=True)
# 针对数据集的GT计算得到rpn的预测框以及mrcnn的输出预测框
if random_rois:
[normalized_images, image_meta, rpn_match, rpn_bbox, gt_class_ids, gt_boxes, gt_masks, rpn_rois, rois], \
[mrcnn_class_ids, mrcnn_bbox, mrcnn_mask] = g.__getitem__(0)
# 打印rois以及mrcnn
log("rois", rois)
log("mrcnn_class_ids", mrcnn_class_ids)
log("mrcnn_bbox", mrcnn_bbox)
log("mrcnn_mask", mrcnn_mask)
# 打印GT结果
log("gt_class_ids", gt_class_ids)
log("gt_boxes", gt_boxes)
log("gt_masks", gt_masks)
log("rpn_match", rpn_match, )
log("rpn_bbox", rpn_bbox)
image_id = image_meta[0][0]
print("image_id: ", image_id)
打印输出结果
# 1、RPN的anchor过滤之后传入maskrcnn阶段感兴趣区域,这里2指的样本数默认最小返回数量
rois shape: (2, 200, 4) min: 0.00000 max: 1021.00000 int32
mrcnn_class_ids shape: (2, 200, 1) min: 0.00000 max: 1.00000 int32
mrcnn_bbox shape: (2, 200, 2, 4) min: -3.46591 max: 2.96960 float32
mrcnn_mask shape: (2, 200, 28, 28, 2) min: 0.00000 max: 1.00000 float32
# 2、msrcnn最终会100个框
gt_class_ids shape: (2, 100) min: 0.00000 max: 1.00000 int32
gt_boxes shape: (2, 100, 4) min: 0.00000 max: 985.00000 int32
gt_masks shape: (2, 56, 56, 100) min: 0.00000 max: 1.00000 bool
# 4、rpn的anchor标记结果
rpn_match shape: (2, 261888, 1) min: -1.00000 max: 1.00000 int32
# 每张图RPN 使用256个样本,
rpn_bbox shape: (2, 256, 4) min: -1.95943 max: 1.38107 float64
# 此图片ID
image_id: 17.0
(2)然后对于标记之后的结果,做正负样本数量统计,并且对于正样本的数据做微调之后结果打印在图片中。负样本同时也打印在图片中显示出来。
# 5、对于其中一张图片进行anchor的坐标转换显示
# 获取正负样本匹配结果
b = 0
positive_anchor_ids = np.where(rpn_match[b] == 1)[0]
print("Positive anchors: {}".format(len(positive_anchor_ids)))
negative_anchor_ids = np.where(rpn_match[b] == -1)[0]
print("Negative anchors: {}".format(len(negative_anchor_ids)))
neutral_anchor_ids = np.where(rpn_match[b] == 0)[0]
print("Neutral anchors: {}".format(len(neutral_anchor_ids)))
# 对于标记为正样本anchor进行位置refine计算
indices = np.where(rpn_match[b] == 1)[0]
refined_anchors = utils.apply_box_deltas(anchors[indices], rpn_bbox[b, :len(indices)] * config.RPN_BBOX_STD_DEV)
log("anchors", anchors)
log("refined_anchors", refined_anchors)
# 获取其中默认第一张图片的数据,打印正样本标记结果和负样本标记结果
sample_image = model.unmold_image(normalized_images[b], config)
# ROI的类别数量
for c, n in zip(dataset_train.class_names, np.bincount(mrcnn_class_ids[b].flatten())):
if n:
print("{:23}: {}".format(c[:20], n))
# 展示正样本输出结果
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1, figsize=(16, 16))
visualize.draw_boxes(sample_image, boxes=anchors[positive_anchor_ids],
refined_boxes=refined_anchors, ax=ax)
输出结果:
# RPN标记正样本数量 10+246=256
Positive anchors: 10
# RPN标记负样本数量
Negative anchors: 246
# RPN标记无效框数量
Neutral anchors: 261632
# anchors总数量
anchors shape: (261888, 4) min: -362.03867 max: 1322.03867 float64
# 正样本进行refined之后的anchor数量
refined_anchors shape: (10, 4) min: 1.00000 max: 826.00000 float32
# 其中这200个bbox框的类别结果
# 背景数量
BG : 176
# 气球数量
balloon : 24
效果,这是我们对于训练数据中一张图片之后的筛选的结果

- 显示负样本标记的246个结果
# 展示负样本输出
visualize.draw_boxes(sample_image, boxes=anchors[negative_anchor_ids])
这里换了一张图(所以负样本数量不一定是上面的246)

其中没有标记的anchor是不会参与网络训练的
(3)Rois:msrcnn的感兴趣区域标记显示
print("Positive ROIs: ", mrcnn_class_ids[b][mrcnn_class_ids[b] > 0].shape[0])
print("Negative ROIs: ", mrcnn_class_ids[b][mrcnn_class_ids[b] == 0].shape[0])
print("Positive Ratio: {:.2f}".format(
mrcnn_class_ids[b][mrcnn_class_ids[b] > 0].shape[0] / mrcnn_class_ids[b].shape[0]))
结果为
Positive ROIs: 27
Negative ROIs: 173
Positive Ratio: 0.14
6.9.1.6 案例:模型加载训练过程代码编写
创建模型判断参数如果是训练,调用训练模型
# 3、创建模型
if args.command == "train":
model = maskrcnn.MaskRCNN(mode="training", config=config,
model_dir=args.logs)
else:
model = maskrcnn.MaskRCNN(mode="inference", config=config,
model_dir=args.logs)
# 4、训练测试逻辑实现
if args.command == "train":
# 选择加载的预训练模型类别并下载
if args.weights.lower() == "imagenet":
weights_path = model.get_imagenet_weights()
else:
raise ValueError("提供一种预训练模型种类")
# 加载预训练模型权重
print("Loading weights ", weights_path)
model.load_weights(weights_path, by_name=True)
# 进行训练
train(model)
(1)加载预训练模型,模型中提供了多种预训练读取使用方法,在这里我们使用imagenet的模型读取
- model.get_imagenet_weights():msrcnn模型中封装的此方法会指定模型下载
- 然后通过model.load_weights加载模型权重(此方法也是msrcnn模型本身封装的函数)
当指定好预训练模型的时候,会从一些官方释放的预训练模型路径中下载,下载到本地:/root/.keras/models/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5路径中,所以确保家目录下的.keras有足够的空间存储模型。
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
94658560/94653016 [==============================] - 472s 5us/step
Loading weights /root/.keras/models/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
注:或者指定COCO预训练路径
weights_path = COCO_WEIGHTS_PATH
通过msrcnn中的utils中下面方法获取权重加载到模型中
utils.download_trained_weights(weights_path)
(2)其中train函数中是数据读取和模型训练代码
def train(model):
"""训练模型逻辑
:param model: maskrcnn模型
:return:
"""
# 1、获取分割数据集
dataset_train = BalloonDataset()
dataset_train.load_balloon(args.dataset, "train")
dataset_train.prepare()
# 2、获取分割验证数据集
dataset_val = BalloonDataset()
dataset_val.load_balloon(args.dataset, "val")
dataset_val.prepare()
# 3、开始训练
print("开始训练网络:")
model.train(dataset_train, dataset_val,
learning_rate=config.LEARNING_RATE,
epochs=20,
layers='heads')
模型训练保存到./logs/中模型文件,这里提供了训练好的版本mask_rcnn_balloon.h5。方便进行测试使用。
6.9.1.7 案例:maskrcnn网络结构流程源码分析
其中源码中导入会有些库进行简写
import tensorflow.keras as keras
import tensorflow.keras.backend as K
import tensorflow.keras.layers as KL
import tensorflow.keras.layers as KE
import tensorflow.keras.utils as KU
import tensorflow.keras.models as KM
1、构造输出
input_image = KL.Input(
shape=[None, None, config.IMAGE_SHAPE[2]], name="input_image")
input_image_meta = KL.Input(shape=[config.IMAGE_META_SIZE],
name="input_image_meta")
2、如果训练的话,构造RPN层的anchor输入样本及其位置、构造masrcnn的GT输入,并对坐标进行normalize,如果使用了USE_MINI_MASK=True,那么input_gt_masks就必须是配置文件中的[56, 56]大小
# RPN GT
input_rpn_match = KL.Input(
shape=[None, 1], name="input_rpn_match", dtype=tf.int32)
input_rpn_bbox = KL.Input(
shape=[None, 4], name="input_rpn_bbox", dtype=tf.float32)
# Detection GT (class IDs, bounding boxes, and masks)
# 1. GT Class IDs (zero padded)
input_gt_class_ids = KL.Input(
shape=[None], name="input_gt_class_ids", dtype=tf.int32)
# 2. GT Boxes in pixels (zero padded)
# [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)] in image coordinates
input_gt_boxes = KL.Input(
shape=[None, 4], name="input_gt_boxes", dtype=tf.float32)
# Normalize coordinates
gt_boxes = KL.Lambda(lambda x: norm_boxes_graph(
x, K.shape(input_image)[1:3]))(input_gt_boxes)
# 3. GT Masks (zero padded)
# [batch, height, width, MAX_GT_INSTANCES]
if config.USE_MINI_MASK:
input_gt_masks = KL.Input(
shape=[config.MINI_MASK_SHAPE[0],
config.MINI_MASK_SHAPE[1], None],
name="input_gt_masks", dtype=bool)
else:
input_gt_masks = KL.Input(
shape=[config.IMAGE_SHAPE[0], config.IMAGE_SHAPE[1], None],
name="input_gt_masks", dtype=bool)
3、如果测试,直接构造anchors的输入
# Anchors in normalized coordinates
input_anchors = KL.Input(shape=[None, 4], name="input_anchors")
4、构造前面Resnet网络输入,输出
_, C2, C3, C4, C5 = resnet_graph(input_image, config.BACKBONE,
stage5=True, train_bn=config.TRAIN_BN)
5、Reset多级特征输出,经过FPN得到5层特征输出P2、P3、P4、P5、P6
# Top-down Layers
# TODO: add assert to varify feature map sizes match what's in config
P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c5p5')(C5)
P4 = KL.Add(name="fpn_p4add")([
KL.UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5),
KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c4p4')(C4)])
P3 = KL.Add(name="fpn_p3add")([
KL.UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4),
KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c3p3')(C3)])
P2 = KL.Add(name="fpn_p2add")([
KL.UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3),
KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c2p2')(C2)])
# Attach 3x3 conv to all P layers to get the final feature maps.
P2 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p2")(P2)
P3 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p3")(P3)
P4 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p4")(P4)
P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p5")(P5)
# P6 is used for the 5th anchor scale in RPN. Generated by
# subsampling from P5 with stride of 2.
P6 = KL.MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5)
6、构造RPN的输入以及maskrcnn的输入
# Note that P6 is used in RPN, but not in the classifier heads.
rpn_feature_maps = [P2, P3, P4, P5, P6]
mrcnn_feature_maps = [P2, P3, P4, P5]
7、如果是训练,就获取每一层若干anchors合并(就是前面演示的结果),做形状改变。测试直接获取input_anchors
# Anchors
if mode == "training":
anchors = self.get_anchors(config.IMAGE_SHAPE)
# Duplicate across the batch dimension because Keras requires it
# TODO: can this be optimized to avoid duplicating the anchors?
anchors = np.broadcast_to(anchors, (config.BATCH_SIZE,) + anchors.shape)
# A hack to get around Keras's bad support for constants
anchors = KL.Lambda(lambda x: tf.Variable(anchors), name="anchors")(input_image)
else:
anchors = input_anchors
8、构造RPN网络,对每一个特征图,都做输入得到输出结果,最终得到RPN网络的输出概率、类别以及网络预测bbox框
# RPN Model
rpn = build_rpn_model(config.RPN_ANCHOR_STRIDE,
len(config.RPN_ANCHOR_RATIOS), config.TOP_DOWN_PYRAMID_SIZE)
config.TOP_DOWN_PYRAMID_SIZE)
# Loop through pyramid layers
layer_outputs = [] # list of lists
for p in rpn_feature_maps:
layer_outputs.append(rpn([p]))
output_names = ["rpn_class_logits", "rpn_class", "rpn_bbox"]
outputs = list(zip(*layer_outputs))
outputs = [KL.Concatenate(axis=1, name=n)(list(o))
for o, n in zip(outputs, output_names)]
rpn_class_logits, rpn_class, rpn_bbox = outputs
9、产生proposals建议框,根据anchors和网络预测输出,产生配置指定过滤到2000个
- ROIs kept after non-maximum suppression (training and inference)
- POST_NMS_ROIS_TRAINING = 2000
- POST_NMS_ROIS_INFERENCE = 1000
proposal_count = config.POST_NMS_ROIS_TRAINING if mode == "training"\
else config.POST_NMS_ROIS_INFERENCE
rpn_rois = ProposalLayer(
proposal_count=proposal_count,
nms_threshold=config.RPN_NMS_THRESHOLD,
name="ROI",
config=config)([rpn_class, rpn_bbox, anchors])
10、如果是训练过程
- DetectionTargetLayer(config, name="proposal_targets")
- 对于建议框,以及输入的GT结果,产生用于目标区域框、类别、位置、mask
- 设置成config.MASK_SHAPE=[28, 28]
if mode == "training":
....
# Generate detection targets
# Subsamples proposals and generates target outputs for training
# Note that proposal class IDs, gt_boxes, and gt_masks are zero
# padded. Equally, returned rois and targets are zero padded.
rois, target_class_ids, target_bbox, target_mask =\
DetectionTargetLayer(config, name="proposal_targets")([
target_rois, input_gt_class_ids, gt_boxes, input_gt_masks])
11、Network Heads(第二阶段的分类、回归、mask)
- 第二阶段maskrcnn的fpn_classifier_graph函数输入感兴趣区域进行计算分类回归输出
- Builds the computation graph of the feature pyramid network classifier and regressor heads.
- 第二阶段maskrcnn的mask分支,得到mrcnn_mask输出结果
- MASK_POOL_SIZE=[14, 14]
# TODO: verify that this handles zero padded ROIs
#Returns:
# logits: [batch, num_rois, NUM_CLASSES] classifier logits (before softmax)
# probs: [batch, num_rois, NUM_CLASSES] classifier probabilities
# bbox_deltas: [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))] Deltas to apply to
# proposal boxes
mrcnn_class_logits, mrcnn_class, mrcnn_bbox =\
fpn_classifier_graph(rois, mrcnn_feature_maps, input_image_meta,
config.POOL_SIZE, config.NUM_CLASSES,
train_bn=config.TRAIN_BN,
fc_layers_size=config.FPN_CLASSIF_FC_LAYERS_SIZE)
# Builds the computation graph of the mask head of Feature Pyramid Network.
# Returns: Masks [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, NUM_CLASSES]
mrcnn_mask = build_fpn_mask_graph(rois, mrcnn_feature_maps,
input_image_meta,
config.MASK_POOL_SIZE,
config.NUM_CLASSES,
train_bn=config.TRAIN_BN)
12、损失计算以及模型最终构建
# TODO: clean up (use tf.identify if necessary)
output_rois = KL.Lambda(lambda x: x * 1, name="output_rois")(rois)
# Losses
rpn_class_loss = KL.Lambda(lambda x: rpn_class_loss_graph(*x), name="rpn_class_loss")(
[input_rpn_match, rpn_class_logits])
rpn_bbox_loss = KL.Lambda(lambda x: rpn_bbox_loss_graph(config, *x), name="rpn_bbox_loss")(
[input_rpn_bbox, input_rpn_match, rpn_bbox])
class_loss = KL.Lambda(lambda x: mrcnn_class_loss_graph(*x), name="mrcnn_class_loss")(
[target_class_ids, mrcnn_class_logits, active_class_ids])
bbox_loss = KL.Lambda(lambda x: mrcnn_bbox_loss_graph(*x), name="mrcnn_bbox_loss")(
[target_bbox, target_class_ids, mrcnn_bbox])
mask_loss = KL.Lambda(lambda x: mrcnn_mask_loss_graph(*x), name="mrcnn_mask_loss")(
[target_mask, target_class_ids, mrcnn_mask])
# Model
inputs = [input_image, input_image_meta,
input_rpn_match, input_rpn_bbox, input_gt_class_ids, input_gt_boxes, input_gt_masks]
if not config.USE_RPN_ROIS:
inputs.append(input_rois)
outputs = [rpn_class_logits, rpn_class, rpn_bbox,
mrcnn_class_logits, mrcnn_class, mrcnn_bbox, mrcnn_mask,
rpn_rois, output_rois,
rpn_class_loss, rpn_bbox_loss, class_loss, bbox_loss, mask_loss]
model = KM.Model(inputs, outputs, name='mask_rcnn')
6.9.2 模型预测流程
这里重点在于对图片的读取预测和显示。另外也提供了opencv读取视频分割的结果(了解流程即可)
完成过程代码如下
from utils.draw_segmention_utils import detect_and_draw_segmentation
elif args.command == "test":
model.load_weights(args.model, by_name=True)
# 进行检测
detect_and_draw_segmentation(model,
image_path=args.image,
video_path=args.video)
else:
print("'{}' 传入参数无法识别. "
"请使用 'train' or 'test'".format(args.command))
其中detect_and_draw_segmentation()方法中提供了对图片或者视频的预测标记显示过程。
我们在根目录中创建的utils目录中添加一个draw_segmention_utils.py的文件,用于预测流程中的绘制图片与视频的工具函数
- 主函数逻辑:
- 判断参数传入是否是图片还是视频,分别处理(使用skimage模块读取处理)
- 1、图片读取、检测结果、分割区域绘制、保存输出
- 2、视频的读取、读取每一帧绘制每一帧结果、返回数据到指定视频目录(了解过程)
import numpy as np
import skimage
def detect_and_draw_segmentation(args, model):
"""
检测结果并画出分割区域
:param args: 命令行参数
:param model: 模型
:return:
"""
if not args.image or not args.video:
raise ValueError("请提供要检测的图片或者视频路径之一")
# 传入的图片
if args.image:
print("正在分割图片:{}".format(args.image))
# 1、读取图片
image = skimage.io.imread(args.image)
# 2、模型检测返回结果
r = model.detect([image], verbose=1)[0]
# 3、画出分割区域
segmentation = draw_segmentation(image, r['masks'])
# 4、保存输出
file_name = "./images/segment_{}".format(args.image.split("/")[-1])
skimage.io.imsave(file_name, segmentation)
if args.video:
import cv2
# 1、获取视频的读取
vcapture = cv2.VideoCapture(args.video)
width = int(vcapture.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(vcapture.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = vcapture.get(cv2.CAP_PROP_FPS)
# 2、定义video writer后续写入
file_name = "./images/segmentation_{}".format(args.video.split("/")[-1])
vwriter = cv2.VideoWriter(file_name,
cv2.VideoWriter_fourcc(*'mp4v'),
fps, (width, height))
# 3、循环获取每帧数据进行处理,完成之后写入本地文件
count = 0
success = True
while success:
print("帧数: ", count)
# 读取图片
success, image = vcapture.read()
if success:
# OpenCV 返回的BGR格式转换成RGB
image = image[..., ::-1]
# 模型检测mask
r = model.detect([image], verbose=0)[0]
# 画出区域
segmentation = draw_segmentation(image, r['masks'])
# RGB -> BGR
segmentation = segmentation[..., ::-1]
# 添加这张图到video writer
vwriter.write(segmentation)
count += 1
vwriter.release()
print("保存到检测结果到路径文件:", file_name)
其中涉及到对每一张图片的分割区域以及图片绘制
def draw_segmentation(image, mask):
"""
对图片进行分割区域的标记
:param image: 输出图片 RGB image [height, width, 3]
:param mask: 分割区域[height, width, instance count]
:return: 返回黑白图片,并且将分割区域保留原来的颜色
"""
# 1、将彩色图片变成灰度图,并保留image以及同份灰色的图片
# 这里经过两次转变目的,gray必须有三个通道才能与后面np.where(mask, image, gray)进行设置得到segmentation
gray = skimage.color.gray2rgb(skimage.color.rgb2gray(image)) * 255
# 2、将彩色格式中mask部分保留其余部分都设置成gray
if mask.shape[-1] > 0:
# 如果多个物体,要将预测结果的多个物体的mask相加,得到一张mask
mask = (np.sum(mask, -1, keepdims=True) >= 1)
# 讲Mask中为1的设置成图片原色,0的设置成gray对应的
segmentation = np.where(mask, image, gray).astype(np.uint8)
else:
segmentation = gray.astype(np.uint8)
return segmentation
使用函数解释
from skimage import io, data, color
# 一张彩色图片转换为灰度图后,它的类型就由unit8变成了float
img_gray = color.rgb2gray(img)
# 再次转换会有损失的
image2 = color.gray2rgb(img_gray)
# API
skimage.color.gray2rgb(image, alpha=None)[source]
Create an RGB representation of a gray-level image.
Parameters
Input
image of shape (M[, N][, P]).
Returns
rgbndarray
RGB image of shape (M[, N][, P], 3).
- 图像数据类型
在skimage中,一张图片以numpy数组形式存储,数组的数据类型有很多中,相互之间可以转换,数据类型以及取值范围如下表所示
数据类型 数值范围
uint8 0 to 255
uint16 0 to 65535
float16 半精度浮点数:16位,正负号1位,指数5位,精度10位
float32 单精度浮点数:32位,正负号1位,指数8位,精度23位
float64 双精度浮点数:64位,正负号1位,指数11位,精度52位
我们读取提供的测试文件如项目image目录下两个文件
parser.add_argument('--image', type=str, default='./images/2917282960_06beee649a_b.jpg',
help='需要进行检测分割的图片目录')
parser.add_argument('--video', type=str, default='./images/v0200fd10000bq043q9pskdh7ri20vm0.MP4',
help='需要进行检测分割的视频目录')
最终测试结果如项目之前所示(当然如果有需要可以使用前面介绍的绘制bbox工具,将预测框也绘制出来):

6.9.3 小结
- maskrcnn中的源码编译训练流程
- maskrcnn中的anchor设置以及计算
- 模型的训练预测流程
- 完成maskrcnn指定气球分割数据集模型训练
- 完成maskrcnn指定图片或视频预测输出