Mask RCNN

Introduction

What is instance segmentation?

Instance segmentation require correct detection of all objects while also precisely segmenting each instance.

So, it therefore combines two task:

  • Object Detection
  • Semantic segmentation

key Point

  • Mask Branch
    Mask R-CNN extends Faster R-CNN by adding a branch for predicting segmentation mask on each Region of Interest(RoI), in parallel with the existing branch for classification and bounding box regression.
    The mask branch has a Mask RCNN - 图1-dimensional output for each RoI, which encodes Mask RCNN - 图2 binary masks of resolution Mask RCNN - 图3, one for each of the Mask RCNN - 图4 classes. Specifically, this model predict mask from each RoI using an FCN.

And the experiment found that the different branches have mutual promotion effect.

mask rcnn.png

  • RoIAlign
    For RoI pooling, quantization is performed on a continuous coordinate Mask RCNN - 图6 by computing Mask RCNN - 图7,
    where 16 is a feature map stride and Mask RCNN - 图8 is rounding; likewise, quantization is performed when
    dividing into bins. It has a large negative effect on predicting pixel-accurate masks.
    To avoiding any quantization of RoI boundaries or bins.The model use bilinear interpolation to
    compute the exact values of input features at four regularly sampled locations in each RoI bin, and
    aggregate the result (using max or average).

roi align.png

My thinking

  • Decoupling and Branch
    If a complicated problem can be split into several simple task, we may be able to create different branches to deal with different task. And this will help to decouple a complicated problem.
    Mask R-CNN decouple mask and class prediction: predict a binary mask for each class independently, without competition among classes. (Using sigmoid replace softmax)
  • Why predict mask using FCN?
    • Compared to fc, FCN has less parameters, and can reduce calculation.
    • FCN retains spatial information.
  • Replace quantization with linear interpolation.