Mask RCNN
Introduction
What is instance segmentation?
Instance segmentation require correct detection of all objects while also precisely segmenting each instance.
So, it therefore combines two task:
- Object Detection
- Semantic segmentation
key Point
- Mask Branch
Mask R-CNN extends Faster R-CNN by adding a branch for predicting segmentation mask on each Region of Interest(RoI), in parallel with the existing branch for classification and bounding box regression.
The mask branch has a-dimensional output for each RoI, which encodes
binary masks of resolution
, one for each of the
classes. Specifically, this model predict mask from each RoI using an FCN.
And the experiment found that the different branches have mutual promotion effect.

- RoIAlign
For RoI pooling, quantization is performed on a continuous coordinateby computing
,
where 16 is a feature map stride andis rounding; likewise, quantization is performed when
dividing into bins. It has a large negative effect on predicting pixel-accurate masks.
To avoiding any quantization of RoI boundaries or bins.The model use bilinear interpolation to
compute the exact values of input features at four regularly sampled locations in each RoI bin, and
aggregate the result (using max or average).

My thinking
- Decoupling and Branch
If a complicated problem can be split into several simple task, we may be able to create different branches to deal with different task. And this will help to decouple a complicated problem.
Mask R-CNN decouple mask and class prediction: predict a binary mask for each class independently, without competition among classes. (Using sigmoid replace softmax) - Why predict mask using FCN?
- Compared to fc, FCN has less parameters, and can reduce calculation.
- FCN retains spatial information.
- Replace quantization with linear interpolation.
