Recognizing objects at vastly different scales is a fundamental challenge in computer vision (识别多尺度的目标是计算机视觉中最具有挑战性的任务)
Feature pyramid built upon image pyramid (Featured image pyramid) is used in traditional hand-engineered computer vision feature to achieve scale-invariant. (特征金字塔受传统计算机视觉中尺度不变形的图像金字塔而提出)
- These pyramids are scale-invariant (object scales change is offset by shifting its level in the pyramid)
- 这些金字塔具有尺度不变性(物体的尺度变化将会随着金字塔移动而移动)
01 Modified Featured Image pyramid
Problem of implying naive featured image pyramid
Computation:Inference time increase considerably
Memory:Training deep networks end-to-end on an image pyramid is infeasible
The nature of Convolution
A deep ConvNet computes a features hierarchy layer by layer, and sith subsampling layers the feature hierarchy has an inherent multi-scale pyramidal shape.However, there are large semantic gaps between layers
Proposal: Feature Pyramid Network (FPN)
Naturally leverage the pyramidal shape of a ConvNet’s feature hierarchy while creating a feature pyramid that has strong semantics at all scales
New architecture: combining semantically strong features with semantically weak features via a top-down pathway and lateral connection
Predictions are made independently at all levels
Architecture - breakdown (YOLOv4)
Common object detector (通用目标检测器)
FPN
- Goal: Feature integration (combining strong-semantics and weak-semantics)
- FPN is a general purpose architecture: take a single-scale image as input, generating proportionally sized feature maps at multiple-levels in fully convolutional fashion.
- FPN is independent to the backbone
- Region Proposal Network
- Object detector
- Instance Segmentation
- FPN is constructed with
- Bottom-up pathway
- Top-down pathway
- Lateral connections
Implementation detail
Implementation detail on ResNet
1x1 的卷积用于调整通道数