目标检测 - YOLOv3 - 《深度学习》

Method(s)

Method(s)

Anchor匹配

将目标先进行三种下采样，分别和目标落在的网格产生的 9个anchor分别计算iou，大于阈值0.3的记为正样本。如果9个iou全部小于0.3，那么和目标iou最大的记为正样本。对于正样本，我们在label上相对应的anchor位置上，赋上真实目标的值。

至于匹配anchor和gt boxes方式，举例说明，假设我们有一个box, 坐标是[203. 138.5 78. 79],映射到三个尺寸分别是（分别除以8,16,32） [25.375 17.3125 9.75 9.875 ]

[12.6875 8.65625 4.875 4.9375 ]

[ 6.34375 4.328125 2.4375 2.46875 ]

而每个尺度上都是要单独作匹配的，以中间尺度为例，[12.6875 8.65625 4.875 4.9375 ]对应的将是位置[12.5,8.5]位置的cell，而这尺度下3个anchor的宽高分别是[1.875 3.8125],[3.875 2.8125], [3.6875 7.4375],即该位置上有3个box

[12.5 8.5 1.875 3.8125]

[12.5 8.5 3.875 2.8125]

[12.5 8.5 3.6875 7.4375]

计算这3个anchor和box[12.6875 8.65625 4.875 4.9375 ]的iou，iou大于阈值(0.3),我们认为是匹配的。也就是说一个box可以和多个anchor匹配。如果有 box 在各个尺度feature map 都找不到满足的匹配 anchor，那就退而求其次，在所有feature map的 anchor里寻找一个最大匹配就好了。而SSD的匹配过程是：分两个阶段，阶段一为每个gt box找IOU值最大的gt box匹配，阶段二对于anchor与gt IOU大于0.5也进行匹配。

Class Prediction

Softmax imposes the assumption that each box hasexactly one class which is often not the case. A multilabelapproach better models the data.

Predictions Across Scales

3 different scales
With COCO [10] we predict 3 boxes at each scale so the tensor is YOLOv3 - 图2 for the 4 bounding box offsets, 1 objectness prediction, and 80 class predictions.
We just sort of chose 9 clusters and 3 scales arbitrarily and then divide up the clusters evenly across scales.