论文:链接
代码:https://github.com/shenyunhang/NA-fWebSOD/
关键词: “sole image-level labels on the web” ,“background noise” , “foreground noise”
为了降低目标检测的成本,本文使用了web image去训练目标检测,因为web image只有image-level的标注,并且带有大量的噪声,所以提出了residual learning structure和bagging-mixup learning去抑制这background和foreground这两种噪声。

Webly Supervised Object Detection (WebSOD)

Introduction

背景:Deep learning network现今取得重大成果,论文中使用web 图片和sole image-level labels去训练网络,但是因为image-level labels on the web 带有大量的噪声, leading to poor performance of the learned detectors.
解决办法:论文中将web image带有的噪声分成了两种:background noise和foreground noise,分布提出了两个措施抑制这两种噪声:
针对background noise:提出了一个residual learning structure,decomposes background noise and models clean data. 分解background noise,同时建模 clean data。同时提出了一个spatially-sensitive entropy criterion去estimate the confidence of background categories being noise。

针对foreground noise:提出了一个 bagging-mixup learning

image.png
关于query aeroplane的several web images:
background labels (BL) foreground labels (FL) background noise (BN) foreground noise (FN)
在这个图里面,Background noise指得是person and areoplane都存在图2中,但是person没有被标注,因此是background noise,Foreground noise指得是在image中没有foreground labels ,例如图三中几乎没有areoplane。

Methodology

Contributions:

  • 提出了一个residual learning structure,减少noisy label的负面影响通过的分解noise和建模clean data。
  • 一个spatially-sensitive entropy 和 a bagging- mixup learning,去评估background labels 的confidence和抑制influence of foreground noise。

**

Noise Decomposition

提出了一个residual feature learning分解background noise 和 建模clean data。
使用multi-task去学习两个head:

  1. Weak detection head

input:pooled featuresimage.png
output:proposal features Noise-Aware Fully Webly Supervised Object Detection - 图3和 detection scores Noise-Aware Fully Webly Supervised Object Detection - 图4,
loss function:
image.png

  1. Residual detection head

将residual features Noise-Aware Fully Webly Supervised Object Detection - 图6和来自于WD的 proposal features Noise-Aware Fully Webly Supervised Object Detection - 图7进行相加得到noise feature :
image.png

Spatially-Sensitive Entropy Criterion

Bagging-Mixup Learning

提出一个novel bagging-mixup strategy for data augmentation 去处理 foreground noise的负面影响,
bagging mixup主要包含three steps:

  • 首先随机采样同一个class的多个web images Noise-Aware Fully Webly Supervised Object Detection - 图9

image.png

Dirichlet distribution

使用web image训练模型
[1] Santosh K. Divvala, Ali Farhadi, and Carlos Guestrin. Learning everything about anything: Webly-supervised visual concept learning. In CVPR, 2014
[2] Qingyi Tao, Hao Yang, and Jianfei Cai. Zero-Annotation Object Detection with Web Knowledge Transfer. In ECCV, 2018.
[3] Xinlei Chen. Webly Supervised Learning of Convolutional Networks. In ICCV, 2015
[4] Exploiting Web Im- ages for Weakly Supervised Object Detection. TMM,2018

使用image level的label训练模型
[5] Maxime Oquab, L´eon Bottou, Ivan Laptev, and Josef Sivic. Is object localization for free? - Weakly-supervised learning with convolutional neural networks. In CVPR, 2015.
[6] Hakan Bilen and Andrea Vedaldi. Weakly Supervised Deep Detection Networks. In CVPR, 2016
[7] Peng Tang, Xinggang Wang, Xiang Bai, and Wenyu Liu. Multiple Instance Detection Network with Online Instance Classifier Refinement. In CVPR, 2017
[8] Yunchao Wei, Zhiqiang Shen, Bowen Cheng, Honghui Shi, Jinjun Xiong, Jiashi Feng, and Thomas Huang. TS2C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection. In ECCV, 2018.