retinaface 是一个鲁棒性较强的单阶段人脸检测器,比较突出的工作是加入了 extra-supervised 和 self-supervised;

大部分人脸检测重点关注人脸分类和人脸框定位这两部分,retinaface 加入了 face landmark 回归( five facial landmarks)以及 dense face regression(主要是 3d 相关);

加入的任务如下图所示:


RetinaFace论文阅读笔记,简要总结 - 图5

retinaface 结构特点

feature pyramid,采用特征金字塔提取多尺度特征, (to increase the receptive field and enhance the rigid context modelling power)

single-stage,单阶段,快捷高效,用 mobile-net 时在 arm 上可以实时

Context Modelling, (to increase the receptive field and enhance the rigid context modelling power)

Multi-task Learning ,额外监督信息

结构图
RetinaFace论文阅读笔记,简要总结 - 图6

损失函数: Multi-taskLoss


RetinaFace论文阅读笔记,简要总结 - 图7

第一部分是分类 Loss, 第二部分是人脸框回归 Loss, 第三部分是人脸关键点回归 loss,第四部分是 dense regression loss;

在实现的时候,还有些细节。

  1. 使用可行变卷积代替 lateral connections 和 context modules 中的 3*3 卷积 (further strengthens the non-rigid context modelling capacity);

2.anchor 的设置,fpn 每层输出对应不同的 anchor 尺寸。


RetinaFace论文阅读笔记,简要总结 - 图8

数据集部分做额外的标注信息

3.1 定义了五个等级的人脸质量,根据清晰度检测难度定义;

3.2 定义人脸关键点。


RetinaFace论文阅读笔记,简要总结 - 图9
RetinaFace论文阅读笔记,简要总结 - 图10

结果

在 WIDER FACE dataset 上,96.9% (Easy), 96.1% (Medium) and 91.8% (Hard) for validation set, and 96.3% (Easy), 95.6% (Medium) and 91.4% (Hard) for test set.

速度
RetinaFace论文阅读笔记,简要总结 - 图11

表格里面单位 ms;轻量级网轻松达到实时检测。

主网络结构代码:

引用的 fpn,ssh 等结构代码

  1. import time
  2. import torch
  3. import torch.nn as nn
  4. import torchvision.models._utils as _utils
  5. import torchvision.models as models
  6. import torch.nn.functional as F
  7. from torch.autograd import Variable
  8. def conv_bn(inp, oup, stride = 1, leaky = 0):
  9. return nn.Sequential(
  10. nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
  11. nn.BatchNorm2d(oup),
  12. nn.LeakyReLU(negative_slope=leaky, inplace=True)
  13. )
  14. def conv_bn_no_relu(inp, oup, stride):
  15. return nn.Sequential(
  16. nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
  17. nn.BatchNorm2d(oup),
  18. )
  19. def conv_bn1X1(inp, oup, stride, leaky=0):
  20. return nn.Sequential(
  21. nn.Conv2d(inp, oup, 1, stride, padding=0, bias=False),
  22. nn.BatchNorm2d(oup),
  23. nn.LeakyReLU(negative_slope=leaky, inplace=True)
  24. )
  25. def conv_dw(inp, oup, stride, leaky=0.1):
  26. return nn.Sequential(
  27. nn.Conv2d(inp, inp, 3, stride, 1, groups=inp, bias=False),
  28. nn.BatchNorm2d(inp),
  29. nn.LeakyReLU(negative_slope= leaky,inplace=True),
  30. nn.Conv2d(inp, oup, 1, 1, 0, bias=False),
  31. nn.BatchNorm2d(oup),
  32. nn.LeakyReLU(negative_slope= leaky,inplace=True),
  33. )
  34. class SSH(nn.Module):
  35. def __init__(self, in_channel, out_channel):
  36. super(SSH, self).__init__()
  37. assert out_channel % 4 == 0
  38. leaky = 0
  39. if (out_channel <= 64):
  40. leaky = 0.1
  41. self.conv3X3 = conv_bn_no_relu(in_channel, out_channel//2, stride=1)
  42. self.conv5X5_1 = conv_bn(in_channel, out_channel//4, stride=1, leaky = leaky)
  43. self.conv5X5_2 = conv_bn_no_relu(out_channel//4, out_channel//4, stride=1)
  44. self.conv7X7_2 = conv_bn(out_channel//4, out_channel//4, stride=1, leaky = leaky)
  45. self.conv7x7_3 = conv_bn_no_relu(out_channel//4, out_channel//4, stride=1)
  46. def forward(self, input):
  47. conv3X3 = self.conv3X3(input)
  48. conv5X5_1 = self.conv5X5_1(input)
  49. conv5X5 = self.conv5X5_2(conv5X5_1)
  50. conv7X7_2 = self.conv7X7_2(conv5X5_1)
  51. conv7X7 = self.conv7x7_3(conv7X7_2)
  52. out = torch.cat([conv3X3, conv5X5, conv7X7], dim=1)
  53. out = F.relu(out)
  54. return out
  55. class FPN(nn.Module):
  56. def __init__(self,in_channels_list,out_channels):
  57. super(FPN,self).__init__()
  58. leaky = 0
  59. if (out_channels <= 64):
  60. leaky = 0.1
  61. self.output1 = conv_bn1X1(in_channels_list[0], out_channels, stride = 1, leaky = leaky)
  62. self.output2 = conv_bn1X1(in_channels_list[1], out_channels, stride = 1, leaky = leaky)
  63. self.output3 = conv_bn1X1(in_channels_list[2], out_channels, stride = 1, leaky = leaky)
  64. self.merge1 = conv_bn(out_channels, out_channels, leaky = leaky)
  65. self.merge2 = conv_bn(out_channels, out_channels, leaky = leaky)
  66. def forward(self, input):
  67. # names = list(input.keys())
  68. input = list(input.values())
  69. output1 = self.output1(input[0])
  70. output2 = self.output2(input[1])
  71. output3 = self.output3(input[2])
  72. up3 = F.interpolate(output3, size=[output2.size(2), output2.size(3)], mode="nearest")
  73. output2 = output2 + up3
  74. output2 = self.merge2(output2)
  75. up2 = F.interpolate(output2, size=[output1.size(2), output1.size(3)], mode="nearest")
  76. output1 = output1 + up2
  77. output1 = self.merge1(output1)
  78. out = [output1, output2, output3]
  79. return out
  80. class MobileNetV1(nn.Module):
  81. def __init__(self):
  82. super(MobileNetV1, self).__init__()
  83. self.stage1 = nn.Sequential(
  84. conv_bn(3, 8, 2, leaky = 0.1), # 3
  85. conv_dw(8, 16, 1), # 7
  86. conv_dw(16, 32, 2), # 11
  87. conv_dw(32, 32, 1), # 19
  88. conv_dw(32, 64, 2), # 27
  89. conv_dw(64, 64, 1), # 43
  90. )
  91. self.stage2 = nn.Sequential(
  92. conv_dw(64, 128, 2), # 43 + 16 = 59
  93. conv_dw(128, 128, 1), # 59 + 32 = 91
  94. conv_dw(128, 128, 1), # 91 + 32 = 123
  95. conv_dw(128, 128, 1), # 123 + 32 = 155
  96. conv_dw(128, 128, 1), # 155 + 32 = 187
  97. conv_dw(128, 128, 1), # 187 + 32 = 219
  98. )
  99. self.stage3 = nn.Sequential(
  100. conv_dw(128, 256, 2), # 219 +3 2 = 241
  101. conv_dw(256, 256, 1), # 241 + 64 = 301
  102. )
  103. self.avg = nn.AdaptiveAvgPool2d((1,1))
  104. self.fc = nn.Linear(256, 1000)
  105. def forward(self, x):
  106. x = self.stage1(x)
  107. x = self.stage2(x)
  108. x = self.stage3(x)
  109. x = self.avg(x)
  110. # x = self.model(x)
  111. x = x.view(-1, 256)
  112. x = self.fc(x)
  113. return x

代码链接:pytorch 实现

https://github.com/biubug6/Pytorch_Retinaface