CosFace与AM-softmax - 《深度学习与人脸识别》

这两个作品分别出自2018年的CVPR和2018年的ICLR，属于撞车作品。

CosFace中的loss名为large margin loss，与AM-softmax基本相同。

CosFace与AM-softmax在归一化权值W与归一化特征x的基础上，采用固定尺度因子s=30，把乘性margin改成了加性margin，使训练难度大幅降低。

large margin当中的夹角，是权值W和特征x之间的夹角，并不是不同类别之间的夹角，loss函数也完全没有涉及不同类别特征向量之间的夹角约束。

Large margin的优化目标是让权值向量W和特征向量f之间的夹角更小。例如，假如一个类别有1000张图像，经CNN特征映射后得到1000个特征向量，而权值向量W是每个类别只有一个，large margin loss要求这1000个特征向量和这1个权值向量的夹角非常小，也就是说，优化让1000个特征向量都向权值向量W的方向靠拢。

Large margin在优化上的困难主要是由乘性margin引起的。

首先，乘性 margin 把 cos 函数的单调区间压小了，如果某个类别正好落到了目标区域之外，那么通过乘法是无法让其重新进入目标区域的，导致优化困难。

其次，乘性 margin 所造成的 margin 实际上是不均匀的，依赖于两个类别的权重W之间的夹角。如果这两个 class 本身挨得很近，那么他们的 margin 就小，这两个class也变得不可优化。

换成加性 margin 之后，large margin loss具备了单调下降的特性，上述提到的两个乘性 margin 的弊端自然就消失了。

CosFace的Large Margin Cosine Loss如下图所示：

Screenshot from 2020-03-09 18-36-201583800336829.png

# CosFace
class AddMarginProduct(nn.Module):
    """
    Implement of large margin cosine distance: :
    Args:
        in_features: size of each input sample
        out_features: size of each output sample
        s: norm of input feature
        m: margin
        cos(theta) - m
    """
    def __init__(self, in_features, out_features, s=30.0, m=0.40):
        super(AddMarginProduct, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.s = s
        self.m = m
        self.weight = Parameter(torch.FloatTensor(out_features, in_features))
        nn.init.xavier_uniform_(self.weight)
    def forward(self, input, label):
        # --------------------------- cos(theta) & phi(theta) ---------------------------
        cosine = F.linear(F.normalize(input), F.normalize(self.weight))
        phi = cosine - self.m
        # --------------------------- convert label to one-hot ---------------------------
        one_hot = torch.zeros(cosine.size(), device='cuda')
        # one_hot = one_hot.cuda() if cosine.is_cuda else one_hot
        one_hot.scatter_(1, label.view(-1, 1).long(), 1)
        # -------------torch.where(out_i = {x_i if condition_i else y_i) -------------
        output = (one_hot * phi) + ((1.0 - one_hot) * cosine)
        # you can use torch.where if your torch.__version__ is 0.4
        output *= self.s
        # print(output)
        return output
    def __repr__(self):
        return self.__class__.__name__ + '(' \
               + 'in_features=' + str(self.in_features) \
               + ', out_features=' + str(self.out_features) \
               + ', s=' + str(self.s) \
               + ', m=' + str(self.m) + ')'

CosFace:
AM-softmax: