本文的内容主要围绕目标定位经典工作Locating Objects Without Bounding Boxes展开,着重于介绍Hausdorff Distance相关的知识。

  • 论文:https://openaccess.thecvf.com/content_CVPR_2019/html/Ribera_Locating_Objects_Without_Bounding_Boxes_CVPR_2019_paper.html
  • 代码:https://github.com/javiribera/locating-objects-without-bboxes

    Hausdorff Distance

    Pasted image 20220714184814.png
    这是一种用于度量两个点集的距离的度量方式。其已经被广泛应用于多种任务中,包括字符识别、人脸识别、以及场景匹配等。
    Hausdorff Distance - 图2表示所有可能点的空间。对于2D图像这样的二维平面可能就是Hausdorff Distance - 图3。对于两个包含点数可能不同的点集Hausdorff Distance - 图4Hausdorff Distance - 图5,有针对其中的点的距离度量Hausdorff Distance - 图6#card=math&code=d%28x%2Cy%29&id=BgwiS),则具体计算形式为:
    Hausdorff Distance - 图7%20%3D%20%5Cmax%20%5C%7B%20%5Csup%7Bx%20%5Cin%20X%7D%20%5Cinf%7By%20%5Cin%20Y%7D%20d(x%2Cy)%2C%20%5Csup%7By%20%5Cin%20Y%7D%20%5Cinf%7Bx%20%5Cin%20X%7D%20d(x%2Cy)%20%5C%7D%0A#card=math&code=dH%28X%2CY%29%20%3D%20%5Cmax%20%5C%7B%20%5Csup%7Bx%20%5Cin%20X%7D%20%5Cinf%7By%20%5Cin%20Y%7D%20d%28x%2Cy%29%2C%20%5Csup%7By%20%5Cin%20Y%7D%20%5Cinf%7Bx%20%5Cin%20X%7D%20d%28x%2Cy%29%20%5C%7D%0A&id=FapRK)
    其中Hausdorff Distance - 图8为上确界操作,Hausdorff Distance - 图9为下确界操作,对于我们关注的图像中的计算所对应的有限点集而言,分别可以简单理解为最大值和最小值。且两点之间距离最大为图像的对角线长度。而这也可以认为是两个点集之间Hausdorff距离可能的上界:
    Hausdorff Distance - 图10%20%5Cle%20d
    %7Bmax%7D%20%3D%20%5Cmax%7Bx%20%5Cin%20%5COmega%2C%20y%20%5Cin%20%5COmega%7D%20d(x%2Cy)%0A#card=math&code=d_H%28X%2CY%29%20%5Cle%20d%7Bmax%7D%20%3D%20%5Cmax_%7Bx%20%5Cin%20%5COmega%2C%20y%20%5Cin%20%5COmega%7D%20d%28x%2Cy%29%0A&id=eF0v9)
    这个度量的计算过程可以简单归纳为如下几步:

  • Hausdorff Distance - 图11#card=math&code=%5Cinf_%7By%20%5Cin%20Y%7D%20d%28x%2Cy%29&id=d8tfa):对于每个Hausdorff Distance - 图12,寻找距离最近(下确界)的Hausdorff Distance - 图13所对应的距离。

  • Hausdorff Distance - 图14#card=math&code=%5Csup%7Bx%20%5Cin%20X%7D%20%5Cinf%7By%20%5Cin%20Y%7D%20d%28x%2Cy%29&id=v8txj):从所有的Hausdorff Distance - 图15所对应的距离下界中寻找最大值(上确界)。
  • Hausdorff Distance - 图16#card=math&code=%5Cinf_%7Bx%20%5Cin%20X%7D%20d%28x%2Cy%29&id=gHkGW):对于每个Hausdorff Distance - 图17,寻找距离最近(下确界)的Hausdorff Distance - 图18所对应的距离。
  • Hausdorff Distance - 图19#card=math&code=%5Csup%7By%20%5Cin%20Y%7D%20%5Cinf%7Bx%20%5Cin%20X%7D%20d%28x%2Cy%29&id=n5xGu):从所有的Hausdorff Distance - 图20所对应的距离下界中寻找最大值(上确界)。
  • Hausdorff Distance - 图21:对两部分计算结果选择最大值。

这一计算过程直观上可以简单理解为,如果一个点集中的每个点都非常接近于另一个点集中的一些点,那么两个点集就可以认为很接近。
对于Hausdorff Distance - 图22,Hausdorff距离满足:

  • Hausdorff Distance - 图23%20%5Cge%200#card=math&code=d_H%28X%2CY%29%20%5Cge%200&id=Eq80b)
  • Hausdorff Distance - 图24%3D0%20%5CLeftrightarrow%20X%3DY#card=math&code=d_H%28X%2CY%29%3D0%20%5CLeftrightarrow%20X%3DY&id=LVnV0)
  • Hausdorff Distance - 图25%3Dd_H(Y%2CX)#card=math&code=d_H%28X%2CY%29%3Dd_H%28Y%2CX%29&id=F9IYA)
  • Hausdorff Distance - 图26%20%5Cle%20d_H(X%2CZ)%2Bd_H(Z%2CY)#card=math&code=d_H%28X%2CY%29%20%5Cle%20d_H%28X%2CZ%29%2Bd_H%28Z%2CY%29&id=wpyLK)

由于涉及到最大距离的选择,所以Hausdorff距离对于异常点是很敏感的。

Average Hausdorff Distance

为了避免这一点,平均Hausdorff距离成为了更常用的选择:
Hausdorff Distance - 图27%20%2B%20%5Cfrac%7B1%7D%7B%7CY%7C%7D%5Csum%7By%20%5Cin%20Y%7D%20%5Cmin%7Bx%20%5Cin%20X%7D%20d(x%2Cy)%0A%0A#card=math&code=%0Ad%7BAH%7D%20%3D%20%5Cfrac%7B1%7D%7B%7CX%7C%7D%5Csum%7Bx%20%5Cin%20X%7D%20%5Cmin%7By%20%5Cin%20Y%7D%20d%28x%2Cy%29%20%2B%20%5Cfrac%7B1%7D%7B%7CY%7C%7D%5Csum%7By%20%5Cin%20Y%7D%20%5Cmin_%7Bx%20%5Cin%20X%7D%20d%28x%2Cy%29%0A%0A&id=CsnAm)
这里的两项分别对两个点集中点的数量计算了平均最短距离。
这一形式仍然满足前面四条属性中的前三条,但是不再满足第四条了。并且也因此,Hausdorff距离对于两个集合中的任意点都是可微分的。
Hausdorff Distance - 图28表示包含真值点坐标的集合,Hausdorff Distance - 图29作为模型的预测。理想情况下,可以会使用平均Hausdorff距离作为损失函数用于训练过程,但是其作为损失函数存在限制。由于FCN风格的模型通常使用预测图上的高的激活位置指示目标中心,通常并不会直接返回像素坐标。为了使得这样情况可以正常优化,必须保证损失函数对于模型输出时可微分的。而上面直接基于坐标的形式就不行了。

Weighted Hausdorff Distance

于是,在这篇论文中提出了一个新的改进版本的Hausdorff距离,即加权Hausdorff距离:
Hausdorff Distance - 图30%20%26%20%3D%20%5Cfrac%7B1%7D%7B%5Csum%7Bx%20%5Cin%20%5COmega%7Dp_x%20%2B%5Cepsilon%7D%20%5Csum%7Bx%20%5Cin%20%5COmega%7D%20px%20%5Cmin%7By%20%5Cin%20Y%7D%20d(x%2Cy)%20%2B%20%5Cfrac%7B1%7D%7B%7CY%7C%7D%20%5Csum%7By%20%5Cin%20%7BY%7D%7D%20%5Cunderset%7Bx%20%5Cin%20%5COmega%7D%7BM%7B%5Calpha%7D%7D%5Bpx%20d(x%2Cy)%20%2B%20(1-p_x)d%7Bmax%7D%5D%20%5C%5C%0A%0A%5Cunderset%7Ba%20%5Cin%20A%7D%7BM%7B%5Calpha%7D%7D%5Bf(a)%5D%20%26%20%3D%20(%5Cfrac%7B1%7D%7B%7CA%7C%20%5Csum%7Ba%20%5Cin%20A%7D%20f%5E%5Calpha(a)%7D)%5E%5Cfrac%7B1%7D%7B%5Calpha%7D%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0A%0Ad%7BWH%7D%28p%2CY%29%20%26%20%3D%20%5Cfrac%7B1%7D%7B%5Csum%7Bx%20%5Cin%20%5COmega%7Dpx%20%2B%5Cepsilon%7D%20%5Csum%7Bx%20%5Cin%20%5COmega%7D%20px%20%5Cmin%7By%20%5Cin%20Y%7D%20d%28x%2Cy%29%20%2B%20%5Cfrac%7B1%7D%7B%7CY%7C%7D%20%5Csum%7By%20%5Cin%20%7BY%7D%7D%20%5Cunderset%7Bx%20%5Cin%20%5COmega%7D%7BM%7B%5Calpha%7D%7D%5Bpx%20d%28x%2Cy%29%20%2B%20%281-p_x%29d%7Bmax%7D%5D%20%5C%5C%0A%0A%5Cunderset%7Ba%20%5Cin%20A%7D%7BM%7B%5Calpha%7D%7D%5Bf%28a%29%5D%20%26%20%3D%20%28%5Cfrac%7B1%7D%7B%7CA%7C%20%5Csum%7Ba%20%5Cin%20A%7D%20f%5E%5Calpha%28a%29%7D%29%5E%5Cfrac%7B1%7D%7B%5Calpha%7D%0A%5Cend%7Balign%7D%0A&id=d8aKs)
第一部分计算了每个Hausdorff Distance - 图31与最近的Hausdorff Distance - 图32的距离,使用x处的预测值对平均后的结果进行加权。这里可以看做是一个加权的平均距离。
而第二部分则把针对位置Hausdorff Distance - 图33处理的距离设定为了使用Hausdorff Distance - 图34加权的Hausdorff Distance - 图35#card=math&code=d%28x%2Cy%29&id=Grlcg)和图像中可能的最大距离Hausdorff Distance - 图36(即对角线长度)的组合。这个式子:

  • 在极端情形,即Hausdorff Distance - 图37时,此时对应的含义就成了图像对角线,因为此时距离可以理解为与任意的Hausdorff Distance - 图38都等于最大距离。而当Hausdorff Distance - 图39时,此时则仅考虑Hausdorff Distance - 图40#card=math&code=d%28x%2Cy%29&id=S2qt8),即实际的距离。
  • 由于Hausdorff Distance - 图41实际并不是二值状态,而是一个0~1之间的变化值,所以这可以表示一种连续集合与离散集合的距离形式。

这里特别的是Hausdorff Distance - 图42%5D#card=math&code=%5Cunderset%7Ba%20%5Cin%20A%7D%7BM_%7B%5Calpha%7D%7D%5Bf%28a%29%5D&id=KzJ8W)是广义平均,具体可见:

广义平均在特殊参数的设定下可以实现对于最大和最小函数的逼近。但是广义平均本身却是可微的。

代码解析

代码来自:https://github.com/javiribera/locating-objects-without-bboxes/blob/master/object-locator/losses.py
先定义一些基本计算函数,包括计算成对欧氏距离的cdist和计算广义平均的generaliz_mean

  1. def _assert_no_grad(variables):
  2. for var in variables:
  3. assert not var.requires_grad, \
  4. "nn criterions don't compute the gradient w.r.t. targets - please " \
  5. "mark these variables as volatile or not requiring gradients"
  6. def cdist(x, y):
  7. """
  8. Compute distance between each pair of the two collections of inputs.
  9. :param x: Nxd Tensor
  10. :param y: Mxd Tensor
  11. :res: NxM matrix where dist[i,j] is the norm between x[i,:] and y[j,:],
  12. i.e. dist[i,j] = ||x[i,:]-y[j,:]||
  13. """
  14. differences = x.unsqueeze(1) - y.unsqueeze(0)
  15. distances = torch.sum(differences**2, -1).sqrt()
  16. return distances
  17. def generaliz_mean(tensor, dim, p=-9, keepdim=False):
  18. """The generalized mean. It corresponds to the minimum when p = -inf.
  19. https://en.wikipedia.org/wiki/Generalized_mean
  20. :param tensor: Tensor of any dimension.
  21. :param dim: (int or tuple of ints) The dimension or dimensions to reduce.
  22. :param keepdim: (bool) Whether the output tensor has dim retained or not.
  23. :param p: (float<0).
  24. """
  25. assert p < 0
  26. res= torch.mean((tensor + 1e-6)**p, dim, keepdim=keepdim)**(1./p)
  27. return res

Average Hausdorff Distance

  1. def averaged_hausdorff_distance(set1, set2, max_ahd=np.inf):
  2. """
  3. Compute the Averaged Hausdorff Distance function
  4. between two unordered sets of points (the function is symmetric).
  5. Batches are not supported, so squeeze your inputs first!
  6. :param set1: Array/list where each row/element is an N-dimensional point.
  7. :param set2: Array/list where each row/element is an N-dimensional point.
  8. :param max_ahd: Maximum AHD possible to return if any set is empty. Default: inf.
  9. :return: The Averaged Hausdorff Distance between set1 and set2.
  10. """
  11. if len(set1) == 0 or len(set2) == 0:
  12. return max_ahd
  13. set1 = np.array(set1)
  14. set2 = np.array(set2)
  15. assert set1.ndim == 2, 'got %s' % set1.ndim
  16. assert set2.ndim == 2, 'got %s' % set2.ndim
  17. assert set1.shape[1] == set2.shape[1], \
  18. 'The points in both sets must have the same number of dimensions, got %s and %s.'\
  19. % (set2.shape[1], set2.shape[1])
  20. d2_matrix = pairwise_distances(set1, set2, metric='euclidean')
  21. res = np.average(np.min(d2_matrix, axis=0)) + \
  22. np.average(np.min(d2_matrix, axis=1))
  23. return res
  24. class AveragedHausdorffLoss(nn.Module):
  25. def __init__(self):
  26. super(nn.Module, self).__init__()
  27. def forward(self, set1, set2):
  28. """Compute the Averaged Hausdorff Distance function between two unordered sets of points (the function is symmetric).
  29. Batches are not supported, so squeeze your inputs first!
  30. :param set1: Tensor where each row is an N-dimensional point.
  31. :param set2: Tensor where each row is an N-dimensional point.
  32. :return: The Averaged Hausdorff Distance between set1 and set2.
  33. """
  34. assert set1.ndimension() == 2, 'got %s' % set1.ndimension()
  35. assert set2.ndimension() == 2, 'got %s' % set2.ndimension()
  36. assert set1.size()[1] == set2.size()[1], \
  37. 'The points in both sets must have the same number of dimensions, got %s and %s.'\
  38. % (set2.size()[1], set2.size()[1])
  39. d2_matrix = cdist(set1, set2)
  40. # Modified Chamfer Loss
  41. term_1 = torch.mean(torch.min(d2_matrix, 1)[0])
  42. term_2 = torch.mean(torch.min(d2_matrix, 0)[0])
  43. res = term_1 + term_2
  44. return res

平均形式的实现方式非常简单,关键在于计算点之间的成对距离矩阵,之后沿着两个轴(以不同的点集作为基准)计算取最小和均值操作。最终的加和即为最终的距离。
注意这里的实现里没有为batch形式提供支持。而且考虑到不同样本对应的点集中点数量的差异,导致无法直接将不同的数据堆叠到一起。

Werighted Hausdorff Distance

  1. class WeightedHausdorffDistance(nn.Module):
  2. def __init__(self,
  3. resized_height, resized_width,
  4. p=-9,
  5. return_2_terms=False,
  6. device=torch.device('cpu')):
  7. """
  8. :param resized_height: Number of rows in the image.
  9. :param resized_width: Number of columns in the image.
  10. :param p: Exponent in the generalized mean. -inf makes it the minimum.
  11. :param return_2_terms: Whether to return the 2 terms
  12. of the WHD instead of their sum.
  13. Default: False.
  14. :param device: Device where all Tensors will reside.
  15. """
  16. super(nn.Module, self).__init__()
  17. # Prepare all possible (row, col) locations in the image
  18. self.height, self.width = resized_height, resized_width
  19. self.resized_size = torch.tensor([resized_height,
  20. resized_width],
  21. dtype=torch.get_default_dtype(),
  22. device=device)
  23. self.max_dist = math.sqrt(resized_height**2 + resized_width**2)
  24. self.n_pixels = resized_height * resized_width
  25. self.all_img_locations = torch.from_numpy(cartesian([np.arange(resized_height),
  26. np.arange(resized_width)]))
  27. # Convert to appropiate type
  28. self.all_img_locations = self.all_img_locations.to(device=device,
  29. dtype=torch.get_default_dtype())
  30. self.return_2_terms = return_2_terms
  31. self.p = p
  32. def forward(self, prob_map, gt, orig_sizes):
  33. """
  34. Compute the Weighted Hausdorff Distance function
  35. between the estimated probability map and ground truth points.
  36. The output is the WHD averaged through all the batch.
  37. :param prob_map: (B x H x W) Tensor of the probability map of the estimation.
  38. B is batch size, H is height and W is width.
  39. Values must be between 0 and 1.
  40. :param gt: List of Tensors of the Ground Truth points.
  41. Must be of size B as in prob_map.
  42. Each element in the list must be a 2D Tensor,
  43. where each row is the (y, x), i.e, (row, col) of a GT point.
  44. :param orig_sizes: Bx2 Tensor containing the size of the original images.
  45. B is batch size.
  46. The size must be in (height, width) format.
  47. :return: Single-scalar Tensor with the Weighted Hausdorff Distance.
  48. If self.return_2_terms=True, then return a tuple containing
  49. the two terms of the Weighted Hausdorff Distance.
  50. """
  51. _assert_no_grad(gt)
  52. assert prob_map.dim() == 3, 'The probability map must be (B x H x W)'
  53. assert prob_map.size()[1:3] == (self.height, self.width), \
  54. 'You must configure the WeightedHausdorffDistance with the height and width of the ' \
  55. 'probability map that you are using, got a probability map of size %s'\
  56. % str(prob_map.size())
  57. batch_size = prob_map.shape[0]
  58. assert batch_size == len(gt)
  59. terms_1 = []
  60. terms_2 = []
  61. for b in range(batch_size):
  62. # One by one
  63. prob_map_b = prob_map[b, :, :]
  64. gt_b = gt[b]
  65. orig_size_b = orig_sizes[b, :]
  66. norm_factor = (orig_size_b/self.resized_size).unsqueeze(0)
  67. n_gt_pts = gt_b.size()[0]
  68. # Corner case: no GT points
  69. if gt_b.ndimension() == 1 and (gt_b < 0).all().item() == 0:
  70. terms_1.append(torch.tensor([0],
  71. dtype=torch.get_default_dtype()))
  72. terms_2.append(torch.tensor([self.max_dist],
  73. dtype=torch.get_default_dtype()))
  74. continue
  75. # Pairwise distances between all possible locations and the GTed locations
  76. n_gt_pts = gt_b.size()[0]
  77. normalized_x = norm_factor.repeat(self.n_pixels, 1) *\
  78. self.all_img_locations
  79. normalized_y = norm_factor.repeat(len(gt_b), 1)*gt_b
  80. d_matrix = cdist(normalized_x, normalized_y) # HWxN
  81. # Reshape probability map as a long column vector,
  82. # and prepare it for multiplication
  83. p = prob_map_b.view(prob_map_b.nelement())
  84. n_est_pts = p.sum()
  85. p_replicated = p.view(-1, 1).repeat(1, n_gt_pts)
  86. # Weighted Hausdorff Distance
  87. term_1 = (1 / (n_est_pts + 1e-6)) * \
  88. torch.sum(p * torch.min(d_matrix, 1)[0]) # HWxN -> HW -> 1
  89. weighted_d_matrix = (1 - p_replicated)*self.max_dist + p_replicated*d_matrix
  90. minn = generaliz_mean(weighted_d_matrix,
  91. p=self.p,
  92. dim=0, keepdim=False) # HWxN -> N
  93. term_2 = torch.mean(minn)
  94. terms_1.append(term_1)
  95. terms_2.append(term_2)
  96. terms_1 = torch.stack(terms_1)
  97. terms_2 = torch.stack(terms_2)
  98. if self.return_2_terms:
  99. res = terms_1.mean(), terms_2.mean()
  100. else:
  101. res = terms_1.mean() + terms_2.mean()
  102. return res

由于这里不同样本对应的点集不同,所以无法直接利用batch形式的计算,需要对每个样本单独计算,最后整体平均。这里同时考虑了实际模型输入输出中图像形状的变化,为了将距离关系对应于原图,所以这里利用放缩后的尺寸和原始尺寸之间计算了一个坐标放缩因子用于调整输出图中坐标和真值坐标。
之后针对两项分别进行计算,对于第一项,其计算方式和Average Hausdorff Distance一致,而第二项则根据公式进行了变换。这类时候首先计算了加权形式的距离矩阵,利用不同位置上的预测值对最大距离(预测图的对角线长度)和前面计算的距离矩阵进行加权求和。之后对加权距离矩阵计算广义平均,利用负指数,获得近似的最小值。最终对所有真值点对应的近似平均最小距离。

参考资料