Reliable Weighted Optimal Transport for Unsupervised Domain Adaptation

Xu, Renjun, et al. “Reliable Weighted Optimal Transport for Unsupervised Domain Adaptation.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
Notes of Optimal Transport


learn transferrable models for the unlabeled target domain
Optimal transport 是一个强有力的metric去对齐source 和target domain的representations.

问题:大多数based on optimal transport的方法都忽略了intra-domain structure,都进行简单粗暴的 coarse pair-wise matching。 然而在cluster过程中,距离对应的class center越远的点更容易被 misclassified by the decision boundary learned from the source domain .
论文中:论文中提出了 Reliable Weighted Optimal Transport (RWOT) for unsupervised domain adaptation

  • Novel Shrinking Subspace Reliability (SSR)
  • Weighted optimal transport strategy

    SSR:SSR 挖掘 spatial prototypical information 和 intra-domain structure去动态的衡量不同domain 的 sample-level domain discrepancy
    Weighted optimal transport strategy:探索一种precise-pair-wise optimal transport procedure, reduces negative transfer brought by the samples near decision boundaries in the target domain.

Main Contributions :

  1. 评估 sample-level domain discrepancy 使用 spatial prototypical information 和 intra-domain structure dynamically.
  2. 提出了weighted optimal transport strategy,实现precise-pair-wise optimal transport,通过采样near decision boundaries去reduce negative transfer。

上图是RWOT整个框架,a -> b是经典的DA算法进行迁移,c->d是利用了centroid loss进行迁移。带颜色的是source domain,灰色的是target domain,红点是class center。

Reliable Weighted Optimal Transport


Source domain:Optimal Transport - 图2 表示Optimal Transport - 图3个labeled samples
Target domain : Optimal Transport - 图4 表示Optimal Transport - 图5个unlabeled samples
Source domain joint probability distributions: Optimal Transport - 图6
Target domain joint probability distributions: Optimal Transport - 图7

Shrinking Subspace Reliability

The spatial prototypical information:
Optimal Transport - 图9: Feature generator
Optimal Transport - 图10: Adaptive classifier
Optimal Transport - 图11: Optimal Transport - 图12-th source class center
convex combination of Optimal Transport - 图14 PSD kernels
positive semi-definite(PSD) kernel(半正定):
Optimal Transport - 图16
半正定矩阵就是 对称矩阵并且特征值都>=0

Sharpen probability annotation matrix M:
Likelihood of intra-domain information
Optimal Transport - 图18: The temperature hyper-parameter
SSR is defined by Q:

Optimal Transport - 图20:评估 spatial prototypical information的相似性,用kernel算他们的距离
Optimal Transport - 图21: intra-domain structure of target samples.
这两个指标都是评测likelihood of target sample Optimal Transport - 图22 having a label Optimal Transport - 图23
在训练早期,Optimal Transport - 图24是更可靠的相比Optimal Transport - 图25,在训练后期,则相反,所以用下面一个A-distance[9][10]去调节训练权重比例


Weighted Optimal Transport

Reduce the wrong pair-wise transport
Weighted Kantorovich problem[11]: 可以从另一个角度来度量两个分布P,Q的距离,即将两个分布之间的距离定义为从分布P运输到分布Q所需要付出的最小代价。

Optimal Transport Divergence的定义如下:

Optimal Transport - 图27
约束条件为 Optimal Transport - 图28Optimal Transport - 图29

为方便计算常用的 Optimal Transport - 图30 Optimal Transport - 图31 的平方来定义cost,即 Optimal Transport - 图32 。那么我们这是就可以得到2-Wasserstein Distance
Optimal Transport - 图33
更一般的情况,k-Wasserstein Distance则为:
Optimal Transport - 图34

论文里面Optimal Transport Divergence的定义如下:

Optimal Transport minimizes a global transportation effort or cost:
