title: Domain adaptation under structural causal models
url: Domain-adaptation-under-structural-causal-models
category:

  • 机器学习
    tags:
  • 域自适应
  • 理论分析
  • 论文笔记

本文为 Domain Adaptation 领域从结构化因果模型 (SCM) 的角度建立了一套理论框架,为分析与对比现有的各类DA方法提供理论上的支持。
本文认为基于域不变投影的方法 (DIP, Domain invariant projection) 在预测问题为反因果且源域与目标域的标记空间一致的情况下能够得到更低的错误率。然而,当预测问题为因果或混合因果与反因果,或者源与目标的标记空间存在不一致的情况下,性能不佳。基于此分析,本文提出了一种新的 DA 方法 CIRM 与其变体能够相对 DIP 类方法在面对混合因果与反因果问题或标记空间不一致的情形下得到更好的性能。

Domain Adaptation 问题设定

DA问题中,给定一个带标记的多源域中包含Domain Adaptation under Structural Causal Models - 图2个源域和一个无标记的目标域。其中,第Domain Adaptation under Structural Causal Models - 图3个源域中包含Domain Adaptation under Structural Causal Models - 图4个独立同分布地从Domain Adaptation under Structural Causal Models - 图5%7D#card=math&code=%5Cmathcal%7BP%7D%5E%7B%28m%29%7D&id=dvYeh)采样的样本集合Domain Adaptation under Structural Causal Models - 图6%7D%3D%5Cleft(%5Cleft(x%7B1%7D%5E%7B(m)%7D%2C%20y%7B1%7D%5E%7B(m)%7D%5Cright)%2C%20%5Ccdots%2C%5Cleft(x%7Bn%7Bm%7D%7D%5E%7B(m)%7D%2C%20y%7Bn%7Bm%7D%7D%5E%7B(m)%7D%5Cright)%5Cright)#card=math&code=S%5E%7B%28m%29%7D%3D%5Cleft%28%5Cleft%28x%7B1%7D%5E%7B%28m%29%7D%2C%20y%7B1%7D%5E%7B%28m%29%7D%5Cright%29%2C%20%5Ccdots%2C%5Cleft%28x%7Bn%7Bm%7D%7D%5E%7B%28m%29%7D%2C%20y%7Bn%7Bm%7D%7D%5E%7B%28m%29%7D%5Cright%29%5Cright%29&id=Jg5wL)。目标域中有Domain Adaptation under Structural Causal Models - 图7个独立同分布从Domain Adaptation under Structural Causal Models - 图8采样得到的样本集合Domain Adaptation under Structural Causal Models - 图9%2C%20%5Ccdots%2C%5Cleft(%5Ctilde%7Bx%7D%7B%5Ctilde%7Bn%7D%7D%2C%20%5Ctilde%7By%7D%7B%5Ctilde%7Bn%7D%7D%5Cright)%5Cright)#card=math&code=%5Ctilde%7BS%7D%3D%5Cleft%28%5Cleft%28%5Ctilde%7Bx%7D%7B1%7D%2C%20%5Ctilde%7By%7D%7B1%7D%5Cright%29%2C%20%5Ccdots%2C%5Cleft%28%5Ctilde%7Bx%7D%7B%5Ctilde%7Bn%7D%7D%2C%20%5Ctilde%7By%7D%7B%5Ctilde%7Bn%7D%7D%5Cright%29%5Cright%29&id=XVd8h),然而,算法仅能观测到协变量即从边缘分布Domain Adaptation under Structural Causal Models - 图10中采样得到的Domain Adaptation under Structural Causal Models - 图11#card=math&code=%5Ctilde%7BS%7D%7BX%7D%3D%5Cleft%28%5Ctilde%7Bx%7D%7B1%7D%2C%20%5Ccdots%2C%20%5Ctilde%7Bx%7D_%7B%5Ctilde%7Bn%7D%7D%5Cright%29&id=O3ZA8)。

DA算法需要估计一个函数Domain Adaptation under Structural Causal Models - 图12建立起从协变量到标记之间的映射,其中,函数参数Domain Adaptation under Structural Causal Models - 图13满足目标群体风险小。定义一个分类器Domain Adaptation under Structural Causal Models - 图14的目标群体风险为:

Domain Adaptation under Structural Causal Models - 图15%3D%5Cmathbb%7BE%7D%7B(X%2C%20Y)%20%5Csim%20%5Ctilde%7B%5Cmathcal%7BP%7D%7D%7D%5Bl(f(X)%2C%20Y)%5D%0A#card=math&code=%5Ctilde%7BR%7D%28f%29%3D%5Cmathbb%7BE%7D%7B%28X%2C%20Y%29%20%5Csim%20%5Ctilde%7B%5Cmathcal%7BP%7D%7D%7D%5Bl%28f%28X%29%2C%20Y%29%5D%0A&id=DeJMu)

其中,风险函数Domain Adaptation under Structural Causal Models - 图16默认为平方误差函数Domain Adaptation under Structural Causal Models - 图17
易知,此风险的理论下界受参数空间Domain Adaptation under Structural Causal Models - 图18的限制,为Domain Adaptation under Structural Causal Models - 图19#card=math&code=%5Ctilde%7BR%7D%28f%7B%5Cbeta%7B%5Ctext%7Boracle%7D%7D%7D%29&id=sbjzj),其中,Domain Adaptation under Structural Causal Models - 图20%20%5Csim%20%5Cwidetilde%7B%5Cmathcal%7BP%7D%7D%7D%5Cleft%5Bl%5Cleft(f%7B%5Cbeta%7D(X)%2C%20Y%5Cright)%5Cright%5D#card=math&code=%5Cbeta%7B%5Ctext%20%7Boracle%20%7D%7D%20%5Cin%20%5Cunderset%7B%5Cbeta%20%5Cin%20%5CTheta%7D%7B%5Carg%20%5Cmin%20%7D%20%5Cmathbb%7BE%7D%7B%28X%2C%20Y%29%20%5Csim%20%5Cwidetilde%7B%5Cmathcal%7BP%7D%7D%7D%5Cleft%5Bl%5Cleft%28f%7B%5Cbeta%7D%28X%29%2C%20Y%5Cright%29%5Cright%5D&id=yOChX)。

针对数据分布的变化,可以分类为以下几种:

  • Covariate Shift: 即协变量Domain Adaptation under Structural Causal Models - 图21的边缘分布Domain Adaptation under Structural Causal Models - 图22#card=math&code=P%28X%29&id=yCiz2)发生改变,而标记函数Domain Adaptation under Structural Causal Models - 图23#card=math&code=P%28Y%7CX%29&id=kLRBt)不变化。
  • Target Shift (Label Shift): 标记Domain Adaptation under Structural Causal Models - 图24的边缘分布Domain Adaptation under Structural Causal Models - 图25#card=math&code=P%28Y%29&id=Bbjlc)发生改变,而条件分布Domain Adaptation under Structural Causal Models - 图26#card=math&code=P%28X%7CY%29&id=zrvEz)不变化。
  • Conditional Shift: 标记Domain Adaptation under Structural Causal Models - 图27的边缘分布Domain Adaptation under Structural Causal Models - 图28#card=math&code=P%28Y%29&id=kwhPH)不变,而条件分布Domain Adaptation under Structural Causal Models - 图29#card=math&code=P%28X%7CY%29&id=NnP94)发生变化。
  • Generalized Target Shift: 标记Domain Adaptation under Structural Causal Models - 图30的边缘分布Domain Adaptation under Structural Causal Models - 图31#card=math&code=P%28Y%29&id=pZNO1)发生变化,同时,条件分布Domain Adaptation under Structural Causal Models - 图32#card=math&code=P%28X%7CY%29&id=fxY6X)也发生满足某些条件的变化。值得注意的是,在这里如果Domain Adaptation under Structural Causal Models - 图33#card=math&code=P%28X%7CY%29&id=h9MYl)变化任意,那么这个问题将变得不可学习。
  • Model Shift: 标记函数Domain Adaptation under Structural Causal Models - 图34#card=math&code=P%28Y%7CX%29&id=EYAyk)发生变化,同时,边缘分布Domain Adaptation under Structural Causal Models - 图35#card=math&code=P%28X%29&id=iPhFz)发生改变。
  • Concept Shift: 标记函数Domain Adaptation under Structural Causal Models - 图36#card=math&code=P%28Y%7CX%29&id=dsItq)发生了变化。

以上变化在”Mapping conditional distributions for domain adaptation under generalized target shift”中进行了总结,Domain Adaptation中主要考虑标记函数不变的情况,也就是分布变化中的前四种情形,第五种情形也有人进行了研究,最后一种变化常在流数据问题中被研究。

建模数据生成的方式:噪声干预的结构因果模型

数据的生成方式

假定第Domain Adaptation under Structural Causal Models - 图37个源域来源于分布Domain Adaptation under Structural Causal Models - 图38%7D#card=math&code=%5Cmathcal%7BP%7D%5E%7B%28m%29%7D&id=aoBWv)的数据Domain Adaptation under Structural Causal Models - 图39%7D%2C%20Y%5E%7B(m)%7D%5Cright%20)#card=math&code=%5Cleft%20%28%20X%5E%7B%28m%29%7D%2C%20Y%5E%7B%28m%29%7D%5Cright%20%29&id=OvfcO),通过如下方式产生:

Domain Adaptation under Structural Causal Models - 图40%7D%20%5C%5C%0AY%5E%7B(m)%7D%0A%5Cend%7Barray%7D%5Cright%5D%3D%5Cleft%5B%5Cbegin%7Barray%7D%7Bcc%7D%0A%5Cmathbf%7BB%7D%20%26%20b%20%5C%5C%0A%5Comega%5E%7B%5Ctop%7D%20%26%200%0A%5Cend%7Barray%7D%5Cright%5D%5Cleft%5B%5Cbegin%7Barray%7D%7Bl%7D%0AX%5E%7B(m)%7D%20%5C%5C%0AY%5E%7B(m)%7D%0A%5Cend%7Barray%7D%5Cright%5D%2Bg%5Cleft(a%5E%7B(m)%7D%2C%20%5Cvarepsilon%5E%7B(m)%7D%5Cright)%0A#card=math&code=%5Cleft%5B%5Cbegin%7Barray%7D%7Bl%7D%0AX%5E%7B%28m%29%7D%20%5C%5C%0AY%5E%7B%28m%29%7D%0A%5Cend%7Barray%7D%5Cright%5D%3D%5Cleft%5B%5Cbegin%7Barray%7D%7Bcc%7D%0A%5Cmathbf%7BB%7D%20%26%20b%20%5C%5C%0A%5Comega%5E%7B%5Ctop%7D%20%26%200%0A%5Cend%7Barray%7D%5Cright%5D%5Cleft%5B%5Cbegin%7Barray%7D%7Bl%7D%0AX%5E%7B%28m%29%7D%20%5C%5C%0AY%5E%7B%28m%29%7D%0A%5Cend%7Barray%7D%5Cright%5D%2Bg%5Cleft%28a%5E%7B%28m%29%7D%2C%20%5Cvarepsilon%5E%7B%28m%29%7D%5Cright%29%0A&id=fbelS)

而目标域来源于分布Domain Adaptation under Structural Causal Models - 图41的数据,通过类似的方式生成:

Domain Adaptation under Structural Causal Models - 图42%0A#card=math&code=%5Cleft%5B%5Cbegin%7Barray%7D%7Bc%7D%0A%5Ctilde%7BX%7D%20%5C%5C%0A%5Ctilde%7BY%7D%0A%5Cend%7Barray%7D%5Cright%5D%3D%5Cleft%5B%5Cbegin%7Barray%7D%7Bll%7D%0A%5Cmathbf%7BB%7D%20%26%20b%20%5C%5C%0A%5Comega%5E%7B%5Ctop%7D%20%26%200%0A%5Cend%7Barray%7D%5Cright%5D%5Cleft%5B%5Cbegin%7Barray%7D%7Bc%7D%0A%5Ctilde%7BX%7D%20%5C%5C%0A%5Ctilde%7BY%7D%0A%5Cend%7Barray%7D%5Cright%5D%2Bg%28%5Cwidetilde%7Ba%7D%2C%20%5Ctilde%7B%5Cvarepsilon%7D%29%0A&id=GLcKN)

数据产生过程中的参数一方面刻画了环境的影响,另一方面刻画了数据内在的因果结构,将在以下的两个小章节介绍。?为什么使用方程的形式来刻画数据。

环境的影响

其中,Domain Adaptation under Structural Causal Models - 图43%7D%2C%5Ctilde%7B%5Cvarepsilon%7D%5Csim%20%5Cmathcal%7BE%7D#card=math&code=%5Cvarepsilon%5E%7B%28m%29%7D%2C%5Ctilde%7B%5Cvarepsilon%7D%5Csim%20%5Cmathcal%7BE%7D&id=ZmDPf)源于相同的噪声分布,Domain Adaptation under Structural Causal Models - 图44%7D%2C%20%5Ctilde%7Ba%7D#card=math&code=a%5E%7B%28m%29%7D%2C%20%5Ctilde%7Ba%7D&id=piNp4)刻画环境的变化,噪声与环境变化共同决定了数据与标记的生成,函数Domain Adaptation under Structural Causal Models - 图45具体建模了这种源域与目标域之间的差异。
易知,在这种数据建模方式下,不同环境中数据的差异源于环境变化Domain Adaptation under Structural Causal Models - 图46%7D%2C%20%5Ctilde%7Ba%7D#card=math&code=a%5E%7B%28m%29%7D%2C%20%5Ctilde%7Ba%7D&id=XSUTt)对于数据产生过程的干预 (Intervention),而函数Domain Adaptation under Structural Causal Models - 图47则具体建模了环境的变化究竟对于数据会产生什么样的影响,本文的理论分析探讨了一种简单的影响方式(Mean shift noise intervention),即,函数Domain Adaptation under Structural Causal Models - 图48%20%5Cmapsto%20a%2B%5Cvarepsilon#card=math&code=g%3A%20%5Cmathbb%7BR%7D%5E%7Bd%2B1%7D%20%5Ctimes%20%5Cmathbb%7BR%7D%5E%7Bd%2B1%7D%20%5Crightarrow%20%5Cmathbb%7BR%7D%5E%7Bd%2B1%7D%20%5Ctext%20%7B%20as%20%7D%28a%2C%20%5Cvarepsilon%29%20%5Cmapsto%20a%2B%5Cvarepsilon&id=cIS2C)。
另外还有一些复杂的建模方式,例如:Variance shift noise intervention(Domain Adaptation under Structural Causal Models - 图49%20%5Cmapsto%20a%20%5Codot%20%5Cvarepsilon#card=math&code=g%3A%28a%2C%20%5Cvarepsilon%29%20%5Cmapsto%20a%20%5Codot%20%5Cvarepsilon&id=CB42T))……本文在后续的实验中对这些复杂的情形进行了探讨。

数据间的因果结构

所生成的数据Domain Adaptation under Structural Causal Models - 图50#card=math&code=%28X%2CY%29&id=Duu63)的因果关系则由未知且确定的参数Domain Adaptation under Structural Causal Models - 图51Domain Adaptation under Structural Causal Models - 图52确定。其中,Domain Adaptation under Structural Causal Models - 图53不可逆,以保证数据生成的唯一性。根据Domain Adaptation under Structural Causal Models - 图54的不同,协变量Domain Adaptation under Structural Causal Models - 图55与标记Domain Adaptation under Structural Causal Models - 图56间的因果关系可以具体分析为:因果预测(Domain Adaptation under Structural Causal Models - 图57导致Domain Adaptation under Structural Causal Models - 图58,由因预测果)、反因果预测(Domain Adaptation under Structural Causal Models - 图59导致Domain Adaptation under Structural Causal Models - 图60,由果预测因)、Domain Adaptation under Structural Causal Models - 图61被干预时的反因果预测(Domain Adaptation under Structural Causal Models - 图62导致Domain Adaptation under Structural Causal Models - 图63,由果预测因,同时标记Domain Adaptation under Structural Causal Models - 图64同样受到来自环境的影响)。

参考与DA方法

参考方法

OLSTar

直接在目标域的标记数据上计算一个最优的线形模型,可以看作是DA问题的性能上限(错误率下限):

Domain Adaptation under Structural Causal Models - 图65%20%26%3A%3Dx%5E%7B%5Ctop%7D%20%5Cbeta%7B%5Ctext%20%7BOLSTar%20%7D%7D%2B%5Cbeta%7B%5Ctext%20%7BOLSTar%20%7D%2C%200%7D%20%5C%5C%0A%5Cbeta%7B%5Ctext%20%7BOLSTar%20%7D%7D%2C%20%5Cbeta%7B%5Ctext%20%7BOLSTar%20%7D%2C%200%7D%20%26%3A%3D%5Cunderset%7B%5Cbeta%2C%20%5Cbeta%7B0%7D%7D%7B%5Carg%20%5Cmin%20%7D%20%5Cmathbb%7BE%7D%7B(X%2C%20Y)%20%5Csim%20%5Ctilde%7B%5Cmathcal%7BP%7D%7D%7D%5Cleft(Y-X%5E%7B%5Ctop%7D%20%5Cbeta-%5Cbeta%7B0%7D%5Cright)%5E%7B2%7D%20.%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0Af%7B%5Ctext%20%7BOLSTar%20%7D%7D%28x%29%20%26%3A%3Dx%5E%7B%5Ctop%7D%20%5Cbeta%7B%5Ctext%20%7BOLSTar%20%7D%7D%2B%5Cbeta%7B%5Ctext%20%7BOLSTar%20%7D%2C%200%7D%20%5C%5C%0A%5Cbeta%7B%5Ctext%20%7BOLSTar%20%7D%7D%2C%20%5Cbeta%7B%5Ctext%20%7BOLSTar%20%7D%2C%200%7D%20%26%3A%3D%5Cunderset%7B%5Cbeta%2C%20%5Cbeta%7B0%7D%7D%7B%5Carg%20%5Cmin%20%7D%20%5Cmathbb%7BE%7D%7B%28X%2C%20Y%29%20%5Csim%20%5Ctilde%7B%5Cmathcal%7BP%7D%7D%7D%5Cleft%28Y-X%5E%7B%5Ctop%7D%20%5Cbeta-%5Cbeta_%7B0%7D%5Cright%29%5E%7B2%7D%20.%0A%5Cend%7Baligned%7D%0A&id=VgGwF)

Causal

利用线形SCM参数生成的因果线形模型,能够在对于协变量Domain Adaptation under Structural Causal Models - 图66扰动任意时到达最低的预测风险。然而,在DA问题中目标域的协变量Domain Adaptation under Structural Causal Models - 图67能够提供更多的信息,因此,因果模型不一定能够相比OLSTar达到一样的最低目标域风险。

Domain Adaptation under Structural Causal Models - 图68%20%26%3A%3Dx%5E%7B%5Ctop%7D%20%5Cbeta%7B%5Ctext%20%7BCausal%20%7D%7D%20%5C%5C%0A%5Cbeta%7B%5Ctext%20%7BCausal%20%7D%7D%20%26%3A%3D%5Comega%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0Af%7B%5Ctext%20%7BCausal%20%7D%7D%28x%29%20%26%3A%3Dx%5E%7B%5Ctop%7D%20%5Cbeta%7B%5Ctext%20%7BCausal%20%7D%7D%20%5C%5C%0A%5Cbeta_%7B%5Ctext%20%7BCausal%20%7D%7D%20%26%3A%3D%5Comega%0A%5Cend%7Baligned%7D%0A&id=Dbfw8)

Source-Only Baseline

OLSSrc:在第Domain Adaptation under Structural Causal Models - 图69个源域上计算得到的线形分类器:

Domain Adaptation under Structural Causal Models - 图70%7D(x)%20%26%3A%3Dx%5E%7B%5Ctop%7D%20%5Cbeta%7B%5Ctext%20%7BOLSSrc%20%7D%7D%5E%7B(m)%7D%2B%5Cbeta%7B%5Cmathrm%7BOLSSrc%7D%2C%200%7D%5E%7B(m)%7D%20%5C%5C%0A%5Cbeta%7B%5Cmathrm%7BOLSSrc%7D%7D%5E%7B(m)%7D%2C%20%5Cbeta%7B%5Cmathrm%7BOLSSrc%7D%2C%200%7D%5E%7B(m)%7D%20%26%3A%3D%5Cunderset%7B%5Cbeta%2C%20%5Cbeta%7B0%7D%7D%7B%5Carg%20%5Cmin%20%7D%20%5Cmathbb%7BE%7D%7B(X%2C%20Y)%20%5Csim%20%5Cmathcal%7BP%7D%5E%7B(m)%7D%7D%5Cleft(Y-X%5E%7B%5Ctop%7D%20%5Cbeta-%5Cbeta%7B0%7D%5Cright)%5E%7B2%7D%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0Af%7B%5Ctext%20%7BOLSSrc%20%7D%7D%5E%7B%28m%29%7D%28x%29%20%26%3A%3Dx%5E%7B%5Ctop%7D%20%5Cbeta%7B%5Ctext%20%7BOLSSrc%20%7D%7D%5E%7B%28m%29%7D%2B%5Cbeta%7B%5Cmathrm%7BOLSSrc%7D%2C%200%7D%5E%7B%28m%29%7D%20%5C%5C%0A%5Cbeta%7B%5Cmathrm%7BOLSSrc%7D%7D%5E%7B%28m%29%7D%2C%20%5Cbeta%7B%5Cmathrm%7BOLSSrc%7D%2C%200%7D%5E%7B%28m%29%7D%20%26%3A%3D%5Cunderset%7B%5Cbeta%2C%20%5Cbeta%7B0%7D%7D%7B%5Carg%20%5Cmin%20%7D%20%5Cmathbb%7BE%7D%7B%28X%2C%20Y%29%20%5Csim%20%5Cmathcal%7BP%7D%5E%7B%28m%29%7D%7D%5Cleft%28Y-X%5E%7B%5Ctop%7D%20%5Cbeta-%5Cbeta_%7B0%7D%5Cright%29%5E%7B2%7D%0A%5Cend%7Baligned%7D%0A&id=MvRrh)

SrcPool: 在所有的源域上计算得到的线性分类器:

Domain Adaptation under Structural Causal Models - 图71%20%26%3A%3Dx%5E%7B%5Ctop%7D%20%5Cbeta%7B%5Ctext%20%7BSrcPool%20%7D%7D%2B%5Cbeta%7B%5Ctext%20%7BSrcPool%20%7D%2C%200%7D%20%5C%5C%0A%5Cbeta%7B%5Ctext%20%7BSrcPool%20%7D%7D%2C%20%5Cbeta%7B%5Ctext%20%7BSrcPool%20%7D%2C%200%7D%20%26%3A%3D%5Cunderset%7B%5Cbeta%2C%20%5Cbeta%7B0%7D%7D%7B%5Carg%20%5Cmin%20%7D%20%5Cmathbb%7BE%7D%7B(X%2C%20Y)%20%5Csim%20%5Cmathcal%7BP%7D%7B%5Ctext%20%7Ballsrc%20%7D%7D%7D%5Cleft(Y-X%5E%7B%5Ctop%7D%20%5Cbeta-%5Cbeta%7B0%7D%5Cright)%5E%7B2%7D%2C%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0Af%7B%5Ctext%20%7BSrcPool%20%7D%7D%28x%29%20%26%3A%3Dx%5E%7B%5Ctop%7D%20%5Cbeta%7B%5Ctext%20%7BSrcPool%20%7D%7D%2B%5Cbeta%7B%5Ctext%20%7BSrcPool%20%7D%2C%200%7D%20%5C%5C%0A%5Cbeta%7B%5Ctext%20%7BSrcPool%20%7D%7D%2C%20%5Cbeta%7B%5Ctext%20%7BSrcPool%20%7D%2C%200%7D%20%26%3A%3D%5Cunderset%7B%5Cbeta%2C%20%5Cbeta%7B0%7D%7D%7B%5Carg%20%5Cmin%20%7D%20%5Cmathbb%7BE%7D%7B%28X%2C%20Y%29%20%5Csim%20%5Cmathcal%7BP%7D%7B%5Ctext%20%7Ballsrc%20%7D%7D%7D%5Cleft%28Y-X%5E%7B%5Ctop%7D%20%5Cbeta-%5Cbeta_%7B0%7D%5Cright%29%5E%7B2%7D%2C%0A%5Cend%7Baligned%7D%0A&id=X3QoB)

Domain Adaptation 方法

Domain Invariant Projection (DIP)

DIP 旨在学习源域与目标域的一个共享子空间,在此空间内优化源域上分类损失。DIP类方法通常通过引入正则化项来拉近源域与目标域之间的距离。本文分析了一种更简单的形式,即,将正则化项的系数设置为Domain Adaptation under Structural Causal Models - 图72、仅使用一个源域、特征提取器为线性模型、分类器为恒等映射、使用均方误差衡量分布之间的差异:

Domain Adaptation under Structural Causal Models - 图73%7D(x)%20%26%3A%3Dx%5E%7B%5Ctop%7D%20%5Cbeta%7B%5Cmathrm%7BDIP%7D%7D%5E%7B(m)%7D%2B%5Cbeta%7B%5Cmathrm%7BDIP%7D%2C%200%7D%5E%7B(m)%7D%20%5C%5C%0A%5Cbeta%7B%5Cmathrm%7BDIP%7D%7D%5E%7B(m)%7D%2C%20%5Cbeta%7B%5Cmathrm%7BDIP%7D%2C%200%7D%5E%7B(m)%7D%20%26%3A%3D%5Cunderset%7B%5Cbeta%2C%20%5Cbeta%7B0%7D%7D%7B%5Carg%20%5Cmin%20%7D%20%5Cmathbb%7BE%7D%7B(X%2C%20Y)%20%5Csim%20%5Cmathcal%7BP%7D%5E%7B(m)%7D%7D%5Cleft(Y-X%5E%7B%5Ctop%7D%20%5Cbeta-%5Cbeta%7B0%7D%5Cright)%5E%7B2%7D%20%5C%5C%0A%26%20%5Ctext%20%7B%20s.t.%20%7D%20%5Cmathbb%7BE%7D%7BX%20%5Csim%20%5Cmathcal%7BP%7D%7BX%7D%5E%7B(m)%7D%7D%5Cleft%5BX%5E%7B%5Ctop%7D%20%5Cbeta%5Cright%5D%3D%5Cmathbb%7BE%7D%7BX%20%5Csim%20%5Ctilde%7B%5Cmathcal%7BP%7D%7D%7BX%7D%7D%5Cleft%5BX%5E%7B%5Ctop%7D%20%5Cbeta%5Cright%5D%20.%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0Af%7B%5Cmathrm%7BDIP%7D%7D%5E%7B%28m%29%7D%28x%29%20%26%3A%3Dx%5E%7B%5Ctop%7D%20%5Cbeta%7B%5Cmathrm%7BDIP%7D%7D%5E%7B%28m%29%7D%2B%5Cbeta%7B%5Cmathrm%7BDIP%7D%2C%200%7D%5E%7B%28m%29%7D%20%5C%5C%0A%5Cbeta%7B%5Cmathrm%7BDIP%7D%7D%5E%7B%28m%29%7D%2C%20%5Cbeta%7B%5Cmathrm%7BDIP%7D%2C%200%7D%5E%7B%28m%29%7D%20%26%3A%3D%5Cunderset%7B%5Cbeta%2C%20%5Cbeta%7B0%7D%7D%7B%5Carg%20%5Cmin%20%7D%20%5Cmathbb%7BE%7D%7B%28X%2C%20Y%29%20%5Csim%20%5Cmathcal%7BP%7D%5E%7B%28m%29%7D%7D%5Cleft%28Y-X%5E%7B%5Ctop%7D%20%5Cbeta-%5Cbeta%7B0%7D%5Cright%29%5E%7B2%7D%20%5C%5C%0A%26%20%5Ctext%20%7B%20s.t.%20%7D%20%5Cmathbb%7BE%7D%7BX%20%5Csim%20%5Cmathcal%7BP%7D%7BX%7D%5E%7B%28m%29%7D%7D%5Cleft%5BX%5E%7B%5Ctop%7D%20%5Cbeta%5Cright%5D%3D%5Cmathbb%7BE%7D%7BX%20%5Csim%20%5Ctilde%7B%5Cmathcal%7BP%7D%7D_%7BX%7D%7D%5Cleft%5BX%5E%7B%5Ctop%7D%20%5Cbeta%5Cright%5D%20.%0A%5Cend%7Baligned%7D%0A&id=dvSYB)

Conditional Invariant Penalty (CIP)

相比DIP对齐源域与目标域的数据,CIP直接利用多个源域中的标记信息Domain Adaptation under Structural Causal Models - 图74,寻找协变量Domain Adaptation under Structural Causal Models - 图75中条件不变的成分。与DIP类似,本文分析了一种简单的形式:

Domain Adaptation under Structural Causal Models - 图76%3A%3Dx%5E%7B%5Ctop%7D%20%5Cbeta%7B%5Cmathrm%7BCIP%7D%7D%2B%5Cbeta%7B%5Cmathrm%7BCIP%7D%2C%200%7D%20%5C%5C%0A%5Cbeta%7B%5Cmathrm%7BCIP%7D%7D%2C%20%5Cbeta%7B%5Cmathrm%7BCIP%7D%2C%200%7D%3A%3D%5Cunderset%7B%5Cbeta%2C%20%5Cbeta%7B0%7D%7D%7B%5Carg%20%5Cmin%20%7D%20%5Cfrac%7B1%7D%7BM%7D%20%5Csum%7Bm%3D1%7D%5E%7BM%7D%20%5Cmathbb%7BE%7D%7B(X%2C%20Y)%20%5Csim%20%5Cmathcal%7BP%7D%5E%7B(m)%7D%7D%5Cleft(Y-X%5E%7B%5Ctop%7D%20%5Cbeta-%5Cbeta%7B0%7D%5Cright)%5E%7B2%7D%20%5C%5C%0A%5Ctext%20%7B%20s.t.%20%7D%20%5Cmathbb%7BE%7D%7B(X%2C%20Y)%20%5Csim%20%5Cmathcal%7BP%7D%5E%7B(m)%7D%7D%5Cleft%5BX%5E%7B%5Ctop%7D%20%5Cbeta%20%5Cmid%20Y%5Cright%5D%3D%5Cmathbb%7BE%7D%7B(X%2C%20Y)%20%5Csim%20%5Cmathcal%7BP%7D%5E%7B(1)%7D%7D%5Cleft%5BX%5E%7B%5Ctop%7D%20%5Cbeta%20%5Cmid%20Y%5Cright%5D%20%5Ctext%20%7B%20a.s.%2C%20%7D%20%5Cforall%20m%20%5Cin%5C%7B2%2C%20%5Ccdots%2C%20M%5C%7D%2C%0A%5Cend%7Bgathered%7D%0A#card=math&code=%5Cbegin%7Bgathered%7D%0Af%7B%5Cmathrm%7BCIP%7D%7D%28x%29%3A%3Dx%5E%7B%5Ctop%7D%20%5Cbeta%7B%5Cmathrm%7BCIP%7D%7D%2B%5Cbeta%7B%5Cmathrm%7BCIP%7D%2C%200%7D%20%5C%5C%0A%5Cbeta%7B%5Cmathrm%7BCIP%7D%7D%2C%20%5Cbeta%7B%5Cmathrm%7BCIP%7D%2C%200%7D%3A%3D%5Cunderset%7B%5Cbeta%2C%20%5Cbeta%7B0%7D%7D%7B%5Carg%20%5Cmin%20%7D%20%5Cfrac%7B1%7D%7BM%7D%20%5Csum%7Bm%3D1%7D%5E%7BM%7D%20%5Cmathbb%7BE%7D%7B%28X%2C%20Y%29%20%5Csim%20%5Cmathcal%7BP%7D%5E%7B%28m%29%7D%7D%5Cleft%28Y-X%5E%7B%5Ctop%7D%20%5Cbeta-%5Cbeta%7B0%7D%5Cright%29%5E%7B2%7D%20%5C%5C%0A%5Ctext%20%7B%20s.t.%20%7D%20%5Cmathbb%7BE%7D%7B%28X%2C%20Y%29%20%5Csim%20%5Cmathcal%7BP%7D%5E%7B%28m%29%7D%7D%5Cleft%5BX%5E%7B%5Ctop%7D%20%5Cbeta%20%5Cmid%20Y%5Cright%5D%3D%5Cmathbb%7BE%7D_%7B%28X%2C%20Y%29%20%5Csim%20%5Cmathcal%7BP%7D%5E%7B%281%29%7D%7D%5Cleft%5BX%5E%7B%5Ctop%7D%20%5Cbeta%20%5Cmid%20Y%5Cright%5D%20%5Ctext%20%7B%20a.s.%2C%20%7D%20%5Cforall%20m%20%5Cin%5C%7B2%2C%20%5Ccdots%2C%20M%5C%7D%2C%0A%5Cend%7Bgathered%7D%0A&id=rNnoO)

启发和结论

因果预测任务(Covariate Shift)中,目标域无标记数据没有用

在因果预测问题中,数据Domain Adaptation under Structural Causal Models - 图77决定标记Domain Adaptation under Structural Causal Models - 图78,且环境对数据Domain Adaptation under Structural Causal Models - 图79产生影响,即,源域与目标域中Domain Adaptation under Structural Causal Models - 图80#card=math&code=P%28X%29&id=GKjiT)变化而Domain Adaptation under Structural Causal Models - 图81#card=math&code=P%28Y%7CX%29&id=kg2ij)保持不变,对应分布变化中的 Covariate Shift。

Causal-Prediction.png

在这种情况下,Domain Adaptation under Structural Causal Models - 图83#card=math&code=P%28X%29&id=o1djz)上的变化与Domain Adaptation under Structural Causal Models - 图84#card=math&code=P%28Y%7CX%29&id=cH5bQ)是无关的,模型应当在源域上挖掘Domain Adaptation under Structural Causal Models - 图85#card=math&code=P%28Y%7CX%29&id=iQe6M),目标域中提供的额外Domain Adaptation under Structural Causal Models - 图86#card=math&code=P%28X%29&id=JDAY9)信息对于最终分类任务是没有增益的。因此,基线方案OLSSrc就能够获得性能上限OLSTar相同的错误率。

在这种情况下,DIP方法对于目标域上的性能相比基线方法OLSSrc反而有损害,因为它利用了本应无用的目标域中Domain Adaptation under Structural Causal Models - 图87#card=math&code=P%28X%29&id=DSpQD)信息,强制对齐源域与目标域间的特征表示。

DIP-Results.png

结果见表格的第一大行,DIP方法在目标域上性能很差,然而OLSTar, Causal, OLSSrc都有较好的性能。

反因果预测任务(Conditional Shift)中,DIP方法最有效但不能盲目对齐源域与目标域特征空间

在反因果预测中,标记Domain Adaptation under Structural Causal Models - 图89决定协变量Domain Adaptation under Structural Causal Models - 图90,且环境对协变量Domain Adaptation under Structural Causal Models - 图91产生影响,即,源域与目标域中Domain Adaptation under Structural Causal Models - 图92#card=math&code=P%28X%29&id=ThoBB)变化而Domain Adaptation under Structural Causal Models - 图93#card=math&code=P%28X%7CY%29&id=m3hRm), Domain Adaptation under Structural Causal Models - 图94#card=math&code=P%28Y%29&id=xfIAn)保持不变,对应分布变化中的 Conditional Shift。

Anticausal-Prediction.png

在这种情况下,Domain Adaptation under Structural Causal Models - 图96#card=math&code=P%28Y%29&id=ET7Jh)与Domain Adaptation under Structural Causal Models - 图97#card=math&code=P%28X%7CY%29&id=NP7yS)保持不变,即,联合分布Domain Adaptation under Structural Causal Models - 图98#card=math&code=P%28X%2C%20Y%29&id=Izckb)不变化。在此条件下,协变量的边缘分布Domain Adaptation under Structural Causal Models - 图99#card=math&code=P%28X%29&id=K3Q91)变化,将导致条件分布Domain Adaptation under Structural Causal Models - 图100#card=math&code=P%28Y%7CX%29&id=ZO5BI)变化。因此,Domain Adaptation under Structural Causal Models - 图101#card=math&code=P%28Y%7CX%29&id=fxZ6n)的变化与Domain Adaptation under Structural Causal Models - 图102#card=math&code=P%28X%29&id=rVQQJ)是相关的。

反因果预测任务(Conditional Shift)中,DIP方法的效果最好。因为DIP方法尝试对齐源域与目标域的Domain Adaptation under Structural Causal Models - 图103#card=math&code=P%28X%29&id=QHw5a),并在对齐的子空间中计算Domain Adaptation under Structural Causal Models - 图104#card=math&code=P%28Y%7CX%29&id=R4xQw),所得模型能够在源域与目标域上都有较好的泛化性能。结果见表格的第二大行,DIP方法性能很好,接近上限算法 OLSTar 的性能。

本文做了另一个小实验,在对齐源域与目标域数据分布的时候,进行了简单的调整(方法无法获取Domain Adaptation under Structural Causal Models - 图105只能获得Domain Adaptation under Structural Causal Models - 图106),性能反而变差,这说明盲目对齐不变表征是有害的。结果见表格中第二大行的DIP与DIPAbs比较。

标记分布的扰动将影响 DIP 方法性能

这里考虑了一种新的任务,即,标记Y受环境影响时的反因果预测任务。在这个任务中,源域与目标域中Domain Adaptation under Structural Causal Models - 图107#card=math&code=P%28X%29&id=XHbW4)与Domain Adaptation under Structural Causal Models - 图108#card=math&code=P%28Y%29&id=tIhUD)都发生变化,同时,存在变换Domain Adaptation under Structural Causal Models - 图109使得Domain Adaptation under Structural Causal Models - 图110%7CY)#card=math&code=P%28%5Cmathcal%7BT%7D%28X%29%7CY%29&id=MuCNJ)保持不变。这种分布变化对应了Generalized Target Shift,且满足限制就是有Domain Adaptation under Structural Causal Models - 图111%7CY)#card=math&code=P%28%5Cmathcal%7BT%7D%28X%29%7CY%29&id=jYTMS)保持不变。
Anticausal-Prediction-with-Y-Intervened.png
在这个问题中,由于Domain Adaptation under Structural Causal Models - 图113#card=math&code=P%28Y%7CX%29&id=HLyOm)的变化同时与Domain Adaptation under Structural Causal Models - 图114#card=math&code=P%28X%29&id=ZPwyj)与Domain Adaptation under Structural Causal Models - 图115#card=math&code=P%28Y%29&id=zITBw)相关,因此,DIP方法仅考虑对齐边缘分布Domain Adaptation under Structural Causal Models - 图116#card=math&code=P%28X%29&id=mTjrd)后,计算得到的Domain Adaptation under Structural Causal Models - 图117#card=math&code=P%28Y%7CX%29&id=ZtgRJ)仍然受到目标域中未知的Domain Adaptation under Structural Causal Models - 图118#card=math&code=P%28Y%29&id=Py9rE)的影响,在目标域上无法得到较好的性能。结果见表格的第三大行,事实上在这种情况下,基线方法和DIP都无法取得较好的性能。

额外的假设能保证DIP方法性能优于基线方案

DIP方法失败的情形

DIP方法的主要思想为:消除Domain Adaptation under Structural Causal Models - 图119#card=math&code=P%28X%29&id=iHQRY)的变化后对Domain Adaptation under Structural Causal Models - 图120#card=math&code=P%28Y%7CX%29&id=WQh0Q)进行建模,因此,其失败有三个原因:

  • Domain Adaptation under Structural Causal Models - 图121#card=math&code=P%28Y%7CX%29&id=JkfqC)与Domain Adaptation under Structural Causal Models - 图122#card=math&code=P%28X%29&id=X1UJa)无关,对齐Domain Adaptation under Structural Causal Models - 图123#card=math&code=P%28X%29&id=qXT9b)多此一举。
  • DIP方法无法正确建模&消除Domain Adaptation under Structural Causal Models - 图124#card=math&code=P%28X%29&id=fZRH9)变化,例如:Domain Adaptation under Structural Causal Models - 图125#card=math&code=P%28X%29&id=ivr2L)变化在Variance上,而算法对齐的是Mean。
  • Domain Adaptation under Structural Causal Models - 图126#card=math&code=P%28Y%7CX%29&id=kV2eJ)不仅与Domain Adaptation under Structural Causal Models - 图127#card=math&code=P%28X%29&id=Qb9aa)相关还受其他因素影响,例如:Domain Adaptation under Structural Causal Models - 图128#card=math&code=P%28Y%29&id=OsyST)也对标记函数造成影响且Domain Adaptation under Structural Causal Models - 图129同样受到环境干预。

DIP方法的风险保证

DIP算法性能保证的假设(详见文章中的Assumption 1):

  • 数据生成过程符合 Linear SCM
  • 预测任务是反因果的(Domain Adaptation under Structural Causal Models - 图130
  • Domain Adaptation under Structural Causal Models - 图131#card=math&code=P%28Y%29&id=tIIzA)不受环境变化影响,即,Domain Adaptation under Structural Causal Models - 图132#card=math&code=P%28Y%7CX%29&id=Q1jZN)仅由Domain Adaptation under Structural Causal Models - 图133#card=math&code=P%28X%29&id=vH20Z)影响
  • DIP对齐Domain Adaptation under Structural Causal Models - 图134#card=math&code=P%28X%29&id=iGzc0)的方式正确,例如:Domain Adaptation under Structural Causal Models - 图135#card=math&code=P%28X%29&id=uUoEd)变化在Mean上,而算法也对齐Mean。

基于此假设,可以推知在此情形下性能上限算法OLSTar、性能下线算法OLSSrc和DIP算法的误差满足以下关系:

Theorem-1.png

性能上限算法OLSTar的错误率和标记Domain Adaptation under Structural Causal Models - 图137的范围Domain Adaptation under Structural Causal Models - 图138相关,与标记Domain Adaptation under Structural Causal Models - 图139对于协变量Domain Adaptation under Structural Causal Models - 图140的影响强度大Domain Adaptation under Structural Causal Models - 图141负相关。其中,Domain Adaptation under Structural Causal Models - 图142可以看作是将协变量Domain Adaptation under Structural Causal Models - 图143内部的相关性解耦,从而计算出Domain Adaptation under Structural Causal Models - 图144对于Domain Adaptation under Structural Causal Models - 图145的真实影响。

仅使用源域数据的OLSSrc的错误率在OLSTar的基础上,增加了与环境对于协变量Domain Adaptation under Structural Causal Models - 图146影响Domain Adaptation under Structural Causal Models - 图147%7D%7BX%7D-%5Ctilde%7Ba%7D_X#card=math&code=a%5E%7B%281%29%7D%7BX%7D-%5Ctilde%7Ba%7D_X&id=W6R6V)相关的损失项。

DIP方法的分母与OLSTar不一样,区别在于Domain Adaptation under Structural Causal Models - 图148变成了Domain Adaptation under Structural Causal Models - 图149%7D%20%5CSigma%5E%7B-%5Cfrac%7B1%7D%7B2%7D%7D#card=math&code=%5CSigma%5E%7B-%5Cfrac%7B1%7D%7B2%7D%7D%20G%7BDIP%7D%5E%7B%281%29%7D%20%5CSigma%5E%7B-%5Cfrac%7B1%7D%7B2%7D%7D&id=vcnRS),这里的![](https://g.yuque.com/gr/latex?G%7BDIP%7D%5E%7B(1)%7D#card=math&code=G%7BDIP%7D%5E%7B%281%29%7D&id=kjbi4)可以理解为经过域不变投影后,协变量Domain Adaptation under Structural Causal Models - 图150的剩余部分。当源域与目标域相同时![](https://g.yuque.com/gr/latex?G%7BDIP%7D%5E%7B(1)%7D#card=math&code=G_%7BDIP%7D%5E%7B%281%29%7D&id=WLKEm)变为单位阵,此时DIP与OLSTar错误率相当。

在此基础上,本文假设协变量Domain Adaptation under Structural Causal Models - 图151个维度间独立时,即,Domain Adaptation under Structural Causal Models - 图152。同时,源域与目标域环境对于协变量Domain Adaptation under Structural Causal Models - 图153的差异从随机告诉分布中采样得到Domain Adaptation under Structural Causal Models - 图154#card=math&code=%5Cmathcal%7BN%7D%5Cleft%280%2C%20%5Ctau%20%5Cmathbb%7BI%7D_%7Bd%7D%5E%7B2%7D%5Cright%29&id=B0Nt9),就有:
Corollary2.png

CIP方法能处理Domain Adaptation under Structural Causal Models - 图156变化

CIP算法能够保证性能的假设(Assumption 2):

  • 数据的生成复合线性SCM模型
  • 预测任务为反因果(Domain Adaptation under Structural Causal Models - 图157
  • 协变量Domain Adaptation under Structural Causal Models - 图158在各个环境下存在条件不变的成分,即,环境对于样本的扰动并不会生成整个特征空间,Domain Adaptation under Structural Causal Models - 图159%7D-a%7BX%7D%5E%7B(1)%7D%2C%20%5Cldots%2C%20a%7BX%7D%5E%7B(M)%7D-a%7BX%7D%5E%7B(1)%7D%5Cright)%5Cright)%3Dp%20%5Cleq%20d-1#card=math&code=%5Coperatorname%7Bdim%7D%5Cleft%28%5Coperatorname%7Bspan%7D%5Cleft%28a%7BX%7D%5E%7B%282%29%7D-a%7BX%7D%5E%7B%281%29%7D%2C%20%5Cldots%2C%20a%7BX%7D%5E%7B%28M%29%7D-a%7BX%7D%5E%7B%281%29%7D%5Cright%29%5Cright%29%3Dp%20%5Cleq%20d-1&id=qWmHq))且目标域对于协变量Domain Adaptation under Structural Causal Models - 图160的干预也属于这个生成空间,即,![](https://g.yuque.com/gr/latex?%5Ctilde%7Ba%7D%7BX%7D-a%7BX%7D%5E%7B(1)%7D%20%5Cin%20%5Coperatorname%7Bspan%7D%5Cleft(a%7BX%7D%5E%7B(2)%7D-a%7BX%7D%5E%7B(1)%7D%2C%20%5Cldots%2C%20a%7BX%7D%5E%7B(M)%7D-a%7BX%7D%5E%7B(1)%7D%5Cright)#card=math&code=%5Ctilde%7Ba%7D%7BX%7D-a%7BX%7D%5E%7B%281%29%7D%20%5Cin%20%5Coperatorname%7Bspan%7D%5Cleft%28a%7BX%7D%5E%7B%282%29%7D-a%7BX%7D%5E%7B%281%29%7D%2C%20%5Cldots%2C%20a%7BX%7D%5E%7B%28M%29%7D-a_%7BX%7D%5E%7B%281%29%7D%5Cright%29&id=fxYo2)
  • 算法对齐数据分布的方式正确

基于此假设,可以推知OLSTar、CIP、CIRM算法在在目标域上的错误率:
Theorem5-1.png
Theorem5-2.png

类似的,可以了解CIP算法的错误率与标记Domain Adaptation under Structural Causal Models - 图163的范围Domain Adaptation under Structural Causal Models - 图164Domain Adaptation under Structural Causal Models - 图165%5E2#card=math&code=%28%5Ctilde%7Ba%7D%7BY%7D-%5Coverline%7Ba%7D%7BY%7D%29%5E2&id=ezBzj)相关;和经过条件不变投影后Domain Adaptation under Structural Causal Models - 图166对于Domain Adaptation under Structural Causal Models - 图167的剩余部分的影响负相关。

CIRM算法

CIRM.png

CIRM算法是DIP算法和CIP算法的结合:首先利用CIP算法计算协变量Domain Adaptation under Structural Causal Models - 图169中条件不变的成分,然后利用Domain Adaptation under Structural Causal Models - 图170作为目标域上未知的Domain Adaptation under Structural Causal Models - 图171的近似。在然后协变量Domain Adaptation under Structural Causal Models - 图172中去除近似的Domain Adaptation under Structural Causal Models - 图173后,协变量的残差Domain Adaptation under Structural Causal Models - 图174将不受环境对于Domain Adaptation under Structural Causal Models - 图175的干预。因此,利用不变成分和残差在源域与目标域上的对齐Domain Adaptation under Structural Causal Models - 图176进行预测。

相比于CIP算法,CIRM算法不仅利用了协变量Domain Adaptation under Structural Causal Models - 图177中条件不变的部分,还尝试将目标域与源域中Domain Adaptation under Structural Causal Models - 图178的残差对齐以预测Domain Adaptation under Structural Causal Models - 图179,从而获得了更好的性能。