期望最大算法的目的是解决具有隐变量的混合模型的参数估计(极大似然估计)。MLE 对 7.期望最大 - 图1#card=math&code=p%28x%7C%5Ctheta%29#crop=0&crop=0&crop=1&crop=1&id=BP9T5&originHeight=26&originWidth=55&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 参数的估计记为:7.期望最大 - 图2#card=math&code=%5Ctheta%7BMLE%7D%3D%5Cmathop%7Bargmax%7D%5Climits%5Ctheta%5Clog%20p%28x%7C%5Ctheta%29#crop=0&crop=0&crop=1&crop=1&id=i7VFy&originHeight=41&originWidth=237&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。EM 算法对这个问题的解决方法是采用迭代的方法:

7.期望最大 - 图3%5Dp(z%7Cx%2C%5Ctheta%5Et)dz%3D%5Cmathbb%7BE%7D%7Bz%7Cx%2C%5Ctheta%5Et%7D%5B%5Clog%20p(x%2Cz%7C%5Ctheta)%5D%0A#card=math&code=%5Ctheta%5E%7Bt%2B1%7D%3D%5Cmathop%7Bargmax%7D%5Climits%7B%5Ctheta%7D%5Cintz%5Clog%20%5Bp%28x%2Cz%7C%5Ctheta%29%5Dp%28z%7Cx%2C%5Ctheta%5Et%29dz%3D%5Cmathbb%7BE%7D%7Bz%7Cx%2C%5Ctheta%5Et%7D%5B%5Clog%20p%28x%2Cz%7C%5Ctheta%29%5D%0A#crop=0&crop=0&crop=1&crop=1&id=Dv5Rf&originHeight=53&originWidth=573&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

这个公式包含了迭代的两步:

  1. E step:计算 7.期望最大 - 图4#card=math&code=%5Clog%20p%28x%2Cz%7C%5Ctheta%29#crop=0&crop=0&crop=1&crop=1&id=HOLcs&originHeight=26&originWidth=104&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 在概率分布 7.期望最大 - 图5#card=math&code=p%28z%7Cx%2C%5Ctheta%5Et%29#crop=0&crop=0&crop=1&crop=1&id=zIlSb&originHeight=27&originWidth=82&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 下的期望
  2. M step:计算使这个期望最大化的参数得到下一个 EM 步骤的输入

求证:7.期望最大 - 图6%5Cle%5Clog%20p(x%7C%5Ctheta%5E%7Bt%2B1%7D)#card=math&code=%5Clog%20p%28x%7C%5Ctheta%5Et%29%5Cle%5Clog%20p%28x%7C%5Ctheta%5E%7Bt%2B1%7D%29#crop=0&crop=0&crop=1&crop=1&id=rxbnB&originHeight=29&originWidth=231&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

证明:7.期望最大 - 图7%3D%5Clog%20p(z%2Cx%7C%5Ctheta)-%5Clog%20p(z%7Cx%2C%5Ctheta)#card=math&code=%5Clog%20p%28x%7C%5Ctheta%29%3D%5Clog%20p%28z%2Cx%7C%5Ctheta%29-%5Clog%20p%28z%7Cx%2C%5Ctheta%29#crop=0&crop=0&crop=1&crop=1&id=JcSUd&originHeight=26&originWidth=346&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),对左右两边求积分:

7.期望最大 - 图8%5Clog%20p(x%7C%5Ctheta)dz%3D%5Clog%20p(x%7C%5Ctheta)%0A#card=math&code=Left%3A%5Cint_zp%28z%7Cx%2C%5Ctheta%5Et%29%5Clog%20p%28x%7C%5Ctheta%29dz%3D%5Clog%20p%28x%7C%5Ctheta%29%0A#crop=0&crop=0&crop=1&crop=1&id=Bfvs5&originHeight=51&originWidth=387&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

7.期望最大 - 图9%5Clog%20p(x%2Cz%7C%5Ctheta)dz-%5Cint_zp(z%7Cx%2C%5Ctheta%5Et)%5Clog%20p(z%7Cx%2C%5Ctheta)dz%3DQ(%5Ctheta%2C%5Ctheta%5Et)-H(%5Ctheta%2C%5Ctheta%5Et)%0A#card=math&code=Right%3A%5Cint_zp%28z%7Cx%2C%5Ctheta%5Et%29%5Clog%20p%28x%2Cz%7C%5Ctheta%29dz-%5Cint_zp%28z%7Cx%2C%5Ctheta%5Et%29%5Clog%20p%28z%7Cx%2C%5Ctheta%29dz%3DQ%28%5Ctheta%2C%5Ctheta%5Et%29-H%28%5Ctheta%2C%5Ctheta%5Et%29%0A#crop=0&crop=0&crop=1&crop=1&id=yng09&originHeight=51&originWidth=756&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

所以:

7.期望最大 - 图10%3DQ(%5Ctheta%2C%5Ctheta%5Et)-H(%5Ctheta%2C%5Ctheta%5Et)%0A#card=math&code=%5Clog%20p%28x%7C%5Ctheta%29%3DQ%28%5Ctheta%2C%5Ctheta%5Et%29-H%28%5Ctheta%2C%5Ctheta%5Et%29%0A#crop=0&crop=0&crop=1&crop=1&id=tIvRA&originHeight=27&originWidth=279&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

由于 7.期望最大 - 图11%3D%5Cintzp(z%7Cx%2C%5Ctheta%5Et)%5Clog%20p(x%2Cz%7C%5Ctheta)dz#card=math&code=Q%28%5Ctheta%2C%5Ctheta%5Et%29%3D%5Cint_zp%28z%7Cx%2C%5Ctheta%5Et%29%5Clog%20p%28x%2Cz%7C%5Ctheta%29dz#crop=0&crop=0&crop=1&crop=1&id=jLKPo&originHeight=51&originWidth=330&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),而 ![](https://g.yuque.com/gr/latex?%5Ctheta%5E%7Bt%2B1%7D%3D%5Cmathop%7Bargmax%7D%5Climits%7B%5Ctheta%7D%5Cintz%5Clog%20%5Bp(x%2Cz%7C%5Ctheta)%5Dp(z%7Cx%2C%5Ctheta%5Et)dz#card=math&code=%5Ctheta%5E%7Bt%2B1%7D%3D%5Cmathop%7Bargmax%7D%5Climits%7B%5Ctheta%7D%5Cint_z%5Clog%20%5Bp%28x%2Cz%7C%5Ctheta%29%5Dp%28z%7Cx%2C%5Ctheta%5Et%29dz#crop=0&crop=0&crop=1&crop=1&id=k0IVm&originHeight=53&originWidth=377&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),所以 7.期望最大 - 图12%5Cge%20Q(%5Ctheta%5Et%2C%5Ctheta%5Et)#card=math&code=Q%28%5Ctheta%5E%7Bt%2B1%7D%2C%5Ctheta%5Et%29%5Cge%20Q%28%5Ctheta%5Et%2C%5Ctheta%5Et%29#crop=0&crop=0&crop=1&crop=1&id=LVvVI&originHeight=29&originWidth=200&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。要证 7.期望最大 - 图13%5Cle%5Clog%20p(x%7C%5Ctheta%5E%7Bt%2B1%7D)#card=math&code=%5Clog%20p%28x%7C%5Ctheta%5Et%29%5Cle%5Clog%20p%28x%7C%5Ctheta%5E%7Bt%2B1%7D%29#crop=0&crop=0&crop=1&crop=1&id=fRHQM&originHeight=29&originWidth=231&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),需证:7.期望最大 - 图14%5Cge%20H(%5Ctheta%5E%7Bt%2B1%7D%2C%5Ctheta%5Et)#card=math&code=H%28%5Ctheta%5Et%2C%5Ctheta%5Et%29%5Cge%20H%28%5Ctheta%5E%7Bt%2B1%7D%2C%5Ctheta%5Et%29#crop=0&crop=0&crop=1&crop=1&id=JKC2B&originHeight=29&originWidth=204&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=):

7.期望最大 - 图15-H(%5Ctheta%5E%7Bt%7D%2C%5Ctheta%5Et)%26%3D%5Cint_zp(z%7Cx%2C%5Ctheta%5E%7Bt%7D)%5Clog%20p(z%7Cx%2C%5Ctheta%5E%7Bt%2B1%7D)dz-%5Cint_zp(z%7Cx%2C%5Ctheta%5Et)%5Clog%20p(z%7Cx%2C%5Ctheta%5E%7Bt%7D)dz%5Cnonumber%5C%5C%0A%26%3D%5Cint_zp(z%7Cx%2C%5Ctheta%5Et)%5Clog%5Cfrac%7Bp(z%7Cx%2C%5Ctheta%5E%7Bt%2B1%7D)%7D%7Bp(z%7Cx%2C%5Ctheta%5Et)%7D%3D-KL(p(z%7Cx%2C%5Ctheta%5Et)%2Cp(z%7Cx%2C%5Ctheta%5E%7Bt%2B1%7D))%5Cle0%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7DH%28%5Ctheta%5E%7Bt%2B1%7D%2C%5Ctheta%5Et%29-H%28%5Ctheta%5E%7Bt%7D%2C%5Ctheta%5Et%29%26%3D%5Cint_zp%28z%7Cx%2C%5Ctheta%5E%7Bt%7D%29%5Clog%20p%28z%7Cx%2C%5Ctheta%5E%7Bt%2B1%7D%29dz-%5Cint_zp%28z%7Cx%2C%5Ctheta%5Et%29%5Clog%20p%28z%7Cx%2C%5Ctheta%5E%7Bt%7D%29dz%5Cnonumber%5C%5C%0A%26%3D%5Cint_zp%28z%7Cx%2C%5Ctheta%5Et%29%5Clog%5Cfrac%7Bp%28z%7Cx%2C%5Ctheta%5E%7Bt%2B1%7D%29%7D%7Bp%28z%7Cx%2C%5Ctheta%5Et%29%7D%3D-KL%28p%28z%7Cx%2C%5Ctheta%5Et%29%2Cp%28z%7Cx%2C%5Ctheta%5E%7Bt%2B1%7D%29%29%5Cle0%0A%5Cend%7Balign%7D%0A#crop=0&crop=0&crop=1&crop=1&id=i1CPR&originHeight=113&originWidth=804&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

综合上面的结果:

7.期望最大 - 图16%5Cle%5Clog%20p(x%7C%5Ctheta%5E%7Bt%2B1%7D)%0A#card=math&code=%5Clog%20p%28x%7C%5Ctheta%5Et%29%5Cle%5Clog%20p%28x%7C%5Ctheta%5E%7Bt%2B1%7D%29%0A#crop=0&crop=0&crop=1&crop=1&id=G3HXg&originHeight=29&originWidth=231&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

根据上面的证明,我们看到,似然函数在每一步都会增大。进一步的,我们看 EM 迭代过程中的式子是怎么来的:

7.期望最大 - 图17%3D%5Clog%20p(z%2Cx%7C%5Ctheta)-%5Clog%20p(z%7Cx%2C%5Ctheta)%3D%5Clog%20%5Cfrac%7Bp(z%2Cx%7C%5Ctheta)%7D%7Bq(z)%7D-%5Clog%20%5Cfrac%7Bp(z%7Cx%2C%5Ctheta)%7D%7Bq(z)%7D%0A#card=math&code=%5Clog%20p%28x%7C%5Ctheta%29%3D%5Clog%20p%28z%2Cx%7C%5Ctheta%29-%5Clog%20p%28z%7Cx%2C%5Ctheta%29%3D%5Clog%20%5Cfrac%7Bp%28z%2Cx%7C%5Ctheta%29%7D%7Bq%28z%29%7D-%5Clog%20%5Cfrac%7Bp%28z%7Cx%2C%5Ctheta%29%7D%7Bq%28z%29%7D%0A#crop=0&crop=0&crop=1&crop=1&id=JIhrN&originHeight=59&originWidth=622&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

分别对两边求期望 7.期望最大 - 图18%7D#card=math&code=%5Cmathbb%7BE%7D_%7Bq%28z%29%7D#crop=0&crop=0&crop=1&crop=1&id=wuPJl&originHeight=27&originWidth=41&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=):

7.期望最大 - 图19%5Clog%20p(x%7C%5Ctheta)dz%3D%5Clog%20p(x%7C%5Ctheta)%5C%5C%0A%26Right%3A%5Cint_zq(z)%5Clog%20%5Cfrac%7Bp(z%2Cx%7C%5Ctheta)%7D%7Bq(z)%7Ddz-%5Cint_zq(z)%5Clog%20%5Cfrac%7Bp(z%7Cx%2C%5Ctheta)%7D%7Bq(z)%7Ddz%3DELBO%2BKL(q(z)%2Cp(z%7Cx%2C%5Ctheta))%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0A%26Left%3A%5Cint_zq%28z%29%5Clog%20p%28x%7C%5Ctheta%29dz%3D%5Clog%20p%28x%7C%5Ctheta%29%5C%5C%0A%26Right%3A%5Cint_zq%28z%29%5Clog%20%5Cfrac%7Bp%28z%2Cx%7C%5Ctheta%29%7D%7Bq%28z%29%7Ddz-%5Cint_zq%28z%29%5Clog%20%5Cfrac%7Bp%28z%7Cx%2C%5Ctheta%29%7D%7Bq%28z%29%7Ddz%3DELBO%2BKL%28q%28z%29%2Cp%28z%7Cx%2C%5Ctheta%29%29%0A%5Cend%7Balign%7D%0A#crop=0&crop=0&crop=1&crop=1&id=aIzeN&originHeight=113&originWidth=776&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

上式中,Evidence Lower Bound(ELBO),是一个下界,所以 7.期望最大 - 图20%5Cge%20ELBO#card=math&code=%5Clog%20p%28x%7C%5Ctheta%29%5Cge%20ELBO#crop=0&crop=0&crop=1&crop=1&id=TXCEx&originHeight=26&originWidth=175&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),等于号取在 KL 散度为0是,即:7.期望最大 - 图21%3Dp(z%7Cx%2C%5Ctheta)#card=math&code=q%28z%29%3Dp%28z%7Cx%2C%5Ctheta%29#crop=0&crop=0&crop=1&crop=1&id=W6KRd&originHeight=26&originWidth=137&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),EM 算法的目的是将 ELBO 最大化,根据上面的证明过程,在每一步 EM 后,求得了最大的ELBO,并根据这个使 ELBO 最大的参数代入下一步中:

7.期望最大 - 图22%5Clog%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)%7D%7Bq(z)%7Ddz%0A#card=math&code=%5Chat%7B%5Ctheta%7D%3D%5Cmathop%7Bargmax%7D%7B%5Ctheta%7DELBO%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cint_zq%28z%29%5Clog%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bq%28z%29%7Ddz%0A#crop=0&crop=0&crop=1&crop=1&id=Xd1Zz&originHeight=59&originWidth=476&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

由于 7.期望最大 - 图23%3Dp(z%7Cx%2C%5Ctheta%5Et)#card=math&code=%C2%A0q%28z%29%3Dp%28z%7Cx%2C%5Ctheta%5Et%29#crop=0&crop=0&crop=1&crop=1&id=lJw4H&originHeight=27&originWidth=145&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 的时候,这一步的最大值才能取等号,所以:

7.期望最大 - 图24%5Clog%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)%7D%7Bq(z)%7Ddz%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cint_zp(z%7Cx%2C%5Ctheta%5Et)%5Clog%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)%7D%7Bp(z%7Cx%2C%5Ctheta%5Et)%7Dd%20z%5C%5C%0A%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cintz%20p(z%7Cx%2C%5Ctheta%5Et)%5Clog%20p(x%2Cz%7C%5Ctheta)%0A#card=math&code=%5Chat%7B%5Ctheta%7D%3D%5Cmathop%7Bargmax%7D%7B%5Ctheta%7DELBO%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cint_zq%28z%29%5Clog%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bq%28z%29%7Ddz%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cintzp%28z%7Cx%2C%5Ctheta%5Et%29%5Clog%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bp%28z%7Cx%2C%5Ctheta%5Et%29%7Dd%20z%5C%5C%0A%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cint_z%20p%28z%7Cx%2C%5Ctheta%5Et%29%5Clog%20p%28x%2Cz%7C%5Ctheta%29%0A#crop=0&crop=0&crop=1&crop=1&id=riTVi&originHeight=116&originWidth=900&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

这个式子就是上面 EM 迭代过程中的式子。

从 Jensen 不等式出发,也可以导出这个式子:

7.期望最大 - 图25%3D%5Clog%5Cintzp(x%2Cz%7C%5Ctheta)dz%3D%5Clog%5Cint_z%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)q(z)%7D%7Bq(z)%7Ddz%5C%5C%0A%3D%5Clog%20%5Cmathbb%7BE%7D%7Bq(z)%7D%5B%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)%7D%7Bq(z)%7D%5D%5Cge%20%5Cmathbb%7BE%7D%7Bq(z)%7D%5B%5Clog%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)%7D%7Bq(z)%7D%5D%0A#card=math&code=%5Clog%20p%28x%7C%5Ctheta%29%3D%5Clog%5Cint_zp%28x%2Cz%7C%5Ctheta%29dz%3D%5Clog%5Cint_z%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29q%28z%29%7D%7Bq%28z%29%7Ddz%5C%5C%0A%3D%5Clog%20%5Cmathbb%7BE%7D%7Bq%28z%29%7D%5B%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bq%28z%29%7D%5D%5Cge%20%5Cmathbb%7BE%7D_%7Bq%28z%29%7D%5B%5Clog%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bq%28z%29%7D%5D%0A#crop=0&crop=0&crop=1&crop=1&id=IXI0j&originHeight=120&originWidth=900&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

其中,右边的式子就是 ELBO,等号在 7.期望最大 - 图26%3DCq(z)#card=math&code=%C2%A0p%28x%2Cz%7C%5Ctheta%29%3DCq%28z%29#crop=0&crop=0&crop=1&crop=1&id=Ga8kD&originHeight=26&originWidth=154&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 时成立。于是:

7.期望最大 - 图27dz%3D%5Cfrac%7B1%7D%7BC%7D%5Cint_zp(x%2Cz%7C%5Ctheta)dz%3D%5Cfrac%7B1%7D%7BC%7Dp(x%7C%5Ctheta)%3D1%5C%5C%0A%5CRightarrow%20q(z)%3D%5Cfrac%7B1%7D%7Bp(x%7C%5Ctheta)%7Dp(x%2Cz%7C%5Ctheta)%3Dp(z%7Cx%2C%5Ctheta)%0A#card=math&code=%5Cint_zq%28z%29dz%3D%5Cfrac%7B1%7D%7BC%7D%5Cint_zp%28x%2Cz%7C%5Ctheta%29dz%3D%5Cfrac%7B1%7D%7BC%7Dp%28x%7C%5Ctheta%29%3D1%5C%5C%0A%5CRightarrow%20q%28z%29%3D%5Cfrac%7B1%7D%7Bp%28x%7C%5Ctheta%29%7Dp%28x%2Cz%7C%5Ctheta%29%3Dp%28z%7Cx%2C%5Ctheta%29%0A#crop=0&crop=0&crop=1&crop=1&id=rPJnq&originHeight=110&originWidth=900&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

我们发现,这个过程就是上面的最大值取等号的条件。

广义 EM

EM 模型解决了概率生成模型的参数估计的问题,通过引入隐变量 7.期望最大 - 图28,来学习 7.期望最大 - 图29,具体的模型对 7.期望最大 - 图30 有不同的假设。对学习任务 7.期望最大 - 图31#card=math&code=p%28x%7C%5Ctheta%29#crop=0&crop=0&crop=1&crop=1&id=bDFQa&originHeight=26&originWidth=55&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),就是学习任务 7.期望最大 - 图32%7D%7Bp(z%7Cx%2C%5Ctheta)%7D#card=math&code=%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bp%28z%7Cx%2C%5Ctheta%29%7D#crop=0&crop=0&crop=1&crop=1&id=vCIIL&originHeight=59&originWidth=81&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。在这个式子中,我们假定了在 E 步骤中,7.期望最大 - 图33%3Dp(z%7Cx%2C%5Ctheta)#card=math&code=q%28z%29%3Dp%28z%7Cx%2C%5Ctheta%29#crop=0&crop=0&crop=1&crop=1&id=sh8Fe&originHeight=26&originWidth=137&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),但是这个7.期望最大 - 图34#card=math&code=p%28z%7Cx%2C%5Ctheta%29#crop=0&crop=0&crop=1&crop=1&id=iuWnE&originHeight=26&originWidth=74&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 如果无法求解,那么必须使用采样(MCMC)或者变分推断等方法来近似推断这个后验。我们观察 KL 散度的表达式,为了最大化 ELBO,在固定的 7.期望最大 - 图35 时,我们需要最小化 KL 散度,于是:

7.期望最大 - 图36%3D%5Cmathop%7Bargmin%7D_qKL(p%2Cq)%3D%5Cmathop%7Bargmax%7D_qELBO%0A#card=math&code=%5Chat%7Bq%7D%28z%29%3D%5Cmathop%7Bargmin%7D_qKL%28p%2Cq%29%3D%5Cmathop%7Bargmax%7D_qELBO%0A#crop=0&crop=0&crop=1&crop=1&id=qnFmY&originHeight=42&originWidth=382&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

这就是广义 EM 的基本思路:

  1. E step:7.期望最大 - 图37%3D%5Cmathop%7Bargmax%7D_q%5Cint_zq%5Et(z)%5Clog%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)%7D%7Bq%5Et(z)%7Ddz%2Cfixed%5C%20%5Ctheta%0A#card=math&code=%5Chat%7Bq%7D%5E%7Bt%2B1%7D%28z%29%3D%5Cmathop%7Bargmax%7D_q%5Cint_zq%5Et%28z%29%5Clog%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bq%5Et%28z%29%7Ddz%2Cfixed%5C%20%5Ctheta%0A#crop=0&crop=0&crop=1&crop=1&id=l13mR&originHeight=59&originWidth=447&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
  2. M step:7.期望最大 - 图38%5Clog%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)%7D%7Bq%5E%7Bt%2B1%7D(z)%7Ddz%2Cfixed%5C%20%5Chat%7Bq%7D%0A#card=math&code=%5Chat%7B%5Ctheta%7D%3D%5Cmathop%7Bargmax%7D_%5Ctheta%20%5Cint_zq%5E%7Bt%2B1%7D%28z%29%5Clog%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bq%5E%7Bt%2B1%7D%28z%29%7Ddz%2Cfixed%5C%20%5Chat%7Bq%7D%0A#crop=0&crop=0&crop=1&crop=1&id=qfFlN&originHeight=59&originWidth=416&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

对于上面的积分:

7.期望最大 - 图39%5Clog%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)%7D%7Bq(z)%7Ddz%3D%5Cmathbb%7BE%7D%7Bq(z)%7D%5Bp(x%2Cz%7C%5Ctheta)%5D%2BEntropy(q(z))%0A#card=math&code=ELBO%3D%5Cint_zq%28z%29%5Clog%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bq%28z%29%7Ddz%3D%5Cmathbb%7BE%7D%7Bq%28z%29%7D%5Bp%28x%2Cz%7C%5Ctheta%29%5D%2BEntropy%28q%28z%29%29%0A#crop=0&crop=0&crop=1&crop=1&id=ClCYf&originHeight=59&originWidth=594&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

因此,我们看到,广义 EM 相当于在原来的式子中加入熵这一项。

EM 的推广

EM 算法类似于坐标上升法,固定部分坐标,优化其他坐标,再一遍一遍的迭代。如果在 EM 框架中,无法求解 7.期望最大 - 图40 后验概率,那么需要采用一些变种的 EM 来估算这个后验。

  1. 基于平均场的变分推断,VBEM/VEM
  2. 基于蒙特卡洛的EM,MCEM