7.期望最大 - 《机器学习》

广义 EM
EM 的推广

期望最大算法的目的是解决具有隐变量的混合模型的参数估计（极大似然估计）。MLE 对 $7.期望最大 - 图1$ #card=math&code=p%28x%7C%5Ctheta%29#crop=0&crop=0&crop=1&crop=1&id=BP9T5&originHeight=26&originWidth=55&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 参数的估计记为： $7.期望最大 - 图2$ #card=math&code=%5Ctheta%7BMLE%7D%3D%5Cmathop%7Bargmax%7D%5Climits%5Ctheta%5Clog%20p%28x%7C%5Ctheta%29#crop=0&crop=0&crop=1&crop=1&id=i7VFy&originHeight=41&originWidth=237&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。EM 算法对这个问题的解决方法是采用迭代的方法：

$7.期望最大 - 图3$ %5Dp(z%7Cx%2C%5Ctheta%5Et)dz%3D%5Cmathbb%7BE%7D%7Bz%7Cx%2C%5Ctheta%5Et%7D%5B%5Clog%20p(x%2Cz%7C%5Ctheta)%5D%0A#card=math&code=%5Ctheta%5E%7Bt%2B1%7D%3D%5Cmathop%7Bargmax%7D%5Climits%7B%5Ctheta%7D%5Cintz%5Clog%20%5Bp%28x%2Cz%7C%5Ctheta%29%5Dp%28z%7Cx%2C%5Ctheta%5Et%29dz%3D%5Cmathbb%7BE%7D%7Bz%7Cx%2C%5Ctheta%5Et%7D%5B%5Clog%20p%28x%2Cz%7C%5Ctheta%29%5D%0A#crop=0&crop=0&crop=1&crop=1&id=Dv5Rf&originHeight=53&originWidth=573&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

这个公式包含了迭代的两步：

E step：计算 $7.期望最大 - 图4$ #card=math&code=%5Clog%20p%28x%2Cz%7C%5Ctheta%29#crop=0&crop=0&crop=1&crop=1&id=HOLcs&originHeight=26&originWidth=104&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 在概率分布 $7.期望最大 - 图5$ #card=math&code=p%28z%7Cx%2C%5Ctheta%5Et%29#crop=0&crop=0&crop=1&crop=1&id=zIlSb&originHeight=27&originWidth=82&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 下的期望
M step：计算使这个期望最大化的参数得到下一个 EM 步骤的输入

求证： $7.期望最大 - 图6$ %5Cle%5Clog%20p(x%7C%5Ctheta%5E%7Bt%2B1%7D)#card=math&code=%5Clog%20p%28x%7C%5Ctheta%5Et%29%5Cle%5Clog%20p%28x%7C%5Ctheta%5E%7Bt%2B1%7D%29#crop=0&crop=0&crop=1&crop=1&id=rxbnB&originHeight=29&originWidth=231&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

证明： $7.期望最大 - 图7$ %3D%5Clog%20p(z%2Cx%7C%5Ctheta)-%5Clog%20p(z%7Cx%2C%5Ctheta)#card=math&code=%5Clog%20p%28x%7C%5Ctheta%29%3D%5Clog%20p%28z%2Cx%7C%5Ctheta%29-%5Clog%20p%28z%7Cx%2C%5Ctheta%29#crop=0&crop=0&crop=1&crop=1&id=JcSUd&originHeight=26&originWidth=346&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)，对左右两边求积分：

$7.期望最大 - 图8$ %5Clog%20p(x%7C%5Ctheta)dz%3D%5Clog%20p(x%7C%5Ctheta)%0A#card=math&code=Left%3A%5Cint_zp%28z%7Cx%2C%5Ctheta%5Et%29%5Clog%20p%28x%7C%5Ctheta%29dz%3D%5Clog%20p%28x%7C%5Ctheta%29%0A#crop=0&crop=0&crop=1&crop=1&id=Bfvs5&originHeight=51&originWidth=387&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

$7.期望最大 - 图9$ %5Clog%20p(x%2Cz%7C%5Ctheta)dz-%5Cint_zp(z%7Cx%2C%5Ctheta%5Et)%5Clog%20p(z%7Cx%2C%5Ctheta)dz%3DQ(%5Ctheta%2C%5Ctheta%5Et)-H(%5Ctheta%2C%5Ctheta%5Et)%0A#card=math&code=Right%3A%5Cint_zp%28z%7Cx%2C%5Ctheta%5Et%29%5Clog%20p%28x%2Cz%7C%5Ctheta%29dz-%5Cint_zp%28z%7Cx%2C%5Ctheta%5Et%29%5Clog%20p%28z%7Cx%2C%5Ctheta%29dz%3DQ%28%5Ctheta%2C%5Ctheta%5Et%29-H%28%5Ctheta%2C%5Ctheta%5Et%29%0A#crop=0&crop=0&crop=1&crop=1&id=yng09&originHeight=51&originWidth=756&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

所以：

$7.期望最大 - 图10$ %3DQ(%5Ctheta%2C%5Ctheta%5Et)-H(%5Ctheta%2C%5Ctheta%5Et)%0A#card=math&code=%5Clog%20p%28x%7C%5Ctheta%29%3DQ%28%5Ctheta%2C%5Ctheta%5Et%29-H%28%5Ctheta%2C%5Ctheta%5Et%29%0A#crop=0&crop=0&crop=1&crop=1&id=tIvRA&originHeight=27&originWidth=279&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

由于 $7.期望最大 - 图11$ %3D%5Cintzp(z%7Cx%2C%5Ctheta%5Et)%5Clog%20p(x%2Cz%7C%5Ctheta)dz#card=math&code=Q%28%5Ctheta%2C%5Ctheta%5Et%29%3D%5Cint_zp%28z%7Cx%2C%5Ctheta%5Et%29%5Clog%20p%28x%2Cz%7C%5Ctheta%29dz#crop=0&crop=0&crop=1&crop=1&id=jLKPo&originHeight=51&originWidth=330&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)，而 ![](https://g.yuque.com/gr/latex?%5Ctheta%5E%7Bt%2B1%7D%3D%5Cmathop%7Bargmax%7D%5Climits%7B%5Ctheta%7D%5Cintz%5Clog%20%5Bp(x%2Cz%7C%5Ctheta)%5Dp(z%7Cx%2C%5Ctheta%5Et)dz#card=math&code=%5Ctheta%5E%7Bt%2B1%7D%3D%5Cmathop%7Bargmax%7D%5Climits%7B%5Ctheta%7D%5Cint_z%5Clog%20%5Bp%28x%2Cz%7C%5Ctheta%29%5Dp%28z%7Cx%2C%5Ctheta%5Et%29dz#crop=0&crop=0&crop=1&crop=1&id=k0IVm&originHeight=53&originWidth=377&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)，所以 $7.期望最大 - 图12$ %5Cge%20Q(%5Ctheta%5Et%2C%5Ctheta%5Et)#card=math&code=Q%28%5Ctheta%5E%7Bt%2B1%7D%2C%5Ctheta%5Et%29%5Cge%20Q%28%5Ctheta%5Et%2C%5Ctheta%5Et%29#crop=0&crop=0&crop=1&crop=1&id=LVvVI&originHeight=29&originWidth=200&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。要证 $7.期望最大 - 图13$ %5Cle%5Clog%20p(x%7C%5Ctheta%5E%7Bt%2B1%7D)#card=math&code=%5Clog%20p%28x%7C%5Ctheta%5Et%29%5Cle%5Clog%20p%28x%7C%5Ctheta%5E%7Bt%2B1%7D%29#crop=0&crop=0&crop=1&crop=1&id=fRHQM&originHeight=29&originWidth=231&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)，需证： $7.期望最大 - 图14$ %5Cge%20H(%5Ctheta%5E%7Bt%2B1%7D%2C%5Ctheta%5Et)#card=math&code=H%28%5Ctheta%5Et%2C%5Ctheta%5Et%29%5Cge%20H%28%5Ctheta%5E%7Bt%2B1%7D%2C%5Ctheta%5Et%29#crop=0&crop=0&crop=1&crop=1&id=JKC2B&originHeight=29&originWidth=204&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)：

$7.期望最大 - 图15$ -H(%5Ctheta%5E%7Bt%7D%2C%5Ctheta%5Et)%26%3D%5Cint_zp(z%7Cx%2C%5Ctheta%5E%7Bt%7D)%5Clog%20p(z%7Cx%2C%5Ctheta%5E%7Bt%2B1%7D)dz-%5Cint_zp(z%7Cx%2C%5Ctheta%5Et)%5Clog%20p(z%7Cx%2C%5Ctheta%5E%7Bt%7D)dz%5Cnonumber%5C%5C%0A%26%3D%5Cint_zp(z%7Cx%2C%5Ctheta%5Et)%5Clog%5Cfrac%7Bp(z%7Cx%2C%5Ctheta%5E%7Bt%2B1%7D)%7D%7Bp(z%7Cx%2C%5Ctheta%5Et)%7D%3D-KL(p(z%7Cx%2C%5Ctheta%5Et)%2Cp(z%7Cx%2C%5Ctheta%5E%7Bt%2B1%7D))%5Cle0%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7DH%28%5Ctheta%5E%7Bt%2B1%7D%2C%5Ctheta%5Et%29-H%28%5Ctheta%5E%7Bt%7D%2C%5Ctheta%5Et%29%26%3D%5Cint_zp%28z%7Cx%2C%5Ctheta%5E%7Bt%7D%29%5Clog%20p%28z%7Cx%2C%5Ctheta%5E%7Bt%2B1%7D%29dz-%5Cint_zp%28z%7Cx%2C%5Ctheta%5Et%29%5Clog%20p%28z%7Cx%2C%5Ctheta%5E%7Bt%7D%29dz%5Cnonumber%5C%5C%0A%26%3D%5Cint_zp%28z%7Cx%2C%5Ctheta%5Et%29%5Clog%5Cfrac%7Bp%28z%7Cx%2C%5Ctheta%5E%7Bt%2B1%7D%29%7D%7Bp%28z%7Cx%2C%5Ctheta%5Et%29%7D%3D-KL%28p%28z%7Cx%2C%5Ctheta%5Et%29%2Cp%28z%7Cx%2C%5Ctheta%5E%7Bt%2B1%7D%29%29%5Cle0%0A%5Cend%7Balign%7D%0A#crop=0&crop=0&crop=1&crop=1&id=i1CPR&originHeight=113&originWidth=804&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

综合上面的结果：

$7.期望最大 - 图16$ %5Cle%5Clog%20p(x%7C%5Ctheta%5E%7Bt%2B1%7D)%0A#card=math&code=%5Clog%20p%28x%7C%5Ctheta%5Et%29%5Cle%5Clog%20p%28x%7C%5Ctheta%5E%7Bt%2B1%7D%29%0A#crop=0&crop=0&crop=1&crop=1&id=G3HXg&originHeight=29&originWidth=231&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

根据上面的证明，我们看到，似然函数在每一步都会增大。进一步的，我们看 EM 迭代过程中的式子是怎么来的：

$7.期望最大 - 图17$ %3D%5Clog%20p(z%2Cx%7C%5Ctheta)-%5Clog%20p(z%7Cx%2C%5Ctheta)%3D%5Clog%20%5Cfrac%7Bp(z%2Cx%7C%5Ctheta)%7D%7Bq(z)%7D-%5Clog%20%5Cfrac%7Bp(z%7Cx%2C%5Ctheta)%7D%7Bq(z)%7D%0A#card=math&code=%5Clog%20p%28x%7C%5Ctheta%29%3D%5Clog%20p%28z%2Cx%7C%5Ctheta%29-%5Clog%20p%28z%7Cx%2C%5Ctheta%29%3D%5Clog%20%5Cfrac%7Bp%28z%2Cx%7C%5Ctheta%29%7D%7Bq%28z%29%7D-%5Clog%20%5Cfrac%7Bp%28z%7Cx%2C%5Ctheta%29%7D%7Bq%28z%29%7D%0A#crop=0&crop=0&crop=1&crop=1&id=JIhrN&originHeight=59&originWidth=622&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

分别对两边求期望 $7.期望最大 - 图18$ %7D#card=math&code=%5Cmathbb%7BE%7D_%7Bq%28z%29%7D#crop=0&crop=0&crop=1&crop=1&id=wuPJl&originHeight=27&originWidth=41&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)：

$7.期望最大 - 图19$ %5Clog%20p(x%7C%5Ctheta)dz%3D%5Clog%20p(x%7C%5Ctheta)%5C%5C%0A%26Right%3A%5Cint_zq(z)%5Clog%20%5Cfrac%7Bp(z%2Cx%7C%5Ctheta)%7D%7Bq(z)%7Ddz-%5Cint_zq(z)%5Clog%20%5Cfrac%7Bp(z%7Cx%2C%5Ctheta)%7D%7Bq(z)%7Ddz%3DELBO%2BKL(q(z)%2Cp(z%7Cx%2C%5Ctheta))%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0A%26Left%3A%5Cint_zq%28z%29%5Clog%20p%28x%7C%5Ctheta%29dz%3D%5Clog%20p%28x%7C%5Ctheta%29%5C%5C%0A%26Right%3A%5Cint_zq%28z%29%5Clog%20%5Cfrac%7Bp%28z%2Cx%7C%5Ctheta%29%7D%7Bq%28z%29%7Ddz-%5Cint_zq%28z%29%5Clog%20%5Cfrac%7Bp%28z%7Cx%2C%5Ctheta%29%7D%7Bq%28z%29%7Ddz%3DELBO%2BKL%28q%28z%29%2Cp%28z%7Cx%2C%5Ctheta%29%29%0A%5Cend%7Balign%7D%0A#crop=0&crop=0&crop=1&crop=1&id=aIzeN&originHeight=113&originWidth=776&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

上式中，Evidence Lower Bound(ELBO)，是一个下界，所以 $7.期望最大 - 图20$ %5Cge%20ELBO#card=math&code=%5Clog%20p%28x%7C%5Ctheta%29%5Cge%20ELBO#crop=0&crop=0&crop=1&crop=1&id=TXCEx&originHeight=26&originWidth=175&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)，等于号取在 KL 散度为0是，即： $7.期望最大 - 图21$ %3Dp(z%7Cx%2C%5Ctheta)#card=math&code=q%28z%29%3Dp%28z%7Cx%2C%5Ctheta%29#crop=0&crop=0&crop=1&crop=1&id=W6KRd&originHeight=26&originWidth=137&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)，EM 算法的目的是将 ELBO 最大化，根据上面的证明过程，在每一步 EM 后，求得了最大的ELBO，并根据这个使 ELBO 最大的参数代入下一步中：

$7.期望最大 - 图22$ %5Clog%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)%7D%7Bq(z)%7Ddz%0A#card=math&code=%5Chat%7B%5Ctheta%7D%3D%5Cmathop%7Bargmax%7D%7B%5Ctheta%7DELBO%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cint_zq%28z%29%5Clog%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bq%28z%29%7Ddz%0A#crop=0&crop=0&crop=1&crop=1&id=Xd1Zz&originHeight=59&originWidth=476&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

由于 $7.期望最大 - 图23$ %3Dp(z%7Cx%2C%5Ctheta%5Et)#card=math&code=%C2%A0q%28z%29%3Dp%28z%7Cx%2C%5Ctheta%5Et%29#crop=0&crop=0&crop=1&crop=1&id=lJw4H&originHeight=27&originWidth=145&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 的时候，这一步的最大值才能取等号，所以：

$7.期望最大 - 图24$ %5Clog%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)%7D%7Bq(z)%7Ddz%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cint_zp(z%7Cx%2C%5Ctheta%5Et)%5Clog%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)%7D%7Bp(z%7Cx%2C%5Ctheta%5Et)%7Dd%20z%5C%5C%0A%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cintz%20p(z%7Cx%2C%5Ctheta%5Et)%5Clog%20p(x%2Cz%7C%5Ctheta)%0A#card=math&code=%5Chat%7B%5Ctheta%7D%3D%5Cmathop%7Bargmax%7D%7B%5Ctheta%7DELBO%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cint_zq%28z%29%5Clog%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bq%28z%29%7Ddz%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cintzp%28z%7Cx%2C%5Ctheta%5Et%29%5Clog%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bp%28z%7Cx%2C%5Ctheta%5Et%29%7Dd%20z%5C%5C%0A%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cint_z%20p%28z%7Cx%2C%5Ctheta%5Et%29%5Clog%20p%28x%2Cz%7C%5Ctheta%29%0A#crop=0&crop=0&crop=1&crop=1&id=riTVi&originHeight=116&originWidth=900&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

这个式子就是上面 EM 迭代过程中的式子。

从 Jensen 不等式出发，也可以导出这个式子：

$7.期望最大 - 图25$ %3D%5Clog%5Cintzp(x%2Cz%7C%5Ctheta)dz%3D%5Clog%5Cint_z%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)q(z)%7D%7Bq(z)%7Ddz%5C%5C%0A%3D%5Clog%20%5Cmathbb%7BE%7D%7Bq(z)%7D%5B%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)%7D%7Bq(z)%7D%5D%5Cge%20%5Cmathbb%7BE%7D%7Bq(z)%7D%5B%5Clog%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)%7D%7Bq(z)%7D%5D%0A#card=math&code=%5Clog%20p%28x%7C%5Ctheta%29%3D%5Clog%5Cint_zp%28x%2Cz%7C%5Ctheta%29dz%3D%5Clog%5Cint_z%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29q%28z%29%7D%7Bq%28z%29%7Ddz%5C%5C%0A%3D%5Clog%20%5Cmathbb%7BE%7D%7Bq%28z%29%7D%5B%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bq%28z%29%7D%5D%5Cge%20%5Cmathbb%7BE%7D_%7Bq%28z%29%7D%5B%5Clog%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bq%28z%29%7D%5D%0A#crop=0&crop=0&crop=1&crop=1&id=IXI0j&originHeight=120&originWidth=900&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

其中，右边的式子就是 ELBO，等号在 $7.期望最大 - 图26$ %3DCq(z)#card=math&code=%C2%A0p%28x%2Cz%7C%5Ctheta%29%3DCq%28z%29#crop=0&crop=0&crop=1&crop=1&id=Ga8kD&originHeight=26&originWidth=154&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 时成立。于是：

$7.期望最大 - 图27$ dz%3D%5Cfrac%7B1%7D%7BC%7D%5Cint_zp(x%2Cz%7C%5Ctheta)dz%3D%5Cfrac%7B1%7D%7BC%7Dp(x%7C%5Ctheta)%3D1%5C%5C%0A%5CRightarrow%20q(z)%3D%5Cfrac%7B1%7D%7Bp(x%7C%5Ctheta)%7Dp(x%2Cz%7C%5Ctheta)%3Dp(z%7Cx%2C%5Ctheta)%0A#card=math&code=%5Cint_zq%28z%29dz%3D%5Cfrac%7B1%7D%7BC%7D%5Cint_zp%28x%2Cz%7C%5Ctheta%29dz%3D%5Cfrac%7B1%7D%7BC%7Dp%28x%7C%5Ctheta%29%3D1%5C%5C%0A%5CRightarrow%20q%28z%29%3D%5Cfrac%7B1%7D%7Bp%28x%7C%5Ctheta%29%7Dp%28x%2Cz%7C%5Ctheta%29%3Dp%28z%7Cx%2C%5Ctheta%29%0A#crop=0&crop=0&crop=1&crop=1&id=rPJnq&originHeight=110&originWidth=900&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

我们发现，这个过程就是上面的最大值取等号的条件。

广义 EM

EM 模型解决了概率生成模型的参数估计的问题，通过引入隐变量 $7.期望最大 - 图28$ ，来学习 $7.期望最大 - 图29$ ，具体的模型对 $7.期望最大 - 图30$ 有不同的假设。对学习任务 $7.期望最大 - 图31$ #card=math&code=p%28x%7C%5Ctheta%29#crop=0&crop=0&crop=1&crop=1&id=bDFQa&originHeight=26&originWidth=55&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)，就是学习任务 $7.期望最大 - 图32$ %7D%7Bp(z%7Cx%2C%5Ctheta)%7D#card=math&code=%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bp%28z%7Cx%2C%5Ctheta%29%7D#crop=0&crop=0&crop=1&crop=1&id=vCIIL&originHeight=59&originWidth=81&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。在这个式子中，我们假定了在 E 步骤中， $7.期望最大 - 图33$ %3Dp(z%7Cx%2C%5Ctheta)#card=math&code=q%28z%29%3Dp%28z%7Cx%2C%5Ctheta%29#crop=0&crop=0&crop=1&crop=1&id=sh8Fe&originHeight=26&originWidth=137&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)，但是这个 $7.期望最大 - 图34$ #card=math&code=p%28z%7Cx%2C%5Ctheta%29#crop=0&crop=0&crop=1&crop=1&id=iuWnE&originHeight=26&originWidth=74&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 如果无法求解，那么必须使用采样（MCMC）或者变分推断等方法来近似推断这个后验。我们观察 KL 散度的表达式，为了最大化 ELBO，在固定的 $7.期望最大 - 图35$ 时，我们需要最小化 KL 散度，于是：

$7.期望最大 - 图36$ %3D%5Cmathop%7Bargmin%7D_qKL(p%2Cq)%3D%5Cmathop%7Bargmax%7D_qELBO%0A#card=math&code=%5Chat%7Bq%7D%28z%29%3D%5Cmathop%7Bargmin%7D_qKL%28p%2Cq%29%3D%5Cmathop%7Bargmax%7D_qELBO%0A#crop=0&crop=0&crop=1&crop=1&id=qnFmY&originHeight=42&originWidth=382&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

这就是广义 EM 的基本思路：

E step： $7.期望最大 - 图37$ %3D%5Cmathop%7Bargmax%7D_q%5Cint_zq%5Et(z)%5Clog%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)%7D%7Bq%5Et(z)%7Ddz%2Cfixed%5C%20%5Ctheta%0A#card=math&code=%5Chat%7Bq%7D%5E%7Bt%2B1%7D%28z%29%3D%5Cmathop%7Bargmax%7D_q%5Cint_zq%5Et%28z%29%5Clog%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bq%5Et%28z%29%7Ddz%2Cfixed%5C%20%5Ctheta%0A#crop=0&crop=0&crop=1&crop=1&id=l13mR&originHeight=59&originWidth=447&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
M step： $7.期望最大 - 图38$ %5Clog%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)%7D%7Bq%5E%7Bt%2B1%7D(z)%7Ddz%2Cfixed%5C%20%5Chat%7Bq%7D%0A#card=math&code=%5Chat%7B%5Ctheta%7D%3D%5Cmathop%7Bargmax%7D_%5Ctheta%20%5Cint_zq%5E%7Bt%2B1%7D%28z%29%5Clog%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bq%5E%7Bt%2B1%7D%28z%29%7Ddz%2Cfixed%5C%20%5Chat%7Bq%7D%0A#crop=0&crop=0&crop=1&crop=1&id=qfFlN&originHeight=59&originWidth=416&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

对于上面的积分：

$7.期望最大 - 图39$ %5Clog%5Cfrac%7Bp(x%2Cz%7C%5Ctheta)%7D%7Bq(z)%7Ddz%3D%5Cmathbb%7BE%7D%7Bq(z)%7D%5Bp(x%2Cz%7C%5Ctheta)%5D%2BEntropy(q(z))%0A#card=math&code=ELBO%3D%5Cint_zq%28z%29%5Clog%5Cfrac%7Bp%28x%2Cz%7C%5Ctheta%29%7D%7Bq%28z%29%7Ddz%3D%5Cmathbb%7BE%7D%7Bq%28z%29%7D%5Bp%28x%2Cz%7C%5Ctheta%29%5D%2BEntropy%28q%28z%29%29%0A#crop=0&crop=0&crop=1&crop=1&id=ClCYf&originHeight=59&originWidth=594&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

因此，我们看到，广义 EM 相当于在原来的式子中加入熵这一项。

EM 的推广

EM 算法类似于坐标上升法，固定部分坐标，优化其他坐标，再一遍一遍的迭代。如果在 EM 框架中，无法求解 $7.期望最大 - 图40$ 后验概率，那么需要采用一些变种的 EM 来估算这个后验。

基于平均场的变分推断，VBEM/VEM
基于蒙特卡洛的EM，MCEM