21.配分函数 - 《机器学习》

包含配分函数的 MLE
对比散度-CD Learning
RBM 的学习问题

在学习和推断中，对于一个概率的归一化因子很难处理，这个归一化因子和配分函数相关。假设一个概率分布：

$21.配分函数 - 图1$ %3D%5Cfrac%7B1%7D%7BZ(%5Ctheta)%7D%5Chat%7Bp%7D(x%7C%5Ctheta)%2CZ(%5Ctheta)%3D%5Cint%5Chat%7Bp%7D(x%7C%5Ctheta)dx%0A#card=math&code=p%28x%7C%5Ctheta%29%3D%5Cfrac%7B1%7D%7BZ%28%5Ctheta%29%7D%5Chat%7Bp%7D%28x%7C%5Ctheta%29%2CZ%28%5Ctheta%29%3D%5Cint%5Chat%7Bp%7D%28x%7C%5Ctheta%29dx%0A&height=38&width=263#crop=0&crop=0&crop=1&crop=1&id=vrUS9&originHeight=54&originWidth=369&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

包含配分函数的 MLE

在学习任务中，采用最大似然：

$21.配分函数 - 图2$ %3D%5Cmathop%7Bargmax%7D%5Ctheta%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20p(xi%7C%5Ctheta)%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20%5Chat%7Bp%7D(x%7C%5Ctheta)-N%5Clog%20Z(%5Ctheta)%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%7B%5Ctheta%7D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20%5Chat%7Bp%7D(x%7C%5Ctheta)-%5Clog%20Z(%5Ctheta)%3D%5Cmathop%7Bargmax%7D%5Ctheta%20l(%5Ctheta)%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0A%5Chat%7B%5Ctheta%7D%26%3D%5Cmathop%7Bargmax%7D%7B%5Ctheta%7Dp%28x%7C%5Ctheta%29%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20p%28x_i%7C%5Ctheta%29%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20%5Chat%7Bp%7D%28x%7C%5Ctheta%29-N%5Clog%20Z%28%5Ctheta%29%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%7B%5Ctheta%7D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20%5Chat%7Bp%7D%28x%7C%5Ctheta%29-%5Clog%20Z%28%5Ctheta%29%3D%5Cmathop%7Bargmax%7D%5Ctheta%20l%28%5Ctheta%29%0A%5Cend%7Balign%7D%0A&height=142&width=361#crop=0&crop=0&crop=1&crop=1&id=iUqSd&originHeight=200&originWidth=505&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

求导：

$21.配分函数 - 图3$ %26%3D%5Cfrac%7B1%7D%7BZ(%5Ctheta)%7D%5Cnabla%5Ctheta%20Z(%5Ctheta)%5Cnonumber%5C%5C%0A%26%3D%5Cfrac%7Bp(x%7C%5Ctheta)%7D%7B%5Chat%7Bp%7D(x%7C%5Ctheta)%7D%5Cint%5Cnabla%5Ctheta%20%5Chat%7Bp%7D(x%7C%5Ctheta)dx%5Cnonumber%5C%5C%0A%26%3D%5Cint%5Cfrac%7Bp(x%7C%5Ctheta)%7D%7B%5Chat%7Bp%7D(x%7C%5Ctheta)%7D%5Cnabla%5Ctheta%5Chat%7Bp%7D(x%7C%5Ctheta)dx%5Cnonumber%5C%5C%0A%26%3D%5Cmathbb%7BE%7D%7Bp(x%7C%5Ctheta)%7D%5B%5Cnabla%5Ctheta%5Clog%5Chat%7Bp%7D(x%7C%5Ctheta)%5D%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%5Cnabla%5Ctheta%5Clog%20Z%28%5Ctheta%29%26%3D%5Cfrac%7B1%7D%7BZ%28%5Ctheta%29%7D%5Cnabla%5Ctheta%20Z%28%5Ctheta%29%5Cnonumber%5C%5C%0A%26%3D%5Cfrac%7Bp%28x%7C%5Ctheta%29%7D%7B%5Chat%7Bp%7D%28x%7C%5Ctheta%29%7D%5Cint%5Cnabla%5Ctheta%20%5Chat%7Bp%7D%28x%7C%5Ctheta%29dx%5Cnonumber%5C%5C%0A%26%3D%5Cint%5Cfrac%7Bp%28x%7C%5Ctheta%29%7D%7B%5Chat%7Bp%7D%28x%7C%5Ctheta%29%7D%5Cnabla%5Ctheta%5Chat%7Bp%7D%28x%7C%5Ctheta%29dx%5Cnonumber%5C%5C%0A%26%3D%5Cmathbb%7BE%7D%7Bp%28x%7C%5Ctheta%29%7D%5B%5Cnabla_%5Ctheta%5Clog%5Chat%7Bp%7D%28x%7C%5Ctheta%29%5D%0A%5Cend%7Balign%7D%0A&height=144&width=237#crop=0&crop=0&crop=1&crop=1&id=wszws&originHeight=203&originWidth=332&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

由于这个表达式和未知的概率相关，于是无法直接精确求解，需要近似采样，如果没有这一项，那么可以采用梯度下降，但是存在配分函数就无法直接采用梯度下降了。

上面这个期望值，是对模型假设的概率分布，定义真实概率分布为 $21.配分函数 - 图4$ ，于是， $21.配分函数 - 图5$ #card=math&code=l%28%5Ctheta%29&height=18&width=23#crop=0&crop=0&crop=1&crop=1&id=xWeOo&originHeight=26&originWidth=32&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 中的第一项的梯度可以看成是从这个概率分布中采样出来的 $21.配分函数 - 图6$ 个点求和平均，可以近似期望值。

$21.配分函数 - 图7$ %3D%5Cmathbb%7BE%7D%7Bp%7Bdata%7D%7D%5B%5Cnabla%5Ctheta%5Clog%5Chat%7Bp%7D(x%7C%5Ctheta)%5D-%5Cmathbb%7BE%7D%7Bp(x%7C%5Ctheta)%7D%5B%5Cnabla%5Ctheta%5Clog%5Chat%7Bp%7D(x%7C%5Ctheta)%5D%0A#card=math&code=%5Cnabla%5Ctheta%20l%28%5Ctheta%29%3D%5Cmathbb%7BE%7D%7Bp%7Bdata%7D%7D%5B%5Cnabla%5Ctheta%5Clog%5Chat%7Bp%7D%28x%7C%5Ctheta%29%5D-%5Cmathbb%7BE%7D%7Bp%28x%7C%5Ctheta%29%7D%5B%5Cnabla_%5Ctheta%5Clog%5Chat%7Bp%7D%28x%7C%5Ctheta%29%5D%0A&height=20&width=337#crop=0&crop=0&crop=1&crop=1&id=cOuWD&originHeight=29&originWidth=471&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

于是，相当于真实分布和模型假设越接近越好。上面这个式子第一项叫做正相，第二项叫做负相。为了得到负相的值，需要采用各种采样方法，如 MCMC。

采样得到 $21.配分函数 - 图8$ #card=math&code=%5Chat%7Bx%7D%7B1-m%7D%5Csim%20p%7Bmodel%7D%28x%7C%5Ctheta%5Et%29&height=19&width=126#crop=0&crop=0&crop=1&crop=1&id=djAJO&originHeight=27&originWidth=177&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)，那么：

$21.配分函数 - 图9$ -%5Csum%5Climits%7Bi%3D1%7D%5Em%5Cnabla%5Ctheta%5Clog%20%5Chat%7Bp%7D(%5Chat%7Bxi%7D%7C%5Ctheta%5Et))%0A#card=math&code=%5Ctheta%5E%7Bt%2B1%7D%3D%5Ctheta%5Et%2B%5Ceta%28%5Csum%5Climits%7Bi%3D1%7D%5Em%5Cnabla%5Ctheta%20%5Clog%20%5Chat%7Bp%7D%28x_i%7C%5Ctheta%5Et%29-%5Csum%5Climits%7Bi%3D1%7D%5Em%5Cnabla_%5Ctheta%5Clog%20%5Chat%7Bp%7D%28%5Chat%7Bx_i%7D%7C%5Ctheta%5Et%29%29%0A&height=44&width=349#crop=0&crop=0&crop=1&crop=1&id=JCN0R&originHeight=62&originWidth=488&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

这个算法也叫做基于 MCMC 采样的梯度上升。每次通过采样得到的样本叫做幻想粒子，如果这些幻想粒子区域的概率高于实际分布，那么最大化参数的结果就是降低这些部分的概率。

对比散度-CD Learning

上面对于负相的采样，最大的问题是，采样到达平稳分布的步骤数量是未知的。对比散度的方法，是对上述的采样是的初始值作出限制，直接采样 $21.配分函数 - 图10$ ，这样可以缩短采样的混合时间。这个算法叫做 CD-k 算法， $21.配分函数 - 图11$ 就是初始化后进行的演化时间，很多时候，即使 $21.配分函数 - 图12$ 也是可以的。

我们看 MLE 的表达式：

$21.配分函数 - 图13$ %3D%5Cmathop%7Bargmax%7D%7B%5Ctheta%7D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20p(xi%7C%5Ctheta)%3D%5Cmathbb%7BE%7D%7Bp%7Bdata%7D%7D%5B%5Clog%20p%7Bmodel%7D(x%7C%5Ctheta)%5D%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cint%20p%7Bdata%7D%5Clog%20p%7Bmodel%7Ddx%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cint%20p%7Bdata%7D%5Clog%20%5Cfrac%7Bp%7Bmodel%7D%7D%7Bp%7Bdata%7D%7Ddx%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmin%7D%5Ctheta%20KL(p%7Bdata%7D%7C%7Cp%7Bmodel%7D)%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%5Chat%7B%5Ctheta%7D%26%3D%5Cmathop%7Bargmax%7D%7B%5Ctheta%7Dp%28x%7C%5Ctheta%29%3D%5Cmathop%7Bargmax%7D%7B%5Ctheta%7D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20p%28x_i%7C%5Ctheta%29%3D%5Cmathbb%7BE%7D%7Bp%7Bdata%7D%7D%5B%5Clog%20p%7Bmodel%7D%28x%7C%5Ctheta%29%5D%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cint%20p%7Bdata%7D%5Clog%20p%7Bmodel%7Ddx%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cint%20p%7Bdata%7D%5Clog%20%5Cfrac%7Bp%7Bmodel%7D%7D%7Bp%7Bdata%7D%7Ddx%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmin%7D%5Ctheta%20KL%28p%7Bdata%7D%7C%7Cp%7Bmodel%7D%29%0A%5Cend%7Balign%7D%0A&height=157&width=464#crop=0&crop=0&crop=1&crop=1&id=wGA3S&originHeight=221&originWidth=649&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

对于 CD-k 的采样过程，可以将初始值这些点表示为：

$21.配分函数 - 图14$

而我们的模型需要采样过程达到平稳分布：

21.配分函数 - 图15

因此，我们需要的是 21.配分函数 - 图16 。定义 CD：

21.配分函数 - 图17

这就是 CD-k 算法第 $21.配分函数 - 图18$ 次采样的目标函数。

RBM 的学习问题

RBM 的参数为：

$21.配分函数 - 图19$ %5ET%5C%5C%0Av%3D(v1%2C%5Ccdots%2Cv_n)%5ET%5C%5C%0Aw%3D(w%7Bij%7D)%7Bmn%7D%5C%5C%0A%5Calpha%3D(%5Calpha_1%2C%5Ccdots%2C%5Calpha_n)%5ET%5C%5C%0A%5Cbeta%3D(%5Cbeta_1%2C%5Ccdots%2C%5Cbeta_m)%5ET%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0Ah%3D%28h_1%2C%5Ccdots%2Ch_m%29%5ET%5C%5C%0Av%3D%28v_1%2C%5Ccdots%2Cv_n%29%5ET%5C%5C%0Aw%3D%28w%7Bij%7D%29_%7Bmn%7D%5C%5C%0A%5Calpha%3D%28%5Calpha_1%2C%5Ccdots%2C%5Calpha_n%29%5ET%5C%5C%0A%5Cbeta%3D%28%5Cbeta_1%2C%5Ccdots%2C%5Cbeta_m%29%5ET%0A%5Cend%7Balign%7D%0A&height=106&width=122#crop=0&crop=0&crop=1&crop=1&id=ah59r&originHeight=149&originWidth=171&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

学习问题关注的概率分布为：

$21.配分函数 - 图20$ %26%3D%5Clog%5Csum%5Climits%7Bh%7Dp(h%2Cv)%5Cnonumber%5C%5C%0A%26%3D%5Clog%5Csum%5Climits_h%5Cfrac%7B1%7D%7BZ%7D%5Cexp(-E(v%2Ch))%5Cnonumber%5C%5C%0A%26%3D%5Clog%5Csum%5Climits%7Bh%7D%5Cexp(-E(v%2Ch))-%5Clog%5Csum%5Climits%7Bv%2Ch%7D%5Cexp(-E(h%2Cv))%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0A%5Clog%20p%28v%29%26%3D%5Clog%5Csum%5Climits%7Bh%7Dp%28h%2Cv%29%5Cnonumber%5C%5C%0A%26%3D%5Clog%5Csum%5Climitsh%5Cfrac%7B1%7D%7BZ%7D%5Cexp%28-E%28v%2Ch%29%29%5Cnonumber%5C%5C%0A%26%3D%5Clog%5Csum%5Climits%7Bh%7D%5Cexp%28-E%28v%2Ch%29%29-%5Clog%5Csum%5Climits_%7Bv%2Ch%7D%5Cexp%28-E%28h%2Cv%29%29%0A%5Cend%7Balign%7D%0A&height=116&width=366#crop=0&crop=0&crop=1&crop=1&id=cA1GB&originHeight=164&originWidth=512&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

对上面这个式子求导第一项：

$21.配分函数 - 图21$ )%7D%7B%5Cpartial%5Ctheta%7D%3D-%5Cfrac%7B%5Csum%5Climitsh%5Cexp(-E(v%2Ch))%5Cfrac%7B%5Cpartial%20E(v%2Ch)%7D%7B%5Cpartial%5Ctheta%7D%7D%7B%5Csum%5Climits%7Bh%7D%5Cexp(-E(v%2Ch))%7D%5C%5C%0A%3D-%5Csum%5Climitsh%5Cfrac%7B%5Cexp(-E(v%2Ch))%5Cfrac%7B%5Cpartial%20E(v%2Ch)%7D%7B%5Cpartial%5Ctheta%7D%7D%7B%5Csum%5Climits%7Bh%7D%5Cexp(-E(v%2Ch))%7D%3D-%5Csum%5Climitshp(h%7Cv)%5Cfrac%7B%5Cpartial%20E(v%2Ch)%7D%7B%5Cpartial%5Ctheta%7D%0A#card=math&code=%5Cfrac%7B%5Cpartial%20%5Clog%5Csum%5Climits%7Bh%7D%5Cexp%28-E%28v%2Ch%29%29%7D%7B%5Cpartial%5Ctheta%7D%3D-%5Cfrac%7B%5Csum%5Climitsh%5Cexp%28-E%28v%2Ch%29%29%5Cfrac%7B%5Cpartial%20E%28v%2Ch%29%7D%7B%5Cpartial%5Ctheta%7D%7D%7B%5Csum%5Climits%7Bh%7D%5Cexp%28-E%28v%2Ch%29%29%7D%5C%5C%0A%3D-%5Csum%5Climitsh%5Cfrac%7B%5Cexp%28-E%28v%2Ch%29%29%5Cfrac%7B%5Cpartial%20E%28v%2Ch%29%7D%7B%5Cpartial%5Ctheta%7D%7D%7B%5Csum%5Climits%7Bh%7D%5Cexp%28-E%28v%2Ch%29%29%7D%3D-%5Csum%5Climits_hp%28h%7Cv%29%5Cfrac%7B%5Cpartial%20E%28v%2Ch%29%7D%7B%5Cpartial%5Ctheta%7D%0A&height=135&width=643#crop=0&crop=0&crop=1&crop=1&id=oND7G&originHeight=189&originWidth=900&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

第二项：

$21.配分函数 - 图22$ )%7D%7B%5Cpartial%5Ctheta%7D%3D-%5Csum%5Climits%7Bh%2Cv%7D%5Cfrac%7B%5Cexp(-E(v%2Ch))%5Cfrac%7B%5Cpartial%20E(v%2Ch)%7D%7B%5Cpartial%5Ctheta%7D%7D%7B%5Csum%5Climits%7Bh%2Cv%7D%5Cexp(-E(v%2Ch))%7D%3D-%5Csum%5Climits%7Bv%2Ch%7Dp(v%2Ch)%5Cfrac%7B%5Cpartial%20E(v%2Ch)%7D%7B%5Cpartial%5Ctheta%7D%0A#card=math&code=%5Cfrac%7B%5Cpartial%20%5Clog%5Csum%5Climits%7Bv%2Ch%7D%5Cexp%28-E%28h%2Cv%29%29%7D%7B%5Cpartial%5Ctheta%7D%3D-%5Csum%5Climits%7Bh%2Cv%7D%5Cfrac%7B%5Cexp%28-E%28v%2Ch%29%29%5Cfrac%7B%5Cpartial%20E%28v%2Ch%29%7D%7B%5Cpartial%5Ctheta%7D%7D%7B%5Csum%5Climits%7Bh%2Cv%7D%5Cexp%28-E%28v%2Ch%29%29%7D%3D-%5Csum%5Climits_%7Bv%2Ch%7Dp%28v%2Ch%29%5Cfrac%7B%5Cpartial%20E%28v%2Ch%29%7D%7B%5Cpartial%5Ctheta%7D%0A&height=67&width=504#crop=0&crop=0&crop=1&crop=1&id=Mj81Z&originHeight=95&originWidth=705&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

所以有：

$21.配分函数 - 图23$ %3D-%5Csum%5Climitshp(h%7Cv)%5Cfrac%7B%5Cpartial%20E(v%2Ch)%7D%7B%5Cpartial%5Ctheta%7D%2B%5Csum%5Climits%7Bv%2Ch%7Dp(v%2Ch)%5Cfrac%7B%5Cpartial%20E(v%2Ch)%7D%7B%5Cpartial%5Ctheta%7D%0A#card=math&code=%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%5Ctheta%7D%5Clog%20p%28v%29%3D-%5Csum%5Climitshp%28h%7Cv%29%5Cfrac%7B%5Cpartial%20E%28v%2Ch%29%7D%7B%5Cpartial%5Ctheta%7D%2B%5Csum%5Climits%7Bv%2Ch%7Dp%28v%2Ch%29%5Cfrac%7B%5Cpartial%20E%28v%2Ch%29%7D%7B%5Cpartial%5Ctheta%7D%0A&height=45&width=369#crop=0&crop=0&crop=1&crop=1&id=BpoLR&originHeight=65&originWidth=516&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

将 RBM 的模型假设代入：

$21.配分函数 - 图24$ %3D-(h%5ETwv%2B%5Calpha%5ETv%2B%5Cbeta%5ETh)%0A#card=math&code=E%28v%2Ch%29%3D-%28h%5ETwv%2B%5Calpha%5ETv%2B%5Cbeta%5ETh%29%0A&height=20&width=213#crop=0&crop=0&crop=1&crop=1&id=xftWY&originHeight=29&originWidth=298&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

$21.配分函数 - 图25$ ：
$21.配分函数 - 图26$ %3D-hiv_j%0A#card=math&code=%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20w%7Bij%7D%7DE%28v%2Ch%29%3D-h_iv_j%0A&height=39&width=138#crop=0&crop=0&crop=1&crop=1&id=fKaMR&originHeight=56&originWidth=193&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

于是：

$21.配分函数 - 图27$ %3D%5Csum%5Climits%7Bh%7Dp(h%7Cv)h_iv_j-%5Csum%5Climits%7Bh%2Cv%7Dp(h%2Cv)hiv_j%0A#card=math&code=%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%5Ctheta%7D%5Clog%20p%28v%29%3D%5Csum%5Climits%7Bh%7Dp%28h%7Cv%29hiv_j-%5Csum%5Climits%7Bh%2Cv%7Dp%28h%2Cv%29h_iv_j%0A&height=44&width=289#crop=0&crop=0&crop=1&crop=1&id=fw431&originHeight=62&originWidth=406&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

第一项：

$21.配分函数 - 图28$ hiv_j%3D%5Csum%5Climits%7Bhi%7Dp(h_i%7Cv)h_iv_j%3Dp(h_i%3D1%7Cv)v_j%0A#card=math&code=%5Csum%5Climits%7Bh1%2Ch_2%2C%5Ccdots%2Ch_m%7Dp%28h_1%2Ch_2%2C%5Ccdots%2Ch_m%7Cv%29h_iv_j%3D%5Csum%5Climits%7Bh_i%7Dp%28h_i%7Cv%29h_iv_j%3Dp%28h_i%3D1%7Cv%29v_j%0A&height=37&width=428#crop=0&crop=0&crop=1&crop=1&id=tSQux&originHeight=53&originWidth=600&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

这里假设了 $21.配分函数 - 图29$ 是二元变量。

第二项：

$21.配分函数 - 图30$ hiv_j%3D%5Csum%5Climits%7Bh%2Cv%7Dp(v)p(h%7Cv)hiv_j%3D%5Csum%5Climits_vp(v)p(h_i%3D1%7Cv)v_j%0A#card=math&code=%5Csum%5Climits%7Bh%2Cv%7Dp%28h%2Cv%29hiv_j%3D%5Csum%5Climits%7Bh%2Cv%7Dp%28v%29p%28h%7Cv%29h_iv_j%3D%5Csum%5Climits_vp%28v%29p%28h_i%3D1%7Cv%29v_j%0A&height=37&width=384#crop=0&crop=0&crop=1&crop=1&id=vhnG5&originHeight=53&originWidth=538&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

这个求和是指数阶的，于是需要采样解决，我么使用 CD-k 方法。

对于第一项，可以直接使用训练样本得到，第二项采用 CD-k 采样方法，首先使用样本 $21.配分函数 - 图31$ ，然后采样得到 $21.配分函数 - 图32$ ，然后采样得到 $21.配分函数 - 图33$ ，这样顺次进行，最终得到 $21.配分函数 - 图34$ ，对于每个样本都得到一个 $21.配分函数 - 图35$ ，最终采样得到 $21.配分函数 - 图36$ 个 $v^k $，于是第二项就是：
$21.配分函数 - 图37$ v_j%5Ek%0A#card=math&code=p%28h_i%3D1%7Cv%5Ek%29v_j%5Ek%0A&height=23&width=93#crop=0&crop=0&crop=1&crop=1&id=Pch14&originHeight=33&originWidth=131&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

具体的算法为：

对每一个样本中的，进行采样：
1. 使用这个样本初始化采样
2. 进行次采样（0-k-1）：
  1. $21.配分函数 - 图40$ #card=math&code=h_i%5El%5Csim%20p%28h_i%7Cv%5El%29&height=21&width=82#crop=0&crop=0&crop=1&crop=1&id=bCMrB&originHeight=30&originWidth=116&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
  2. $21.配分函数 - 图41$ #card=math&code=v_i%5E%7Bl%2B1%7D%5Csim%20p%28v_i%7Ch%5El%29&height=21&width=94#crop=0&crop=0&crop=1&crop=1&id=O5Yyg&originHeight=30&originWidth=132&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
3. 将这些采样出来的结果累加进梯度中
重复进行上述过程，最终的梯度除以 $21.配分函数 - 图42$