在学习和推断中,对于一个概率的归一化因子很难处理,这个归一化因子和配分函数相关。假设一个概率分布:

21.配分函数 - 图1%3D%5Cfrac%7B1%7D%7BZ(%5Ctheta)%7D%5Chat%7Bp%7D(x%7C%5Ctheta)%2CZ(%5Ctheta)%3D%5Cint%5Chat%7Bp%7D(x%7C%5Ctheta)dx%0A#card=math&code=p%28x%7C%5Ctheta%29%3D%5Cfrac%7B1%7D%7BZ%28%5Ctheta%29%7D%5Chat%7Bp%7D%28x%7C%5Ctheta%29%2CZ%28%5Ctheta%29%3D%5Cint%5Chat%7Bp%7D%28x%7C%5Ctheta%29dx%0A&height=38&width=263#crop=0&crop=0&crop=1&crop=1&id=vrUS9&originHeight=54&originWidth=369&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

包含配分函数的 MLE

在学习任务中,采用最大似然:

21.配分函数 - 图2%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20p(xi%7C%5Ctheta)%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20%5Chat%7Bp%7D(x%7C%5Ctheta)-N%5Clog%20Z(%5Ctheta)%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%7B%5Ctheta%7D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20%5Chat%7Bp%7D(x%7C%5Ctheta)-%5Clog%20Z(%5Ctheta)%3D%5Cmathop%7Bargmax%7D%5Ctheta%20l(%5Ctheta)%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0A%5Chat%7B%5Ctheta%7D%26%3D%5Cmathop%7Bargmax%7D%7B%5Ctheta%7Dp%28x%7C%5Ctheta%29%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20p%28x_i%7C%5Ctheta%29%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20%5Chat%7Bp%7D%28x%7C%5Ctheta%29-N%5Clog%20Z%28%5Ctheta%29%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%7B%5Ctheta%7D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20%5Chat%7Bp%7D%28x%7C%5Ctheta%29-%5Clog%20Z%28%5Ctheta%29%3D%5Cmathop%7Bargmax%7D%5Ctheta%20l%28%5Ctheta%29%0A%5Cend%7Balign%7D%0A&height=142&width=361#crop=0&crop=0&crop=1&crop=1&id=iUqSd&originHeight=200&originWidth=505&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

求导:

21.配分函数 - 图3%26%3D%5Cfrac%7B1%7D%7BZ(%5Ctheta)%7D%5Cnabla%5Ctheta%20Z(%5Ctheta)%5Cnonumber%5C%5C%0A%26%3D%5Cfrac%7Bp(x%7C%5Ctheta)%7D%7B%5Chat%7Bp%7D(x%7C%5Ctheta)%7D%5Cint%5Cnabla%5Ctheta%20%5Chat%7Bp%7D(x%7C%5Ctheta)dx%5Cnonumber%5C%5C%0A%26%3D%5Cint%5Cfrac%7Bp(x%7C%5Ctheta)%7D%7B%5Chat%7Bp%7D(x%7C%5Ctheta)%7D%5Cnabla%5Ctheta%5Chat%7Bp%7D(x%7C%5Ctheta)dx%5Cnonumber%5C%5C%0A%26%3D%5Cmathbb%7BE%7D%7Bp(x%7C%5Ctheta)%7D%5B%5Cnabla%5Ctheta%5Clog%5Chat%7Bp%7D(x%7C%5Ctheta)%5D%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%5Cnabla%5Ctheta%5Clog%20Z%28%5Ctheta%29%26%3D%5Cfrac%7B1%7D%7BZ%28%5Ctheta%29%7D%5Cnabla%5Ctheta%20Z%28%5Ctheta%29%5Cnonumber%5C%5C%0A%26%3D%5Cfrac%7Bp%28x%7C%5Ctheta%29%7D%7B%5Chat%7Bp%7D%28x%7C%5Ctheta%29%7D%5Cint%5Cnabla%5Ctheta%20%5Chat%7Bp%7D%28x%7C%5Ctheta%29dx%5Cnonumber%5C%5C%0A%26%3D%5Cint%5Cfrac%7Bp%28x%7C%5Ctheta%29%7D%7B%5Chat%7Bp%7D%28x%7C%5Ctheta%29%7D%5Cnabla%5Ctheta%5Chat%7Bp%7D%28x%7C%5Ctheta%29dx%5Cnonumber%5C%5C%0A%26%3D%5Cmathbb%7BE%7D%7Bp%28x%7C%5Ctheta%29%7D%5B%5Cnabla_%5Ctheta%5Clog%5Chat%7Bp%7D%28x%7C%5Ctheta%29%5D%0A%5Cend%7Balign%7D%0A&height=144&width=237#crop=0&crop=0&crop=1&crop=1&id=wszws&originHeight=203&originWidth=332&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

由于这个表达式和未知的概率相关,于是无法直接精确求解,需要近似采样,如果没有这一项,那么可以采用梯度下降,但是存在配分函数就无法直接采用梯度下降了。

上面这个期望值,是对模型假设的概率分布,定义真实概率分布为 21.配分函数 - 图4,于是,21.配分函数 - 图5#card=math&code=l%28%5Ctheta%29&height=18&width=23#crop=0&crop=0&crop=1&crop=1&id=xWeOo&originHeight=26&originWidth=32&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 中的第一项的梯度可以看成是从这个概率分布中采样出来的 21.配分函数 - 图6 个点求和平均,可以近似期望值。

21.配分函数 - 图7%3D%5Cmathbb%7BE%7D%7Bp%7Bdata%7D%7D%5B%5Cnabla%5Ctheta%5Clog%5Chat%7Bp%7D(x%7C%5Ctheta)%5D-%5Cmathbb%7BE%7D%7Bp(x%7C%5Ctheta)%7D%5B%5Cnabla%5Ctheta%5Clog%5Chat%7Bp%7D(x%7C%5Ctheta)%5D%0A#card=math&code=%5Cnabla%5Ctheta%20l%28%5Ctheta%29%3D%5Cmathbb%7BE%7D%7Bp%7Bdata%7D%7D%5B%5Cnabla%5Ctheta%5Clog%5Chat%7Bp%7D%28x%7C%5Ctheta%29%5D-%5Cmathbb%7BE%7D%7Bp%28x%7C%5Ctheta%29%7D%5B%5Cnabla_%5Ctheta%5Clog%5Chat%7Bp%7D%28x%7C%5Ctheta%29%5D%0A&height=20&width=337#crop=0&crop=0&crop=1&crop=1&id=cOuWD&originHeight=29&originWidth=471&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

于是,相当于真实分布和模型假设越接近越好。上面这个式子第一项叫做正相,第二项叫做负相。为了得到负相的值,需要采用各种采样方法,如 MCMC。

采样得到 21.配分函数 - 图8#card=math&code=%5Chat%7Bx%7D%7B1-m%7D%5Csim%20p%7Bmodel%7D%28x%7C%5Ctheta%5Et%29&height=19&width=126#crop=0&crop=0&crop=1&crop=1&id=djAJO&originHeight=27&originWidth=177&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),那么:

21.配分函数 - 图9-%5Csum%5Climits%7Bi%3D1%7D%5Em%5Cnabla%5Ctheta%5Clog%20%5Chat%7Bp%7D(%5Chat%7Bxi%7D%7C%5Ctheta%5Et))%0A#card=math&code=%5Ctheta%5E%7Bt%2B1%7D%3D%5Ctheta%5Et%2B%5Ceta%28%5Csum%5Climits%7Bi%3D1%7D%5Em%5Cnabla%5Ctheta%20%5Clog%20%5Chat%7Bp%7D%28x_i%7C%5Ctheta%5Et%29-%5Csum%5Climits%7Bi%3D1%7D%5Em%5Cnabla_%5Ctheta%5Clog%20%5Chat%7Bp%7D%28%5Chat%7Bx_i%7D%7C%5Ctheta%5Et%29%29%0A&height=44&width=349#crop=0&crop=0&crop=1&crop=1&id=JCN0R&originHeight=62&originWidth=488&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

这个算法也叫做基于 MCMC 采样的梯度上升。每次通过采样得到的样本叫做幻想粒子,如果这些幻想粒子区域的概率高于实际分布,那么最大化参数的结果就是降低这些部分的概率。

对比散度-CD Learning

上面对于负相的采样,最大的问题是,采样到达平稳分布的步骤数量是未知的。对比散度的方法,是对上述的采样是的初始值作出限制,直接采样 21.配分函数 - 图10,这样可以缩短采样的混合时间。这个算法叫做 CD-k 算法,21.配分函数 - 图11 就是初始化后进行的演化时间,很多时候,即使 21.配分函数 - 图12 也是可以的。

我们看 MLE 的表达式:

21.配分函数 - 图13%3D%5Cmathop%7Bargmax%7D%7B%5Ctheta%7D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20p(xi%7C%5Ctheta)%3D%5Cmathbb%7BE%7D%7Bp%7Bdata%7D%7D%5B%5Clog%20p%7Bmodel%7D(x%7C%5Ctheta)%5D%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cint%20p%7Bdata%7D%5Clog%20p%7Bmodel%7Ddx%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cint%20p%7Bdata%7D%5Clog%20%5Cfrac%7Bp%7Bmodel%7D%7D%7Bp%7Bdata%7D%7Ddx%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmin%7D%5Ctheta%20KL(p%7Bdata%7D%7C%7Cp%7Bmodel%7D)%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%5Chat%7B%5Ctheta%7D%26%3D%5Cmathop%7Bargmax%7D%7B%5Ctheta%7Dp%28x%7C%5Ctheta%29%3D%5Cmathop%7Bargmax%7D%7B%5Ctheta%7D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20p%28x_i%7C%5Ctheta%29%3D%5Cmathbb%7BE%7D%7Bp%7Bdata%7D%7D%5B%5Clog%20p%7Bmodel%7D%28x%7C%5Ctheta%29%5D%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cint%20p%7Bdata%7D%5Clog%20p%7Bmodel%7Ddx%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Cint%20p%7Bdata%7D%5Clog%20%5Cfrac%7Bp%7Bmodel%7D%7D%7Bp%7Bdata%7D%7Ddx%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmin%7D%5Ctheta%20KL%28p%7Bdata%7D%7C%7Cp%7Bmodel%7D%29%0A%5Cend%7Balign%7D%0A&height=157&width=464#crop=0&crop=0&crop=1&crop=1&id=wGA3S&originHeight=221&originWidth=649&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

对于 CD-k 的采样过程,可以将初始值这些点表示为:

21.配分函数 - 图14

而我们的模型需要采样过程达到平稳分布:

21.配分函数 - 图15

因此,我们需要的是 21.配分函数 - 图16。定义 CD:

21.配分函数 - 图17

这就是 CD-k 算法第 21.配分函数 - 图18 次采样的目标函数。

RBM 的学习问题

RBM 的参数为:

21.配分函数 - 图19%5ET%5C%5C%0Av%3D(v1%2C%5Ccdots%2Cv_n)%5ET%5C%5C%0Aw%3D(w%7Bij%7D)%7Bmn%7D%5C%5C%0A%5Calpha%3D(%5Calpha_1%2C%5Ccdots%2C%5Calpha_n)%5ET%5C%5C%0A%5Cbeta%3D(%5Cbeta_1%2C%5Ccdots%2C%5Cbeta_m)%5ET%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0Ah%3D%28h_1%2C%5Ccdots%2Ch_m%29%5ET%5C%5C%0Av%3D%28v_1%2C%5Ccdots%2Cv_n%29%5ET%5C%5C%0Aw%3D%28w%7Bij%7D%29_%7Bmn%7D%5C%5C%0A%5Calpha%3D%28%5Calpha_1%2C%5Ccdots%2C%5Calpha_n%29%5ET%5C%5C%0A%5Cbeta%3D%28%5Cbeta_1%2C%5Ccdots%2C%5Cbeta_m%29%5ET%0A%5Cend%7Balign%7D%0A&height=106&width=122#crop=0&crop=0&crop=1&crop=1&id=ah59r&originHeight=149&originWidth=171&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

学习问题关注的概率分布为:

21.配分函数 - 图20%26%3D%5Clog%5Csum%5Climits%7Bh%7Dp(h%2Cv)%5Cnonumber%5C%5C%0A%26%3D%5Clog%5Csum%5Climits_h%5Cfrac%7B1%7D%7BZ%7D%5Cexp(-E(v%2Ch))%5Cnonumber%5C%5C%0A%26%3D%5Clog%5Csum%5Climits%7Bh%7D%5Cexp(-E(v%2Ch))-%5Clog%5Csum%5Climits%7Bv%2Ch%7D%5Cexp(-E(h%2Cv))%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0A%5Clog%20p%28v%29%26%3D%5Clog%5Csum%5Climits%7Bh%7Dp%28h%2Cv%29%5Cnonumber%5C%5C%0A%26%3D%5Clog%5Csum%5Climitsh%5Cfrac%7B1%7D%7BZ%7D%5Cexp%28-E%28v%2Ch%29%29%5Cnonumber%5C%5C%0A%26%3D%5Clog%5Csum%5Climits%7Bh%7D%5Cexp%28-E%28v%2Ch%29%29-%5Clog%5Csum%5Climits_%7Bv%2Ch%7D%5Cexp%28-E%28h%2Cv%29%29%0A%5Cend%7Balign%7D%0A&height=116&width=366#crop=0&crop=0&crop=1&crop=1&id=cA1GB&originHeight=164&originWidth=512&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

对上面这个式子求导第一项:

21.配分函数 - 图21)%7D%7B%5Cpartial%5Ctheta%7D%3D-%5Cfrac%7B%5Csum%5Climitsh%5Cexp(-E(v%2Ch))%5Cfrac%7B%5Cpartial%20E(v%2Ch)%7D%7B%5Cpartial%5Ctheta%7D%7D%7B%5Csum%5Climits%7Bh%7D%5Cexp(-E(v%2Ch))%7D%5C%5C%0A%3D-%5Csum%5Climitsh%5Cfrac%7B%5Cexp(-E(v%2Ch))%5Cfrac%7B%5Cpartial%20E(v%2Ch)%7D%7B%5Cpartial%5Ctheta%7D%7D%7B%5Csum%5Climits%7Bh%7D%5Cexp(-E(v%2Ch))%7D%3D-%5Csum%5Climitshp(h%7Cv)%5Cfrac%7B%5Cpartial%20E(v%2Ch)%7D%7B%5Cpartial%5Ctheta%7D%0A#card=math&code=%5Cfrac%7B%5Cpartial%20%5Clog%5Csum%5Climits%7Bh%7D%5Cexp%28-E%28v%2Ch%29%29%7D%7B%5Cpartial%5Ctheta%7D%3D-%5Cfrac%7B%5Csum%5Climitsh%5Cexp%28-E%28v%2Ch%29%29%5Cfrac%7B%5Cpartial%20E%28v%2Ch%29%7D%7B%5Cpartial%5Ctheta%7D%7D%7B%5Csum%5Climits%7Bh%7D%5Cexp%28-E%28v%2Ch%29%29%7D%5C%5C%0A%3D-%5Csum%5Climitsh%5Cfrac%7B%5Cexp%28-E%28v%2Ch%29%29%5Cfrac%7B%5Cpartial%20E%28v%2Ch%29%7D%7B%5Cpartial%5Ctheta%7D%7D%7B%5Csum%5Climits%7Bh%7D%5Cexp%28-E%28v%2Ch%29%29%7D%3D-%5Csum%5Climits_hp%28h%7Cv%29%5Cfrac%7B%5Cpartial%20E%28v%2Ch%29%7D%7B%5Cpartial%5Ctheta%7D%0A&height=135&width=643#crop=0&crop=0&crop=1&crop=1&id=oND7G&originHeight=189&originWidth=900&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

第二项:

21.配分函数 - 图22)%7D%7B%5Cpartial%5Ctheta%7D%3D-%5Csum%5Climits%7Bh%2Cv%7D%5Cfrac%7B%5Cexp(-E(v%2Ch))%5Cfrac%7B%5Cpartial%20E(v%2Ch)%7D%7B%5Cpartial%5Ctheta%7D%7D%7B%5Csum%5Climits%7Bh%2Cv%7D%5Cexp(-E(v%2Ch))%7D%3D-%5Csum%5Climits%7Bv%2Ch%7Dp(v%2Ch)%5Cfrac%7B%5Cpartial%20E(v%2Ch)%7D%7B%5Cpartial%5Ctheta%7D%0A#card=math&code=%5Cfrac%7B%5Cpartial%20%5Clog%5Csum%5Climits%7Bv%2Ch%7D%5Cexp%28-E%28h%2Cv%29%29%7D%7B%5Cpartial%5Ctheta%7D%3D-%5Csum%5Climits%7Bh%2Cv%7D%5Cfrac%7B%5Cexp%28-E%28v%2Ch%29%29%5Cfrac%7B%5Cpartial%20E%28v%2Ch%29%7D%7B%5Cpartial%5Ctheta%7D%7D%7B%5Csum%5Climits%7Bh%2Cv%7D%5Cexp%28-E%28v%2Ch%29%29%7D%3D-%5Csum%5Climits_%7Bv%2Ch%7Dp%28v%2Ch%29%5Cfrac%7B%5Cpartial%20E%28v%2Ch%29%7D%7B%5Cpartial%5Ctheta%7D%0A&height=67&width=504#crop=0&crop=0&crop=1&crop=1&id=Mj81Z&originHeight=95&originWidth=705&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

所以有:

21.配分函数 - 图23%3D-%5Csum%5Climitshp(h%7Cv)%5Cfrac%7B%5Cpartial%20E(v%2Ch)%7D%7B%5Cpartial%5Ctheta%7D%2B%5Csum%5Climits%7Bv%2Ch%7Dp(v%2Ch)%5Cfrac%7B%5Cpartial%20E(v%2Ch)%7D%7B%5Cpartial%5Ctheta%7D%0A#card=math&code=%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%5Ctheta%7D%5Clog%20p%28v%29%3D-%5Csum%5Climitshp%28h%7Cv%29%5Cfrac%7B%5Cpartial%20E%28v%2Ch%29%7D%7B%5Cpartial%5Ctheta%7D%2B%5Csum%5Climits%7Bv%2Ch%7Dp%28v%2Ch%29%5Cfrac%7B%5Cpartial%20E%28v%2Ch%29%7D%7B%5Cpartial%5Ctheta%7D%0A&height=45&width=369#crop=0&crop=0&crop=1&crop=1&id=BpoLR&originHeight=65&originWidth=516&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

将 RBM 的模型假设代入:

21.配分函数 - 图24%3D-(h%5ETwv%2B%5Calpha%5ETv%2B%5Cbeta%5ETh)%0A#card=math&code=E%28v%2Ch%29%3D-%28h%5ETwv%2B%5Calpha%5ETv%2B%5Cbeta%5ETh%29%0A&height=20&width=213#crop=0&crop=0&crop=1&crop=1&id=xftWY&originHeight=29&originWidth=298&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

  1. 21.配分函数 - 图25

  2. 21.配分函数 - 图26%3D-hiv_j%0A#card=math&code=%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%20w%7Bij%7D%7DE%28v%2Ch%29%3D-h_iv_j%0A&height=39&width=138#crop=0&crop=0&crop=1&crop=1&id=fKaMR&originHeight=56&originWidth=193&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)


于是:

21.配分函数 - 图27%3D%5Csum%5Climits%7Bh%7Dp(h%7Cv)h_iv_j-%5Csum%5Climits%7Bh%2Cv%7Dp(h%2Cv)hiv_j%0A#card=math&code=%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%5Ctheta%7D%5Clog%20p%28v%29%3D%5Csum%5Climits%7Bh%7Dp%28h%7Cv%29hiv_j-%5Csum%5Climits%7Bh%2Cv%7Dp%28h%2Cv%29h_iv_j%0A&height=44&width=289#crop=0&crop=0&crop=1&crop=1&id=fw431&originHeight=62&originWidth=406&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)


第一项:

21.配分函数 - 图28hiv_j%3D%5Csum%5Climits%7Bhi%7Dp(h_i%7Cv)h_iv_j%3Dp(h_i%3D1%7Cv)v_j%0A#card=math&code=%5Csum%5Climits%7Bh1%2Ch_2%2C%5Ccdots%2Ch_m%7Dp%28h_1%2Ch_2%2C%5Ccdots%2Ch_m%7Cv%29h_iv_j%3D%5Csum%5Climits%7Bh_i%7Dp%28h_i%7Cv%29h_iv_j%3Dp%28h_i%3D1%7Cv%29v_j%0A&height=37&width=428#crop=0&crop=0&crop=1&crop=1&id=tSQux&originHeight=53&originWidth=600&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)


这里假设了 21.配分函数 - 图29 是二元变量。

第二项:

21.配分函数 - 图30hiv_j%3D%5Csum%5Climits%7Bh%2Cv%7Dp(v)p(h%7Cv)hiv_j%3D%5Csum%5Climits_vp(v)p(h_i%3D1%7Cv)v_j%0A#card=math&code=%5Csum%5Climits%7Bh%2Cv%7Dp%28h%2Cv%29hiv_j%3D%5Csum%5Climits%7Bh%2Cv%7Dp%28v%29p%28h%7Cv%29h_iv_j%3D%5Csum%5Climits_vp%28v%29p%28h_i%3D1%7Cv%29v_j%0A&height=37&width=384#crop=0&crop=0&crop=1&crop=1&id=vhnG5&originHeight=53&originWidth=538&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)


这个求和是指数阶的,于是需要采样解决,我么使用 CD-k 方法。

对于第一项,可以直接使用训练样本得到,第二项采用 CD-k 采样方法,首先使用样本 21.配分函数 - 图31,然后采样得到 21.配分函数 - 图32,然后采样得到 21.配分函数 - 图33,这样顺次进行,最终得到 21.配分函数 - 图34,对于每个样本都得到一个 21.配分函数 - 图35,最终采样得到 21.配分函数 - 图36 个 $v^k $,于是第二项就是:
21.配分函数 - 图37v_j%5Ek%0A#card=math&code=p%28h_i%3D1%7Cv%5Ek%29v_j%5Ek%0A&height=23&width=93#crop=0&crop=0&crop=1&crop=1&id=Pch14&originHeight=33&originWidth=131&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)


具体的算法为:

  1. 对每一个样本中的 21.配分函数 - 图38,进行采样:
    1. 使用这个样本初始化采样
    2. 进行 21.配分函数 - 图39 次采样(0-k-1):
      1. 21.配分函数 - 图40#card=math&code=h_i%5El%5Csim%20p%28h_i%7Cv%5El%29&height=21&width=82#crop=0&crop=0&crop=1&crop=1&id=bCMrB&originHeight=30&originWidth=116&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
      2. 21.配分函数 - 图41#card=math&code=v_i%5E%7Bl%2B1%7D%5Csim%20p%28v_i%7Ch%5El%29&height=21&width=94#crop=0&crop=0&crop=1&crop=1&id=O5Yyg&originHeight=30&originWidth=132&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
    3. 将这些采样出来的结果累加进梯度中
  2. 重复进行上述过程,最终的梯度除以 21.配分函数 - 图42