- 推断
- #card=math&code=p%28h%7Cv%29&height=18&width=39#crop=0&crop=0&crop=1&crop=1&id=Pv0ki&originHeight=26&originWidth=56&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)">
#card=math&code=p%28h%7Cv%29&height=18&width=39#crop=0&crop=0&crop=1&crop=1&id=Pv0ki&originHeight=26&originWidth=56&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
- #card=math&code=p%28v%29&height=18&width=27#crop=0&crop=0&crop=1&crop=1&id=HsH4q&originHeight=26&originWidth=38&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)">
#card=math&code=p%28v%29&height=18&width=27#crop=0&crop=0&crop=1&crop=1&id=HsH4q&originHeight=26&originWidth=38&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
- #card=math&code=p%28h%7Cv%29&height=18&width=39#crop=0&crop=0&crop=1&crop=1&id=Pv0ki&originHeight=26&originWidth=56&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)">
玻尔兹曼机是一种存在隐节点的无向图模型。在图模型中最简单的是朴素贝叶斯模型(朴素贝叶斯假设),引入单个隐变量后,发展出了 GMM,如果单个隐变量变成序列的隐变量,就得到了状态空间模型(引入齐次马尔可夫假设和观测独立假设就有HMM,Kalman Filter,Particle Filter),为了引入观测变量之间的关联,引入了一种最大熵模型-MEMM,为了克服 MEMM 中的局域问题,又引入了 CRF,CRF 是一个无向图,其中,破坏了齐次马尔可夫假设,如果隐变量是一个链式结构,那么又叫线性链 CRF。
在无向图的基础上,引入隐变量得到了玻尔兹曼机,这个图模型的概率密度函数是一个指数族分布。对隐变量和观测变量作出一定的限制,就得到了受限玻尔兹曼机(RBM)。
我们看到,不同的概率图模型对下面几个特点作出假设:
- 方向-边的性质
- 离散/连续/混合-点的性质
- 条件独立性-边的性质
- 隐变量-节点的性质
- 指数族-结构特点
将观测变量和隐变量分别记为 。我们知道,无向图根据最大团的分解,可以写为玻尔兹曼分布的形式
%3D%5Cfrac%7B1%7D%7BZ%7D%5Cprod%5Climits%7Bi%3D1%7D%5EK%5Cpsi_i(x%7Bci%7D)%3D%5Cfrac%7B1%7D%7BZ%7D%5Cexp(-%5Csum%5Climits%7Bi%3D1%7D%5EKE(x%7Bci%7D))#card=math&code=p%28x%29%3D%5Cfrac%7B1%7D%7BZ%7D%5Cprod%5Climits%7Bi%3D1%7D%5EK%5Cpsi_i%28x%7Bci%7D%29%3D%5Cfrac%7B1%7D%7BZ%7D%5Cexp%28-%5Csum%5Climits%7Bi%3D1%7D%5EKE%28x%7Bci%7D%29%29&height=47&width=283#crop=0&crop=0&crop=1&crop=1&id=Sy8ov&originHeight=66&originWidth=397&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),这也是一个指数族分布。
一个玻尔兹曼机存在一系列的问题,在其推断任务中,想要精确推断,是无法进行的,想要近似推断,计算量过大。为了解决这个问题,一种简化的玻尔兹曼机-受限玻尔兹曼机作出了假设,所有隐变量内部以及观测变量内部没有连接,只在隐变量和观测变量之间有连接,这样一来:
%3Dp(h%2Cv)%3D%5Cfrac%7B1%7D%7BZ%7D%5Cexp(-E(v%2Ch))%0A#card=math&code=p%28x%29%3Dp%28h%2Cv%29%3D%5Cfrac%7B1%7D%7BZ%7D%5Cexp%28-E%28v%2Ch%29%29%0A&height=32&width=218#crop=0&crop=0&crop=1&crop=1&id=hgleI&originHeight=47&originWidth=305&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
其中能量函数 #card=math&code=E%28v%2Ch%29&height=18&width=45#crop=0&crop=0&crop=1&crop=1&id=ZYGcx&originHeight=26&originWidth=64&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 可以写出三个部分,包括与节点集合相关的两项以及与边
相关的一项,记为:
%3D-(h%5ETwv%2B%5Calpha%5ET%20v%2B%5Cbeta%5ET%20h)%0A#card=math&code=E%28v%2Ch%29%3D-%28h%5ETwv%2B%5Calpha%5ET%20v%2B%5Cbeta%5ET%20h%29%0A&height=20&width=213#crop=0&crop=0&crop=1&crop=1&id=EOw3d&originHeight=29&originWidth=298&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
所以:
%3D%5Cfrac%7B1%7D%7BZ%7D%5Cexp(h%5ETwv)%5Cexp(%5Calpha%5ET%20v)%5Cexp(%5Cbeta%5ET%20h)%3D%5Cfrac%7B1%7D%7BZ%7D%5Cprod%7Bi%3D1%7D%5Em%5Cprod%7Bj%3D1%7D%5En%5Cexp(hiw%7Bij%7Dvj)%5Cprod%7Bj%3D1%7D%5En%5Cexp(%5Calphajv_j)%5Cprod%7Bi%3D1%7D%5Em%5Cexp(%5Cbetaih_i)%0A#card=math&code=p%28x%29%3D%5Cfrac%7B1%7D%7BZ%7D%5Cexp%28h%5ETwv%29%5Cexp%28%5Calpha%5ET%20v%29%5Cexp%28%5Cbeta%5ET%20h%29%3D%5Cfrac%7B1%7D%7BZ%7D%5Cprod%7Bi%3D1%7D%5Em%5Cprod%7Bj%3D1%7D%5En%5Cexp%28h_iw%7Bij%7Dvj%29%5Cprod%7Bj%3D1%7D%5En%5Cexp%28%5Calphajv_j%29%5Cprod%7Bi%3D1%7D%5Em%5Cexp%28%5Cbeta_ih_i%29%0A&height=45&width=597#crop=0&crop=0&crop=1&crop=1&id=Ryn8j&originHeight=65&originWidth=835&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
上面这个式子也和 RBM 的因子图一一对应。
推断
推断任务包括求后验概率 以及求边缘概率
#card=math&code=p%28v%29&height=18&width=27#crop=0&crop=0&crop=1&crop=1&id=YYRO6&originHeight=26&originWidth=38&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。
#card=math&code=p%28h%7Cv%29&height=18&width=39#crop=0&crop=0&crop=1&crop=1&id=Pv0ki&originHeight=26&originWidth=56&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
对于一个无向图,满足局域的 Markov 性质,即 %3Dp(h_1%7CNeighbour(h_1))%3Dp(h_1%7Cv)#card=math&code=p%28h_1%7Ch-%5C%7Bh_1%5C%7D%2Cv%29%3Dp%28h_1%7CNeighbour%28h_1%29%29%3Dp%28h_1%7Cv%29&height=18&width=332#crop=0&crop=0&crop=1&crop=1&id=e2FIf&originHeight=26&originWidth=465&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。我们可以得到:
%3D%5Cprod%7Bi%3D1%7D%5Emp(h_i%7Cv)%0A#card=math&code=p%28h%7Cv%29%3D%5Cprod%7Bi%3D1%7D%5Emp%28h_i%7Cv%29%0A&height=44&width=125#crop=0&crop=0&crop=1&crop=1&id=fkIw1&originHeight=62&originWidth=176&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
考虑 Binary RBM,所有的隐变量只有两个取值 :
%3D%5Cfrac%7Bp(hl%3D1%2Ch%7B-l%7D%2Cv)%7D%7Bp(h%7B-l%7D%2Cv)%7D%3D%5Cfrac%7Bp(h_l%3D1%2Ch%7B-l%7D%2Cv)%7D%7Bp(hl%3D1%2Ch%7B-l%7D%2Cv)%2Bp(hl%3D0%2Ch%7B-l%7D%2Cv)%7D%0A#card=math&code=p%28hl%3D1%7Cv%29%3D%5Cfrac%7Bp%28h_l%3D1%2Ch%7B-l%7D%2Cv%29%7D%7Bp%28h%7B-l%7D%2Cv%29%7D%3D%5Cfrac%7Bp%28h_l%3D1%2Ch%7B-l%7D%2Cv%29%7D%7Bp%28hl%3D1%2Ch%7B-l%7D%2Cv%29%2Bp%28hl%3D0%2Ch%7B-l%7D%2Cv%29%7D%0A&height=41&width=446#crop=0&crop=0&crop=1&crop=1&id=mgJZh&originHeight=59&originWidth=624&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
将能量函数写成和 相关或不相关的两项:
%3D-(%5Csum%5Climits%7Bi%3D1%2Ci%5Cne%20l%7D%5Em%5Csum%5Climits%7Bj%3D1%7D%5Enhiw%7Bij%7Dvj%2Bh_l%5Csum%5Climits%7Bj%3D1%7D%5Enw%7Blj%7Dv_j%2B%5Csum%5Climits%7Bj%3D1%7D%5En%5Calphaj%20v_j%2B%5Csum%5Climits%7Bi%3D1%2Ci%5Cne%20l%7D%5Em%5Cbetaih_i%2B%5Cbeta_lh_l)%0A#card=math&code=E%28v%2Ch%29%3D-%28%5Csum%5Climits%7Bi%3D1%2Ci%5Cne%20l%7D%5Em%5Csum%5Climits%7Bj%3D1%7D%5Enh_iw%7Bij%7Dvj%2Bh_l%5Csum%5Climits%7Bj%3D1%7D%5Enw%7Blj%7Dv_j%2B%5Csum%5Climits%7Bj%3D1%7D%5En%5Calphaj%20v_j%2B%5Csum%5Climits%7Bi%3D1%2Ci%5Cne%20l%7D%5Em%5Cbeta_ih_i%2B%5Cbeta_lh_l%29%0A&height=47&width=487#crop=0&crop=0&crop=1&crop=1&id=emMH8&originHeight=66&originWidth=682&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
定义:%3Dhl%5Csum%5Climits%7Bj%3D1%7D%5Enw%7Blj%7Dv_j%2B%5Cbeta_lh_l%2C%5Coverline%7BH%7D(h%7B-l%7D%2Cv)%3D%5Csum%5Climits%7Bi%3D1%2Ci%5Cne%20l%7D%5Em%5Csum%5Climits%7Bj%3D1%7D%5Enhiw%7Bij%7Dvj%2B%5Csum%5Climits%7Bj%3D1%7D%5En%5Calphaj%20v_j%2B%5Csum%5Climits%7Bi%3D1%2Ci%5Cne%20l%7D%5Em%5Cbetaih_i#card=math&code=h_lH_l%28v%29%3Dh_l%5Csum%5Climits%7Bj%3D1%7D%5Enw%7Blj%7Dv_j%2B%5Cbeta_lh_l%2C%5Coverline%7BH%7D%28h%7B-l%7D%2Cv%29%3D%5Csum%5Climits%7Bi%3D1%2Ci%5Cne%20l%7D%5Em%5Csum%5Climits%7Bj%3D1%7D%5Enhiw%7Bij%7Dvj%2B%5Csum%5Climits%7Bj%3D1%7D%5En%5Calphaj%20v_j%2B%5Csum%5Climits%7Bi%3D1%2Ci%5Cne%20l%7D%5Em%5Cbeta_ih_i&height=47&width=537#crop=0&crop=0&crop=1&crop=1&id=LkT7b&originHeight=66&originWidth=752&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。
代入,有:
%3D%5Cfrac%7B%5Cexp(Hl(v)%2B%5Coverline%7BH%7D(h%7B-l%7D%2Cv))%7D%7B%5Cexp(Hl(v)%2B%5Coverline%7BH%7D(h%7B-l%7D%2Cv))%2B%5Cexp(%5Coverline%7BH%7D(h%7B-l%7D%2Cv))%7D%3D%5Cfrac%7B1%7D%7B1%2B%5Cexp(-H_l(v))%7D%3D%5Csigma(H_l(v))%0A#card=math&code=p%28h_l%3D1%7Cv%29%3D%5Cfrac%7B%5Cexp%28H_l%28v%29%2B%5Coverline%7BH%7D%28h%7B-l%7D%2Cv%29%29%7D%7B%5Cexp%28Hl%28v%29%2B%5Coverline%7BH%7D%28h%7B-l%7D%2Cv%29%29%2B%5Cexp%28%5Coverline%7BH%7D%28h_%7B-l%7D%2Cv%29%29%7D%3D%5Cfrac%7B1%7D%7B1%2B%5Cexp%28-H_l%28v%29%29%7D%3D%5Csigma%28H_l%28v%29%29%0A&height=48&width=571#crop=0&crop=0&crop=1&crop=1&id=II7Fi&originHeight=68&originWidth=799&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
于是就得到了后验概率。对于 的后验是对称的,所以类似的可以求解。
#card=math&code=p%28v%29&height=18&width=27#crop=0&crop=0&crop=1&crop=1&id=HsH4q&originHeight=26&originWidth=38&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
%26%3D%5Csum%5Climitshp(h%2Cv)%3D%5Csum%5Climits_h%5Cfrac%7B1%7D%7BZ%7D%5Cexp(h%5ETwv%2B%5Calpha%5ETv%2B%5Cbeta%5ETh)%5Cnonumber%5C%5C%0A%26%3D%5Cexp(%5Calpha%5ETv)%5Cfrac%7B1%7D%7BZ%7D%5Csum%5Climits%7Bh1%7D%5Cexp(h_1w_1v%2B%5Cbeta_1h_1)%5Ccdots%5Csum%5Climits%7Bhm%7D%5Cexp(h_mw_mv%2B%5Cbeta_mh_m)%5Cnonumber%5C%5C%0A%26%3D%5Cexp(%5Calpha%5ETv)%5Cfrac%7B1%7D%7BZ%7D(1%2B%5Cexp(w_1v%2B%5Cbeta_1))%5Ccdots(1%2B%5Cexp(w_mv%2B%5Cbeta_m))%5Cnonumber%5C%5C%0A%26%3D%5Cfrac%7B1%7D%7BZ%7D%5Cexp(%5Calpha%5ETv%2B%5Csum%5Climits%7Bi%3D1%7D%5Em%5Clog(1%2B%5Cexp(wiv%2B%5Cbeta_i)))%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7Dp%28v%29%26%3D%5Csum%5Climits_hp%28h%2Cv%29%3D%5Csum%5Climits_h%5Cfrac%7B1%7D%7BZ%7D%5Cexp%28h%5ETwv%2B%5Calpha%5ETv%2B%5Cbeta%5ETh%29%5Cnonumber%5C%5C%0A%26%3D%5Cexp%28%5Calpha%5ETv%29%5Cfrac%7B1%7D%7BZ%7D%5Csum%5Climits%7Bh1%7D%5Cexp%28h_1w_1v%2B%5Cbeta_1h_1%29%5Ccdots%5Csum%5Climits%7Bhm%7D%5Cexp%28h_mw_mv%2B%5Cbeta_mh_m%29%5Cnonumber%5C%5C%0A%26%3D%5Cexp%28%5Calpha%5ETv%29%5Cfrac%7B1%7D%7BZ%7D%281%2B%5Cexp%28w_1v%2B%5Cbeta_1%29%29%5Ccdots%281%2B%5Cexp%28w_mv%2B%5Cbeta_m%29%29%5Cnonumber%5C%5C%0A%26%3D%5Cfrac%7B1%7D%7BZ%7D%5Cexp%28%5Calpha%5ETv%2B%5Csum%5Climits%7Bi%3D1%7D%5Em%5Clog%281%2B%5Cexp%28w_iv%2B%5Cbeta_i%29%29%29%0A%5Cend%7Balign%7D%0A&height=166&width=464#crop=0&crop=0&crop=1&crop=1&id=T2EwQ&originHeight=233&originWidth=649&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
其中,)#card=math&code=%5Clog%281%2B%5Cexp%28x%29%29&height=18&width=99#crop=0&crop=0&crop=1&crop=1&id=FeLgE&originHeight=26&originWidth=139&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 叫做 Softplus 函数。
