我们知道,分类问题可以分为硬分类和软分类两种,其中硬分类有 SVM,PLA,LDA 等。软分类问题大体上可以分为概率生成和概率判别模型,其中较为有名的概率判别模型有 Logistic 回归,生成模型有朴素贝叶斯模型。Logistic 回归模型的损失函数为交叉熵,这类模型也叫对数线性模型,一般地,又叫做最大熵模型,这类模型和指数族分布的概率假设是一致的。对朴素贝叶斯假设,如果将其中的单元素的条件独立性做推广到一系列的隐变量,那么,由此得到的模型又被称为动态模型,比较有代表性的如 HMM,从概率意义上,HMM也可以看成是 GMM 在时序上面的推广。

我们看到,一般地,如果将最大熵模型和 HMM相结合,那么这种模型叫做最大熵 Markov 模型(MEMM):

这个图就是将 HMM 的图中观测变量和隐变量的边方向反向,应用在分类中,隐变量就是输出的分类,这样 HMM 中的两个假设就不成立了,特别是观测之间不是完全独立的了。

HMM 是一种生成式模型,其建模对象为 14.条件随机场 - 图1#card=math&code=p%28X%2CY%7C%5Clambda%29#crop=0&crop=0&crop=1&crop=1&id=l0g31&originHeight=26&originWidth=89&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),根据 HMM 的概率图,14.条件随机场 - 图2%3D%5Cprod%5Climits%7Bt%3D1%7D%5ETp(x_t%2Cy_t%7C%5Clambda%2Cy%7Bt-1%7D)#card=math&code=p%28X%2CY%7C%5Clambda%29%3D%5Cprod%5Climits%7Bt%3D1%7D%5ETp%28x_t%2Cy_t%7C%5Clambda%2Cy%7Bt-1%7D%29#crop=0&crop=0&crop=1&crop=1&id=pgjjk&originHeight=66&originWidth=284&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。我们看到,观测独立性假设是一个很强的假设,如果我们有一个文本样本,那么观测独立性假设就假定了所有的单词之间没有关联。

在 MEMM 中,建模对象是 14.条件随机场 - 图3#card=math&code=p%28Y%7CX%2C%5Clambda%29#crop=0&crop=0&crop=1&crop=1&id=egoHF&originHeight=26&originWidth=89&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),我们看概率图,给定 14.条件随机场 - 图414.条件随机场 - 图5 是不独立的,这样,观测独立假设就不成立了。根据概率图,14.条件随机场 - 图6%3D%5Cprod%5Climits%7Bt%3D1%7D%5ETp(y_t%7Cy%7Bt-1%7D%2CX%2C%5Clambda)#card=math&code=p%28Y%7CX%2C%5Clambda%29%3D%5Cprod%5Climits%7Bt%3D1%7D%5ETp%28y_t%7Cy%7Bt-1%7D%2CX%2C%5Clambda%29#crop=0&crop=0&crop=1&crop=1&id=Jx8YJ&originHeight=66&originWidth=282&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。

MEMM 的缺陷是其必须满足局域的概率归一化(Label Bias Problem),我们看到,在上面的概率图中,14.条件随机场 - 图7#card=math&code=p%28yt%7Cy%7Bt-1%7D%2Cxt%29#crop=0&crop=0&crop=1&crop=1&id=tBjIu&originHeight=26&originWidth=116&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=), 这个概率,如果 ![](https://g.yuque.com/gr/latex?p(y_t%7Cy%7Bt-1%7D)#card=math&code=p%28yt%7Cy%7Bt-1%7D%29#crop=0&crop=0&crop=1&crop=1&id=KIArH&originHeight=26&originWidth=88&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 非常接近1,那么事实上,观测变量是什么就不会影响这个概率了。

对于这个问题,我们将 14.条件随机场 - 图8 之间的箭头转为直线转为无向图(线性链条件随机场),这样就只要满足全局归一化了(破坏齐次 Markov 假设)。

CRF 的 PDF

线性链的 CRF 的 PDF 为 14.条件随机场 - 图9%3D%5Cfrac%7B1%7D%7BZ%7D%5Cexp%5Csum%5Climits%7Bt%3D1%7D%5ET(F_t(y%7Bt-1%7D%2Cyt%2Cx%7B1%3AT%7D))#card=math&code=p%28Y%7CX%29%3D%5Cfrac%7B1%7D%7BZ%7D%5Cexp%5Csum%5Climits%7Bt%3D1%7D%5ET%28F_t%28y%7Bt-1%7D%2Cyt%2Cx%7B1%3AT%7D%29%29#crop=0&crop=0&crop=1&crop=1&id=eA43X&originHeight=66&originWidth=346&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),两两形成了最大团,其中 14.条件随机场 - 图10 是随意外加的一个元素。作为第一个简化,我们假设每个团的势函数相同 14.条件随机场 - 图11

对于这个 14.条件随机场 - 图12,我们进一步,可以将其写为 14.条件随机场 - 图13%3D%5CDelta%7By%7Bt-1%7D%2CX%7D%2B%5CDelta%7By%7Bt%7D%2CX%7D%2B%5CDelta%7By_t%2Cy%7Bt-1%7D%2CX%7D#card=math&code=%C2%A0F%28y%7Bt-1%7D%2Cy_t%2CX%29%3D%5CDelta%7By%7Bt-1%7D%2CX%7D%2B%5CDelta%7By%7Bt%7D%2CX%7D%2B%5CDelta%7Byt%2Cy%7Bt-1%7D%2CX%7D#crop=0&crop=0&crop=1&crop=1&id=dXDxh&originHeight=29&originWidth=397&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)这三个部分,分别表示状态函数已经转移函数,由于整体的求和,可以简化为 image.png

我们可以设计一个表达式将其参数化:

14.条件随机场 - 图15%5C%5C%0A%5CDelta%7By%7Bt%7D%2CX%7D%26%3D%5Csum%5Climits%7Bl%3D1%7D%5EL%5Ceta_lg_l(y_t%2CX)%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0A%5CDelta%7Byt%2Cy%7Bt-1%7D%2CX%7D%26%3D%5Csum%5Climits%7Bk%3D1%7D%5EK%5Clambda_kf_k%28y%7Bt-1%7D%2Cyt%2CX%29%5C%5C%0A%5CDelta%7By%7Bt%7D%2CX%7D%26%3D%5Csum%5Climits%7Bl%3D1%7D%5EL%5Ceta_lg_l%28y_t%2CX%29%0A%5Cend%7Balign%7D%0A#crop=0&crop=0&crop=1&crop=1&id=Bjmpp&originHeight=134&originWidth=299&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

其中 $g,f $ 叫做特征函数,对于 14.条件随机场 - 图1614.条件随机场 - 图17 种元素,那么 14.条件随机场 - 图18

代入概率密度函数中:

14.条件随机场 - 图19%3D%5Cfrac%7B1%7D%7BZ%7D%5Cexp%5Csum%5Climits%7Bt%3D1%7D%5ET%5B%5Csum%5Climits%7Bk%3D1%7D%5EK%5Clambdakf_k(y%7Bt-1%7D%2Cyt%2CX)%2B%5Csum%5Climits%7Bl%3D1%7D%5EL%5Cetalg_l(y_t%2CX)%5D%0A#card=math&code=p%28Y%7CX%29%3D%5Cfrac%7B1%7D%7BZ%7D%5Cexp%5Csum%5Climits%7Bt%3D1%7D%5ET%5B%5Csum%5Climits%7Bk%3D1%7D%5EK%5Clambda_kf_k%28y%7Bt-1%7D%2Cyt%2CX%29%2B%5Csum%5Climits%7Bl%3D1%7D%5EL%5Ceta_lg_l%28y_t%2CX%29%5D%0A#crop=0&crop=0&crop=1&crop=1&id=uUcJ5&originHeight=66&originWidth=531&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

对于单个样本,将其写成向量的形式。定义 14.条件随机场 - 图20%5ET%2Cx%3D(x_1%2Cx_2%2C%5Ccdots%2Cx_T)%5ET%2C%5Clambda%3D(%5Clambda_1%2C%5Clambda_2%2C%5Ccdots%2C%5Clambda_K)%5ET%2C%5Ceta%3D(%5Ceta_1%2C%5Ceta_2%2C%5Ccdots%2C%5Ceta_L)%5ET#card=math&code=y%3D%28y_1%2Cy_2%2C%5Ccdots%2Cy_T%29%5ET%2Cx%3D%28x_1%2Cx_2%2C%5Ccdots%2Cx_T%29%5ET%2C%5Clambda%3D%28%5Clambda_1%2C%5Clambda_2%2C%5Ccdots%2C%5Clambda_K%29%5ET%2C%5Ceta%3D%28%5Ceta_1%2C%5Ceta_2%2C%5Ccdots%2C%5Ceta_L%29%5ET#crop=0&crop=0&crop=1&crop=1&id=zhi5X&originHeight=29&originWidth=786&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。并且有 14.条件随机场 - 图21%5ET%2Cg%3D(g_1%2Cg_2%2C%5Ccdots%2Cg_L)%5ET#card=math&code=f%3D%28f_1%2Cf_2%2C%5Ccdots%2Cf_K%29%5ET%2Cg%3D%28g_1%2Cg_2%2C%5Ccdots%2Cg_L%29%5ET#crop=0&crop=0&crop=1&crop=1&id=Ub97w&originHeight=29&originWidth=382&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。于是:

14.条件随机场 - 图22%3D%5Cfrac%7B1%7D%7BZ%7D%5Cexp%5Csum%5Climits%7Bt%3D1%7D%5ET%5B%5Clambda%5ETf(y%7Bt-1%7D%2Cyt%2Cx)%2B%5Ceta%5ETg(y_t%2Cx)%5D%0A#card=math&code=p%28Y%3Dy%7CX%3Dx%29%3D%5Cfrac%7B1%7D%7BZ%7D%5Cexp%5Csum%5Climits%7Bt%3D1%7D%5ET%5B%5Clambda%5ETf%28y_%7Bt-1%7D%2Cy_t%2Cx%29%2B%5Ceta%5ETg%28y_t%2Cx%29%5D%0A#crop=0&crop=0&crop=1&crop=1&id=VeC3j&originHeight=66&originWidth=524&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

不妨记:14.条件随机场 - 图23%5ET%2CH%3D(%5Csum%5Climits%7Bt%3D1%7D%5ETf%2C%5Csum%5Climits%7Bt%3D1%7D%5ETg)%5ET#card=math&code=%5Ctheta%3D%28%5Clambda%2C%5Ceta%29%5ET%2CH%3D%28%5Csum%5Climits%7Bt%3D1%7D%5ETf%2C%5Csum%5Climits%7Bt%3D1%7D%5ETg%29%5ET#crop=0&crop=0&crop=1&crop=1&id=vOzJ5&originHeight=66&originWidth=281&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=):

14.条件随机场 - 图24%3D%5Cfrac%7B1%7D%7BZ(x%2C%5Ctheta)%7D%5Cexp%5B%5Ctheta%5ETH(yt%2Cy%7Bt-1%7D%2Cx)%5D%0A#card=math&code=p%28Y%3Dy%7CX%3Dx%29%3D%5Cfrac%7B1%7D%7BZ%28x%2C%5Ctheta%29%7D%5Cexp%5B%5Ctheta%5ETH%28yt%2Cy%7Bt-1%7D%2Cx%29%5D%0A#crop=0&crop=0&crop=1&crop=1&id=F1BPz&originHeight=54&originWidth=429&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

上面这个式子是一个指数族分布,于是 14.条件随机场 - 图25 是配分函数。

CRF 需要解决下面几个问题:

  1. Learning:参数估计问题,对 14.条件随机场 - 图2614.条件随机场 - 图27 维样本,14.条件随机场 - 图28#card=math&code=%5Chat%7B%5Ctheta%7D%3D%5Cmathop%7Bargmax%7D%5Climits%7B%5Ctheta%7D%5Cprod%5Climits%7Bi%3D1%7D%5ENp%28y%5Ei%7Cx%5Ei%29#crop=0&crop=0&crop=1&crop=1&id=B3ixP&originHeight=66&originWidth=215&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),这里用上标表示样本的编号。

  2. Inference:

a. 边缘概率:14.条件随机场 - 图29%0A#card=math&code=p%28y_t%7Cx%29%0A#crop=0&crop=0&crop=1&crop=1&id=PON4L&originHeight=26&originWidth=63&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

  1. 条件概率:一般在生成模型中较为关注,CRF 中不关注

  2. MAP 推断:

image.png

边缘概率

边缘概率这个问题描述为,根据学习任务得到的参数,给定了 14.条件随机场 - 图31#card=math&code=p%28Y%3Dy%7CX%3Dx%29#crop=0&crop=0&crop=1&crop=1&id=taleC&originHeight=26&originWidth=145&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),求解 14.条件随机场 - 图32#card=math&code=p%28y_t%3Di%7Cx%29#crop=0&crop=0&crop=1&crop=1&id=tZqgs&originHeight=26&originWidth=98&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。根据无向图可以给出:

14.条件随机场 - 图33%3D%5Csum%5Climits%7By%7B1%3At-1%7D%2Cy%7Bt%2B1%3AT%7D%7Dp(y%7Cx)%3D%5Csum%5Climits%7By%7B1%3At-1%7D%7D%5Csum%5Climits%7By%7Bt%2B1%3AT%7D%7D%5Cfrac%7B1%7D%7BZ%7D%5Cprod%5Climits%7Bt’%3D1%7D%5ET%5Cphi%7Bt’%7D(y%7Bt’-1%7D%2Cy%7Bt’%7D%2Cx)%0A#card=math&code=p%28y_t%3Di%7Cx%29%3D%5Csum%5Climits%7By%7B1%3At-1%7D%2Cy%7Bt%2B1%3AT%7D%7Dp%28y%7Cx%29%3D%5Csum%5Climits%7By%7B1%3At-1%7D%7D%5Csum%5Climits%7By%7Bt%2B1%3AT%7D%7D%5Cfrac%7B1%7D%7BZ%7D%5Cprod%5Climits%7Bt%27%3D1%7D%5ET%5Cphi%7Bt%27%7D%28y%7Bt%27-1%7D%2Cy%7Bt%27%7D%2Cx%29%0A#crop=0&crop=0&crop=1&crop=1&id=ojnVO&originHeight=71&originWidth=574&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

我们看到上面的式子,直接计算的复杂度很高,这是由于求和的复杂度在 14.条件随机场 - 图34#card=math&code=O%28S%5ET%29#crop=0&crop=0&crop=1&crop=1&id=CSJC2&originHeight=29&originWidth=58&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),求积的复杂度在 14.条件随机场 - 图35#card=math&code=O%28T%29#crop=0&crop=0&crop=1&crop=1&id=gsZYu&originHeight=26&originWidth=47&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),所以整体复杂度为 14.条件随机场 - 图36#card=math&code=O%28TS%5ET%29#crop=0&crop=0&crop=1&crop=1&id=RYuIV&originHeight=29&originWidth=73&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。我们需要调整求和符号的顺序,从而降低复杂度。

首先,将两个求和分为:

14.条件随机场 - 图37%3D%5Cfrac%7B1%7D%7BZ%7D%5CDeltal%5CDelta_r%5C%5C%0A%26%5CDelta_l%3D%5Csum%5Climits%7By%7B1%3At-1%7D%7D%5Cphi%7B1%7D(y0%2Cy_1%2Cx)%5Cphi_2(y_1%2Cy_2%2Cx)%5Ccdots%5Cphi%7Bt-1%7D(y%7Bt-2%7D%2Cy%7Bt-1%7D%2Cx)%5Cphit(y%7Bt-1%7D%2Cyt%3Di%2Cx)%5C%5C%0A%26%5CDelta_r%3D%5Csum%5Climits%7By%7Bt%2B1%3AT%7D%7D%5Cphi%7Bt%2B1%7D(yt%3Di%2Cy%7Bt%2B1%7D%2Cx)%5Cphi%7Bt%2B2%7D(y%7Bt%2B1%7D%2Cy%7Bt%2B2%7D%2Cx)%5Ccdots%5Cphi_T(y%7BT-1%7D%2CyT%2Cx)%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%26p%28y_t%3Di%7Cx%29%3D%5Cfrac%7B1%7D%7BZ%7D%5CDelta_l%5CDelta_r%5C%5C%0A%26%5CDelta_l%3D%5Csum%5Climits%7By%7B1%3At-1%7D%7D%5Cphi%7B1%7D%28y0%2Cy_1%2Cx%29%5Cphi_2%28y_1%2Cy_2%2Cx%29%5Ccdots%5Cphi%7Bt-1%7D%28y%7Bt-2%7D%2Cy%7Bt-1%7D%2Cx%29%5Cphit%28y%7Bt-1%7D%2Cyt%3Di%2Cx%29%5C%5C%0A%26%5CDelta_r%3D%5Csum%5Climits%7By%7Bt%2B1%3AT%7D%7D%5Cphi%7Bt%2B1%7D%28yt%3Di%2Cy%7Bt%2B1%7D%2Cx%29%5Cphi%7Bt%2B2%7D%28y%7Bt%2B1%7D%2Cy%7Bt%2B2%7D%2Cx%29%5Ccdots%5Cphi_T%28y%7BT-1%7D%2Cy_T%2Cx%29%0A%5Cend%7Balign%7D%0A#crop=0&crop=0&crop=1&crop=1&id=p2tTf&originHeight=158&originWidth=663&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

对于 14.条件随机场 - 图38,从左向右,一步一步将 14.条件随机场 - 图39 消掉:

14.条件随机场 - 图40%5Csum%5Climits%7By%7Bt-2%7D%7D%5Cphi%7Bt-1%7D(y%7Bt-2%7D%2Cy%7Bt-1%7D%2Cx)%5Ccdots%5Csum%5Climits%7By0%7D%5Cphi_1(y_0%2Cy_1%2Cx)%0A#card=math&code=%5CDelta_l%3D%5Csum%5Climits%7By%7Bt-1%7D%7D%5Cphi_t%28y%7Bt-1%7D%2Cyt%3Di%2Cx%29%5Csum%5Climits%7By%7Bt-2%7D%7D%5Cphi%7Bt-1%7D%28y%7Bt-2%7D%2Cy%7Bt-1%7D%2Cx%29%5Ccdots%5Csum%5Climits_%7By_0%7D%5Cphi_1%28y_0%2Cy_1%2Cx%29%0A#crop=0&crop=0&crop=1&crop=1&id=qhmvj&originHeight=54&originWidth=611&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

引入:

14.条件随机场 - 图41%3D%5CDelta_l%0A#card=math&code=%5Calpha_t%28i%29%3D%5CDelta_l%0A#crop=0&crop=0&crop=1&crop=1&id=qNa5F&originHeight=26&originWidth=96&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

于是:

14.条件随机场 - 图42%3D%5Csum%5Climits%7Bj%5Cin%20S%7D%5Cphi_t(y%7Bt-1%7D%3Dj%2Cyt%3Di%2Cx)%5Calpha%7Bt-1%7D(j)%0A#card=math&code=%5Calpha%7Bt%7D%28i%29%3D%5Csum%5Climits%7Bj%5Cin%20S%7D%5Cphit%28y%7Bt-1%7D%3Dj%2Cyt%3Di%2Cx%29%5Calpha%7Bt-1%7D%28j%29%0A#crop=0&crop=0&crop=1&crop=1&id=OqFbK&originHeight=53&originWidth=363&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

这样我们得到了一个递推式。

类似地,14.条件随机场 - 图43%3D%5Csum%5Climits%7Bj%5Cin%20S%7D%5Cphi%7Bt%2B1%7D(yt%3Di%2Cy%7Bt%2B1%7D%3Dj%2Cx)%5Cbeta%7Bt%2B1%7D(j)#card=math&code=%5CDelta_r%3D%5Cbeta_t%28i%29%3D%5Csum%5Climits%7Bj%5Cin%20S%7D%5Cphi%7Bt%2B1%7D%28y_t%3Di%2Cy%7Bt%2B1%7D%3Dj%2Cx%29%5Cbeta_%7Bt%2B1%7D%28j%29#crop=0&crop=0&crop=1&crop=1&id=iQHY0&originHeight=53&originWidth=433&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。这个方法和 HMM 中的前向后向算法类似,就是概率图模型中精确推断的变量消除算法(信念传播)。

参数估计

在进行各种类型的推断之前,还需要对参数进行学习:

14.条件随机场 - 图44%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20p(y%5Ei%7Cx%5Ei)%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Csum%5Climits%7Bi%3D1%7D%5EN%5B-%5Clog%20Z(x%5Ei%2C%5Clambda%2C%5Ceta)%2B%5Csum%5Climits%7Bt%3D1%7D%5ET%5B%5Clambda%5ETf(y%7Bt-1%7D%2Cyt%2Cx)%2B%5Ceta%5ETg(y_t%2Cx)%5D%5D%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%5Chat%7B%5Ctheta%7D%26%3D%5Cmathop%7Bargmax%7D%7B%5Ctheta%7D%5Cprod%5Climits%7Bi%3D1%7D%5ENp%28y%5Ei%7Cx%5Ei%29%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20p%28y%5Ei%7Cx%5Ei%29%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Ctheta%5Csum%5Climits%7Bi%3D1%7D%5EN%5B-%5Clog%20Z%28x%5Ei%2C%5Clambda%2C%5Ceta%29%2B%5Csum%5Climits%7Bt%3D1%7D%5ET%5B%5Clambda%5ETf%28y_%7Bt-1%7D%2Cy_t%2Cx%29%2B%5Ceta%5ETg%28y_t%2Cx%29%5D%5D%0A%5Cend%7Balign%7D%0A#crop=0&crop=0&crop=1&crop=1&id=nbhz0&originHeight=200&originWidth=625&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

上面的式子中,第一项是对数配分函数,根据指数族分布的结论:

14.条件随机场 - 图45)%3D%5Cmathbb%7BE%7D%7Bp(y%5Ei%7Cx%5Ei)%7D%5B%5Csum%5Climits%7Bt%3D1%7D%5ETf(y%7Bt-1%7D%2Cy_t%2Cx%5Ei)%5D%0A#card=math&code=%5Cnabla%5Clambda%28%5Clog%20Z%28x%5Ei%2C%5Clambda%2C%5Ceta%29%29%3D%5Cmathbb%7BE%7D%7Bp%28y%5Ei%7Cx%5Ei%29%7D%5B%5Csum%5Climits%7Bt%3D1%7D%5ETf%28y_%7Bt-1%7D%2Cy_t%2Cx%5Ei%29%5D%0A#crop=0&crop=0&crop=1&crop=1&id=Nt0ap&originHeight=66&originWidth=426&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

其中,和 14.条件随机场 - 图46 相关的项相当于一个常数。求解这个期望值:

14.条件随机场 - 图47%7D%5B%5Csum%5Climits%7Bt%3D1%7D%5ETf(y%7Bt-1%7D%2Cyt%2Cx%5Ei)%5D%3D%5Csum%5Climits%7By%7Dp(y%7Cx%5Ei)%5Csum%5Climits%7Bt%3D1%7D%5ETf(y%7Bt-1%7D%2Cyt%2Cx%5Ei)%0A#card=math&code=%5Cmathbb%7BE%7D%7Bp%28y%5Ei%7Cx%5Ei%29%7D%5B%5Csum%5Climits%7Bt%3D1%7D%5ETf%28y%7Bt-1%7D%2Cyt%2Cx%5Ei%29%5D%3D%5Csum%5Climits%7By%7Dp%28y%7Cx%5Ei%29%5Csum%5Climits%7Bt%3D1%7D%5ETf%28y%7Bt-1%7D%2Cy_t%2Cx%5Ei%29%0A#crop=0&crop=0&crop=1&crop=1&id=FV485&originHeight=69&originWidth=512&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

第一个求和号的复杂度为 14.条件随机场 - 图48#card=math&code=O%28S%5ET%29#crop=0&crop=0&crop=1&crop=1&id=tHfTf&originHeight=29&originWidth=58&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),重新排列求和符号:

14.条件随机场 - 图49%7D%5B%5Csum%5Climits%7Bt%3D1%7D%5ETf(y%7Bt-1%7D%2Cyt%2Cx%5Ei)%5D%26%3D%5Csum%5Climits%7Bt%3D1%7D%5ET%5Csum%5Climits%7By%7B1%3At-2%7D%7D%5Csum%5Climits%7By%7Bt-1%7D%7D%5Csum%5Climits%7By_t%7D%5Csum%5Climits%7By%7Bt%2B1%3AT%7D%7Dp(y%7Cx%5Ei)f(y%7Bt-1%7D%2Cyt%2Cx%5Ei)%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bt%3D1%7D%5ET%5Csum%5Climits%7By%7Bt-1%7D%7D%5Csum%5Climits%7By_t%7Dp(y%7Bt-1%7D%2Cyt%7Cx%5Ei)f(y%7Bt-1%7D%2Cyt%2Cx%5Ei)%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%5Cmathbb%7BE%7D%7Bp%28y%5Ei%7Cx%5Ei%29%7D%5B%5Csum%5Climits%7Bt%3D1%7D%5ETf%28y%7Bt-1%7D%2Cyt%2Cx%5Ei%29%5D%26%3D%5Csum%5Climits%7Bt%3D1%7D%5ET%5Csum%5Climits%7By%7B1%3At-2%7D%7D%5Csum%5Climits%7By%7Bt-1%7D%7D%5Csum%5Climits%7By_t%7D%5Csum%5Climits%7By%7Bt%2B1%3AT%7D%7Dp%28y%7Cx%5Ei%29f%28y%7Bt-1%7D%2Cyt%2Cx%5Ei%29%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bt%3D1%7D%5ET%5Csum%5Climits%7By%7Bt-1%7D%7D%5Csum%5Climits%7By_t%7Dp%28y%7Bt-1%7D%2Cyt%7Cx%5Ei%29f%28y%7Bt-1%7D%2Cy_t%2Cx%5Ei%29%0A%5Cend%7Balign%7D%0A#crop=0&crop=0&crop=1&crop=1&id=jDE7g&originHeight=140&originWidth=634&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

和上面的边缘概率类似,也可以通过前向后向算法得到上面式子中的边缘概率。

于是:

14.条件随机场 - 图50-%5Csum%5Climits%7By%7Bt-1%7D%7D%5Csum%5Climits%7By_t%7Dp(y%7Bt-1%7D%2Cyt%7Cx%5Ei)f(y%7Bt-1%7D%2Cyt%2Cx%5Ei)%5D%0A#card=math&code=%5Cnabla%5Clambda%20L%3D%5Csum%5Climits%7Bi%3D1%7D%5EN%5Csum%5Climits%7Bt%3D1%7D%5ET%5Bf%28y%7Bt-1%7D%2Cy_t%2Cx%5Ei%29-%5Csum%5Climits%7By%7Bt-1%7D%7D%5Csum%5Climits%7Byt%7Dp%28y%7Bt-1%7D%2Cyt%7Cx%5Ei%29f%28y%7Bt-1%7D%2Cy_t%2Cx%5Ei%29%5D%0A#crop=0&crop=0&crop=1&crop=1&id=u9Xh7&originHeight=71&originWidth=594&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

利用梯度上升算法可以求解。对于 14.条件随机场 - 图51 也是类似的过程。

译码

译码问题和 HMM 中的 Viterbi 算法类似,同样采样动态规划的思想一层一层求解最大值。