隐马尔可夫模型是一种概率图模型。我们知道,机器学习模型可以从频率派和贝叶斯派两个方向考虑,在频率派的方法中的核心是优化问题,而在贝叶斯派的方法中,核心是积分问题,也发展出来了一系列的积分方法如变分推断,MCMC 等。概率图模型最基本的模型可以分为有向图(贝叶斯网络)和无向图(马尔可夫随机场)两个方面,例如 GMM,在这些基本的模型上,如果样本之间存在关联,可以认为样本中附带了时序信息,从而样本之间不独立全同分布的,这种模型就叫做动态模型,隐变量随着时间发生变化,于是观测变量也发生变化:

隐马尔可夫模型 - 图1 根据状态变量的特点,可以分为:

  1. HMM,状态变量(隐变量)是离散的
  2. Kalman 滤波,状态变量是连续的,线性的
  3. 粒子滤波,状态变量是连续,非线性的

HMM

HMM 用概率图表示为:

隐马尔可夫模型 - 图2)%0Aend%0Asubgraph%20three%0A%09t3—%3Ex3((x3))%0Aend%0Asubgraph%20two%0A%09t2—%3Ex2((x2))%0Aend%0Asubgraph%20one%0A%09t1—%3Ex1((x1))%0Aend%0A%0At2—%3Et3%3B%0At3—%3Et4%3B#code=graph%20TD%3B%0At1—%3Et2%3B%0Asubgraph%20four%0A%09t4—%3Ex4%28%28x4%29%29%0Aend%0Asubgraph%20three%0A%09t3—%3Ex3%28%28x3%29%29%0Aend%0Asubgraph%20two%0A%09t2—%3Ex2%28%28x2%29%29%0Aend%0Asubgraph%20one%0A%09t1—%3Ex1%28%28x1%29%29%0Aend%0A%0At2—%3Et3%3B%0At3—%3Et4%3B&id=b76f3bfa&type=mermaid) 上图表示了四个时刻的隐变量变化。用参数 隐马尔可夫模型 - 图3#card=math&code=%5Clambda%3D%28%5Cpi%2CA%2CB%29&height=18&width=84) 来表示,其中 隐马尔可夫模型 - 图4 是开始的概率分布,隐马尔可夫模型 - 图5 为状态转移矩阵,隐马尔可夫模型 - 图6 为发射矩阵。

下面使用 隐马尔可夫模型 - 图7 来表示观测变量,隐马尔可夫模型 - 图8 为观测序列,隐马尔可夫模型 - 图9 表示观测的值域,隐马尔可夫模型 - 图10 表示状态变量【隐变量】,隐马尔可夫模型 - 图11 为状态序列,隐马尔可夫模型 - 图12 表示状态变量的值域。定义 隐马尔可夫模型 - 图13)#card=math&code=A%3D%28a%7Bij%7D%3Dp%28i%7Bt%2B1%7D%3Dq_j%7Ci_t%3Dq_i%29%29&height=21&width=227)—由状态隐马尔可夫模型 - 图14转移到隐马尔可夫模型 - 图15— 表示状态转移矩阵,隐马尔可夫模型 - 图16%3Dp(o_t%3Dv_k%7Ci_t%3Dq_j))#card=math&code=B%3D%28b_j%28k%29%3Dp%28o_t%3Dv_k%7Ci_t%3Dq_j%29%29&height=19&width=207) 表示发射矩阵【在隐马尔可夫模型 - 图17时刻的状态和观测值】。

隐马尔可夫模型 - 图18中,有两个基本假设:

  1. 齐次 隐马尔可夫模型 - 图19 假设(未来只依赖于当前,只和前一个有关系):隐马尔可夫模型 - 图20%3Dp(i%7Bt%2B1%7D%7Ci_t)%0A#card=math&code=p%28i%7Bt%2B1%7D%7Cit%2Ci%7Bt-1%7D%2C%5Ccdots%2Ci1%2Co_t%2Co%7Bt-1%7D%2C%5Ccdots%2Co1%29%3Dp%28i%7Bt%2B1%7D%7Ci_t%29%0A&height=18&width=311)
  2. 观测独立假设:隐马尔可夫模型 - 图21%3Dp(ot%7Ci_t)%0A#card=math&code=p%28o_t%7Ci_t%2Ci%7Bt-1%7D%2C%5Ccdots%2Ci1%2Co%7Bt-1%7D%2C%5Ccdots%2Co_1%29%3Dp%28o_t%7Ci_t%29%0A&height=18&width=269)

HMM 要解决三个问题:

  1. Evaluation:隐马尔可夫模型 - 图22#card=math&code=p%28O%7C%5Clambda%29&height=18&width=44),Forward-Backward 算法
  2. Learning:隐马尔可夫模型 - 图23#card=math&code=%5Clambda%3D%5Cmathop%7Bargmax%7D%5Climits_%7B%5Clambda%7Dp%28O%7C%5Clambda%29&height=28&width=126),EM 算法(Baum-Welch)
  3. Decoding:隐马尔可夫模型 - 图24#card=math&code=I%3D%5Cmathop%7Bargmax%7D%5Climits_%7BI%7Dp%28I%7CO%2C%5Clambda%29&height=28&width=139),Vierbi 算法
    1. 预测问题:隐马尔可夫模型 - 图25#card=math&code=p%28i_%7Bt%2B1%7D%7Co_1%2Co_2%2C%5Ccdots%2Co_t%29&height=18&width=128)
    2. 滤波问题:隐马尔可夫模型 - 图26#card=math&code=p%28i_t%7Co_1%2Co_2%2C%5Ccdots%2Co_t%29&height=18&width=115)

Evaluation

隐马尔可夫模型 - 图27%3D%5Csum%5Climits%7BI%7Dp(I%2CO%7C%5Clambda)%3D%5Csum%5Climits%7BI%7Dp(O%7CI%2C%5Clambda)p(I%7C%5Clambda)%0A#card=math&code=p%28O%7C%5Clambda%29%3D%5Csum%5Climits%7BI%7Dp%28I%2CO%7C%5Clambda%29%3D%5Csum%5Climits%7BI%7Dp%28O%7CI%2C%5Clambda%29p%28I%7C%5Clambda%29%0A&height=40&width=322) 引入一个变量,只需要积分[求和]

隐马尔可夫模型 - 图28%3Dp(i1%2Ci_2%2C%5Ccdots%2Ci_t%7C%5Clambda)%3Dp(i_t%7Ci_1%2Ci_2%2C%5Ccdots%2Ci%7Bt-1%7D%2C%5Clambda)p(i1%2Ci_2%2C%5Ccdots%2Ci%7Bt-1%7D%7C%5Clambda)%0A#card=math&code=p%28I%7C%5Clambda%29%3Dp%28i1%2Ci_2%2C%5Ccdots%2Ci_t%7C%5Clambda%29%3Dp%28i_t%7Ci_1%2Ci_2%2C%5Ccdots%2Ci%7Bt-1%7D%2C%5Clambda%29p%28i1%2Ci_2%2C%5Ccdots%2Ci%7Bt-1%7D%7C%5Clambda%29%0A&height=18&width=443)隐马尔可夫模型 - 图29,将隐马尔可夫模型 - 图30看作隐马尔可夫模型 - 图31

根据齐次 Markov 假设:

隐马尔可夫模型 - 图32%3Dp(it%7Ci%7Bt-1%7D)%3Da%7Bi%7Bt-1%7Dit%7D%0A#card=math&code=p%28i_t%7Ci_1%2Ci_2%2C%5Ccdots%2Ci%7Bt-1%7D%2C%5Clambda%29%3Dp%28it%7Ci%7Bt-1%7D%29%3Da%7Bi%7Bt-1%7Di_t%7D%0A&height=19&width=271)

所以:

隐马尔可夫模型 - 图33%3D%5Cpi1%5Cprod%5Climits%7Bt%3D2%7D%5ETa%7Bi%7Bt-1%7D%2Cit%7D%0A#card=math&code=p%28I%7C%5Clambda%29%3D%5Cpi_1%5Cprod%5Climits%7Bt%3D2%7D%5ETa%7Bi%7Bt-1%7D%2Cit%7D%0A&height=47&width=138) ,这里的![](https://cdn.nlark.com/yuque/__latex/4d50b474c7a8b790bdd18aa1ee199f42.svg#card=math&code=%5Cpi&height=14&width=16)是初始概率

又由于:

隐马尔可夫模型 - 图34%3D%5Cprod%5Climits%7Bt%3D1%7D%5ETb%7Bit%7D(o_t)%0A#card=math&code=p%28O%7CI%2C%5Clambda%29%3D%5Cprod%5Climits%7Bt%3D1%7D%5ETb_%7Bi_t%7D%28o_t%29%0A&height=47&width=140)

于是:

隐马尔可夫模型 - 图35%3D%5Csum%5Climits%7BI%7D%5Cpi%7Bi1%7D%5Cprod%5Climits%7Bt%3D2%7D%5ETa%7Bi%7Bt-1%7D%2Cit%7D%5Cprod%5Climits%7Bt%3D1%7D%5ETb%7Bi_t%7D(o_t)%0A#card=math&code=p%28O%7C%5Clambda%29%3D%5Csum%5Climits%7BI%7D%5Cpi%7Bi_1%7D%5Cprod%5Climits%7Bt%3D2%7D%5ETa%7Bi%7Bt-1%7D%2Cit%7D%5Cprod%5Climits%7Bt%3D1%7D%5ETb_%7Bi_t%7D%28o_t%29%0A&height=47&width=234)

我们看到,上面的式子中的求和符号是对所有的观测变量求和,于是复杂度为 隐马尔可夫模型 - 图36#card=math&code=O%28N%5ET%29&height=20&width=45)。

下面,记 隐马尔可夫模型 - 图37%3Dp(o_1%2Co_2%2C%5Ccdots%2Co_t%2Ci_t%3Dq_i%7C%5Clambda)#card=math&code=%5Calpha_t%28i%29%3Dp%28o_1%2Co_2%2C%5Ccdots%2Co_t%2Ci_t%3Dq_i%7C%5Clambda%29&height=18&width=213),所以,隐马尔可夫模型 - 图38%3Dp(O%2Ci_T%3Dq_i%7C%5Clambda)#card=math&code=%5Calpha_T%28i%29%3Dp%28O%2Ci_T%3Dq_i%7C%5Clambda%29&height=18&width=151)。我们看到:

隐马尔可夫模型 - 图39%3D%5Csum%5Climits%7Bi%3D1%7D%5ENp(O%2Ci_T%3Dq_i%7C%5Clambda)%3D%5Csum%5Climits%7Bi%3D1%7D%5EN%5CalphaT(i)%0A#card=math&code=p%28O%7C%5Clambda%29%3D%5Csum%5Climits%7Bi%3D1%7D%5ENp%28O%2CiT%3Dq_i%7C%5Clambda%29%3D%5Csum%5Climits%7Bi%3D1%7D%5EN%5Calpha_T%28i%29%0A&height=47&width=263)为了去掉隐马尔可夫模型 - 图40,对隐马尔可夫模型 - 图41求和

隐马尔可夫模型 - 图42#card=math&code=%5Calpha_%7Bt%2B1%7D%28j%29&height=18&width=45):

隐马尔可夫模型 - 图43

利用观测独立假设:

隐马尔可夫模型 - 图44

上面利用了齐次 Markov 假设得到了一个递推公式,这个算法叫做前向算法。

还有一种算法叫做后向算法,定义 隐马尔可夫模型 - 图45%3Dp(o%7Bt%2B1%7D%2Co%7Bt%2B1%7D%2C%5Ccdots%EF%BC%8CoT%7Ci_t%3Di%2C%5Clambda)#card=math&code=%5Cbeta_t%28i%29%3Dp%28o%7Bt%2B1%7D%2Co_%7Bt%2B1%7D%2C%5Ccdots%EF%BC%8Co_T%7Ci_t%3Di%2C%5Clambda%29&height=21&width=240):

隐马尔可夫模型 - 图46%26%3Dp(o1%2C%5Ccdots%2Co_T%7C%5Clambda)%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bi%3D1%7D%5ENp(o1%2Co_2%2C%5Ccdots%2Co_T%2Ci_1%3Dq_i%7C%5Clambda)%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bi%3D1%7D%5ENp(o1%2Co_2%2C%5Ccdots%2Co_T%7Ci_1%3Dq_i%2C%5Clambda)%5Cpi_i%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bi%3D1%7D%5ENp(o1%7Co_2%2C%5Ccdots%2Co_T%2Ci_1%3Dq_i%2C%5Clambda)p(o_2%2C%5Ccdots%2Co_T%7Ci_1%3Dq_i%2C%5Clambda)%5Cpi_i%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bi%3D1%7D%5ENbi(o_1)%5Cpi_i%5Cbeta_1(i)%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7Dp%28O%7C%5Clambda%29%26%3Dp%28o_1%2C%5Ccdots%2Co_T%7C%5Clambda%29%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bi%3D1%7D%5ENp%28o1%2Co_2%2C%5Ccdots%2Co_T%2Ci_1%3Dq_i%7C%5Clambda%29%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bi%3D1%7D%5ENp%28o1%2Co_2%2C%5Ccdots%2Co_T%7Ci_1%3Dq_i%2C%5Clambda%29%5Cpi_i%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bi%3D1%7D%5ENp%28o1%7Co_2%2C%5Ccdots%2Co_T%2Ci_1%3Dq_i%2C%5Clambda%29p%28o_2%2C%5Ccdots%2Co_T%7Ci_1%3Dq_i%2C%5Clambda%29%5Cpi_i%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bi%3D1%7D%5ENb_i%28o_1%29%5Cpi_i%5Cbeta_1%28i%29%0A%5Cend%7Balign%7D%0A&height=211&width=418)

对于这个 隐马尔可夫模型 - 图47#card=math&code=%5Cbeta_1%28i%29&height=18&width=32):

隐马尔可夫模型 - 图48%26%3Dp(o%7Bt%2B1%7D%2C%5Ccdots%2Co_T%7Ci_t%3Dq_i)%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bj%3D1%7D%5ENp(o%7Bt%2B1%7D%2Co%7Bt%2B2%7D%2C%5Ccdots%2CoT%2Ci%7Bt%2B1%7D%3Dqj%7Ci_t%3Dq_i)%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bj%3D1%7D%5ENp(o%7Bt%2B1%7D%2C%5Ccdots%2Co_T%7Ci%7Bt%2B1%7D%3Dqj%2Ci_t%3Dq_i)p(i%7Bt%2B1%7D%3Dqj%7Ci_t%3Dq_i)%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bj%3D1%7D%5ENp(o%7Bt%2B1%7D%2C%5Ccdots%2Co_T%7Ci%7Bt%2B1%7D%3Dqj)a%7Bij%7D%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bj%3D1%7D%5ENp(o%7Bt%2B1%7D%7Co%7Bt%2B2%7D%2C%5Ccdots%2Co_T%2Ci%7Bt%2B1%7D%3Dqj)p(o%7Bt%2B2%7D%2C%5Ccdots%2CoT%7Ci%7Bt%2B1%7D%3Dqj)a%7Bij%7D%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bj%3D1%7D%5ENb_j(o%7Bt%2B1%7D)a%7Bij%7D%5Cbeta%7Bt%2B1%7D(j)%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%5Cbetat%28i%29%26%3Dp%28o%7Bt%2B1%7D%2C%5Ccdots%2CoT%7Ci_t%3Dq_i%29%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bj%3D1%7D%5ENp%28o%7Bt%2B1%7D%2Co%7Bt%2B2%7D%2C%5Ccdots%2CoT%2Ci%7Bt%2B1%7D%3Dqj%7Ci_t%3Dq_i%29%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bj%3D1%7D%5ENp%28o%7Bt%2B1%7D%2C%5Ccdots%2Co_T%7Ci%7Bt%2B1%7D%3Dqj%2Ci_t%3Dq_i%29p%28i%7Bt%2B1%7D%3Dqj%7Ci_t%3Dq_i%29%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bj%3D1%7D%5ENp%28o%7Bt%2B1%7D%2C%5Ccdots%2Co_T%7Ci%7Bt%2B1%7D%3Dqj%29a%7Bij%7D%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bj%3D1%7D%5ENp%28o%7Bt%2B1%7D%7Co%7Bt%2B2%7D%2C%5Ccdots%2Co_T%2Ci%7Bt%2B1%7D%3Dqj%29p%28o%7Bt%2B2%7D%2C%5Ccdots%2CoT%7Ci%7Bt%2B1%7D%3Dqj%29a%7Bij%7D%5Cnonumber%5C%5C%0A%26%3D%5Csum%5Climits%7Bj%3D1%7D%5ENb_j%28o%7Bt%2B1%7D%29a%7Bij%7D%5Cbeta%7Bt%2B1%7D%28j%29%0A%5Cend%7Balign%7D%0A&height=271&width=440)

于是后向地得到了第一项。

Learning

为了学习得到参数的最优值,在 MLE 中:

隐马尔可夫模型 - 图49%0A#card=math&code=%5Clambda%7BMLE%7D%3D%5Cmathop%7Bargmax%7D%5Clambda%20p%28O%7C%5Clambda%29%0A&height=28&width=153)

我们采用 EM 算法(在这里也叫 Baum Welch 算法),用上标表示迭代:

隐马尔可夫模型 - 图50p(Z%7CX%2C%5Ctheta%5Et)dz%0A#card=math&code=%5Ctheta%5E%7Bt%2B1%7D%3D%5Cmathop%7Bargmax%7D_%7B%5Ctheta%7D%5Cint_z%5Clog%20p%28X%2CZ%7C%5Ctheta%29p%28Z%7CX%2C%5Ctheta%5Et%29dz%0A&height=37&width=280)

其中,隐马尔可夫模型 - 图51 是观测变量,隐马尔可夫模型 - 图52 是隐变量序列。于是:

隐马尔可夫模型 - 图53p(I%7CO%2C%5Clambda%5Et)%5C%5C%0A%3D%5Cmathop%7Bargmax%7D%5Clambda%5Csum%5Climits_I%5Clog%20p(O%2CI%7C%5Clambda)p(O%2CI%7C%5Clambda%5Et)%0A#card=math&code=%5Clambda%5E%7Bt%2B1%7D%3D%5Cmathop%7Bargmax%7D%5Clambda%5Csum%5ClimitsI%5Clog%20p%28O%2CI%7C%5Clambda%29p%28I%7CO%2C%5Clambda%5Et%29%5C%5C%0A%3D%5Cmathop%7Bargmax%7D%5Clambda%5Csum%5Climits_I%5Clog%20p%28O%2CI%7C%5Clambda%29p%28O%2CI%7C%5Clambda%5Et%29%0A&height=74&width=643)

这里利用了 隐马尔可夫模型 - 图54#card=math&code=p%28O%7C%5Clambda%5Et%29&height=19&width=49) 和隐马尔可夫模型 - 图55 无关。将 Evaluation 中的式子代入:

隐马尔可夫模型 - 图56p(O%2CI%7C%5Clambda%5Et)%3D%5Csum%5ClimitsI%5B%5Clog%20%5Cpi%7Bi1%7D%2B%5Csum%5Climits%7Bt%3D2%7D%5ET%5Clog%20a%7Bi%7Bt-1%7D%2Cit%7D%2B%5Csum%5Climits%7Bt%3D1%7D%5ET%5Clog%20b%7Bi_t%7D(o_t)%5Dp(O%2CI%7C%5Clambda%5Et)%0A#card=math&code=%5Csum%5Climits_I%5Clog%20p%28O%2CI%7C%5Clambda%29p%28O%2CI%7C%5Clambda%5Et%29%3D%5Csum%5Climits_I%5B%5Clog%20%5Cpi%7Bi1%7D%2B%5Csum%5Climits%7Bt%3D2%7D%5ET%5Clog%20a%7Bi%7Bt-1%7D%2Cit%7D%2B%5Csum%5Climits%7Bt%3D1%7D%5ET%5Clog%20b_%7Bi_t%7D%28o_t%29%5Dp%28O%2CI%7C%5Clambda%5Et%29%0A&height=47&width=527)

隐马尔可夫模型 - 图57

隐马尔可夫模型 - 图58%5D%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Cpi%5Csum%5Climits_I%5B%5Clog%20%5Cpi%7Bi1%7D%5Ccdot%20p(O%2Ci_1%2Ci_2%2C%5Ccdots%2Ci_T%7C%5Clambda%5Et)%5D%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%5Cpi%5E%7Bt%2B1%7D%26%3D%5Cmathop%7Bargmax%7D%5Cpi%5Csum%5ClimitsI%5B%5Clog%20%5Cpi%7Bi1%7Dp%28O%2CI%7C%5Clambda%5Et%29%5D%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmax%7D%5Cpi%5Csum%5ClimitsI%5B%5Clog%20%5Cpi%7Bi_1%7D%5Ccdot%20p%28O%2Ci_1%2Ci_2%2C%5Ccdots%2Ci_T%7C%5Clambda%5Et%29%5D%0A%5Cend%7Balign%7D%0A&height=71&width=321)

上面的式子中,对 隐马尔可夫模型 - 图59 求和可以将这些参数消掉:

隐马尔可夫模型 - 图60%5D%0A#card=math&code=%5Cpi%5E%7Bt%2B1%7D%3D%5Cmathop%7Bargmax%7D%5Cpi%5Csum%5Climits%7Bi1%7D%5B%5Clog%20%5Cpi%7Bi_1%7D%5Ccdot%20p%28O%2Ci_1%7C%5Clambda%5Et%29%5D%0A&height=37&width=250)

上面的式子还有对 隐马尔可夫模型 - 图61 的约束 隐马尔可夫模型 - 图62。定义 Lagrange 函数:

隐马尔可夫模型 - 图63%3D%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20%5Cpi_i%5Ccdot%20p(O%2Ci_1%3Dq_i%7C%5Clambda%5Et)%2B%5Ceta(%5Csum%5Climits%7Bi%3D1%7D%5EN%5Cpii-1)%0A#card=math&code=L%28%5Cpi%2C%5Ceta%29%3D%5Csum%5Climits%7Bi%3D1%7D%5EN%5Clog%20%5Cpii%5Ccdot%20p%28O%2Ci_1%3Dq_i%7C%5Clambda%5Et%29%2B%5Ceta%28%5Csum%5Climits%7Bi%3D1%7D%5EN%5Cpi_i-1%29%0A&height=47&width=334)

于是:

隐马尔可夫模型 - 图64%2B%5Ceta%3D0%0A#card=math&code=%5Cfrac%7B%5Cpartial%20L%7D%7B%5Cpartial%5Cpi_i%7D%3D%5Cfrac%7B1%7D%7B%5Cpi_i%7Dp%28O%2Ci_1%3Dq_i%7C%5Clambda%5Et%29%2B%5Ceta%3D0%0A&height=37&width=218)

对上式求和:

隐马尔可夫模型 - 图65%2B%5Cpii%5Ceta%3D0%5CRightarrow%5Ceta%3D-p(O%7C%5Clambda%5Et)%0A#card=math&code=%5Csum%5Climits%7Bi%3D1%7D%5ENp%28O%2Ci_1%3Dq_i%7C%5Clambda%5Et%29%2B%5Cpi_i%5Ceta%3D0%5CRightarrow%5Ceta%3D-p%28O%7C%5Clambda%5Et%29%0A&height=47&width=301)

所以:

隐马尔可夫模型 - 图66%7D%7Bp(O%7C%5Clambda%5Et)%7D%0A#card=math&code=%5Cpi_i%5E%7Bt%2B1%7D%3D%5Cfrac%7Bp%28O%2Ci_1%3Dq_i%7C%5Clambda%5Et%29%7D%7Bp%28O%7C%5Clambda%5Et%29%7D%0A&height=41&width=152)

Decoding

Decoding 问题表述为:

隐马尔可夫模型 - 图67%0A#card=math&code=I%3D%5Cmathop%7Bargmax%7D%5Climits_%7BI%7Dp%28I%7CO%2C%5Clambda%29%0A&height=28&width=139)

我们需要找到一个序列,其概率最大,这个序列就是在参数空间中的一个路径,可以采用动态规划的思想。

定义:

隐马尔可夫模型 - 图68%3D%5Cmax%5Climits%7Bi_1%2C%5Ccdots%2Ci%7Bt-1%7D%7Dp(o1%2C%5Ccdots%2Co_t%2Ci_1%2C%5Ccdots%2Ci%7Bt-1%7D%2Cit%3Dq_i)%0A#card=math&code=%5Cdelta%7Bt%7D%28j%29%3D%5Cmax%5Climits%7Bi_1%2C%5Ccdots%2Ci%7Bt-1%7D%7Dp%28o1%2C%5Ccdots%2Co_t%2Ci_1%2C%5Ccdots%2Ci%7Bt-1%7D%2Ci_t%3Dq_i%29%0A&height=28&width=302)

于是:

隐马尔可夫模型 - 图69%3D%5Cmax%5Climits%7B1%5Cle%20i%5Cle%20N%7D%5Cdelta_t(i)a%7Bij%7Dbj(o%7Bt%2B1%7D)%0A#card=math&code=%5Cdelta%7Bt%2B1%7D%28j%29%3D%5Cmax%5Climits%7B1%5Cle%20i%5Cle%20N%7D%5Cdeltat%28i%29a%7Bij%7Dbj%28o%7Bt%2B1%7D%29%0A&height=26&width=197)

这个式子就是从上一步到下一步的概率再求最大值。记这个路径为:

隐马尔可夫模型 - 图70%3D%5Cmathop%7Bargmax%7D%5Climits%7B1%5Cle%20i%5Cle%20N%7D%5Cdelta_t(i)a%7Bij%7D%0A#card=math&code=%5Cpsi%7Bt%2B1%7D%28j%29%3D%5Cmathop%7Bargmax%7D%5Climits%7B1%5Cle%20i%5Cle%20N%7D%5Cdeltat%28i%29a%7Bij%7D%0A&height=29&width=166)

小结

HMM 是一种动态模型,是由混合树形模型和时序结合起来的一种模型(类似 GMM + Time)。对于类似 HMM 的这种状态空间模型,普遍的除了学习任务(采用 EM )外,还有推断任务,推断任务包括:

  1. 译码 Decoding:隐马尔可夫模型 - 图71#card=math&code=p%28z_1%2Cz_2%2C%5Ccdots%2Cz_t%7Cx_1%2Cx_2%2C%5Ccdots%2Cx_t%29&height=18&width=188)
  2. 似然概率:隐马尔可夫模型 - 图72#card=math&code=p%28X%7C%5Ctheta%29&height=18&width=43)
  3. 滤波:隐马尔可夫模型 - 图73#card=math&code=%C2%A0p%28zt%7Cx_1%2C%5Ccdots%2Cx_t%29&height=18&width=98),Online![](https://g.yuque.com/gr/latex?p(z_t%7Cx%7B1%3At%7D)%3D%5Cfrac%7Bp(x%7B1%3At%7D%2Cz_t)%7D%7Bp(x%7B1%3At%7D)%7D%3DC%5Calphat(z_t)%0A#card=math&code=p%28z_t%7Cx%7B1%3At%7D%29%3D%5Cfrac%7Bp%28x%7B1%3At%7D%2Cz_t%29%7D%7Bp%28x%7B1%3At%7D%29%7D%3DC%5Calpha_t%28z_t%29%0A&height=41&width=214)
  4. 平滑:隐马尔可夫模型 - 图74#card=math&code=p%28zt%7Cx_1%2C%5Ccdots%2Cx_T%29&height=18&width=102),Offline![](https://g.yuque.com/gr/latex?p(z_t%7Cx%7B1%3AT%7D)%3D%5Cfrac%7Bp(x%7B1%3AT%7D%2Cz_t)%7D%7Bp(x%7B1%3AT%7D)%7D%3D%5Cfrac%7B%5Calphat(z_t)p(x%7Bt%2B1%3AT%7D%7Cx%7B1%3At%7D%2Cz_t)%7D%7Bp(x%7B1%3AT%7D)%7D%0A#card=math&code=p%28zt%7Cx%7B1%3AT%7D%29%3D%5Cfrac%7Bp%28x%7B1%3AT%7D%2Cz_t%29%7D%7Bp%28x%7B1%3AT%7D%29%7D%3D%5Cfrac%7B%5Calphat%28z_t%29p%28x%7Bt%2B1%3AT%7D%7Cx%7B1%3At%7D%2Cz_t%29%7D%7Bp%28x%7B1%3AT%7D%29%7D%0A&height=41&width=317)

根据概率图的条件独立性,有:隐马尔可夫模型 - 图75%3D%5Cfrac%7B%5Calphat(z_t)p(x%7Bt%2B1%3AT%7D%7Czt)%7D%7Bp(x%7B1%3AT%7D)%7D%3DC%5Calphat(z_t)%5Cbeta_t(z_t)%0A#card=math&code=p%28z_t%7Cx%7B1%3AT%7D%29%3D%5Cfrac%7B%5Calphat%28z_t%29p%28x%7Bt%2B1%3AT%7D%7Czt%29%7D%7Bp%28x%7B1%3AT%7D%29%7D%3DC%5Calpha_t%28z_t%29%5Cbeta_t%28z_t%29%0A&height=41&width=307)

这个算法叫做前向后向算法。

  1. 预测:隐马尔可夫模型 - 图76%2Cp(x%7Bt%2B1%7D%2Cx%7Bt%2B2%7D%7Cx1%2C%5Ccdots%2Cx_t)#card=math&code=p%28z%7Bt%2B1%7D%2Cz%7Bt%2B2%7D%7Cx_1%2C%5Ccdots%2Cx_t%29%2Cp%28x%7Bt%2B1%7D%2Cx%7Bt%2B2%7D%7Cx_1%2C%5Ccdots%2Cx_t%29&height=18&width=298)![](https://g.yuque.com/gr/latex?p(z%7Bt%2B1%7D%7Cx%7B1%3At%7D)%3D%5Csum%7Bzt%7Dp(z%7Bt%2B1%7D%2Czt%7Cx%7B1%3At%7D)%3D%5Csum%5Climits%7Bz_t%7Dp(z%7Bt%2B1%7D%7Czt)p(z_t%7Cx%7B1%3At%7D)%0A#card=math&code=p%28z%7Bt%2B1%7D%7Cx%7B1%3At%7D%29%3D%5Csum%7Bz_t%7Dp%28z%7Bt%2B1%7D%2Czt%7Cx%7B1%3At%7D%29%3D%5Csum%5Climits%7Bz_t%7Dp%28z%7Bt%2B1%7D%7Czt%29p%28z_t%7Cx%7B1%3At%7D%29%0A&height=36&width=369)
    隐马尔可夫模型 - 图77%3D%5Csum%5Climits%7Bz%7Bt%2B1%7D%7Dp(x%7Bt%2B1%7D%2Cz%7Bt%2B1%7D%7Cx%7B1%3At%7D)%3D%5Csum%5Climits%7Bz%7Bt%2B1%7D%7Dp(x%7Bt%2B1%7D%7Cz%7Bt%2B1%7D)p(z%7Bt%2B1%7D%7Cx%7B1%3At%7D)%0A#card=math&code=p%28x%7Bt%2B1%7D%7Cx%7B1%3At%7D%29%3D%5Csum%5Climits%7Bz%7Bt%2B1%7D%7Dp%28x%7Bt%2B1%7D%2Cz%7Bt%2B1%7D%7Cx%7B1%3At%7D%29%3D%5Csum%5Climits%7Bz%7Bt%2B1%7D%7Dp%28x%7Bt%2B1%7D%7Cz%7Bt%2B1%7D%29p%28z%7Bt%2B1%7D%7Cx%7B1%3At%7D%29%0A&height=37&width=414)