- 一维情况 MLE
- 多维情况
- %5E%7B-1%7DA%5CLambda%5E%7B-1%7D%0A#card=math&code=Var%5Bx%7Cy%5D%3D%5CLambda%5E%7B-1%7D-%5CLambda%5E%7B-1%7DA%5ET%28L%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET%29%5E%7B-1%7DA%5CLambda%5E%7B-1%7D%0A&height=20&width=327)">%5E%7B-1%7DA%5CLambda%5E%7B-1%7D%0A#card=math&code=Var%5Bx%7Cy%5D%3D%5CLambda%5E%7B-1%7D-%5CLambda%5E%7B-1%7DA%5ET%28L%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET%29%5E%7B-1%7DA%5CLambda%5E%7B-1%7D%0A&height=20&width=327)
一维情况 MLE
高斯分布在机器学习中占有举足轻重的作用。在 MLE (最大似然估计)方法中:
%3D(%5Cmu%2C%5Csigma%5E%7B2%7D)%2C%5Ctheta%7BMLE%7D%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Clog%20p(X%7C%5Ctheta)%5Cmathop%7B%3D%7D%5Climits%20%7Biid%7D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%20p(x%7Bi%7D%7C%5Ctheta)%0A#card=math&code=%5Ctheta%3D%28%5Cmu%2C%5CSigma%29%3D%28%5Cmu%2C%5Csigma%5E%7B2%7D%29%2C%5Ctheta%7BMLE%7D%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Clog%20p%28X%7C%5Ctheta%29%5Cmathop%7B%3D%7D%5Climits%20%7Biid%7D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%20p%28x%7Bi%7D%7C%5Ctheta%29%0A&height=47&width=469)
一般地,高斯分布的概率密度函数PDF写为:
带入 MLE 中我们考虑一维的情况
%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%20p(x%7Bi%7D%7C%5Ctheta)%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%7D%5Csigma%7D%5Cexp(-(x%7Bi%7D-%5Cmu)%5E%7B2%7D%2F2%5Csigma%5E%7B2%7D)%0A#card=math&code=%5Clog%20p%28X%7C%5Ctheta%29%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%20p%28x%7Bi%7D%7C%5Ctheta%29%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%7D%5Csigma%7D%5Cexp%28-%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%2F2%5Csigma%5E%7B2%7D%29%0A&height=47&width=418)
首先对 的极值可以得到(和无关的常数项去掉,下式) :
于是:
%5E%7B2%7D%3D0%5Clongrightarrow%5Cmu%7BMLE%7D%3D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%0A#card=math&code=%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%5Cmu%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%3D0%5Clongrightarrow%5Cmu%7BMLE%7D%3D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%0A&height=53&width=317)
其次对 中的另一个参数 ,有:
%26%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Csigma%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5B-%5Clog%5Csigma-%5Cfrac%7B1%7D%7B2%5Csigma%5E%7B2%7D%7D(x%7Bi%7D-%5Cmu)%5E%7B2%7D%5D%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmin%7D%5Climits%20%7B%5Csigma%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5B%5Clog%5Csigma%2B%5Cfrac%7B1%7D%7B2%5Csigma%5E%7B2%7D%7D(x%7Bi%7D-%5Cmu)%5E%7B2%7D%5D%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0A%5Csigma%7BMLE%7D%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Csigma%7D%5Clog%20p%28X%7C%5Ctheta%29%26%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Csigma%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5B-%5Clog%5Csigma-%5Cfrac%7B1%7D%7B2%5Csigma%5E%7B2%7D%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%5D%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmin%7D%5Climits%20%7B%5Csigma%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5B%5Clog%5Csigma%2B%5Cfrac%7B1%7D%7B2%5Csigma%5E%7B2%7D%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%5D%0A%5Cend%7Balign%7D%0A&height=95&width=433)
于是:
%5E%7B2%7D%5D%3D0%5Clongrightarrow%5Csigma%7BMLE%7D%5E%7B2%7D%3D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D(x%7Bi%7D-%5Cmu)%5E%7B2%7D%0A#card=math&code=%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%5Csigma%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5B%5Clog%5Csigma%2B%5Cfrac%7B1%7D%7B2%5Csigma%5E%7B2%7D%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%5D%3D0%5Clongrightarrow%5Csigma%7BMLE%7D%5E%7B2%7D%3D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%0A&height=53&width=461)
值得注意的是,上面的推导中,首先对 求 , 然后利用这个结果求 ,因此可以预期的是对数据集求期望时 是无偏差的:
但是当对 求 期望的时候由于使用了单个数据集的 ,因此对所有数据集求期望的时候我们会发现 是 有偏的:
若,则是无偏的
所以:
%5E%7B2%7D%0A#card=math&code=%5Chat%7B%5Csigma%7D%5E%7B2%7D%3D%5Cfrac%7B1%7D%7BN-1%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%0A&height=47&width=163)
多维情况
多维高斯分布表达式为:
%3D%5Cfrac%7B1%7D%7B(2%5Cpi)%5E%7Bp%2F2%7D%7C%5CSigma%7C%5E%7B1%2F2%7D%7De%5E%7B-%5Cfrac%7B1%7D%7B2%7D(x-%5Cmu)%5E%7BT%7D%5CSigma%5E%7B-1%7D(x-%5Cmu)%7D%0A#card=math&code=p%28x%7C%5Cmu%2C%5CSigma%29%3D%5Cfrac%7B1%7D%7B%282%5Cpi%29%5E%7Bp%2F2%7D%7C%5CSigma%7C%5E%7B1%2F2%7D%7De%5E%7B-%5Cfrac%7B1%7D%7B2%7D%28x-%5Cmu%29%5E%7BT%7D%5CSigma%5E%7B-1%7D%28x-%5Cmu%29%7D%0A&height=41&width=276)
其中 , 为协方差矩阵,一般而言也是半正定矩阵。这里我们只考虑正定矩阵。首先我们处理指数上的数字,指数上的数字可以记为 和 之间的马氏距离。对于对称的协方差矩阵可进行特征值分解,diag(%5Clambda%7Bi%7D)(u%7B1%7D%2Cu%7B2%7D%2C%5Ccdots%2Cu%7Bp%7D)%5E%7BT%7D%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7Du%7Bi%7D%5Clambda%7Bi%7Du%7Bi%7D%5E%7BT%7D#card=math&code=%5CSigma%3DU%5CLambda%20U%5E%7BT%7D%3D%28u%7B1%7D%2Cu%7B2%7D%2C%5Ccdots%2Cu%7Bp%7D%29diag%28%5Clambda%7Bi%7D%29%28u%7B1%7D%2Cu%7B2%7D%2C%5Ccdots%2Cu%7Bp%7D%29%5E%7BT%7D%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7Du%7Bi%7D%5Clambda%7Bi%7Du_%7Bi%7D%5E%7BT%7D&height=45&width=440) ,于是:
%5E%7BT%7D%5CSigma%5E%7B-1%7D(x-%5Cmu)%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7D(x-%5Cmu)%5E%7BT%7Du%7Bi%7D%5Cfrac%7B1%7D%7B%5Clambda%7Bi%7D%7Du%7Bi%7D%5E%7BT%7D(x-%5Cmu)%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7D%5Cfrac%7By%7Bi%7D%5E%7B2%7D%7D%7B%5Clambda%7Bi%7D%7D%0A#card=math&code=%5CDelta%3D%28x-%5Cmu%29%5E%7BT%7D%5CSigma%5E%7B-1%7D%28x-%5Cmu%29%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7D%28x-%5Cmu%29%5E%7BT%7Du%7Bi%7D%5Cfrac%7B1%7D%7B%5Clambda%7Bi%7D%7Du%7Bi%7D%5E%7BT%7D%28x-%5Cmu%29%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7D%5Cfrac%7By%7Bi%7D%5E%7B2%7D%7D%7B%5Clambda%7Bi%7D%7D%0A&height=45&width=421)
我们注意到 是 在特征向量 上的投影长度,因此上式子就是 取不同值时的同心椭圆。
下面我们看多维高斯模型在实际应用时的两个问题
- 参数 的自由度为 #card=math&code=O%28p%5E%7B2%7D%29&height=20&width=37) 对于维度很高的数据其自由度太高。解决方案:高自由度的来源是 有 %7D%7B2%7D#card=math&code=%5Cfrac%7Bp%28p%2B1%29%7D%7B2%7D&height=36&width=57) 个自由参数,可以假设其是对角矩阵,甚至在各向同性假设中假设其对角线上的元素都相同。前一种的算法有 Factor Analysis,后一种有概率 PCA(p-PCA) 。
- 第二个问题是单个高斯分布是单峰的,对有多个峰的数据分布不能得到好的结果。解决方案:高斯混合GMM 模型。
下面对多维高斯分布的常用定理进行介绍。
我们记 %5ET%3D(x%7Ba%2Cm%5Ctimes%201%7D%2C%20x%7Bb%2Cn%5Ctimes1%7D)%5ET%2C%5Cmu%3D(%5Cmu%7Ba%2Cm%5Ctimes1%7D%2C%20%5Cmu%7Bb%2Cn%5Ctimes1%7D)%2C%5CSigma%3D%5Cbegin%7Bpmatrix%7D%5CSigma%7Baa%7D%26%5CSigma%7Bab%7D%5C%5C%5CSigma%7Bba%7D%26%5CSigma%7Bbb%7D%5Cend%7Bpmatrix%7D#card=math&code=x%3D%28x1%2C%20x_2%2C%5Ccdots%2Cx_p%29%5ET%3D%28x%7Ba%2Cm%5Ctimes%201%7D%2C%20x%7Bb%2Cn%5Ctimes1%7D%29%5ET%2C%5Cmu%3D%28%5Cmu%7Ba%2Cm%5Ctimes1%7D%2C%20%5Cmu%7Bb%2Cn%5Ctimes1%7D%29%2C%5CSigma%3D%5Cbegin%7Bpmatrix%7D%5CSigma%7Baa%7D%26%5CSigma%7Bab%7D%5C%5C%5CSigma%7Bba%7D%26%5CSigma_%7Bbb%7D%5Cend%7Bpmatrix%7D&height=39&width=519),已知 #card=math&code=x%5Csim%5Cmathcal%7BN%7D%28%5Cmu%2C%5CSigma%29&height=19&width=81)。
首先是一个高斯分布的定理:
定理:已知 %2C%20y%5Csim%20Ax%2Bb#card=math&code=x%5Csim%5Cmathcal%7BN%7D%28%5Cmu%2C%5CSigma%29%2C%20y%5Csim%20Ax%2Bb&height=19&width=160),那么 #card=math&code=y%5Csim%5Cmathcal%7BN%7D%28A%5Cmu%2Bb%2C%20A%5CSigma%20A%5ET%29&height=20&width=147)。 证明:,。
下面利用这个定理得到 %2Cp(x_b)%2Cp(x_a%7Cx_b)%2Cp(x_b%7Cx_a)#card=math&code=p%28x_a%29%2Cp%28x_b%29%2Cp%28x_a%7Cx_b%29%2Cp%28x_b%7Cx_a%29&height=18&width=196) 这四个量。
- %5Cend%7Bpmatrix%7D%5Cbegin%7Bpmatrix%7Dxa%5C%5Cx_b%5Cend%7Bpmatrix%7D#card=math&code=x_a%3D%5Cbegin%7Bpmatrix%7D%5Cmathbb%7BI%7D%7Bm%5Ctimes%20m%7D%26%5Cmathbb%7BO%7D%7Bm%5Ctimes%20n%7D%29%5Cend%7Bpmatrix%7D%5Cbegin%7Bpmatrix%7Dx_a%5C%5Cx_b%5Cend%7Bpmatrix%7D&height=39&width=189),代入定理中得到:![](https://g.yuque.com/gr/latex?%5Cmathbb%7BE%7D%5Bx_a%5D%3D%5Cbegin%7Bpmatrix%7D%5Cmathbb%7BI%7D%26%5Cmathbb%7BO%7D%5Cend%7Bpmatrix%7D%5Cbegin%7Bpmatrix%7D%5Cmu_a%5C%5C%5Cmu_b%5Cend%7Bpmatrix%7D%3D%5Cmu_a%5C%5C%0AVar%5Bx_a%5D%3D%5Cbegin%7Bpmatrix%7D%5Cmathbb%7BI%7D%26%5Cmathbb%7BO%7D%5Cend%7Bpmatrix%7D%5Cbegin%7Bpmatrix%7D%5CSigma%7Baa%7D%26%5CSigma%7Bab%7D%5C%5C%5CSigma%7Bba%7D%26%5CSigma%7Bbb%7D%5Cend%7Bpmatrix%7D%5Cbegin%7Bpmatrix%7D%5Cmathbb%7BI%7D%5C%5C%5Cmathbb%7BO%7D%5Cend%7Bpmatrix%7D%3D%5CSigma%7Baa%7D%0A#card=math&code=%5Cmathbb%7BE%7D%5Bxa%5D%3D%5Cbegin%7Bpmatrix%7D%5Cmathbb%7BI%7D%26%5Cmathbb%7BO%7D%5Cend%7Bpmatrix%7D%5Cbegin%7Bpmatrix%7D%5Cmu_a%5C%5C%5Cmu_b%5Cend%7Bpmatrix%7D%3D%5Cmu_a%5C%5C%0AVar%5Bx_a%5D%3D%5Cbegin%7Bpmatrix%7D%5Cmathbb%7BI%7D%26%5Cmathbb%7BO%7D%5Cend%7Bpmatrix%7D%5Cbegin%7Bpmatrix%7D%5CSigma%7Baa%7D%26%5CSigma%7Bab%7D%5C%5C%5CSigma%7Bba%7D%26%5CSigma%7Bbb%7D%5Cend%7Bpmatrix%7D%5Cbegin%7Bpmatrix%7D%5Cmathbb%7BI%7D%5C%5C%5Cmathbb%7BO%7D%5Cend%7Bpmatrix%7D%3D%5CSigma%7Baa%7D%0A&height=81&width=643)
所以 #card=math&code=xa%5Csim%5Cmathcal%7BN%7D%28%5Cmu_a%2C%5CSigma%7Baa%7D%29&height=19&width=107)。
- 同样的,#card=math&code=xb%5Csim%5Cmathcal%7BN%7D%28%5Cmu_b%2C%5CSigma%7Bbb%7D%29&height=19&width=103)。
- 对于两个条件概率,我们引入三个量:
特别的,最后一个式子叫做 的 Schur Complementary。可以看到:
所以:
利用这三个量可以得到 。因此:
这里同样用到了定理。
- 同样:
所以:
下面利用上边四个量,求解线性模型:
已知:%3D%5Cmathcal%7BN%7D(%5Cmu%2C%5CLambda%5E%7B-1%7D)%2Cp(y%7Cx)%3D%5Cmathcal%7BN%7D(Ax%2Bb%2CL%5E%7B-1%7D)#card=math&code=p%28x%29%3D%5Cmathcal%7BN%7D%28%5Cmu%2C%5CLambda%5E%7B-1%7D%29%2Cp%28y%7Cx%29%3D%5Cmathcal%7BN%7D%28Ax%2Bb%2CL%5E%7B-1%7D%29&height=20&width=284),求解:%2Cp(x%7Cy)#card=math&code=p%28y%29%2Cp%28x%7Cy%29&height=18&width=73)。 解:令 #card=math&code=y%3DAx%2Bb%2B%5Cepsilon%2C%5Cepsilon%5Csim%5Cmathcal%7BN%7D%280%2CL%5E%7B-1%7D%29&height=20&width=194),所以 ,,因此: %3D%5Cmathcal%7BN%7D(A%5Cmu%2Bb%2CL%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET)%0A#card=math&code=p%28y%29%3D%5Cmathcal%7BN%7D%28A%5Cmu%2Bb%2CL%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET%29%0A&height=20&width=225) 引入 ,我们可以得到 (y-%5Cmathbb%7BE%7D%5By%5D)%5ET%5D#card=math&code=Cov%5Bx%2Cy%5D%3D%5Cmathbb%7BE%7D%5B%28x-%5Cmathbb%7BE%7D%5Bx%5D%29%28y-%5Cmathbb%7BE%7D%5By%5D%29%5ET%5D&height=20&width=232)。对于这个协方差可以直接计算: %26%3D%5Cmathbb%7BE%7D%5B(x-%5Cmu)(Ax-A%5Cmu%2B%5Cepsilon)%5ET%5D%3D%5Cmathbb%7BE%7D%5B(x-%5Cmu)(x-%5Cmu)%5ETA%5ET%5D%3DVar%5Bx%5DA%5ET%3D%5CLambda%5E%7B-1%7DA%5ET%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0ACov%28x%2Cy%29%26%3D%5Cmathbb%7BE%7D%5B%28x-%5Cmu%29%28Ax-A%5Cmu%2B%5Cepsilon%29%5ET%5D%3D%5Cmathbb%7BE%7D%5B%28x-%5Cmu%29%28x-%5Cmu%29%5ETA%5ET%5D%3DVar%5Bx%5DA%5ET%3D%5CLambda%5E%7B-1%7DA%5ET%0A%5Cend%7Balign%7D%0A&height=20&width=564) 注意到协方差矩阵的对称性,所以 %3D%5Cmathcal%7BN%7D%5Cbegin%7Bpmatrix%7D%5Cmu%5C%5CA%5Cmu%2Bb%5Cend%7Bpmatrix%7D%2C%5Cbegin%7Bpmatrix%7D%5CLambda%5E%7B-1%7D%26%5CLambda%5E%7B-1%7DA%5ET%5C%5CA%5CLambda%5E%7B-1%7D%26L%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET%5Cend%7Bpmatrix%7D)#card=math&code=p%28z%29%3D%5Cmathcal%7BN%7D%5Cbegin%7Bpmatrix%7D%5Cmu%5C%5CA%5Cmu%2Bb%5Cend%7Bpmatrix%7D%2C%5Cbegin%7Bpmatrix%7D%5CLambda%5E%7B-1%7D%26%5CLambda%5E%7B-1%7DA%5ET%5C%5CA%5CLambda%5E%7B-1%7D%26L%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET%5Cend%7Bpmatrix%7D%29&height=39&width=329)。根据之前的公式,我们可以得到: %5E%7B-1%7D(y-A%5Cmu-b)%0A#card=math&code=%5Cmathbb%7BE%7D%5Bx%7Cy%5D%3D%5Cmu%2B%5CLambda%5E%7B-1%7DA%5ET%28L%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET%29%5E%7B-1%7D%28y-A%5Cmu-b%29%0A&height=20&width=340)
%5E%7B-1%7DA%5CLambda%5E%7B-1%7D%0A#card=math&code=Var%5Bx%7Cy%5D%3D%5CLambda%5E%7B-1%7D-%5CLambda%5E%7B-1%7DA%5ET%28L%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET%29%5E%7B-1%7DA%5CLambda%5E%7B-1%7D%0A&height=20&width=327)