对概率的诠释有两大学派，一种是频率派另一种是贝叶斯派。后面我们对观测集采用下面记号：

$0.Introduction - 图1$ %5E%7BT%7D%2Cx%7Bi%7D%3D(x%7Bi1%7D%2Cx%7Bi2%7D%2C%5Ccdots%2Cx%7Bip%7D)%5E%7BT%7D%0A#card=math&code=X%7BN%5Ctimes%20p%7D%3D%28x%7B1%7D%2Cx%7B2%7D%2C%5Ccdots%2Cx%7BN%7D%29%5E%7BT%7D%2Cx%7Bi%7D%3D%28x%7Bi1%7D%2Cx%7Bi2%7D%2C%5Ccdots%2Cx%7Bip%7D%29%5E%7BT%7D%0A&height=21&width=325#crop=0&crop=0&crop=1&crop=1&id=NsYdQ&originHeight=30&originWidth=455&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

这个记号表示有 $0.Introduction - 图2$ 个样本，每个样本都是 $0.Introduction - 图3$ 维向量。其中每个观测都是由 $0.Introduction - 图4$ #card=math&code=p%28x%7C%5Ctheta%29&height=18&width=39#crop=0&crop=0&crop=1&crop=1&id=YAo5e&originHeight=26&originWidth=55&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 生成的。

频率派的观点

$0.Introduction - 图5$ #card=math&code=p%28x%7C%5Ctheta%29&height=18&width=39#crop=0&crop=0&crop=1&crop=1&id=ZnMb5&originHeight=26&originWidth=55&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)中的 $0.Introduction - 图6$ 是一个常量。对于 $0.Introduction - 图7$ 个观测来说观测集的概率为 $0.Introduction - 图8$ %5Cmathop%7B%3D%7D%5Climits%20%7Biid%7D%5Cprod%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dp(x_%7Bi%7D%7C%5Ctheta)#crop=0&crop=0&crop=1&crop=1&id=jUInc&originHeight=66&originWidth=178&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)为了求 $0.Introduction - 图9$ 的大小，我们采用最大对数似然MLE的方法：

$0.Introduction - 图10$ %5Cmathop%7B%3D%7D%5Climits%20%7Biid%7D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%20p(x%7Bi%7D%7C%5Ctheta)%0A#card=math&code=%5Ctheta%7BMLE%7D%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Clog%20p%28X%7C%5Ctheta%29%5Cmathop%7B%3D%7D%5Climits%20%7Biid%7D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%20p%28x%7Bi%7D%7C%5Ctheta%29%0A&height=47&width=334#crop=0&crop=0&crop=1&crop=1&id=fhfyQ&originHeight=66&originWidth=469&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

贝叶斯派的观点

贝叶斯派认为 $0.Introduction - 图11$ #card=math&code=p%28x%7C%5Ctheta%29&height=18&width=39#crop=0&crop=0&crop=1&crop=1&id=C8xTt&originHeight=26&originWidth=55&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 中的 $0.Introduction - 图12$ 不是一个常量。这个 $0.Introduction - 图13$ 满足一个预设的先验的分布 $0.Introduction - 图14$ #card=math&code=%5Ctheta%5Csim%20p%28%5Ctheta%29&height=18&width=53#crop=0&crop=0&crop=1&crop=1&id=R8o94&originHeight=26&originWidth=74&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 。于是根据贝叶斯定理依赖观测集参数的后验可以写成：

$0.Introduction - 图15$ %3D%5Cfrac%7Bp(X%7C%5Ctheta)%5Ccdot%20p(%5Ctheta)%7D%7Bp(X)%7D%3D%5Cfrac%7Bp(X%7C%5Ctheta)%5Ccdot%20p(%5Ctheta)%7D%7B%5Cint%5Climits%20%7B%5Ctheta%7Dp(X%7C%5Ctheta)%5Ccdot%20p(%5Ctheta)d%5Ctheta%7D%0A#card=math&code=p%28%5Ctheta%7CX%29%3D%5Cfrac%7Bp%28X%7C%5Ctheta%29%5Ccdot%20p%28%5Ctheta%29%7D%7Bp%28X%29%7D%3D%5Cfrac%7Bp%28X%7C%5Ctheta%29%5Ccdot%20p%28%5Ctheta%29%7D%7B%5Cint%5Climits%20%7B%5Ctheta%7Dp%28X%7C%5Ctheta%29%5Ccdot%20p%28%5Ctheta%29d%5Ctheta%7D%0A&height=54&width=280#crop=0&crop=0&crop=1&crop=1&id=n7eKA&originHeight=77&originWidth=393&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

为了求 $0.Introduction - 图16$ 的值，我们要最大化这个参数后验MAP：

$0.Introduction - 图17$ %3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7Dp(X%7C%5Ctheta)%5Ccdot%20p(%5Ctheta)%0A#card=math&code=%5Ctheta%7BMAP%7D%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7Dp%28%5Ctheta%7CX%29%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7Dp%28X%7C%5Ctheta%29%5Ccdot%20p%28%5Ctheta%29%0A&height=28&width=306#crop=0&crop=0&crop=1&crop=1&id=Ogtm7&originHeight=41&originWidth=429&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

其中第二个等号是由于分母和 $0.Introduction - 图18$ 没有关系。求解这个 $0.Introduction - 图19$ 值后计算 $0.Introduction - 图20$ %5Ccdot%20p(%5Ctheta)%7D%7B%5Cint%5Climits%20%7B%5Ctheta%7Dp(X%7C%5Ctheta)%5Ccdot%20p(%5Ctheta)d%5Ctheta%7D#card=math&code=%5Cfrac%7Bp%28X%7C%5Ctheta%29%5Ccdot%20p%28%5Ctheta%29%7D%7B%5Cint%5Climits%20%7B%5Ctheta%7Dp%28X%7C%5Ctheta%29%5Ccdot%20p%28%5Ctheta%29d%5Ctheta%7D&height=54&width=111#crop=0&crop=0&crop=1&crop=1&id=MPRdF&originHeight=77&originWidth=157&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) ，就得到了参数的后验概率。其中 $0.Introduction - 图21$ #card=math&code=p%28X%7C%5Ctheta%29&height=18&width=43#crop=0&crop=0&crop=1&crop=1&id=TjEYy&originHeight=26&originWidth=61&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 叫似然，是我们的模型分布。得到了参数的后验分布后，我们可以将这个分布用于预测贝叶斯预测：

$0.Introduction - 图22$ %3D%5Cint%5Climits%20%7B%5Ctheta%7Dp(x%7Bnew%7D%7C%5Ctheta)%5Ccdot%20p(%5Ctheta%7CX)d%5Ctheta%0A#card=math&code=p%28x%7Bnew%7D%7CX%29%3D%5Cint%5Climits%20%7B%5Ctheta%7Dp%28x_%7Bnew%7D%7C%5Ctheta%29%5Ccdot%20p%28%5Ctheta%7CX%29d%5Ctheta%0A&height=47&width=230#crop=0&crop=0&crop=1&crop=1&id=Vgo0B&originHeight=66&originWidth=322&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

其中积分中的被乘数是模型，乘数是后验分布。

小结

频率派和贝叶斯派分别给出了一系列的机器学习算法。频率派的观点导出了一系列的统计机器学习算法而贝叶斯派导出了概率图理论。在应用频率派的 MLE 方法时最优化理论占有重要地位。而贝叶斯派的算法无论是后验概率的建模还是应用这个后验进行推断时积分占有重要地位。因此采样积分方法如 MCMC 有很多应用。

MathBasics

高斯分布

一维情况 MLE

高斯分布在机器学习中占有举足轻重的作用。在 MLE 方法中：

$0.Introduction - 图23$ %3D(%5Cmu%2C%5Csigma%5E%7B2%7D)%2C%5Ctheta%7BMLE%7D%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Clog%20p(X%7C%5Ctheta)%5Cmathop%7B%3D%7D%5Climits%20%7Biid%7D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%20p(x%7Bi%7D%7C%5Ctheta)%0A#card=math&code=%5Ctheta%3D%28%5Cmu%2C%5CSigma%29%3D%28%5Cmu%2C%5Csigma%5E%7B2%7D%29%2C%5Ctheta%7BMLE%7D%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Clog%20p%28X%7C%5Ctheta%29%5Cmathop%7B%3D%7D%5Climits%20%7Biid%7D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%20p%28x%7Bi%7D%7C%5Ctheta%29%0A&height=47&width=469#crop=0&crop=0&crop=1&crop=1&id=tFxib&originHeight=66&originWidth=656&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

一般地，高斯分布的概率密度函数PDF写为：

$0.Introduction - 图24$ %3D%5Cfrac%7B1%7D%7B(2%5Cpi)%5E%7Bp%2F2%7D%7C%5CSigma%7C%5E%7B1%2F2%7D%7De%5E%7B-%5Cfrac%7B1%7D%7B2%7D(x-%5Cmu)%5E%7BT%7D%5CSigma%5E%7B-1%7D(x-%5Cmu)%7D%0A#card=math&code=p%28x%7C%5Cmu%2C%5CSigma%29%3D%5Cfrac%7B1%7D%7B%282%5Cpi%29%5E%7Bp%2F2%7D%7C%5CSigma%7C%5E%7B1%2F2%7D%7De%5E%7B-%5Cfrac%7B1%7D%7B2%7D%28x-%5Cmu%29%5E%7BT%7D%5CSigma%5E%7B-1%7D%28x-%5Cmu%29%7D%0A&height=41&width=276#crop=0&crop=0&crop=1&crop=1&id=kH6PR&originHeight=59&originWidth=387&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

带入 MLE 中我们考虑一维的情况

$0.Introduction - 图25$ %3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%20p(x%7Bi%7D%7C%5Ctheta)%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%7D%5Csigma%7D%5Cexp(-(x%7Bi%7D-%5Cmu)%5E%7B2%7D%2F2%5Csigma%5E%7B2%7D)%0A#card=math&code=%5Clog%20p%28X%7C%5Ctheta%29%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%20p%28x%7Bi%7D%7C%5Ctheta%29%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%7D%5Csigma%7D%5Cexp%28-%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%2F2%5Csigma%5E%7B2%7D%29%0A&height=47&width=418#crop=0&crop=0&crop=1&crop=1&id=rba08&originHeight=66&originWidth=585&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

首先对 $0.Introduction - 图26$ 的极值可以得到：

$0.Introduction - 图27$ %3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Cmu%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D(x%7Bi%7D-%5Cmu)%5E%7B2%7D%0A#card=math&code=%5Cmu%7BMLE%7D%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Cmu%7D%5Clog%20p%28X%7C%5Ctheta%29%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Cmu%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%0A&height=47&width=330#crop=0&crop=0&crop=1&crop=1&id=jPYqL&originHeight=66&originWidth=462&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

于是：

$0.Introduction - 图28$ %5E%7B2%7D%3D0%5Clongrightarrow%5Cmu%7BMLE%7D%3D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%0A#card=math&code=%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%5Cmu%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%3D0%5Clongrightarrow%5Cmu%7BMLE%7D%3D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%0A&height=47&width=282#crop=0&crop=0&crop=1&crop=1&id=USbpM&originHeight=66&originWidth=395&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

其次对 $0.Introduction - 图29$ 中的另一个参数 $0.Introduction - 图30$ ，有：

$0.Introduction - 图31$ %26%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Csigma%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5B-%5Clog%5Csigma-%5Cfrac%7B1%7D%7B2%5Csigma%5E%7B2%7D%7D(x%7Bi%7D-%5Cmu)%5E%7B2%7D%5D%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmin%7D%5Climits%20%7B%5Csigma%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5B%5Clog%5Csigma%2B%5Cfrac%7B1%7D%7B2%5Csigma%5E%7B2%7D%7D(x%7Bi%7D-%5Cmu)%5E%7B2%7D%5D%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0A%5Csigma%7BMLE%7D%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Csigma%7D%5Clog%20p%28X%7C%5Ctheta%29%26%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Csigma%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5B-%5Clog%5Csigma-%5Cfrac%7B1%7D%7B2%5Csigma%5E%7B2%7D%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%5D%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmin%7D%5Climits%20%7B%5Csigma%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5B%5Clog%5Csigma%2B%5Cfrac%7B1%7D%7B2%5Csigma%5E%7B2%7D%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%5D%0A%5Cend%7Balign%7D%0A&height=95&width=433#crop=0&crop=0&crop=1&crop=1&id=dGmfJ&originHeight=134&originWidth=607&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

于是：

$0.Introduction - 图32$ %5E%7B2%7D%5D%3D0%5Clongrightarrow%5Csigma%7BMLE%7D%5E%7B2%7D%3D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D(x%7Bi%7D-%5Cmu)%5E%7B2%7D%0A#card=math&code=%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%5Csigma%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5B%5Clog%5Csigma%2B%5Cfrac%7B1%7D%7B2%5Csigma%5E%7B2%7D%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%5D%3D0%5Clongrightarrow%5Csigma%7BMLE%7D%5E%7B2%7D%3D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%0A&height=47&width=409#crop=0&crop=0&crop=1&crop=1&id=km9O5&originHeight=66&originWidth=573&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

值得注意的是，上面的推导中，首先对 $0.Introduction - 图33$ 求 MLE，然后利用这个结果求 $0.Introduction - 图34$ ，因此可以预期的是对数据集求期望时 $0.Introduction - 图35$ 是无偏差的：

$0.Introduction - 图36$

但是当对 $0.Introduction - 图37$ 求期望的时候由于使用了单个数据集的 $0.Introduction - 图38$ ，因此对所有数据集求期望的时候我们会发现 $0.Introduction - 图39$ 是有偏的：

$0.Introduction - 图40$ %5E%7B2%7D%5D%3D%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D(x%7Bi%7D%5E%7B2%7D-2x%7Bi%7D%5Cmu%7BMLE%7D%2B%5Cmu%7BMLE%7D%5E%7B2%7D)%5Cnonumber%0A%5C%5C%26%3D%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%5E%7B2%7D-%5Cmu%7BMLE%7D%5E%7B2%7D%5D%3D%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%5E%7B2%7D-%5Cmu%5E%7B2%7D%2B%5Cmu%5E%7B2%7D-%5Cmu%7BMLE%7D%5E%7B2%7D%5D%5Cnonumber%5C%5C%0A%26%3D%20%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%5E%7B2%7D-%5Cmu%5E%7B2%7D%5D-%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cmu%7BMLE%7D%5E%7B2%7D-%5Cmu%5E%7B2%7D%5D%3D%5Csigma%5E%7B2%7D-(%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cmu%7BMLE%7D%5E%7B2%7D%5D-%5Cmu%5E%7B2%7D)%5Cnonumber%5C%5C%26%3D%5Csigma%5E%7B2%7D-(%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cmu%7BMLE%7D%5E%7B2%7D%5D-%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5E%7B2%7D%5B%5Cmu%7BMLE%7D%5D)%3D%5Csigma%5E%7B2%7D-Var%5B%5Cmu%7BMLE%7D%5D%5Cnonumber%5C%5C%26%3D%5Csigma%5E%7B2%7D-Var%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%5D%3D%5Csigma%5E%7B2%7D-%5Cfrac%7B1%7D%7BN%5E%7B2%7D%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7DVar%5Bx%7Bi%7D%5D%3D%5Cfrac%7BN-1%7D%7BN%7D%5Csigma%5E%7B2%7D%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0A%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Csigma%7BMLE%7D%5E%7B2%7D%5D%26%3D%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%28x%7Bi%7D-%5Cmu%7BMLE%7D%29%5E%7B2%7D%5D%3D%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%28x%7Bi%7D%5E%7B2%7D-2x%7Bi%7D%5Cmu%7BMLE%7D%2B%5Cmu%7BMLE%7D%5E%7B2%7D%29%5Cnonumber%0A%5C%5C%26%3D%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%5E%7B2%7D-%5Cmu%7BMLE%7D%5E%7B2%7D%5D%3D%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%5E%7B2%7D-%5Cmu%5E%7B2%7D%2B%5Cmu%5E%7B2%7D-%5Cmu%7BMLE%7D%5E%7B2%7D%5D%5Cnonumber%5C%5C%0A%26%3D%20%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%5E%7B2%7D-%5Cmu%5E%7B2%7D%5D-%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cmu%7BMLE%7D%5E%7B2%7D-%5Cmu%5E%7B2%7D%5D%3D%5Csigma%5E%7B2%7D-%28%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cmu%7BMLE%7D%5E%7B2%7D%5D-%5Cmu%5E%7B2%7D%29%5Cnonumber%5C%5C%26%3D%5Csigma%5E%7B2%7D-%28%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cmu%7BMLE%7D%5E%7B2%7D%5D-%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5E%7B2%7D%5B%5Cmu%7BMLE%7D%5D%29%3D%5Csigma%5E%7B2%7D-Var%5B%5Cmu%7BMLE%7D%5D%5Cnonumber%5C%5C%26%3D%5Csigma%5E%7B2%7D-Var%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%5D%3D%5Csigma%5E%7B2%7D-%5Cfrac%7B1%7D%7BN%5E%7B2%7D%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7DVar%5Bx%7Bi%7D%5D%3D%5Cfrac%7BN-1%7D%7BN%7D%5Csigma%5E%7B2%7D%0A%5Cend%7Balign%7D%0A&height=214&width=493#crop=0&crop=0&crop=1&crop=1&id=beP0J&originHeight=300&originWidth=690&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

所以：

$0.Introduction - 图41$ %5E%7B2%7D%0A#card=math&code=%5Chat%7B%5Csigma%7D%5E%7B2%7D%3D%5Cfrac%7B1%7D%7BN-1%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%0A&height=47&width=163#crop=0&crop=0&crop=1&crop=1&id=t4S8w&originHeight=66&originWidth=228&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

多维情况

多维高斯分布表达式为：

$0.Introduction - 图42$ %3D%5Cfrac%7B1%7D%7B(2%5Cpi)%5E%7Bp%2F2%7D%7C%5CSigma%7C%5E%7B1%2F2%7D%7De%5E%7B-%5Cfrac%7B1%7D%7B2%7D(x-%5Cmu)%5E%7BT%7D%5CSigma%5E%7B-1%7D(x-%5Cmu)%7D%0A#card=math&code=p%28x%7C%5Cmu%2C%5CSigma%29%3D%5Cfrac%7B1%7D%7B%282%5Cpi%29%5E%7Bp%2F2%7D%7C%5CSigma%7C%5E%7B1%2F2%7D%7De%5E%7B-%5Cfrac%7B1%7D%7B2%7D%28x-%5Cmu%29%5E%7BT%7D%5CSigma%5E%7B-1%7D%28x-%5Cmu%29%7D%0A&height=41&width=276#crop=0&crop=0&crop=1&crop=1&id=aXK89&originHeight=59&originWidth=387&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

其中 $0.Introduction - 图43$ ， $0.Introduction - 图44$ 为协方差矩阵，一般而言也是半正定矩阵。这里我们只考虑正定矩阵。首先我们处理指数上的数字，指数上的数字可以记为 $0.Introduction - 图45$ 和 $0.Introduction - 图46$ 之间的马氏距离。对于对称的协方差矩阵可进行特征值分解， $0.Introduction - 图47$ diag(%5Clambda%7Bi%7D)(u%7B1%7D%2Cu%7B2%7D%2C%5Ccdots%2Cu%7Bp%7D)%5E%7BT%7D%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7Du%7Bi%7D%5Clambda%7Bi%7Du%7Bi%7D%5E%7BT%7D#card=math&code=%5CSigma%3DU%5CLambda%20U%5E%7BT%7D%3D%28u%7B1%7D%2Cu%7B2%7D%2C%5Ccdots%2Cu%7Bp%7D%29diag%28%5Clambda%7Bi%7D%29%28u%7B1%7D%2Cu%7B2%7D%2C%5Ccdots%2Cu%7Bp%7D%29%5E%7BT%7D%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7Du%7Bi%7D%5Clambda%7Bi%7Du_%7Bi%7D%5E%7BT%7D&height=45&width=440#crop=0&crop=0&crop=1&crop=1&id=SEv2v&originHeight=63&originWidth=616&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) ，于是：

$0.Introduction - 图48$

$0.Introduction - 图49$ %5E%7BT%7D%5CSigma%5E%7B-1%7D(x-%5Cmu)%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7D(x-%5Cmu)%5E%7BT%7Du%7Bi%7D%5Cfrac%7B1%7D%7B%5Clambda%7Bi%7D%7Du%7Bi%7D%5E%7BT%7D(x-%5Cmu)%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7D%5Cfrac%7By%7Bi%7D%5E%7B2%7D%7D%7B%5Clambda%7Bi%7D%7D%0A#card=math&code=%5CDelta%3D%28x-%5Cmu%29%5E%7BT%7D%5CSigma%5E%7B-1%7D%28x-%5Cmu%29%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7D%28x-%5Cmu%29%5E%7BT%7Du%7Bi%7D%5Cfrac%7B1%7D%7B%5Clambda%7Bi%7D%7Du%7Bi%7D%5E%7BT%7D%28x-%5Cmu%29%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7D%5Cfrac%7By%7Bi%7D%5E%7B2%7D%7D%7B%5Clambda%7Bi%7D%7D%0A&height=45&width=421#crop=0&crop=0&crop=1&crop=1&id=LhG1m&originHeight=63&originWidth=590&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)

我们注意到 $0.Introduction - 图50$ 是 $0.Introduction - 图51$ 在特征向量 $0.Introduction - 图52$ 上的投影长度，因此上式子就是 $0.Introduction - 图53$ 取不同值时的同心椭圆。

下面我们看多维高斯模型在实际应用时的两个问题

参数 $0.Introduction - 图54$ 的自由度为 $0.Introduction - 图55$ #card=math&code=O%28p%5E%7B2%7D%29&height=20&width=37#crop=0&crop=0&crop=1&crop=1&id=DLrIP&originHeight=29&originWidth=52&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 对于维度很高的数据其自由度太高。解决方案：高自由度的来源是 $0.Introduction - 图56$ 有 $0.Introduction - 图57$ %7D%7B2%7D#card=math&code=%5Cfrac%7Bp%28p%2B1%29%7D%7B2%7D&height=36&width=57#crop=0&crop=0&crop=1&crop=1&id=iKMZt&originHeight=51&originWidth=81&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 个自由参数，可以假设其是对角矩阵，甚至在各向同性假设中假设其对角线上的元素都相同。前一种的算法有 Factor Analysis，后一种有概率 PCA(p-PCA) 。
第二个问题是单个高斯分布是单峰的，对有多个峰的数据分布不能得到好的结果。解决方案：高斯混合GMM 模型。

下面对多维高斯分布的常用定理进行介绍。

我们记 $0.Introduction - 图58$ %5ET%3D(x%7Ba%2Cm%5Ctimes%201%7D%2C%20x%7Bb%2Cn%5Ctimes1%7D)%5ET%2C%5Cmu%3D(%5Cmu%7Ba%2Cm%5Ctimes1%7D%2C%20%5Cmu%7Bb%2Cn%5Ctimes1%7D)%2C%5CSigma%3D%5Cbegin%7Bpmatrix%7D%5CSigma%7Baa%7D%26%5CSigma%7Bab%7D%5C%5C%5CSigma%7Bba%7D%26%5CSigma%7Bbb%7D%5Cend%7Bpmatrix%7D#card=math&code=x%3D%28x1%2C%20x_2%2C%5Ccdots%2Cx_p%29%5ET%3D%28x%7Ba%2Cm%5Ctimes%201%7D%2C%20x%7Bb%2Cn%5Ctimes1%7D%29%5ET%2C%5Cmu%3D%28%5Cmu%7Ba%2Cm%5Ctimes1%7D%2C%20%5Cmu%7Bb%2Cn%5Ctimes1%7D%29%2C%5CSigma%3D%5Cbegin%7Bpmatrix%7D%5CSigma%7Baa%7D%26%5CSigma%7Bab%7D%5C%5C%5CSigma%7Bba%7D%26%5CSigma_%7Bbb%7D%5Cend%7Bpmatrix%7D&height=39&width=519#crop=0&crop=0&crop=1&crop=1&id=bn11a&originHeight=56&originWidth=727&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)，已知 $0.Introduction - 图59$ #card=math&code=x%5Csim%5Cmathcal%7BN%7D%28%5Cmu%2C%5CSigma%29&height=19&width=81#crop=0&crop=0&crop=1&crop=1&id=WP8mg&originHeight=27&originWidth=114&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。

首先是一个高斯分布的定理：

定理：已知 $0.Introduction - 图60$ %2C%20y%5Csim%20Ax%2Bb#card=math&code=x%5Csim%5Cmathcal%7BN%7D%28%5Cmu%2C%5CSigma%29%2C%20y%5Csim%20Ax%2Bb&height=19&width=160#crop=0&crop=0&crop=1&crop=1&id=cmhNH&originHeight=27&originWidth=223&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)，那么 $0.Introduction - 图61$ #card=math&code=y%5Csim%5Cmathcal%7BN%7D%28A%5Cmu%2Bb%2C%20A%5CSigma%20A%5ET%29&height=20&width=147#crop=0&crop=0&crop=1&crop=1&id=f0Fmq&originHeight=29&originWidth=206&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。
证明： $0.Introduction - 图62$ ， $0.Introduction - 图63$ 。

下面利用这个定理得到 $0.Introduction - 图64$ %2Cp(x_b)%2Cp(x_a%7Cx_b)%2Cp(x_b%7Cx_a)#card=math&code=p%28x_a%29%2Cp%28x_b%29%2Cp%28x_a%7Cx_b%29%2Cp%28x_b%7Cx_a%29&height=18&width=196#crop=0&crop=0&crop=1&crop=1&id=Xd8vq&originHeight=26&originWidth=274&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 这四个量。

，代入定理中得到：
$0.Introduction - 图66$

所以 $0.Introduction - 图67$ #card=math&code=xa%5Csim%5Cmathcal%7BN%7D%28%5Cmu_a%2C%5CSigma%7Baa%7D%29&height=19&width=107#crop=0&crop=0&crop=1&crop=1&id=o7VXF&originHeight=27&originWidth=151&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。
2. 同样的， $0.Introduction - 图68$ #card=math&code=xb%5Csim%5Cmathcal%7BN%7D%28%5Cmu_b%2C%5CSigma%7Bbb%7D%29&height=19&width=103#crop=0&crop=0&crop=1&crop=1&id=rNjNr&originHeight=27&originWidth=145&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。
3. 对于两个条件概率，我们引入三个量：
$0.Introduction - 图69$

特别的，最后一个式子叫做 $0.Introduction - 图70$ 的 Schur Complementary。可以看到：

$0.Introduction - 图71$

所以：
$0.Introduction - 图72$

利用这三个量可以得到 $0.Introduction - 图73$ 。因此：
$0.Introduction - 图74$
$0.Introduction - 图75$

这里同样用到了定理。
4. 同样：
$0.Introduction - 图76$

所以： $0.Introduction - 图77$ $0.Introduction - 图78$

下面利用上边四个量，求解线性模型：

已知： $0.Introduction - 图79$ %3D%5Cmathcal%7BN%7D(%5Cmu%2C%5CLambda%5E%7B-1%7D)%2Cp(y%7Cx)%3D%5Cmathcal%7BN%7D(Ax%2Bb%2CL%5E%7B-1%7D)#card=math&code=p%28x%29%3D%5Cmathcal%7BN%7D%28%5Cmu%2C%5CLambda%5E%7B-1%7D%29%2Cp%28y%7Cx%29%3D%5Cmathcal%7BN%7D%28Ax%2Bb%2CL%5E%7B-1%7D%29&height=20&width=284#crop=0&crop=0&crop=1&crop=1&id=BUiJ4&originHeight=29&originWidth=397&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)，求解： $0.Introduction - 图80$ %2Cp(x%7Cy)#card=math&code=p%28y%29%2Cp%28x%7Cy%29&height=18&width=73#crop=0&crop=0&crop=1&crop=1&id=y2i09&originHeight=26&originWidth=102&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。
解：令 $0.Introduction - 图81$ #card=math&code=y%3DAx%2Bb%2B%5Cepsilon%2C%5Cepsilon%5Csim%5Cmathcal%7BN%7D%280%2CL%5E%7B-1%7D%29&height=20&width=194#crop=0&crop=0&crop=1&crop=1&id=lZ0gS&originHeight=29&originWidth=272&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)，所以 $0.Introduction - 图82$ ， $0.Introduction - 图83$ ，因此：
$0.Introduction - 图84$ %3D%5Cmathcal%7BN%7D(A%5Cmu%2Bb%2CL%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET)%0A#card=math&code=p%28y%29%3D%5Cmathcal%7BN%7D%28A%5Cmu%2Bb%2CL%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET%29%0A&height=20&width=225#crop=0&crop=0&crop=1&crop=1&id=YCpwU&originHeight=29&originWidth=315&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
引入 $0.Introduction - 图85$ ，我们可以得到 $0.Introduction - 图86$ (y-%5Cmathbb%7BE%7D%5By%5D)%5ET%5D#card=math&code=Cov%5Bx%2Cy%5D%3D%5Cmathbb%7BE%7D%5B%28x-%5Cmathbb%7BE%7D%5Bx%5D%29%28y-%5Cmathbb%7BE%7D%5By%5D%29%5ET%5D&height=20&width=232#crop=0&crop=0&crop=1&crop=1&id=Otmcn&originHeight=29&originWidth=325&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。对于这个协方差可以直接计算：
$0.Introduction - 图87$ %26%3D%5Cmathbb%7BE%7D%5B(x-%5Cmu)(Ax-A%5Cmu%2B%5Cepsilon)%5ET%5D%3D%5Cmathbb%7BE%7D%5B(x-%5Cmu)(x-%5Cmu)%5ETA%5ET%5D%3DVar%5Bx%5DA%5ET%3D%5CLambda%5E%7B-1%7DA%5ET%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0ACov%28x%2Cy%29%26%3D%5Cmathbb%7BE%7D%5B%28x-%5Cmu%29%28Ax-A%5Cmu%2B%5Cepsilon%29%5ET%5D%3D%5Cmathbb%7BE%7D%5B%28x-%5Cmu%29%28x-%5Cmu%29%5ETA%5ET%5D%3DVar%5Bx%5DA%5ET%3D%5CLambda%5E%7B-1%7DA%5ET%0A%5Cend%7Balign%7D%0A&height=20&width=564#crop=0&crop=0&crop=1&crop=1&id=ScPy7&originHeight=29&originWidth=790&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
注意到协方差矩阵的对称性，所以 $0.Introduction - 图88$ %3D%5Cmathcal%7BN%7D%5Cbegin%7Bpmatrix%7D%5Cmu%5C%5CA%5Cmu%2Bb%5Cend%7Bpmatrix%7D%2C%5Cbegin%7Bpmatrix%7D%5CLambda%5E%7B-1%7D%26%5CLambda%5E%7B-1%7DA%5ET%5C%5CA%5CLambda%5E%7B-1%7D%26L%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET%5Cend%7Bpmatrix%7D#crop=0&crop=0&crop=1&crop=1&id=lSJsW&originHeight=56&originWidth=452&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)根据之前的公式，我们可以得到：
$0.Introduction - 图89$ %5E%7B-1%7D(y-A%5Cmu-b)%0A#card=math&code=%5Cmathbb%7BE%7D%5Bx%7Cy%5D%3D%5Cmu%2B%5CLambda%5E%7B-1%7DA%5ET%28L%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET%29%5E%7B-1%7D%28y-A%5Cmu-b%29%0A&height=20&width=340#crop=0&crop=0&crop=1&crop=1&id=KnDlP&originHeight=29&originWidth=476&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
$0.Introduction - 图90$ %5E%7B-1%7DA%5CLambda%5E%7B-1%7D%0A#card=math&code=Var%5Bx%7Cy%5D%3D%5CLambda%5E%7B-1%7D-%5CLambda%5E%7B-1%7DA%5ET%28L%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET%29%5E%7B-1%7DA%5CLambda%5E%7B-1%7D%0A&height=20&width=327#crop=0&crop=0&crop=1&crop=1&id=NUlIG&originHeight=29&originWidth=458&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)