对概率的诠释有两大学派,一种是频率派另一种是贝叶斯派。后面我们对观测集采用下面记号:
%5E%7BT%7D%2Cx%7Bi%7D%3D(x%7Bi1%7D%2Cx%7Bi2%7D%2C%5Ccdots%2Cx%7Bip%7D)%5E%7BT%7D%0A#card=math&code=X%7BN%5Ctimes%20p%7D%3D%28x%7B1%7D%2Cx%7B2%7D%2C%5Ccdots%2Cx%7BN%7D%29%5E%7BT%7D%2Cx%7Bi%7D%3D%28x%7Bi1%7D%2Cx%7Bi2%7D%2C%5Ccdots%2Cx%7Bip%7D%29%5E%7BT%7D%0A&height=21&width=325#crop=0&crop=0&crop=1&crop=1&id=NsYdQ&originHeight=30&originWidth=455&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
这个记号表示有 个样本,每个样本都是
维向量。其中每个观测都是由
#card=math&code=p%28x%7C%5Ctheta%29&height=18&width=39#crop=0&crop=0&crop=1&crop=1&id=YAo5e&originHeight=26&originWidth=55&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 生成的。
频率派的观点
#card=math&code=p%28x%7C%5Ctheta%29&height=18&width=39#crop=0&crop=0&crop=1&crop=1&id=ZnMb5&originHeight=26&originWidth=55&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)中的
是一个常量。对于
个观测来说观测集的概率为
%5Cmathop%7B%3D%7D%5Climits%20%7Biid%7D%5Cprod%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dp(x_%7Bi%7D%7C%5Ctheta)#crop=0&crop=0&crop=1&crop=1&id=jUInc&originHeight=66&originWidth=178&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)为了求
的大小,我们采用最大对数似然MLE的方法:
%5Cmathop%7B%3D%7D%5Climits%20%7Biid%7D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%20p(x%7Bi%7D%7C%5Ctheta)%0A#card=math&code=%5Ctheta%7BMLE%7D%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Clog%20p%28X%7C%5Ctheta%29%5Cmathop%7B%3D%7D%5Climits%20%7Biid%7D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%20p%28x%7Bi%7D%7C%5Ctheta%29%0A&height=47&width=334#crop=0&crop=0&crop=1&crop=1&id=fhfyQ&originHeight=66&originWidth=469&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
贝叶斯派的观点
贝叶斯派认为 #card=math&code=p%28x%7C%5Ctheta%29&height=18&width=39#crop=0&crop=0&crop=1&crop=1&id=C8xTt&originHeight=26&originWidth=55&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 中的
不是一个常量。这个
满足一个预设的先验的分布
#card=math&code=%5Ctheta%5Csim%20p%28%5Ctheta%29&height=18&width=53#crop=0&crop=0&crop=1&crop=1&id=R8o94&originHeight=26&originWidth=74&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 。于是根据贝叶斯定理依赖观测集参数的后验可以写成:
%3D%5Cfrac%7Bp(X%7C%5Ctheta)%5Ccdot%20p(%5Ctheta)%7D%7Bp(X)%7D%3D%5Cfrac%7Bp(X%7C%5Ctheta)%5Ccdot%20p(%5Ctheta)%7D%7B%5Cint%5Climits%20%7B%5Ctheta%7Dp(X%7C%5Ctheta)%5Ccdot%20p(%5Ctheta)d%5Ctheta%7D%0A#card=math&code=p%28%5Ctheta%7CX%29%3D%5Cfrac%7Bp%28X%7C%5Ctheta%29%5Ccdot%20p%28%5Ctheta%29%7D%7Bp%28X%29%7D%3D%5Cfrac%7Bp%28X%7C%5Ctheta%29%5Ccdot%20p%28%5Ctheta%29%7D%7B%5Cint%5Climits%20%7B%5Ctheta%7Dp%28X%7C%5Ctheta%29%5Ccdot%20p%28%5Ctheta%29d%5Ctheta%7D%0A&height=54&width=280#crop=0&crop=0&crop=1&crop=1&id=n7eKA&originHeight=77&originWidth=393&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
为了求 的值,我们要最大化这个参数后验MAP:
%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7Dp(X%7C%5Ctheta)%5Ccdot%20p(%5Ctheta)%0A#card=math&code=%5Ctheta%7BMAP%7D%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7Dp%28%5Ctheta%7CX%29%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7Dp%28X%7C%5Ctheta%29%5Ccdot%20p%28%5Ctheta%29%0A&height=28&width=306#crop=0&crop=0&crop=1&crop=1&id=Ogtm7&originHeight=41&originWidth=429&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
其中第二个等号是由于分母和 没有关系。求解这个
值后计算
%5Ccdot%20p(%5Ctheta)%7D%7B%5Cint%5Climits%20%7B%5Ctheta%7Dp(X%7C%5Ctheta)%5Ccdot%20p(%5Ctheta)d%5Ctheta%7D#card=math&code=%5Cfrac%7Bp%28X%7C%5Ctheta%29%5Ccdot%20p%28%5Ctheta%29%7D%7B%5Cint%5Climits%20%7B%5Ctheta%7Dp%28X%7C%5Ctheta%29%5Ccdot%20p%28%5Ctheta%29d%5Ctheta%7D&height=54&width=111#crop=0&crop=0&crop=1&crop=1&id=MPRdF&originHeight=77&originWidth=157&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) ,就得到了参数的后验概率。其中
#card=math&code=p%28X%7C%5Ctheta%29&height=18&width=43#crop=0&crop=0&crop=1&crop=1&id=TjEYy&originHeight=26&originWidth=61&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 叫似然,是我们的模型分布。得到了参数的后验分布后,我们可以将这个分布用于预测贝叶斯预测:
%3D%5Cint%5Climits%20%7B%5Ctheta%7Dp(x%7Bnew%7D%7C%5Ctheta)%5Ccdot%20p(%5Ctheta%7CX)d%5Ctheta%0A#card=math&code=p%28x%7Bnew%7D%7CX%29%3D%5Cint%5Climits%20%7B%5Ctheta%7Dp%28x_%7Bnew%7D%7C%5Ctheta%29%5Ccdot%20p%28%5Ctheta%7CX%29d%5Ctheta%0A&height=47&width=230#crop=0&crop=0&crop=1&crop=1&id=Vgo0B&originHeight=66&originWidth=322&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
其中积分中的被乘数是模型,乘数是后验分布。
小结
频率派和贝叶斯派分别给出了一系列的机器学习算法。频率派的观点导出了一系列的统计机器学习算法而贝叶斯派导出了概率图理论。在应用频率派的 MLE 方法时最优化理论占有重要地位。而贝叶斯派的算法无论是后验概率的建模还是应用这个后验进行推断时积分占有重要地位。因此采样积分方法如 MCMC 有很多应用。
MathBasics
高斯分布
一维情况 MLE
高斯分布在机器学习中占有举足轻重的作用。在 MLE 方法中:
%3D(%5Cmu%2C%5Csigma%5E%7B2%7D)%2C%5Ctheta%7BMLE%7D%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Clog%20p(X%7C%5Ctheta)%5Cmathop%7B%3D%7D%5Climits%20%7Biid%7D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%20p(x%7Bi%7D%7C%5Ctheta)%0A#card=math&code=%5Ctheta%3D%28%5Cmu%2C%5CSigma%29%3D%28%5Cmu%2C%5Csigma%5E%7B2%7D%29%2C%5Ctheta%7BMLE%7D%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Clog%20p%28X%7C%5Ctheta%29%5Cmathop%7B%3D%7D%5Climits%20%7Biid%7D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Ctheta%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%20p%28x%7Bi%7D%7C%5Ctheta%29%0A&height=47&width=469#crop=0&crop=0&crop=1&crop=1&id=tFxib&originHeight=66&originWidth=656&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
一般地,高斯分布的概率密度函数PDF写为:
%3D%5Cfrac%7B1%7D%7B(2%5Cpi)%5E%7Bp%2F2%7D%7C%5CSigma%7C%5E%7B1%2F2%7D%7De%5E%7B-%5Cfrac%7B1%7D%7B2%7D(x-%5Cmu)%5E%7BT%7D%5CSigma%5E%7B-1%7D(x-%5Cmu)%7D%0A#card=math&code=p%28x%7C%5Cmu%2C%5CSigma%29%3D%5Cfrac%7B1%7D%7B%282%5Cpi%29%5E%7Bp%2F2%7D%7C%5CSigma%7C%5E%7B1%2F2%7D%7De%5E%7B-%5Cfrac%7B1%7D%7B2%7D%28x-%5Cmu%29%5E%7BT%7D%5CSigma%5E%7B-1%7D%28x-%5Cmu%29%7D%0A&height=41&width=276#crop=0&crop=0&crop=1&crop=1&id=kH6PR&originHeight=59&originWidth=387&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
带入 MLE 中我们考虑一维的情况
%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%20p(x%7Bi%7D%7C%5Ctheta)%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%7D%5Csigma%7D%5Cexp(-(x%7Bi%7D-%5Cmu)%5E%7B2%7D%2F2%5Csigma%5E%7B2%7D)%0A#card=math&code=%5Clog%20p%28X%7C%5Ctheta%29%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%20p%28x%7Bi%7D%7C%5Ctheta%29%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5Clog%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%7D%5Csigma%7D%5Cexp%28-%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%2F2%5Csigma%5E%7B2%7D%29%0A&height=47&width=418#crop=0&crop=0&crop=1&crop=1&id=rba08&originHeight=66&originWidth=585&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
首先对 的极值可以得到 :
%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Cmu%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D(x%7Bi%7D-%5Cmu)%5E%7B2%7D%0A#card=math&code=%5Cmu%7BMLE%7D%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Cmu%7D%5Clog%20p%28X%7C%5Ctheta%29%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Cmu%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%0A&height=47&width=330#crop=0&crop=0&crop=1&crop=1&id=jPYqL&originHeight=66&originWidth=462&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
于是:
%5E%7B2%7D%3D0%5Clongrightarrow%5Cmu%7BMLE%7D%3D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%0A#card=math&code=%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%5Cmu%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%3D0%5Clongrightarrow%5Cmu%7BMLE%7D%3D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%0A&height=47&width=282#crop=0&crop=0&crop=1&crop=1&id=USbpM&originHeight=66&originWidth=395&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
其次对 中的另一个参数
,有:
%26%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Csigma%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5B-%5Clog%5Csigma-%5Cfrac%7B1%7D%7B2%5Csigma%5E%7B2%7D%7D(x%7Bi%7D-%5Cmu)%5E%7B2%7D%5D%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmin%7D%5Climits%20%7B%5Csigma%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5B%5Clog%5Csigma%2B%5Cfrac%7B1%7D%7B2%5Csigma%5E%7B2%7D%7D(x%7Bi%7D-%5Cmu)%5E%7B2%7D%5D%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0A%5Csigma%7BMLE%7D%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Csigma%7D%5Clog%20p%28X%7C%5Ctheta%29%26%3D%5Cmathop%7Bargmax%7D%5Climits%20%7B%5Csigma%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5B-%5Clog%5Csigma-%5Cfrac%7B1%7D%7B2%5Csigma%5E%7B2%7D%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%5D%5Cnonumber%5C%5C%0A%26%3D%5Cmathop%7Bargmin%7D%5Climits%20%7B%5Csigma%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5B%5Clog%5Csigma%2B%5Cfrac%7B1%7D%7B2%5Csigma%5E%7B2%7D%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%5D%0A%5Cend%7Balign%7D%0A&height=95&width=433#crop=0&crop=0&crop=1&crop=1&id=dGmfJ&originHeight=134&originWidth=607&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
于是:
%5E%7B2%7D%5D%3D0%5Clongrightarrow%5Csigma%7BMLE%7D%5E%7B2%7D%3D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D(x%7Bi%7D-%5Cmu)%5E%7B2%7D%0A#card=math&code=%5Cfrac%7B%5Cpartial%7D%7B%5Cpartial%5Csigma%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%5B%5Clog%5Csigma%2B%5Cfrac%7B1%7D%7B2%5Csigma%5E%7B2%7D%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%5D%3D0%5Clongrightarrow%5Csigma%7BMLE%7D%5E%7B2%7D%3D%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%0A&height=47&width=409#crop=0&crop=0&crop=1&crop=1&id=km9O5&originHeight=66&originWidth=573&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
值得注意的是,上面的推导中,首先对 求 MLE, 然后利用这个结果求
,因此可以预期的是对数据集求期望时
是无偏差的:
但是当对 求 期望的时候由于使用了单个数据集的
,因此对所有数据集求期望的时候我们会发现
是 有偏的:
%5E%7B2%7D%5D%3D%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D(x%7Bi%7D%5E%7B2%7D-2x%7Bi%7D%5Cmu%7BMLE%7D%2B%5Cmu%7BMLE%7D%5E%7B2%7D)%5Cnonumber%0A%5C%5C%26%3D%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%5E%7B2%7D-%5Cmu%7BMLE%7D%5E%7B2%7D%5D%3D%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%5E%7B2%7D-%5Cmu%5E%7B2%7D%2B%5Cmu%5E%7B2%7D-%5Cmu%7BMLE%7D%5E%7B2%7D%5D%5Cnonumber%5C%5C%0A%26%3D%20%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%5E%7B2%7D-%5Cmu%5E%7B2%7D%5D-%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cmu%7BMLE%7D%5E%7B2%7D-%5Cmu%5E%7B2%7D%5D%3D%5Csigma%5E%7B2%7D-(%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cmu%7BMLE%7D%5E%7B2%7D%5D-%5Cmu%5E%7B2%7D)%5Cnonumber%5C%5C%26%3D%5Csigma%5E%7B2%7D-(%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cmu%7BMLE%7D%5E%7B2%7D%5D-%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5E%7B2%7D%5B%5Cmu%7BMLE%7D%5D)%3D%5Csigma%5E%7B2%7D-Var%5B%5Cmu%7BMLE%7D%5D%5Cnonumber%5C%5C%26%3D%5Csigma%5E%7B2%7D-Var%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%5D%3D%5Csigma%5E%7B2%7D-%5Cfrac%7B1%7D%7BN%5E%7B2%7D%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7DVar%5Bx%7Bi%7D%5D%3D%5Cfrac%7BN-1%7D%7BN%7D%5Csigma%5E%7B2%7D%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0A%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Csigma%7BMLE%7D%5E%7B2%7D%5D%26%3D%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%28x%7Bi%7D-%5Cmu%7BMLE%7D%29%5E%7B2%7D%5D%3D%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%28x%7Bi%7D%5E%7B2%7D-2x%7Bi%7D%5Cmu%7BMLE%7D%2B%5Cmu%7BMLE%7D%5E%7B2%7D%29%5Cnonumber%0A%5C%5C%26%3D%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%5E%7B2%7D-%5Cmu%7BMLE%7D%5E%7B2%7D%5D%3D%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%5E%7B2%7D-%5Cmu%5E%7B2%7D%2B%5Cmu%5E%7B2%7D-%5Cmu%7BMLE%7D%5E%7B2%7D%5D%5Cnonumber%5C%5C%0A%26%3D%20%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%5E%7B2%7D-%5Cmu%5E%7B2%7D%5D-%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cmu%7BMLE%7D%5E%7B2%7D-%5Cmu%5E%7B2%7D%5D%3D%5Csigma%5E%7B2%7D-%28%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cmu%7BMLE%7D%5E%7B2%7D%5D-%5Cmu%5E%7B2%7D%29%5Cnonumber%5C%5C%26%3D%5Csigma%5E%7B2%7D-%28%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5B%5Cmu%7BMLE%7D%5E%7B2%7D%5D-%5Cmathbb%7BE%7D%7B%5Cmathcal%7BD%7D%7D%5E%7B2%7D%5B%5Cmu%7BMLE%7D%5D%29%3D%5Csigma%5E%7B2%7D-Var%5B%5Cmu%7BMLE%7D%5D%5Cnonumber%5C%5C%26%3D%5Csigma%5E%7B2%7D-Var%5B%5Cfrac%7B1%7D%7BN%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7Dx%7Bi%7D%5D%3D%5Csigma%5E%7B2%7D-%5Cfrac%7B1%7D%7BN%5E%7B2%7D%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7DVar%5Bx%7Bi%7D%5D%3D%5Cfrac%7BN-1%7D%7BN%7D%5Csigma%5E%7B2%7D%0A%5Cend%7Balign%7D%0A&height=214&width=493#crop=0&crop=0&crop=1&crop=1&id=beP0J&originHeight=300&originWidth=690&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
所以:
%5E%7B2%7D%0A#card=math&code=%5Chat%7B%5Csigma%7D%5E%7B2%7D%3D%5Cfrac%7B1%7D%7BN-1%7D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7BN%7D%28x%7Bi%7D-%5Cmu%29%5E%7B2%7D%0A&height=47&width=163#crop=0&crop=0&crop=1&crop=1&id=t4S8w&originHeight=66&originWidth=228&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
多维情况
多维高斯分布表达式为:
%3D%5Cfrac%7B1%7D%7B(2%5Cpi)%5E%7Bp%2F2%7D%7C%5CSigma%7C%5E%7B1%2F2%7D%7De%5E%7B-%5Cfrac%7B1%7D%7B2%7D(x-%5Cmu)%5E%7BT%7D%5CSigma%5E%7B-1%7D(x-%5Cmu)%7D%0A#card=math&code=p%28x%7C%5Cmu%2C%5CSigma%29%3D%5Cfrac%7B1%7D%7B%282%5Cpi%29%5E%7Bp%2F2%7D%7C%5CSigma%7C%5E%7B1%2F2%7D%7De%5E%7B-%5Cfrac%7B1%7D%7B2%7D%28x-%5Cmu%29%5E%7BT%7D%5CSigma%5E%7B-1%7D%28x-%5Cmu%29%7D%0A&height=41&width=276#crop=0&crop=0&crop=1&crop=1&id=aXK89&originHeight=59&originWidth=387&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
其中 ,
为协方差矩阵,一般而言也是半正定矩阵。这里我们只考虑正定矩阵。首先我们处理指数上的数字,指数上的数字可以记为
和
之间的马氏距离。对于对称的协方差矩阵可进行特征值分解,
diag(%5Clambda%7Bi%7D)(u%7B1%7D%2Cu%7B2%7D%2C%5Ccdots%2Cu%7Bp%7D)%5E%7BT%7D%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7Du%7Bi%7D%5Clambda%7Bi%7Du%7Bi%7D%5E%7BT%7D#card=math&code=%5CSigma%3DU%5CLambda%20U%5E%7BT%7D%3D%28u%7B1%7D%2Cu%7B2%7D%2C%5Ccdots%2Cu%7Bp%7D%29diag%28%5Clambda%7Bi%7D%29%28u%7B1%7D%2Cu%7B2%7D%2C%5Ccdots%2Cu%7Bp%7D%29%5E%7BT%7D%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7Du%7Bi%7D%5Clambda%7Bi%7Du_%7Bi%7D%5E%7BT%7D&height=45&width=440#crop=0&crop=0&crop=1&crop=1&id=SEv2v&originHeight=63&originWidth=616&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) ,于是:
%5E%7BT%7D%5CSigma%5E%7B-1%7D(x-%5Cmu)%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7D(x-%5Cmu)%5E%7BT%7Du%7Bi%7D%5Cfrac%7B1%7D%7B%5Clambda%7Bi%7D%7Du%7Bi%7D%5E%7BT%7D(x-%5Cmu)%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7D%5Cfrac%7By%7Bi%7D%5E%7B2%7D%7D%7B%5Clambda%7Bi%7D%7D%0A#card=math&code=%5CDelta%3D%28x-%5Cmu%29%5E%7BT%7D%5CSigma%5E%7B-1%7D%28x-%5Cmu%29%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7D%28x-%5Cmu%29%5E%7BT%7Du%7Bi%7D%5Cfrac%7B1%7D%7B%5Clambda%7Bi%7D%7Du%7Bi%7D%5E%7BT%7D%28x-%5Cmu%29%3D%5Csum%5Climits%20%7Bi%3D1%7D%5E%7Bp%7D%5Cfrac%7By%7Bi%7D%5E%7B2%7D%7D%7B%5Clambda%7Bi%7D%7D%0A&height=45&width=421#crop=0&crop=0&crop=1&crop=1&id=LhG1m&originHeight=63&originWidth=590&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
我们注意到 是
在特征向量
上的投影长度,因此上式子就是
取不同值时的同心椭圆。
下面我们看多维高斯模型在实际应用时的两个问题
- 参数
的自由度为
#card=math&code=O%28p%5E%7B2%7D%29&height=20&width=37#crop=0&crop=0&crop=1&crop=1&id=DLrIP&originHeight=29&originWidth=52&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 对于维度很高的数据其自由度太高。解决方案:高自由度的来源是
有
%7D%7B2%7D#card=math&code=%5Cfrac%7Bp%28p%2B1%29%7D%7B2%7D&height=36&width=57#crop=0&crop=0&crop=1&crop=1&id=iKMZt&originHeight=51&originWidth=81&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 个自由参数,可以假设其是对角矩阵,甚至在各向同性假设中假设其对角线上的元素都相同。前一种的算法有 Factor Analysis,后一种有概率 PCA(p-PCA) 。
- 第二个问题是单个高斯分布是单峰的,对有多个峰的数据分布不能得到好的结果。解决方案:高斯混合GMM 模型。
下面对多维高斯分布的常用定理进行介绍。
我们记 %5ET%3D(x%7Ba%2Cm%5Ctimes%201%7D%2C%20x%7Bb%2Cn%5Ctimes1%7D)%5ET%2C%5Cmu%3D(%5Cmu%7Ba%2Cm%5Ctimes1%7D%2C%20%5Cmu%7Bb%2Cn%5Ctimes1%7D)%2C%5CSigma%3D%5Cbegin%7Bpmatrix%7D%5CSigma%7Baa%7D%26%5CSigma%7Bab%7D%5C%5C%5CSigma%7Bba%7D%26%5CSigma%7Bbb%7D%5Cend%7Bpmatrix%7D#card=math&code=x%3D%28x1%2C%20x_2%2C%5Ccdots%2Cx_p%29%5ET%3D%28x%7Ba%2Cm%5Ctimes%201%7D%2C%20x%7Bb%2Cn%5Ctimes1%7D%29%5ET%2C%5Cmu%3D%28%5Cmu%7Ba%2Cm%5Ctimes1%7D%2C%20%5Cmu%7Bb%2Cn%5Ctimes1%7D%29%2C%5CSigma%3D%5Cbegin%7Bpmatrix%7D%5CSigma%7Baa%7D%26%5CSigma%7Bab%7D%5C%5C%5CSigma%7Bba%7D%26%5CSigma_%7Bbb%7D%5Cend%7Bpmatrix%7D&height=39&width=519#crop=0&crop=0&crop=1&crop=1&id=bn11a&originHeight=56&originWidth=727&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),已知
#card=math&code=x%5Csim%5Cmathcal%7BN%7D%28%5Cmu%2C%5CSigma%29&height=19&width=81#crop=0&crop=0&crop=1&crop=1&id=WP8mg&originHeight=27&originWidth=114&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。
首先是一个高斯分布的定理:
定理:已知
%2C%20y%5Csim%20Ax%2Bb#card=math&code=x%5Csim%5Cmathcal%7BN%7D%28%5Cmu%2C%5CSigma%29%2C%20y%5Csim%20Ax%2Bb&height=19&width=160#crop=0&crop=0&crop=1&crop=1&id=cmhNH&originHeight=27&originWidth=223&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),那么
#card=math&code=y%5Csim%5Cmathcal%7BN%7D%28A%5Cmu%2Bb%2C%20A%5CSigma%20A%5ET%29&height=20&width=147#crop=0&crop=0&crop=1&crop=1&id=f0Fmq&originHeight=29&originWidth=206&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。
证明:,
。
下面利用这个定理得到 %2Cp(x_b)%2Cp(x_a%7Cx_b)%2Cp(x_b%7Cx_a)#card=math&code=p%28x_a%29%2Cp%28x_b%29%2Cp%28x_a%7Cx_b%29%2Cp%28x_b%7Cx_a%29&height=18&width=196#crop=0&crop=0&crop=1&crop=1&id=Xd8vq&originHeight=26&originWidth=274&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 这四个量。
,代入定理中得到:
所以 #card=math&code=xa%5Csim%5Cmathcal%7BN%7D%28%5Cmu_a%2C%5CSigma%7Baa%7D%29&height=19&width=107#crop=0&crop=0&crop=1&crop=1&id=o7VXF&originHeight=27&originWidth=151&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。
2. 同样的,#card=math&code=xb%5Csim%5Cmathcal%7BN%7D%28%5Cmu_b%2C%5CSigma%7Bbb%7D%29&height=19&width=103#crop=0&crop=0&crop=1&crop=1&id=rNjNr&originHeight=27&originWidth=145&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。
3. 对于两个条件概率,我们引入三个量:
特别的,最后一个式子叫做 的 Schur Complementary。可以看到:
所以:
利用这三个量可以得到 。因此:
这里同样用到了定理。
4. 同样:
所以:
下面利用上边四个量,求解线性模型:
已知:
%3D%5Cmathcal%7BN%7D(%5Cmu%2C%5CLambda%5E%7B-1%7D)%2Cp(y%7Cx)%3D%5Cmathcal%7BN%7D(Ax%2Bb%2CL%5E%7B-1%7D)#card=math&code=p%28x%29%3D%5Cmathcal%7BN%7D%28%5Cmu%2C%5CLambda%5E%7B-1%7D%29%2Cp%28y%7Cx%29%3D%5Cmathcal%7BN%7D%28Ax%2Bb%2CL%5E%7B-1%7D%29&height=20&width=284#crop=0&crop=0&crop=1&crop=1&id=BUiJ4&originHeight=29&originWidth=397&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),求解:
%2Cp(x%7Cy)#card=math&code=p%28y%29%2Cp%28x%7Cy%29&height=18&width=73#crop=0&crop=0&crop=1&crop=1&id=y2i09&originHeight=26&originWidth=102&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。
解:令#card=math&code=y%3DAx%2Bb%2B%5Cepsilon%2C%5Cepsilon%5Csim%5Cmathcal%7BN%7D%280%2CL%5E%7B-1%7D%29&height=20&width=194#crop=0&crop=0&crop=1&crop=1&id=lZ0gS&originHeight=29&originWidth=272&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),所以
,
,因此:
%3D%5Cmathcal%7BN%7D(A%5Cmu%2Bb%2CL%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET)%0A#card=math&code=p%28y%29%3D%5Cmathcal%7BN%7D%28A%5Cmu%2Bb%2CL%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET%29%0A&height=20&width=225#crop=0&crop=0&crop=1&crop=1&id=YCpwU&originHeight=29&originWidth=315&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
引入,我们可以得到
(y-%5Cmathbb%7BE%7D%5By%5D)%5ET%5D#card=math&code=Cov%5Bx%2Cy%5D%3D%5Cmathbb%7BE%7D%5B%28x-%5Cmathbb%7BE%7D%5Bx%5D%29%28y-%5Cmathbb%7BE%7D%5By%5D%29%5ET%5D&height=20&width=232#crop=0&crop=0&crop=1&crop=1&id=Otmcn&originHeight=29&originWidth=325&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。对于这个协方差可以直接计算:
%26%3D%5Cmathbb%7BE%7D%5B(x-%5Cmu)(Ax-A%5Cmu%2B%5Cepsilon)%5ET%5D%3D%5Cmathbb%7BE%7D%5B(x-%5Cmu)(x-%5Cmu)%5ETA%5ET%5D%3DVar%5Bx%5DA%5ET%3D%5CLambda%5E%7B-1%7DA%5ET%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0ACov%28x%2Cy%29%26%3D%5Cmathbb%7BE%7D%5B%28x-%5Cmu%29%28Ax-A%5Cmu%2B%5Cepsilon%29%5ET%5D%3D%5Cmathbb%7BE%7D%5B%28x-%5Cmu%29%28x-%5Cmu%29%5ETA%5ET%5D%3DVar%5Bx%5DA%5ET%3D%5CLambda%5E%7B-1%7DA%5ET%0A%5Cend%7Balign%7D%0A&height=20&width=564#crop=0&crop=0&crop=1&crop=1&id=ScPy7&originHeight=29&originWidth=790&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
注意到协方差矩阵的对称性,所以%3D%5Cmathcal%7BN%7D%5Cbegin%7Bpmatrix%7D%5Cmu%5C%5CA%5Cmu%2Bb%5Cend%7Bpmatrix%7D%2C%5Cbegin%7Bpmatrix%7D%5CLambda%5E%7B-1%7D%26%5CLambda%5E%7B-1%7DA%5ET%5C%5CA%5CLambda%5E%7B-1%7D%26L%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET%5Cend%7Bpmatrix%7D#crop=0&crop=0&crop=1&crop=1&id=lSJsW&originHeight=56&originWidth=452&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)根据之前的公式,我们可以得到:
%5E%7B-1%7D(y-A%5Cmu-b)%0A#card=math&code=%5Cmathbb%7BE%7D%5Bx%7Cy%5D%3D%5Cmu%2B%5CLambda%5E%7B-1%7DA%5ET%28L%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET%29%5E%7B-1%7D%28y-A%5Cmu-b%29%0A&height=20&width=340#crop=0&crop=0&crop=1&crop=1&id=KnDlP&originHeight=29&originWidth=476&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
%5E%7B-1%7DA%5CLambda%5E%7B-1%7D%0A#card=math&code=Var%5Bx%7Cy%5D%3D%5CLambda%5E%7B-1%7D-%5CLambda%5E%7B-1%7DA%5ET%28L%5E%7B-1%7D%2BA%5CLambda%5E%7B-1%7DA%5ET%29%5E%7B-1%7DA%5CLambda%5E%7B-1%7D%0A&height=20&width=327#crop=0&crop=0&crop=1&crop=1&id=NUlIG&originHeight=29&originWidth=458&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
