我们知道,线性回归当噪声为高斯分布的时候,最小二乘损失导出的结果相当于对概率模型应用 MLE,引入参数的先验时,先验分布是高斯分布,那么 MAP的结果相当于岭回归的正则化,如果先验是拉普拉斯分布,那么相当于 Lasso 的正则化。这两种方案都是点估计方法。我们希望利用贝叶斯方法来求解参数的后验分布。
线性回归的模型假设为:
%3Dw%5ETx%0A%5C%5Cy%3Df(x)%2B%5Cvarepsilon%5C%5C%0A%5Cvarepsilon%5Csim%5Cmathcal%7BN%7D(0%2C%5Csigma%5E2)%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7Df%28x%29%3Dw%5ETx%0A%5C%5Cy%3Df%28x%29%2B%5Cvarepsilon%5C%5C%0A%5Cvarepsilon%5Csim%5Cmathcal%7BN%7D%280%2C%5Csigma%5E2%29%0A%5Cend%7Balign%7D%0A#crop=0&crop=0&crop=1&crop=1&id=wXIUG&originHeight=86&originWidth=122&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
在贝叶斯方法中,需要解决推断和预测两个问题。
推断
引入高斯先验:
%3D%5Cmathcal%7BN%7D(0%2C%5CSigma_p)%0A#card=math&code=p%28w%29%3D%5Cmathcal%7BN%7D%280%2C%5CSigma_p%29%0A#crop=0&crop=0&crop=1&crop=1&id=Z3kBb&originHeight=29&originWidth=152&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
对参数的后验分布进行推断:
%3D%5Cfrac%7Bp(w%2CY%7CX)%7D%7Bp(Y%7CX)%7D%3D%5Cfrac%7Bp(Y%7Cw%2CX)p(w%7CX)%7D%7B%5Cint%20p(Y%7Cw%2CX)p(w%7CX)dw%7D%0A#card=math&code=p%28w%7CX%2CY%29%3D%5Cfrac%7Bp%28w%2CY%7CX%29%7D%7Bp%28Y%7CX%29%7D%3D%5Cfrac%7Bp%28Y%7Cw%2CX%29p%28w%7CX%29%7D%7B%5Cint%20p%28Y%7Cw%2CX%29p%28w%7CX%29dw%7D%0A#crop=0&crop=0&crop=1&crop=1&id=g71gw&originHeight=60&originWidth=451&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
分母和参数无关,由于 %3Dp(w)#card=math&code=p%28w%7CX%29%3Dp%28w%29#crop=0&crop=0&crop=1&crop=1&id=ZdFbC&originHeight=26&originWidth=136&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),代入先验得到:
%5Cpropto%20%5Cprod%5Climits%7Bi%3D1%7D%5EN%5Cmathcal%7BN%7D(y_i%7Cw%5ETx_i%2C%5Csigma%5E2)%5Ccdot%5Cmathcal%7BN%7D(0%2C%5CSigma_p)%0A#card=math&code=p%28w%7CX%2CY%29%5Cpropto%20%5Cprod%5Climits%7Bi%3D1%7D%5EN%5Cmathcal%7BN%7D%28y_i%7Cw%5ETx_i%2C%5Csigma%5E2%29%5Ccdot%5Cmathcal%7BN%7D%280%2C%5CSigma_p%29%0A#crop=0&crop=0&crop=1&crop=1&id=ww4ZH&originHeight=66&originWidth=383&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
高斯分布取高斯先验的共轭分布依然是高斯分布,于是可以得到后验分布也是一个高斯分布。第一项:
%26%3D%5Cfrac%7B1%7D%7B(2%5Cpi)%5E%7BN%2F2%7D%5Csigma%5EN%7D%5Cexp(-%5Cfrac%7B1%7D%7B2%5Csigma%5E2%7D%5Csum%5Climits%7Bi%3D1%7D%5EN(y_i-w%5ETx_i)%5E2)%5Cnonumber%5C%5C%0A%26%3D%5Cfrac%7B1%7D%7B(2%5Cpi)%5E%7BN%2F2%7D%5Csigma%5EN%7D%5Cexp(-%5Cfrac%7B1%7D%7B2%7D(Y-Xw)%5ET(%5Csigma%5E%7B-2%7D%5Cmathbb%7BI%7D)(Y-Xw))%0A%5Cnonumber%5C%5C%26%3D%5Cmathcal%7BN%7D(Xw%2C%5Csigma%5E2%5Cmathbb%7BI%7D)%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%5Cprod%5Climits%7Bi%3D1%7D%5EN%5Cmathcal%7BN%7D%28yi%7Cw%5ETx_i%2C%5Csigma%5E2%29%26%3D%5Cfrac%7B1%7D%7B%282%5Cpi%29%5E%7BN%2F2%7D%5Csigma%5EN%7D%5Cexp%28-%5Cfrac%7B1%7D%7B2%5Csigma%5E2%7D%5Csum%5Climits%7Bi%3D1%7D%5EN%28y_i-w%5ETx_i%29%5E2%29%5Cnonumber%5C%5C%0A%26%3D%5Cfrac%7B1%7D%7B%282%5Cpi%29%5E%7BN%2F2%7D%5Csigma%5EN%7D%5Cexp%28-%5Cfrac%7B1%7D%7B2%7D%28Y-Xw%29%5ET%28%5Csigma%5E%7B-2%7D%5Cmathbb%7BI%7D%29%28Y-Xw%29%29%0A%5Cnonumber%5C%5C%26%3D%5Cmathcal%7BN%7D%28Xw%2C%5Csigma%5E2%5Cmathbb%7BI%7D%29%0A%5Cend%7Balign%7D%0A#crop=0&crop=0&crop=1&crop=1&id=AuvcE&originHeight=152&originWidth=639&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
代入上面的式子:
%5Cpropto%5Cexp(-%5Cfrac%7B1%7D%7B2%5Csigma%5E2%7D(Y-Xw)%5ET%5Csigma%5E%7B-2%7D%5Cmathbb%7BI%7D(Y-Xw)-%5Cfrac%7B1%7D%7B2%7Dw%5ET%5CSigma_p%5E%7B-1%7Dw)%0A#card=math&code=p%28w%7CX%2CY%29%5Cpropto%5Cexp%28-%5Cfrac%7B1%7D%7B2%5Csigma%5E2%7D%28Y-Xw%29%5ET%5Csigma%5E%7B-2%7D%5Cmathbb%7BI%7D%28Y-Xw%29-%5Cfrac%7B1%7D%7B2%7Dw%5ET%5CSigma_p%5E%7B-1%7Dw%29%0A#crop=0&crop=0&crop=1&crop=1&id=eVBJD&originHeight=50&originWidth=580&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
假定最后得到的高斯分布为:#card=math&code=%5Cmathcal%7BN%7D%28%5Cmu_w%2C%5CSigma_w%29#crop=0&crop=0&crop=1&crop=1&id=HVVyb&originHeight=27&originWidth=100&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。对于上面的分布,采用配方的方式来得到最终的分布,指数上面的二次项为:
于是:
一次项:
于是:
预测
给定一个 ,求解
,所以
%3Dx%5E%7BT%7Dw#card=math&code=f%28x%5E%2A%29%3Dx%5E%7B%2AT%7Dw#crop=0&crop=0&crop=1&crop=1&id=dLUN2&originHeight=29&originWidth=124&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),代入参数后验,有 #card=math&code=x%5E%7B%2AT%7Dw%5Csim%20%5Cmathcal%7BN%7D%28x%5E%7B%2AT%7D%5Cmu_w%2Cx%5E%7B%2AT%7D%5CSigma_wx%5E%2A%29#crop=0&crop=0&crop=1&crop=1&id=X8BV7&originHeight=29&originWidth=259&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),添上噪声项:
%3D%5Cint_wp(y%5E%7Cw%2CX%2CY%2Cx%5E)p(w%7CX%2CY%2Cx%5E)dw%3D%5Cint_wp(y%5E%7Cw%2Cx%5E)p(w%7CX%2CY)dw%5C%5C%0A%3D%5Cmathcal%7BN%7D(x%5E%7BT%7D%5Cmu_w%2Cx%5E%7BT%7D%5CSigma_wx%5E%2B%5Csigma%5E2)%0A#card=math&code=p%28y%5E%2A%7CX%2CY%2Cx%5E%2A%29%3D%5Cint_wp%28y%5E%2A%7Cw%2CX%2CY%2Cx%5E%2A%29p%28w%7CX%2CY%2Cx%5E%2A%29dw%3D%5Cint_wp%28y%5E%2A%7Cw%2Cx%5E%2A%29p%28w%7CX%2CY%29dw%5C%5C%0A%3D%5Cmathcal%7BN%7D%28x%5E%7B%2AT%7D%5Cmu_w%2Cx%5E%7B%2AT%7D%5CSigma_wx%5E%2A%2B%5Csigma%5E2%29%0A#crop=0&crop=0&crop=1&crop=1&id=Ibc4R&originHeight=84&originWidth=900&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
