将一维高斯分布推广到多变量中就得到了高斯网络,将多变量推广到无限维,就得到了高斯过程,高斯过程是定义在连续域(时间空间)上的无限多个高维随机变量所组成的随机过程。
在时间轴上的任意一个点都满足高斯分布吗,将这些点的集合叫做高斯过程的一个样本。
对于时间轴上的序列
,如果
,有
#card=math&code=%5Cxi%7Bt_1-t_n%7D%5Csim%20%5Cmathcal%7BN%7D%28%5Cmu%7Bt1-t_n%7D%2C%5CSigma%7Bt1-t_n%7D%29#crop=0&crop=0&crop=1&crop=1&id=yt6fi&originHeight=29&originWidth=232&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=), 那么  是一个高斯过程。
高斯过程有两个参数(高斯过程存在性定理),均值函数
%3D%5Cmathbb%7BE%7D%5B%5Cxi_t%5D#card=math&code=m%28t%29%3D%5Cmathbb%7BE%7D%5B%5Cxi_t%5D#crop=0&crop=0&crop=1&crop=1&id=PCpNm&originHeight=26&originWidth=112&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 和协方差函数
%3D%5Cmathbb%7BE%7D%5B(%5Cxi_s-%5Cmathbb%7BE%7D%5B%5Cxi_s%5D)(%5Cxi_t-%5Cmathbb%7BE%7D%5B%5Cxi_t%5D)%5D#card=math&code=k%28s%2Ct%29%3D%5Cmathbb%7BE%7D%5B%28%5Cxi_s-%5Cmathbb%7BE%7D%5B%5Cxi_s%5D%29%28%5Cxi_t-%5Cmathbb%7BE%7D%5B%5Cxi_t%5D%29%5D#crop=0&crop=0&crop=1&crop=1&id=k7twz&originHeight=26&originWidth=312&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。
我们将贝叶斯线性回归添加核技巧的这个模型叫做高斯过程回归,高斯过程回归分为两种视角:
- 权空间的视角-核贝叶斯线性回归,相当于
为
,在每个时刻的高斯分布来源于权重,根据上面的推导,预测的函数依然是高斯分布。
- 函数空间的视角-高斯分布通过函数
#card=math&code=f%28x%29#crop=0&crop=0&crop=1&crop=1&id=gnVRB&originHeight=26&originWidth=40&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 来体现。
核贝叶斯线性回归
贝叶斯线性回归可以通过加入核函数的方法来解决非线性函数的问题,将 %3Dx%5ETw#card=math&code=f%28x%29%3Dx%5ETw#crop=0&crop=0&crop=1&crop=1&id=pfQNK&originHeight=29&originWidth=107&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 这个函数变为
%3D%5Cphi(x)%5ETw#card=math&code=f%28x%29%3D%5Cphi%28x%29%5ETw#crop=0&crop=0&crop=1&crop=1&id=jYAB0&originHeight=29&originWidth=136&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)(当然这个时候,
也要变为更高维度的),变换到更高维的空间,有:
%5Csim%20%5Cmathcal%7BN%7D(%5Cphi(x%5E)%5E%7BT%7D%5Csigma%5E%7B-2%7DA%5E%7B-1%7D%5CPhi%5ETY%2C%5Cphi(x%5E)%5E%7BT%7DA%5E%7B-1%7D%5Cphi(x%5E*))%5C%5C%0AA%3D%5Csigma%5E%7B-2%7D%5CPhi%5ET%5CPhi%2B%5CSigma_p%5E%7B-1%7D%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7Df%28x%5E%2A%29%5Csim%20%5Cmathcal%7BN%7D%28%5Cphi%28x%5E%2A%29%5E%7BT%7D%5Csigma%5E%7B-2%7DA%5E%7B-1%7D%5CPhi%5ETY%2C%5Cphi%28x%5E%2A%29%5E%7BT%7DA%5E%7B-1%7D%5Cphi%28x%5E%2A%29%29%5C%5C%0AA%3D%5Csigma%5E%7B-2%7D%5CPhi%5ET%5CPhi%2B%5CSigma_p%5E%7B-1%7D%0A%5Cend%7Balign%7D%0A#crop=0&crop=0&crop=1&crop=1&id=I7EnC&originHeight=59&originWidth=455&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
其中,%2C%5Cphi(x_2)%2C%5Ccdots%2C%5Cphi(x_N))%5ET#card=math&code=%5CPhi%3D%28%5Cphi%28x_1%29%2C%5Cphi%28x_2%29%2C%5Ccdots%2C%5Cphi%28x_N%29%29%5ET#crop=0&crop=0&crop=1&crop=1&id=ZnIvE&originHeight=29&originWidth=284&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。
为了求解 ,可以利用 Woodbury Formula,
:
%5E%7B-1%7D%3DA%5E%7B-1%7D-A%5E%7B-1%7DU(C%5E%7B-1%7D%2BVA%5E%7B-1%7DU)%5E%7B-1%7DVA%5E%7B-1%7D%0A#card=math&code=%28A%2BUCV%29%5E%7B-1%7D%3DA%5E%7B-1%7D-A%5E%7B-1%7DU%28C%5E%7B-1%7D%2BVA%5E%7B-1%7DU%29%5E%7B-1%7DVA%5E%7B-1%7D%0A#crop=0&crop=0&crop=1&crop=1&id=b78sw&originHeight=29&originWidth=491&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
所以 %5E%7B-1%7D%5CPhi%5CSigma_p#card=math&code=A%5E%7B-1%7D%3D%5CSigma_p-%5CSigma_p%5CPhi%5ET%28%5Csigma%5E2%5Cmathbb%7BI%7D%2B%5CPhi%5CSigma_p%5CPhi%5ET%29%5E%7B-1%7D%5CPhi%5CSigma_p#crop=0&crop=0&crop=1&crop=1&id=rQAwH&originHeight=30&originWidth=367&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
也可以用另一种方法:
%5Cnonumber%5C%5C%0A%5CLeftrightarrow%20%5CSigma_p%5CPhi%5ET%26%3D%5Csigma%5E%7B-2%7DA%5E%7B-1%7D%5CPhi%5ET(k%2B%5Csigma%5E2%5Cmathbb%7BI%7D)%5Cnonumber%5C%5C%0A%5CLeftrightarrow%20%5Csigma%5E%7B-2%7DA%5E%7B-1%7D%5CPhi%5ET%26%3D%5CSigma_p%5CPhi%5ET(k%2B%5Csigma%5E2%5Cmathbb%7BI%7D)%5E%7B-1%7D%5Cnonumber%5C%5C%0A%5CLeftrightarrow%20%5Cphi(x%5E)%5ET%5Csigma%5E%7B-2%7DA%5E%7B-1%7D%5CPhi%5ET%26%3D%5Cphi(x%5E)%5ET%5CSigma_p%5CPhi%5ET(k%2B%5Csigma%5E2%5Cmathbb%7BI%7D)%5E%7B-1%7D%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7D%0AA%26%3D%5Csigma%5E%7B-2%7D%5CPhi%5ET%5CPhi%2B%5CSigma_p%5E%7B-1%7D%5Cnonumber%5C%5C%0A%5CLeftrightarrow%20A%5CSigma_p%26%3D%5Csigma%5E%7B-2%7D%5CPhi%5ET%5CPhi%5CSigma_p%2B%5Cmathbb%7BI%7D%5Cnonumber%5C%5C%0A%5CLeftrightarrow%20A%5CSigma_p%5CPhi%5ET%26%3D%5Csigma%5E%7B-2%7D%5CPhi%5ET%5CPhi%5CSigma_p%5CPhi%5ET%2B%5CPhi%5ET%3D%5Csigma%5E%7B-2%7D%5CPhi%5ET%28k%2B%5Csigma%5E2%5Cmathbb%7BI%7D%29%5Cnonumber%5C%5C%0A%5CLeftrightarrow%20%5CSigma_p%5CPhi%5ET%26%3D%5Csigma%5E%7B-2%7DA%5E%7B-1%7D%5CPhi%5ET%28k%2B%5Csigma%5E2%5Cmathbb%7BI%7D%29%5Cnonumber%5C%5C%0A%5CLeftrightarrow%20%5Csigma%5E%7B-2%7DA%5E%7B-1%7D%5CPhi%5ET%26%3D%5CSigma_p%5CPhi%5ET%28k%2B%5Csigma%5E2%5Cmathbb%7BI%7D%29%5E%7B-1%7D%5Cnonumber%5C%5C%0A%5CLeftrightarrow%20%5Cphi%28x%5E%2A%29%5ET%5Csigma%5E%7B-2%7DA%5E%7B-1%7D%5CPhi%5ET%26%3D%5Cphi%28x%5E%2A%29%5ET%5CSigma_p%5CPhi%5ET%28k%2B%5Csigma%5E2%5Cmathbb%7BI%7D%29%5E%7B-1%7D%0A%5Cend%7Balign%7D%0A#crop=0&crop=0&crop=1&crop=1&id=Vv4Wv&originHeight=185&originWidth=573&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
上面的左边的式子就是变换后的均值,而右边的式子就是不含 的式子,其中
。
根据 得到方差为:
%5ET%5CSigma_p%5Cphi(x%5E)-%5Cphi(x%5E)%5ET%5CSigma_p%5CPhi%5ET(%5Csigma%5E2%5Cmathbb%7BI%7D%2Bk)%5E%7B-1%7D%5CPhi%5CSigma_p%5Cphi(x%5E*)%0A#card=math&code=%5Cphi%28x%5E%2A%29%5ET%5CSigma_p%5Cphi%28x%5E%2A%29-%5Cphi%28x%5E%2A%29%5ET%5CSigma_p%5CPhi%5ET%28%5Csigma%5E2%5Cmathbb%7BI%7D%2Bk%29%5E%7B-1%7D%5CPhi%5CSigma_p%5Cphi%28x%5E%2A%29%0A#crop=0&crop=0&crop=1&crop=1&id=zA5ba&originHeight=30&originWidth=471&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
上面定义了:
我们看到,在均值和方差中,含有下面四项:
%5ET%5CSigma_p%5CPhi%5ET%2C%5Cphi(x%5E)%5ET%5CSigma_p%5Cphi(x%5E)%2C%5Cphi(x%5E)%5ET%5CSigma_p%5CPhi%5ET%2C%5CPhi%5CSigma_p%5Cphi(x%5E)%0A#card=math&code=%5Cphi%28x%5E%2A%29%5ET%5CSigma_p%5CPhi%5ET%2C%5Cphi%28x%5E%2A%29%5ET%5CSigma_p%5Cphi%28x%5E%2A%29%2C%5Cphi%28x%5E%2A%29%5ET%5CSigma_p%5CPhi%5ET%2C%5CPhi%5CSigma_p%5Cphi%28x%5E%2A%29%0A#crop=0&crop=0&crop=1&crop=1&id=gxkd7&originHeight=30&originWidth=485&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
展开后,可以看到,有共同的项:%3D%5Cphi(x)%5ET%5CSigma_p%5Cphi(x%E2%80%98)#card=math&code=k%28x%2Cx%27%29%3D%5Cphi%28x%29%5ET%5CSigma_p%5Cphi%28x%E2%80%98%29#crop=0&crop=0&crop=1&crop=1&id=CEq5B&originHeight=30&originWidth=219&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。由于
是正定对称的方差矩阵,所以,这是一个核函数。
对于高斯过程中的协方差:
%3DCov%5Bf(x)%2Cf(x’)%5D%3D%5Cmathbb%7BE%7D%5B%5Cphi(x)%5ETww%5ET%5Cphi(x’)%5D%3D%5Cphi(x)%5ET%5Cmathbb%7BE%7D%5Bww%5ET%5D%5Cphi(x’)%3D%5Cphi(x)%5ET%5CSigma_p%5Cphi(x’)%0A#card=math&code=k%28t%2Cs%29%3DCov%5Bf%28x%29%2Cf%28x%27%29%5D%3D%5Cmathbb%7BE%7D%5B%5Cphi%28x%29%5ETww%5ET%5Cphi%28x%27%29%5D%3D%5Cphi%28x%29%5ET%5Cmathbb%7BE%7D%5Bww%5ET%5D%5Cphi%28x%27%29%3D%5Cphi%28x%29%5ET%5CSigma_p%5Cphi%28x%27%29%0A#crop=0&crop=0&crop=1&crop=1&id=Tc0ah&originHeight=30&originWidth=769&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
我们可以看到,这个就对应着上面的核函数。因此我们看到 %5C%7D#card=math&code=%5C%7Bf%28x%29%5C%7D#crop=0&crop=0&crop=1&crop=1&id=iwNhy&originHeight=26&originWidth=61&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=) 组成的组合就是一个高斯过程。
函数空间的观点
相比权重空间,我们也可以直接关注 这个空间,对于预测任务,这就是类似于求:
%3D%5Cint_fp(y%5E%7Cf%2CX%2CY%2Cx%5E)p(f%7CX%2CY%2Cx%5E*)df%0A#card=math&code=p%28y%5E%2A%7CX%2CY%2Cx%5E%2A%29%3D%5Cint_fp%28y%5E%2A%7Cf%2CX%2CY%2Cx%5E%2A%29p%28f%7CX%2CY%2Cx%5E%2A%29df%0A#crop=0&crop=0&crop=1&crop=1&id=gDCQM&originHeight=54&originWidth=468&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
对于数据集来说,取 %5Csim%5Cmathcal%7BN%7D(%5Cmu(X)%2Ck(X%2CX))%2CY%3Df(X)%2B%5Cvarepsilon%5Csim%5Cmathcal%7BN%7D(%5Cmu(X)%2Ck(X%2CX)%2B%5Csigma%5E2%5Cmathbb%7BI%7D)#card=math&code=f%28X%29%5Csim%5Cmathcal%7BN%7D%28%5Cmu%28X%29%2Ck%28X%2CX%29%29%2CY%3Df%28X%29%2B%5Cvarepsilon%5Csim%5Cmathcal%7BN%7D%28%5Cmu%28X%29%2Ck%28X%2CX%29%2B%5Csigma%5E2%5Cmathbb%7BI%7D%29#crop=0&crop=0&crop=1&crop=1&id=LVXqU&originHeight=29&originWidth=620&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。预测任务的目的是给定一个新数据序列
%5ET#card=math&code=X%5E%2A%3D%28x_1%5E%2A%2C%5Ccdots%2Cx_M%5E%2A%29%5ET#crop=0&crop=0&crop=1&crop=1&id=FMF7L&originHeight=30&originWidth=182&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=),得到
%2B%5Cvarepsilon#card=math&code=Y%5E%2A%3Df%28X%5E%2A%29%2B%5Cvarepsilon#crop=0&crop=0&crop=1&crop=1&id=QZtDS&originHeight=26&originWidth=145&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)。我们可以写出:
%5Cend%7Bpmatrix%7D%5Csim%5Cmathcal%7BN%7D%5Cleft(%5Cbegin%7Bpmatrix%7D%5Cmu(X)%5C%5C%5Cmu(X%5E)%5Cend%7Bpmatrix%7D%2C%5Cbegin%7Bpmatrix%7Dk(X%2CX)%2B%5Csigma%5E2%5Cmathbb%7BI%7D%26k(X%2CX%5E)%5C%5Ck(X%5E%2CX)%26k(X%5E%2CX%5E*)%5Cend%7Bpmatrix%7D%5Cright)%0A#card=math&code=%5Cbegin%7Bpmatrix%7DY%5C%5Cf%28X%5E%2A%29%5Cend%7Bpmatrix%7D%5Csim%5Cmathcal%7BN%7D%5Cleft%28%5Cbegin%7Bpmatrix%7D%5Cmu%28X%29%5C%5C%5Cmu%28X%5E%2A%29%5Cend%7Bpmatrix%7D%2C%5Cbegin%7Bpmatrix%7Dk%28X%2CX%29%2B%5Csigma%5E2%5Cmathbb%7BI%7D%26k%28X%2CX%5E%2A%29%5C%5Ck%28X%5E%2A%2CX%29%26k%28X%5E%2A%2CX%5E%2A%29%5Cend%7Bpmatrix%7D%5Cright%29%0A#crop=0&crop=0&crop=1&crop=1&id=bT8hJ&originHeight=59&originWidth=559&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
根据高斯分布的方法:
%5C%5C%0Axb%7Cx_a%5Csim%5Cmathcal%7BN%7D(%5Cmu%7Bb%7Ca%7D%2C%5CSigma%7Bb%7Ca%7D)%5C%5C%0A%5Cmu%7Bb%7Ca%7D%3D%5CSigma%7Bba%7D%5CSigma%7Baa%7D%5E%7B-1%7D(xa-%5Cmu_a)%2B%5Cmu_b%5C%5C%0A%5CSigma%7Bb%7Ca%7D%3D%5CSigma%7Bbb%7D-%5CSigma%7Bba%7D%5CSigma%7Baa%7D%5E%7B-1%7D%5CSigma%7Bab%7D%0A%5Cend%7Balign%7D%0A#card=math&code=%5Cbegin%7Balign%7Dx%3D%5Cbegin%7Bpmatrix%7Dxa%5C%5Cx_b%5Cend%7Bpmatrix%7D%5Csim%5Cmathcal%7BN%7D%5Cleft%28%5Cbegin%7Bpmatrix%7D%5Cmu_a%5C%5C%5Cmu_b%5Cend%7Bpmatrix%7D%2C%5Cbegin%7Bpmatrix%7D%5CSigma%7Baa%7D%26%5CSigma%7Bab%7D%5C%5C%5CSigma%7Bba%7D%26%5CSigma%7Bbb%7D%5Cend%7Bpmatrix%7D%5Cright%29%5C%5C%0Ax_b%7Cx_a%5Csim%5Cmathcal%7BN%7D%28%5Cmu%7Bb%7Ca%7D%2C%5CSigma%7Bb%7Ca%7D%29%5C%5C%0A%5Cmu%7Bb%7Ca%7D%3D%5CSigma%7Bba%7D%5CSigma%7Baa%7D%5E%7B-1%7D%28xa-%5Cmu_a%29%2B%5Cmu_b%5C%5C%0A%5CSigma%7Bb%7Ca%7D%3D%5CSigma%7Bbb%7D-%5CSigma%7Bba%7D%5CSigma%7Baa%7D%5E%7B-1%7D%5CSigma%7Bab%7D%0A%5Cend%7Balign%7D%0A#crop=0&crop=0&crop=1&crop=1&id=Oc7aN&originHeight=152&originWidth=384&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
可以直接写出:
%7CX%2CY%2CX%5E)%3Dp(f(X%5E)%7CY)%5C%5C%0A%3D%5Cmathcal%7BN%7D(k(X%5E%2CX)%5Bk(X%2CX)%2B%5Csigma%5E2%5Cmathbb%7BI%7D%5D%5E%7B-1%7D(Y-%5Cmu(X))%2B%5Cmu(X%5E)%2C%5C%5C%0Ak(X%5E%2CX%5E)-k(X%5E%2CX)%5Bk(X%2CX)%2B%5Csigma%5E2%5Cmathbb%7BI%7D%5D%5E%7B1%7Dk(X%2CX%5E))%0A#card=math&code=p%28f%28X%5E%2A%29%7CX%2CY%2CX%5E%2A%29%3Dp%28f%28X%5E%2A%29%7CY%29%5C%5C%0A%3D%5Cmathcal%7BN%7D%28k%28X%5E%2A%2CX%29%5Bk%28X%2CX%29%2B%5Csigma%5E2%5Cmathbb%7BI%7D%5D%5E%7B-1%7D%28Y-%5Cmu%28X%29%29%2B%5Cmu%28X%5E%2A%29%2C%5C%5C%0Ak%28X%5E%2A%2CX%5E%2A%29-k%28X%5E%2A%2CX%29%5Bk%28X%2CX%29%2B%5Csigma%5E2%5Cmathbb%7BI%7D%5D%5E%7B1%7Dk%28X%2CX%5E%2A%29%29%0A#crop=0&crop=0&crop=1&crop=1&id=HgOXP&originHeight=92&originWidth=900&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
所以对于 %2B%5Cvarepsilon#card=math&code=Y%3Df%28X%5E%2A%29%2B%5Cvarepsilon#crop=0&crop=0&crop=1&crop=1&id=jDGjA&originHeight=26&originWidth=134&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=):
%5Bk(X%2CX)%2B%5Csigma%5E2%5Cmathbb%7BI%7D%5D%5E%7B-1%7D(Y-%5Cmu(X))%2B%5Cmu(X%5E)%2C%5C%5C%0Ak(X%5E%2CX%5E)-k(X%5E%2CX)%5Bk(X%2CX)%2B%5Csigma%5E2%5Cmathbb%7BI%7D%5D%5E%7B1%7Dk(X%2CX%5E*)%2B%5Csigma%5E2%5Cmathbb%7BI%7D)%0A#card=math&code=%5Cmathcal%7BN%7D%28k%28X%5E%2A%2CX%29%5Bk%28X%2CX%29%2B%5Csigma%5E2%5Cmathbb%7BI%7D%5D%5E%7B-1%7D%28Y-%5Cmu%28X%29%29%2B%5Cmu%28X%5E%2A%29%2C%5C%5C%0Ak%28X%5E%2A%2CX%5E%2A%29-k%28X%5E%2A%2CX%29%5Bk%28X%2CX%29%2B%5Csigma%5E2%5Cmathbb%7BI%7D%5D%5E%7B1%7Dk%28X%2CX%5E%2A%29%2B%5Csigma%5E2%5Cmathbb%7BI%7D%29%0A#crop=0&crop=0&crop=1&crop=1&id=vJKeb&originHeight=62&originWidth=900&originalType=binary&ratio=1&rotation=0&showTitle=false&status=done&style=none&title=)
我们看到,函数空间的观点更加简单易于求解。
