背景

在分析数据之间的关系时,线性回归是我们最常用的模型,对于给定的一组数据Box-Cox变换 - 图2#card=math&code=%28%5Cboldsymbol%7Bx%7Di%2Cy_i%29),Box-Cox变换 - 图3 ,我们可以用最小二乘法来确定合适的参数向量Box-Cox变换 - 图4,用![](https://g.yuque.com/gr/latex?%5Cboldsymbol%7Bx%7D%7Bi%7D%5E%7BT%7D%5Cboldsymbol%7B%5Cbeta%20%7D#card=math&code=%5Cboldsymbol%7Bx%7D_%7Bi%7D%5E%7BT%7D%5Cboldsymbol%7B%5Cbeta%20%7D) 来逼近Box-Cox变换 - 图5 .

但线性回归也不是万能的,只有在误差Box-Cox变换 - 图6 满足均值为0的正态分布时,我们才认为线性回归取得了良好的效果,也就是一般线性模型假定

Box-Cox变换 - 图7%20%2Ci%3D1%2C2%2C%5Ccdots%20%2Cn%0A#card=math&code=yi%3D%5Cboldsymbol%7Bx%7D%7Bi%7D%5E%7BT%7D%5Cboldsymbol%7B%5Cbeta%20%7D%2B%5Cvarepsilon%20_i%2C%5Cquad%20%5Cvarepsilon%20_i%5Csim%20N%5Cleft%28%200%2C%5Csigma%20%5E2%20%5Cright%29%20%2Ci%3D1%2C2%2C%5Ccdots%20%2Cn%0A)

对于整体样本来说,形式为:

Box-Cox变换 - 图8%20%5Ctext%7B%EF%BC%8C%E5%85%B6%E4%B8%AD%7DX%3D%5Cleft%5B%20%5Cbegin%7Barray%7D%7Bc%7D%0A%09%5Cboldsymbol%7Bx%7D_1%5ET%5C%5C%0A%09%5Cvdots%5C%5C%0A%09%5Cboldsymbol%7Bx%7D_n%5ET%5C%5C%0A%5Cend%7Barray%7D%20%5Cright%5D%0A#card=math&code=%5Cboldsymbol%7By%7D%3DX%5Cboldsymbol%7B%5Cbeta%20%7D%2B%5Cboldsymbol%7Be%7D%2C%5Cquad%20%5Cboldsymbol%7Be%7D%5Csim%20N%5Cleft%28%200%2C%5Csigma%20%5E2%5Cboldsymbol%7BI%7D%20%5Cright%29%20%5Ctext%7B%EF%BC%8C%E5%85%B6%E4%B8%AD%7DX%3D%5Cleft%5B%20%5Cbegin%7Barray%7D%7Bc%7D%0A%09%5Cboldsymbol%7Bx%7D_1%5ET%5C%5C%0A%09%5Cvdots%5C%5C%0A%09%5Cboldsymbol%7Bx%7D_n%5ET%5C%5C%0A%5Cend%7Barray%7D%20%5Cright%5D%0A)

但在处理实际问题时,误差极有可能不服从正态分布,这时用普通线性回归模型不能很好地拟合这组数据(如有异常点、异方差、非线性关系等等)。

具体内容

为了改善拟合效果,1957年 Tukey 首先提出一种新方法,即在数据变换中引进一个新参数Box-Cox变换 - 图9 和一族函数Box-Cox变换 - 图10#card=math&code=%5Cmathrm%7BBC%7D%5Cleft%28%20y_i%2C%5Clambda%20%5Cright%29),使得变换Box-Cox变换 - 图11%7D%3D%5Cmathrm%7BBC%7D%5Cleft(%20y_i%2C%5Clambda%20%5Cright)#card=math&code=y_i%5E%7B%5Cleft%28%20%5Clambda%20%5Cright%29%7D%3D%5Cmathrm%7BBC%7D%5Cleft%28%20y_i%2C%5Clambda%20%5Cright%29) 后,满足模型的基本假定,即:

Box-Cox变换 - 图12%7D%3D%5Cboldsymbol%7Bx%7D%7Bi%7D%5E%7BT%7D%20%5Cboldsymbol%7B%5Cbeta%7D%2B%5Cvarepsilon%7Bi%7D%2C%20%5Cquad%20%5Cvarepsilon%7Bi%7D%20%5Csim%20N%5Cleft(0%2C%20%5Csigma%5E%7B2%7D%5Cright)%2C%20i%3D1%2C2%2C%20%5Ccdots%2C%20n%0A#card=math&code=y%7Bi%7D%5E%7B%28%5Clambda%29%7D%3D%5Cboldsymbol%7Bx%7D%7Bi%7D%5E%7BT%7D%20%5Cboldsymbol%7B%5Cbeta%7D%2B%5Cvarepsilon%7Bi%7D%2C%20%5Cquad%20%5Cvarepsilon_%7Bi%7D%20%5Csim%20N%5Cleft%280%2C%20%5Csigma%5E%7B2%7D%5Cright%29%2C%20i%3D1%2C2%2C%20%5Ccdots%2C%20n%0A)

其中的变换参数Box-Cox变换 - 图13 由对数据的估计得出,1957年 Tukey 提出的幂变换族为:

Box-Cox变换 - 图14%20%3D%5Cbegin%7Bcases%7D%0A%09y%5E%7B%5Clambda%7D%26%09%09%5Cmathrm%7Bif%7D%5Cquad%20%5Clambda%20%5Cne%200%5C%5C%0A%09%5Cln%20y%26%09%09%5Cmathrm%7Bif%7D%5Cquad%20%5Clambda%20%3D0%5C%5C%0A%5Cend%7Bcases%7D%0A#card=math&code=%5Cmathrm%7BBC%7D%5Cleft%28%20y%2C%5Clambda%20%5Cright%29%20%3D%5Cbegin%7Bcases%7D%0A%09y%5E%7B%5Clambda%7D%26%09%09%5Cmathrm%7Bif%7D%5Cquad%20%5Clambda%20%5Cne%200%5C%5C%0A%09%5Cln%20y%26%09%09%5Cmathrm%7Bif%7D%5Cquad%20%5Clambda%20%3D0%5C%5C%0A%5Cend%7Bcases%7D%0A)

但是这个函数在Box-Cox变换 - 图15 处不连续,作为改进,1964年 Box 和 Cox 共同研究提出了 “Box-Cox幂变换族”,其形式如下:

Box-Cox变换 - 图16%20%3D%5Cbegin%7Bcases%7D%0A%09%5Cfrac%7By%5E%7B%5Clambda%7D-1%7D%7B%5Clambda%7D%26%09%09%5Cmathrm%7Bif%7D%5Cquad%20%5Clambda%20%5Cne%200%5C%5C%0A%09%5Cln%20y%26%09%09%5Cmathrm%7Bif%7D%5Cquad%20%5Clambda%20%3D0%5C%5C%0A%5Cend%7Bcases%7D%0A#card=math&code=%5Cmathrm%7BBC%7D%5Cleft%28%20y%2C%5Clambda%20%5Cright%29%20%3D%5Cbegin%7Bcases%7D%0A%09%5Cfrac%7By%5E%7B%5Clambda%7D-1%7D%7B%5Clambda%7D%26%09%09%5Cmathrm%7Bif%7D%5Cquad%20%5Clambda%20%5Cne%200%5C%5C%0A%09%5Cln%20y%26%09%09%5Cmathrm%7Bif%7D%5Cquad%20%5Clambda%20%3D0%5C%5C%0A%5Cend%7Bcases%7D%0A)

这就使得该函数在Box-Cox变换 - 图17 处也连续了:

Box-Cox变换 - 图18%20%5E2%2B%5Ccdots%20%5Cright)%20-1%7D%7B%5Clambda%7D%5Crightarrow%20%5Cln%20y%2C%5Cquad%20%5Cleft(%20%5Clambda%20%5Crightarrow%200%20%5Cright)%0A#card=math&code=%5Cfrac%7By%5E%7B%5Clambda%7D-1%7D%7B%5Clambda%7D%3D%5Cfrac%7Be%5E%7B%5Clambda%20%5Cln%20y%7D-1%7D%7B%5Clambda%7D%5Capprox%20%5Cfrac%7B%5Cleft%28%201%2B%5Clambda%20%5Cln%20y%2B%5Cfrac%7B1%7D%7B2%7D%5Clambda%20%5E2%5Cleft%28%20%5Cln%20y%20%5Cright%29%20%5E2%2B%5Ccdots%20%5Cright%29%20-1%7D%7B%5Clambda%7D%5Crightarrow%20%5Cln%20y%2C%5Cquad%20%5Cleft%28%20%5Clambda%20%5Crightarrow%200%20%5Cright%29%0A)

这一变换得到了理论和实际应用工作者的广泛认可,是目前用得最多的数据变换类型。

Box-Cox变换 - 图19

Box-Cox变换 - 图20 的估计

对于因变量Box-Cox变换 - 图21,我们假设变换后的Box-Cox变换 - 图22%7D%3D%5Cmathrm%7BBC%7D%5Cleft(%20%5Cboldsymbol%7By%7D%2C%5Clambda%20%5Cright)#card=math&code=%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%29%7D%3D%5Cmathrm%7BBC%7D%5Cleft%28%20%5Cboldsymbol%7By%7D%2C%5Clambda%20%5Cright%29) 服从正态分布:

Box-Cox变换 - 图23%7D%3DX%5Cboldsymbol%7B%5Cbeta%20%7D%2B%5Cboldsymbol%7Be%7D%2C%5Cquad%20%5Cboldsymbol%7Be%7D%5Csim%20N%5Cleft(%200%2C%5Csigma%20%5E2%5Cboldsymbol%7BI%7D%20%5Cright)%0A#card=math&code=%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%29%7D%3DX%5Cboldsymbol%7B%5Cbeta%20%7D%2B%5Cboldsymbol%7Be%7D%2C%5Cquad%20%5Cboldsymbol%7Be%7D%5Csim%20N%5Cleft%28%200%2C%5Csigma%20%5E2%5Cboldsymbol%7BI%7D%20%5Cright%29%0A)

所以Box-Cox变换 - 图24%7D#card=math&code=%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%29%7D) 也应服从正态分布:

Box-Cox变换 - 图25%7D%5Csim%20N%5Cleft(%20X%5Cboldsymbol%7B%5Cbeta%20%7D%2C%5Csigma%20%5E2%5Cboldsymbol%7BI%7D%20%5Cright)%0A#card=math&code=%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%5Csim%20N%5Cleft%28%20X%5Cboldsymbol%7B%5Cbeta%20%7D%2C%5Csigma%20%5E2%5Cboldsymbol%7BI%7D%20%5Cright%29%0A)

我们先固定Box-Cox变换 - 图26 ,那么Box-Cox变换 - 图27 的似然函数为:

Box-Cox变换 - 图28%3D%5Cfrac%7B1%7D%7B(%5Csqrt%7B2%20%5Cpi%7D%20%5Csigma)%5E%7Bn%7D%7D%20%5Cexp%20%5Cleft%5C%7B-%5Cfrac%7B1%7D%7B2%20%5Csigma%5E%7B2%7D%7D%5Cleft(%5Cboldsymbol%7By%7D%5E%7B(%5Clambda)%7D-%5Cboldsymbol%7BX%7D%20%5Cboldsymbol%7B%5Cbeta%7D%5Cright)%5E%7BT%7D%5Cleft(%5Cboldsymbol%7By%7D%5E%7B(%5Clambda)%7D-%5Cboldsymbol%7BX%7D%20%5Cboldsymbol%7B%5Cbeta%7D%5Cright)%5Cright%5C%7D%20J%0A#card=math&code=L%5Cleft%28%5Cboldsymbol%7B%5Cbeta%7D%2C%20%5Csigma%5E%7B2%7D%5Cright%29%3D%5Cfrac%7B1%7D%7B%28%5Csqrt%7B2%20%5Cpi%7D%20%5Csigma%29%5E%7Bn%7D%7D%20%5Cexp%20%5Cleft%5C%7B-%5Cfrac%7B1%7D%7B2%20%5Csigma%5E%7B2%7D%7D%5Cleft%28%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%29%7D-%5Cboldsymbol%7BX%7D%20%5Cboldsymbol%7B%5Cbeta%7D%5Cright%29%5E%7BT%7D%5Cleft%28%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%29%7D-%5Cboldsymbol%7BX%7D%20%5Cboldsymbol%7B%5Cbeta%7D%5Cright%29%5Cright%5C%7D%20J%0A)

其中,Box-Cox变换 - 图29 为变换的Jacobi行列式:

Box-Cox变换 - 图30%7D%7D%7B%5Cmathrm%7Bd%7D%20y%7Bi%7D%7D%5Cright%7C%3D%5Cprod%7Bi%3D1%7D%5E%7Bn%7D%20y%7Bi%7D%5E%7B%5Clambda-1%7D%0A#card=math&code=J%3D%5Cprod%7Bi%3D1%7D%5E%7Bn%7D%5Cleft%7C%5Cfrac%7B%5Cmathrm%7Bd%7D%20y%7Bi%7D%5E%7B%28%5Clambda%29%7D%7D%7B%5Cmathrm%7Bd%7D%20y%7Bi%7D%7D%5Cright%7C%3D%5Cprod%7Bi%3D1%7D%5E%7Bn%7D%20y%7Bi%7D%5E%7B%5Clambda-1%7D%0A)

为了求解方便,我们换成对数似然函数:

Box-Cox变换 - 图31%20%3D%5Cln%20%5Cleft(%20L%20%5Cright)%20%3D-%5Cfrac%7Bn%7D%7B2%7D%5Cleft(%20%5Cln%20%5Cleft(%202%5Cpi%20%5Cright)%20%2B%5Cln%20%5Csigma%20%5E2%20%5Cright)%20-%5Cfrac%7B1%7D%7B2%5Csigma%20%5E2%7D%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D-%5Cboldsymbol%7BX%5Cbeta%20%7D%20%5Cright)%20%5ET%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D-%5Cboldsymbol%7BX%5Cbeta%20%7D%20%5Cright)%20%2B%5Cln%20J%0A#card=math&code=%5Cmathcal%7BL%7D%5Cleft%28%20%5Cboldsymbol%7B%5Cbeta%20%7D%2C%5Csigma%20%5E2%20%5Cright%29%20%3D%5Cln%20%5Cleft%28%20L%20%5Cright%29%20%3D-%5Cfrac%7Bn%7D%7B2%7D%5Cleft%28%20%5Cln%20%5Cleft%28%202%5Cpi%20%5Cright%29%20%2B%5Cln%20%5Csigma%20%5E2%20%5Cright%29%20-%5Cfrac%7B1%7D%7B2%5Csigma%20%5E2%7D%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Cboldsymbol%7BX%5Cbeta%20%7D%20%5Cright%29%20%5ET%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Cboldsymbol%7BX%5Cbeta%20%7D%20%5Cright%29%20%2B%5Cln%20J%0A)

我们要求解Box-Cox变换 - 图32 的极大似然估计,那么要让对数似然函数取到极值点,也就是让Box-Cox变换 - 图33#card=math&code=%5Cmathcal%7BL%7D%5Cleft%28%5Cboldsymbol%7B%5Cbeta%7D%2C%20%5Csigma%5E%7B2%7D%5Cright%29) 关于Box-Cox变换 - 图34Box-Cox变换 - 图35 求的偏导都为0.

先对Box-Cox变换 - 图36 求偏导得:

Box-Cox变换 - 图37%7D%7B%5Cpartial%20%5Cboldsymbol%7B%5Cbeta%20%7D%7D%3D%5Cfrac%7B1%7D%7B%5Csigma%20%5E2%7D%5Ccdot%20%5Cboldsymbol%7BX%7D%5ET%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D-%5Cboldsymbol%7BX%5Cbeta%20%7D%20%5Cright)%20%3D0%0A#card=math&code=%5Cfrac%7B%5Cpartial%20%5Cmathcal%7BL%7D%5Cleft%28%20%5Cboldsymbol%7B%5Cbeta%20%7D%2C%5Csigma%20%5E2%20%5Cright%29%7D%7B%5Cpartial%20%5Cboldsymbol%7B%5Cbeta%20%7D%7D%3D%5Cfrac%7B1%7D%7B%5Csigma%20%5E2%7D%5Ccdot%20%5Cboldsymbol%7BX%7D%5ET%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Cboldsymbol%7BX%5Cbeta%20%7D%20%5Cright%29%20%3D0%0A)

得到Box-Cox变换 - 图38 的极大似然估计:

Box-Cox变换 - 图39%3D%5Cleft(%5Cboldsymbol%7BX%7D%5E%7BT%7D%20%5Cboldsymbol%7BX%7D%5Cright)%5E%7B-1%7D%20%5Cboldsymbol%7BX%7D%5E%7BT%7D%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda)%7D%0A#card=math&code=%5Chat%7B%5Cboldsymbol%7B%5Cbeta%7D%7D%28%5Clambda%29%3D%5Cleft%28%5Cboldsymbol%7BX%7D%5E%7BT%7D%20%5Cboldsymbol%7BX%7D%5Cright%29%5E%7B-1%7D%20%5Cboldsymbol%7BX%7D%5E%7BT%7D%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%29%7D%0A)

然后再对Box-Cox变换 - 图40 求偏导:

Box-Cox变换 - 图41%7D%7B%5Cpartial%20%5Csigma%20%5E2%7D%3D%5Cleft(%20-%5Cfrac%7Bn%7D%7B2%7D%20%5Cright)%20%5Ccdot%20%5Cfrac%7B1%7D%7B%5Csigma%20%5E2%7D%2B%5Cfrac%7B1%7D%7B2%5Csigma%20%5E4%7D%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D-%5Cboldsymbol%7BX%5Cbeta%20%7D%20%5Cright)%20%5ET%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D-%5Cboldsymbol%7BX%5Cbeta%20%7D%20%5Cright)%0A#card=math&code=%5Cfrac%7B%5Cpartial%20%5Cmathcal%7BL%7D%5Cleft%28%20%5Cboldsymbol%7B%5Cbeta%20%7D%2C%5Csigma%20%5E2%20%5Cright%29%7D%7B%5Cpartial%20%5Csigma%20%5E2%7D%3D%5Cleft%28%20-%5Cfrac%7Bn%7D%7B2%7D%20%5Cright%29%20%5Ccdot%20%5Cfrac%7B1%7D%7B%5Csigma%20%5E2%7D%2B%5Cfrac%7B1%7D%7B2%5Csigma%20%5E4%7D%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Cboldsymbol%7BX%5Cbeta%20%7D%20%5Cright%29%20%5ET%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Cboldsymbol%7BX%5Cbeta%20%7D%20%5Cright%29%0A)

得到Box-Cox变换 - 图42 的极大似然估计:

Box-Cox变换 - 图43%26%3D%5Cfrac%7B1%7D%7Bn%7D%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright)%20%5ET%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright)%5C%5C%0A%09%26%3D%5Cfrac%7B1%7D%7Bn%7D%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D-%5Cboldsymbol%7BX%7D%5Cleft(%20%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7BX%7D%20%5Cright)%20%5E%7B-1%7D%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%20%5Cright)%20%5ET%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D-%5Cboldsymbol%7BX%7D%5Cleft(%20%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7BX%7D%20%5Cright)%20%5E%7B-1%7D%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%20%5Cright)%5C%5C%0A%09%26%3D%5Cfrac%7B1%7D%7Bn%7D%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%20%5Cright)%20%5ET%5Cleft(%20%5Cboldsymbol%7BI%7D-%5Cboldsymbol%7BX%7D%5Cleft(%20%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7BX%7D%20%5Cright)%20%5E%7B-1%7D%5Cboldsymbol%7BX%7D%5ET%20%5Cright)%20%5ET%5Cleft(%20%5Cboldsymbol%7BI%7D-%5Cboldsymbol%7BX%7D%5Cleft(%20%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7BX%7D%20%5Cright)%20%5E%7B-1%7D%5Cboldsymbol%7BX%7D%5ET%20%5Cright)%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%5C%5C%0A%09%26%3D%5Cfrac%7B1%7D%7Bn%7D%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%20%5Cright)%20%5ET%5Cleft(%20%5Cboldsymbol%7BI%7D-%5Cboldsymbol%7BX%7D%5Cleft(%20%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7BX%7D%20%5Cright)%20%5E%7B-1%7D%5Cboldsymbol%7BX%7D%5ET%20%5Cright)%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%5C%5C%0A%09%26%3D%5Cfrac%7B1%7D%7Bn%7D%5Cboldsymbol%7BQ%7D_e%5Cleft(%20%5Clambda%20%2C%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%20%5Cright)%5C%5C%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0A%09%5Chat%7B%5Csigma%7D%5E2%28%5Clambda%20%29%26%3D%5Cfrac%7B1%7D%7Bn%7D%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright%29%20%5ET%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright%29%5C%5C%0A%09%26%3D%5Cfrac%7B1%7D%7Bn%7D%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Cboldsymbol%7BX%7D%5Cleft%28%20%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7BX%7D%20%5Cright%29%20%5E%7B-1%7D%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29%20%5ET%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Cboldsymbol%7BX%7D%5Cleft%28%20%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7BX%7D%20%5Cright%29%20%5E%7B-1%7D%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29%5C%5C%0A%09%26%3D%5Cfrac%7B1%7D%7Bn%7D%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29%20%5ET%5Cleft%28%20%5Cboldsymbol%7BI%7D-%5Cboldsymbol%7BX%7D%5Cleft%28%20%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7BX%7D%20%5Cright%29%20%5E%7B-1%7D%5Cboldsymbol%7BX%7D%5ET%20%5Cright%29%20%5ET%5Cleft%28%20%5Cboldsymbol%7BI%7D-%5Cboldsymbol%7BX%7D%5Cleft%28%20%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7BX%7D%20%5Cright%29%20%5E%7B-1%7D%5Cboldsymbol%7BX%7D%5ET%20%5Cright%29%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%5C%5C%0A%09%26%3D%5Cfrac%7B1%7D%7Bn%7D%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29%20%5ET%5Cleft%28%20%5Cboldsymbol%7BI%7D-%5Cboldsymbol%7BX%7D%5Cleft%28%20%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7BX%7D%20%5Cright%29%20%5E%7B-1%7D%5Cboldsymbol%7BX%7D%5ET%20%5Cright%29%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%5C%5C%0A%09%26%3D%5Cfrac%7B1%7D%7Bn%7D%5Cboldsymbol%7BQ%7D_e%5Cleft%28%20%5Clambda%20%2C%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29%5C%5C%0A%5Cend%7Baligned%7D%0A)

其中Box-Cox变换 - 图44%7D%5Cright)#card=math&code=%5Cboldsymbol%7BQ%7D_%7Be%7D%5Cleft%28%5Clambda%2C%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%29%7D%5Cright%29) 为残差平方和:

Box-Cox变换 - 图45%7D%5Cright)%3D%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%20%5Cright)%20%5ET%5Cleft(%5Cboldsymbol%7BI%7D-%5Cboldsymbol%7BX%7D%5Cleft(%5Cboldsymbol%7BX%7D%5E%7BT%7D%20%5Cboldsymbol%7BX%7D%5Cright)%5E%7B-1%7D%20%5Cboldsymbol%7BX%7D%5E%7B%5Cprime%7D%5Cright)%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda)%7D%0A#card=math&code=%5Cboldsymbol%7BQ%7D_%7Be%7D%5Cleft%28%5Clambda%2C%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%29%7D%5Cright%29%3D%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29%20%5ET%5Cleft%28%5Cboldsymbol%7BI%7D-%5Cboldsymbol%7BX%7D%5Cleft%28%5Cboldsymbol%7BX%7D%5E%7BT%7D%20%5Cboldsymbol%7BX%7D%5Cright%29%5E%7B-1%7D%20%5Cboldsymbol%7BX%7D%5E%7B%5Cprime%7D%5Cright%29%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%29%7D%0A)

对应的似然最大值为:

Box-Cox变换 - 图46%26%3D%5Cmathcal%7BL%7D%5Cleft(%20%5Cboldsymbol%7B%5Chat%7B%5Cbeta%7D%7D(%5Clambda%20)%2C%5Chat%7B%5Csigma%7D%5E2(%5Clambda%20)%20%5Cright)%5C%5C%0A%09%26%3D-%5Cfrac%7Bn%7D%7B2%7D%5Cleft(%20%5Cln%202%5Cpi%20%2B%5Cln%20%5Cfrac%7B%5Cboldsymbol%7BQ%7De%5Cleft(%20%5Clambda%20%2C%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%20%5Cright)%7D%7Bn%7D%20%5Cright)%20-%5Cfrac%7Bn%7D%7B2%7D%2B%5Cln%20J%5C%5C%0A%09%26%3D-%5Cfrac%7Bn%7D%7B2%7D%5Cln%20%5Cboldsymbol%7BQ%7D_e%5Cleft(%20%5Clambda%20%2C%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%20%5Cright)%20%2B%5Cln%20J-%5Cfrac%7Bn%7D%7B2%7D%5Cleft(%20%5Cln%20%5Cfrac%7B2%5Cpi%7D%7Bn%7D%2B1%20%5Cright)%5C%5C%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0A%09%5Cmathcal%7BL%7D%7B%5Cmax%7D%28%5Clambda%20%29%26%3D%5Cmathcal%7BL%7D%5Cleft%28%20%5Cboldsymbol%7B%5Chat%7B%5Cbeta%7D%7D%28%5Clambda%20%29%2C%5Chat%7B%5Csigma%7D%5E2%28%5Clambda%20%29%20%5Cright%29%5C%5C%0A%09%26%3D-%5Cfrac%7Bn%7D%7B2%7D%5Cleft%28%20%5Cln%202%5Cpi%20%2B%5Cln%20%5Cfrac%7B%5Cboldsymbol%7BQ%7D_e%5Cleft%28%20%5Clambda%20%2C%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29%7D%7Bn%7D%20%5Cright%29%20-%5Cfrac%7Bn%7D%7B2%7D%2B%5Cln%20J%5C%5C%0A%09%26%3D-%5Cfrac%7Bn%7D%7B2%7D%5Cln%20%5Cboldsymbol%7BQ%7D_e%5Cleft%28%20%5Clambda%20%2C%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29%20%2B%5Cln%20J-%5Cfrac%7Bn%7D%7B2%7D%5Cleft%28%20%5Cln%20%5Cfrac%7B2%5Cpi%7D%7Bn%7D%2B1%20%5Cright%29%5C%5C%0A%5Cend%7Baligned%7D%0A)

我们要找出使得Box-Cox变换 - 图47#card=math&code=%5Cmathcal%7BL%7D_%7B%5Cmax%7D%28%5Clambda%20%29) 最大的Box-Cox变换 - 图48

Box-Cox变换 - 图49%5C%5C%0A%09%26%3D%5Cunderset%7B%5Clambda%7D%7B%5Cmathrm%7Barg%7D%5Cmax%7D-%5Cfrac%7Bn%7D%7B2%7D%5Cln%20%5Cboldsymbol%7BQ%7De%5Cleft(%20%5Clambda%20%2C%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%20%5Cright)%20%2B%5Cln%20J%5C%5C%0A%09%26%3D%5Cunderset%7B%5Clambda%7D%7B%5Cmathrm%7Barg%7D%5Cmax%7D-%5Cfrac%7Bn%7D%7B2%7D%5Cln%20%5Cleft(%20%5Cfrac%7B%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%20%5Cright)%20%5ET%7D%7BJ%5E%7B%5Cfrac%7B1%7D%7Bn%7D%7D%7D%5Cleft(%20%5Cboldsymbol%7BI%7D-%5Cboldsymbol%7BX%7D%5Cleft(%20%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7BX%7D%20%5Cright)%20%5E%7B-1%7D%5Cboldsymbol%7BX%7D%5ET%20%5Cright)%20%5Cfrac%7B%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%7D%7BJ%5E%7B%5Cfrac%7B1%7D%7Bn%7D%7D%7D%20%5Cright)%5C%5C%0A%09%26%3D%5Cunderset%7B%5Clambda%7D%7B%5Cmathrm%7Barg%7D%5Cmax%7D-%5Cfrac%7Bn%7D%7B2%7D%5Cln%20Q_e%5Cleft(%20%5Clambda%20%2C%5Cboldsymbol%7Bz%7D%5E%7B(%5Clambda%20)%7D%20%5Cright)%5C%5C%0A%09%26%3D%5Cunderset%7B%5Clambda%7D%7B%5Cmathrm%7Barg%7D%5Cmin%7D%5C%2C%5C%2CQ_e%5Cleft(%20%5Clambda%20%2C%5Cboldsymbol%7Bz%7D%5E%7B(%5Clambda%20)%7D%20%5Cright)%20%5Cquad%5C%5C%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0A%09%5Chat%7B%5Clambda%7D%26%3D%5Cunderset%7B%5Clambda%7D%7B%5Cmathrm%7Barg%7D%5Cmax%7D%5C%2C%5C%2C%5Cmathcal%7BL%7D%7B%5Cmax%7D%28%5Clambda%20%29%5C%5C%0A%09%26%3D%5Cunderset%7B%5Clambda%7D%7B%5Cmathrm%7Barg%7D%5Cmax%7D-%5Cfrac%7Bn%7D%7B2%7D%5Cln%20%5Cboldsymbol%7BQ%7D_e%5Cleft%28%20%5Clambda%20%2C%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29%20%2B%5Cln%20J%5C%5C%0A%09%26%3D%5Cunderset%7B%5Clambda%7D%7B%5Cmathrm%7Barg%7D%5Cmax%7D-%5Cfrac%7Bn%7D%7B2%7D%5Cln%20%5Cleft%28%20%5Cfrac%7B%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29%20%5ET%7D%7BJ%5E%7B%5Cfrac%7B1%7D%7Bn%7D%7D%7D%5Cleft%28%20%5Cboldsymbol%7BI%7D-%5Cboldsymbol%7BX%7D%5Cleft%28%20%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7BX%7D%20%5Cright%29%20%5E%7B-1%7D%5Cboldsymbol%7BX%7D%5ET%20%5Cright%29%20%5Cfrac%7B%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%7D%7BJ%5E%7B%5Cfrac%7B1%7D%7Bn%7D%7D%7D%20%5Cright%29%5C%5C%0A%09%26%3D%5Cunderset%7B%5Clambda%7D%7B%5Cmathrm%7Barg%7D%5Cmax%7D-%5Cfrac%7Bn%7D%7B2%7D%5Cln%20Q_e%5Cleft%28%20%5Clambda%20%2C%5Cboldsymbol%7Bz%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29%5C%5C%0A%09%26%3D%5Cunderset%7B%5Clambda%7D%7B%5Cmathrm%7Barg%7D%5Cmin%7D%5C%2C%5C%2CQ_e%5Cleft%28%20%5Clambda%20%2C%5Cboldsymbol%7Bz%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29%20%5Cquad%5C%5C%0A%5Cend%7Baligned%7D%0A)

其中:

Box-Cox变换 - 图50%7D%20%5Cright)%20%3D%5Cleft(%20%5Cboldsymbol%7Bz%7D%5E%7B(%5Clambda%20)%7D%20%5Cright)%20%5ET%5Cleft(%20%5Cboldsymbol%7BI%7D-%5Cboldsymbol%7BX%7D%5Cleft(%20%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7BX%7D%20%5Cright)%20%5E%7B-1%7D%5Cboldsymbol%7BX%7D%5ET%20%5Cright)%20%5Cboldsymbol%7Bz%7D%5E%7B(%5Clambda%20)%7D%0A#card=math&code=Q_e%5Cleft%28%20%5Clambda%20%2C%5Cboldsymbol%7Bz%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29%20%3D%5Cleft%28%20%5Cboldsymbol%7Bz%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29%20%5ET%5Cleft%28%20%5Cboldsymbol%7BI%7D-%5Cboldsymbol%7BX%7D%5Cleft%28%20%5Cboldsymbol%7BX%7D%5ET%5Cboldsymbol%7BX%7D%20%5Cright%29%20%5E%7B-1%7D%5Cboldsymbol%7BX%7D%5ET%20%5Cright%29%20%5Cboldsymbol%7Bz%7D%5E%7B%28%5Clambda%20%29%7D%0A)

Box-Cox变换 - 图51%7D%3D%5Cleft%5Bz%7B1%7D%5E%7B(%5Clambda)%7D%2C%20%5Cldots%2C%20z%7Bn%7D%5E%7B(%5Clambda)%7D%5Cright%5D%5E%7BT%7D%3D%5Cfrac%7B%5Cboldsymbol%7By%7D%5E%7B(%5Clambda)%7D%7D%7BJ%5E%7B%5Cfrac%7B1%7D%7Bn%7D%7D%7D%0A#card=math&code=%5Cboldsymbol%7Bz%7D%5E%7B%28%5Clambda%29%7D%3D%5Cleft%5Bz%7B1%7D%5E%7B%28%5Clambda%29%7D%2C%20%5Cldots%2C%20z%7Bn%7D%5E%7B%28%5Clambda%29%7D%5Cright%5D%5E%7BT%7D%3D%5Cfrac%7B%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%29%7D%7D%7BJ%5E%7B%5Cfrac%7B1%7D%7Bn%7D%7D%7D%0A)

Box-Cox变换 - 图52%7D%3D%5Cleft%5C%7B%20%5Cbegin%7Barray%7D%7Bc%7D%0A%09%5Cfrac%7By%7Bi%7D%5E%7B%5Clambda%7D%7D%7B%5Cleft(%20%5Cprod%7Bi%3D1%7D%5En%7Byi%7D%20%5Cright)%20%5E%7B%5Cfrac%7B%5Clambda%20-1%7D%7Bn%7D%7D%7D%2C%5Cquad%20%5Clambda%20%5Cne%200%5C%5C%0A%09%5Cleft(%20%5Cln%20y_i%20%5Cright)%20%5Cleft(%20%5Cprod%7Bi%3D1%7D%5En%7Byi%7D%20%5Cright)%20%5E%7B%5Cfrac%7B1%7D%7Bn%7D%7D%2C%5Cquad%20%5Clambda%20%3D0%5C%5C%0A%5Cend%7Barray%7D%20%5Cright.%0A#card=math&code=z%7Bi%7D%5E%7B%28%5Clambda%20%29%7D%3D%5Cleft%5C%7B%20%5Cbegin%7Barray%7D%7Bc%7D%0A%09%5Cfrac%7By%7Bi%7D%5E%7B%5Clambda%7D%7D%7B%5Cleft%28%20%5Cprod%7Bi%3D1%7D%5En%7Byi%7D%20%5Cright%29%20%5E%7B%5Cfrac%7B%5Clambda%20-1%7D%7Bn%7D%7D%7D%2C%5Cquad%20%5Clambda%20%5Cne%200%5C%5C%0A%09%5Cleft%28%20%5Cln%20y_i%20%5Cright%29%20%5Cleft%28%20%5Cprod%7Bi%3D1%7D%5En%7By_i%7D%20%5Cright%29%20%5E%7B%5Cfrac%7B1%7D%7Bn%7D%7D%2C%5Cquad%20%5Clambda%20%3D0%5C%5C%0A%5Cend%7Barray%7D%20%5Cright.%0A)

此时,只需求出使Box-Cox变换 - 图53%7D%20%5Cright)#card=math&code=Q_e%5Cleft%28%20%5Clambda%20%2C%5Cboldsymbol%7Bz%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29) 最小的Box-Cox变换 - 图54 即可。

非回归情况

看似 Box-Cox 转换的问题都被我们解决了,但实际上我们还会遇到一个问题:

如果我们要计算Box-Cox变换 - 图55%7D%20%5Cright)#card=math&code=Q_e%5Cleft%28%20%5Clambda%20%2C%5Cboldsymbol%7Bz%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29) ,必须要同时拥有自变量Box-Cox变换 - 图56 和因变量Box-Cox变换 - 图57 的数据。

但在实际大多数情况中,我们只是将一组非正态分布的数据转换成正态分布,并没有自变量或因变量的说法,那这时该怎么计算 Box-Cox变换 - 图58%7D%20%5Cright)#card=math&code=Q_e%5Cleft%28%20%5Clambda%20%2C%5Cboldsymbol%7Bz%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29) 呢?

我查阅了MATLAB中 boxcox 函数的源码,其中计算对数似然函数Box-Cox变换 - 图59#card=math&code=%5Cmathcal%7BL%7D_%7B%5Cmax%7D%28%5Clambda%20%29) 的代码如下:

  1. llf = -(n/2) .* log(std(xhat, 1, 1)'.^ 2) + (lambda-1)*(sum(log(x)));

我们知道:

Box-Cox变换 - 图60%26%3D-%5Cfrac%7Bn%7D%7B2%7D%5Cln%20%5Cboldsymbol%7BQ%7De%5Cleft(%20%5Clambda%20%2C%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%20%5Cright)%20%2B%5Cln%20J%5C%5C%0A%09%26%3D-%5Cfrac%7Bn%7D%7B2%7D%5Cln%20%5Cleft%5B%20%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright)%20%5ET%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright)%20%5Cright%5D%20%2B%5Cleft(%20%5Clambda%20-1%20%5Cright)%20%5Csum%7Bi%3D1%7D%5En%7B%5Cln%20yi%7D%5C%5C%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0A%09%5Cmathcal%7BL%7D%7B%5Cmax%7D%28%5Clambda%20%29%26%3D-%5Cfrac%7Bn%7D%7B2%7D%5Cln%20%5Cboldsymbol%7BQ%7De%5Cleft%28%20%5Clambda%20%2C%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29%20%2B%5Cln%20J%5C%5C%0A%09%26%3D-%5Cfrac%7Bn%7D%7B2%7D%5Cln%20%5Cleft%5B%20%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright%29%20%5ET%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright%29%20%5Cright%5D%20%2B%5Cleft%28%20%5Clambda%20-1%20%5Cright%29%20%5Csum%7Bi%3D1%7D%5En%7B%5Cln%20y_i%7D%5C%5C%0A%5Cend%7Baligned%7D%0A)

仔细观看,可以发现MATLAB中,使用了Box-Cox变换 - 图61%7D#card=math&code=%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D) 的方差来代替Box-Cox变换 - 图62%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright)%20%5ET%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright)#card=math&code=%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright%29%20%5ET%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright%29)

想一想也可以理解:我们现在要对数据 Box-Cox变换 - 图63 进行处理,但并没有Box-Cox变换 - 图64 ,就没法求残差Box-Cox变换 - 图65%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright)#card=math&code=%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright%29).

但是,由于

Box-Cox变换 - 图66%7D%5Csim%20N%5Cleft(%20X%5Cboldsymbol%7B%5Cbeta%20%7D%2C%5Csigma%20%5E2%5Cboldsymbol%7BI%7D%20%5Cright)%0A#card=math&code=%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%5Csim%20N%5Cleft%28%20X%5Cboldsymbol%7B%5Cbeta%20%7D%2C%5Csigma%20%5E2%5Cboldsymbol%7BI%7D%20%5Cright%29%0A)

故其均值为:

Box-Cox变换 - 图67%7D%7D%3D%5Cmathbb%7BE%7D%5Cleft%5B%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%20%5Cright%5D%20%3D%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%0A#card=math&code=%5Coverline%7B%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%7D%3D%5Cmathbb%7BE%7D%5Cleft%5B%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%5D%20%3D%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%0A)

于是我们可以用均值Box-Cox变换 - 图68%7D%7D#card=math&code=%5Coverline%7B%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%7D) 来近似代替Box-Cox变换 - 图69,即:

Box-Cox变换 - 图70%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright)%20%5ET%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright)%20%5Capprox%20%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D-%5Coverline%7B%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%7D%20%5Cright)%20%5ET%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D-%5Coverline%7B%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%7D%20%5Cright)%20%3Dn%5Cmathrm%7BVar%7D%5Cleft(%20%5Cboldsymbol%7By%7D%5E%7B(%5Clambda%20)%7D%20%5Cright)%20%0A#card=math&code=%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright%29%20%5ET%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Cboldsymbol%7BX%5Chat%7B%5Cbeta%7D%7D%20%5Cright%29%20%5Capprox%20%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Coverline%7B%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%7D%20%5Cright%29%20%5ET%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D-%5Coverline%7B%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%7D%20%5Cright%29%20%3Dn%5Cmathrm%7BVar%7D%5Cleft%28%20%5Cboldsymbol%7By%7D%5E%7B%28%5Clambda%20%29%7D%20%5Cright%29%20%0A)

之后,我们就可以算出对数似然函数Box-Cox变换 - 图71#card=math&code=%5Cmathcal%7BL%7D_%7B%5Cmax%7D%28%5Clambda%20%29) 的值,然后遍历所有Box-Cox变换 - 图72 找出最小值即可。

实例

我们生成 1000 个服从均值为1的负指数分布的样本:Box-Cox变换 - 图73#card=math&code=X%5Csim%20%5Ctext%7BExp%7D%281%29) ,概率密度函数为Box-Cox变换 - 图74%20%3De%5E%7B-x%7D#card=math&code=f%5Cleft%28%20x%20%5Cright%29%20%3De%5E%7B-x%7D)

对数似然函数Box-Cox变换 - 图75#card=math&code=%5Cmathcal%7BL%7D_%7B%5Cmax%7D%28%5Clambda%20%29) 的图像如下:

Box-Cox变换 - 图76

我们可以找到其极小值点为Box-Cox变换 - 图77.

Box-Cox变换 - 图78

左上图为原始数据的频率分布直方图,可以明显地看出为负指数分布,右上为正态分布的分位数图(Quantile Quantile Plot),可见原始数据明显偏离了基准线。

左下图为 Box-Cox 变换后的分布,可以看出较为明显地服从正态分布。

代码

  1. clc,clear;
  2. n=1000;
  3. data = exprnd(1,[n,1]);
  4. figure
  5. subplot(2,2,1)
  6. histogram(data)
  7. subplot(2,2,2)
  8. qqplot(data)
  9. [bct, bclambda] = boxcox(data);
  10. subplot(2,2,3)
  11. histogram(bct)
  12. subplot(2,2,4)
  13. qqplot(bct)
  14. figure
  15. hold on
  16. fplot(@(l) logLikelihood(l,data),[-1,1])
  17. plot(bclambda,logLikelihood(bclambda,data),'o','MarkerFaceColor','r')
  18. function llf = logLikelihood(lambda,x)
  19. % Compute the log likelihood function for a given lambda and x
  20. % Get the length of the data vector.
  21. n = length(x);
  22. % Transform data using a particular lambda.
  23. xhat = bcTransform(lambda,x);
  24. % The algorithm calls for maximizing the LLF; however, since we
  25. % have only functions that minimize, the LLF is negated so that we
  26. % can minimize the function instead of maximizing it to find the
  27. % optimum lambda.
  28. llf = -(n/2) .* log(std(xhat, 1, 1)' .^ 2) + (lambda-1)*(sum(log(x)));
  29. llf = -llf;
  30. end
  31. function bct = bcTransform(lambda,x)
  32. % Perform the actual box-cox transform.
  33. % Get the length of the data vector.
  34. n = length(x);
  35. % Make sure that the lambda vector is a column vector.
  36. lambda = lambda(:);
  37. % Find where the non-zero and zero lambda's are.
  38. nzlambda = find(lambda ~= 0);
  39. zlambda = find(lambda == 0);
  40. % Create a matrix of the data by replicating the data vector
  41. % columnwise.
  42. mx = x * ones(1, length(lambda));
  43. % Create a matrix of the lambda by replicating the lambda vector
  44. % rowwise.
  45. mlambda = (lambda * ones(1, n))';
  46. % Calculate the transformed data vector, xhat.
  47. bct(:, nzlambda) = ((mx(:, nzlambda).^mlambda(:, nzlambda))-1) ./ ...
  48. mlambda(:, nzlambda);
  49. bct(:, zlambda) = log(mx(:, zlambda));
  50. end