• 前面说到机器学习中的任务主要是分类和回归,分类如前面的贝叶斯定理所展现的那样是输出一个类别(有限的离散数值),而回归的任务则是输出一个real value

3.1线性回归的基本思路

  • 对于n维的样本三、线性回归Linear Regression - 图1找到一系列线性函数三、线性回归Linear Regression - 图2使得三、线性回归Linear Regression - 图3%3D%5Comega%5ET%5Cboldsymbol%20x%2Bb#card=math&code=f%28x%29%3D%5Comega%5ET%5Cboldsymbol%20x%2Bb),这个问题可以变形为n+1维使得三、线性回归Linear Regression - 图4%3D%5Comega%5ET%5Cboldsymbol%20x#card=math&code=f%28x%29%3D%5Comega%5ET%5Cboldsymbol%20x)其中三、线性回归Linear Regression - 图5三、线性回归Linear Regression - 图6,这样以来表达式的形式更加统一了,而我们的目标就是训练出这个函数f,使得对于训练数据三、线性回归Linear Regression - 图7#card=math&code=%28x_i%2Cy_i%29),学习出一个函数f使得三、线性回归Linear Regression - 图8%3Dy_i#card=math&code=f%28x_i%29%3Dy_i),我们要求解的对象就是这个n+1维的向量三、线性回归Linear Regression - 图9

3.2线性回归的loss函数和解

3.2.1 推导过程

  • 线性回归常见的loss函数定义是最小平方误差(MSE)

三、线性回归Linear Regression - 图10)%5E2%0A#card=math&code=MSE%3D%5Cfrac%7B1%7D%7Bn%7D%5Csum%5Climits_%7Bi%3D1%7D%5Climits%5E%7Bn%7D%28y_i-f%28x_i%2C%5Comega%29%29%5E2%0A)

  • 也会使用残差平方和(RSS),其形式如下:

三、线性回归Linear Regression - 图11%3D%5Csum%5Climits%7Bi%3D1%7D%5Climits%5E%7Bn%7D(y_i-f(x_i%2C%5Comega))%5E2%3D(%5Cboldsymbol%20y-X%5ET%5Comega)%5ET(%5Cboldsymbol%20y-X%5ET%5Comega)%0A#card=math&code=J_n%28%5Calpha%29%3D%5Csum%5Climits%7Bi%3D1%7D%5Climits%5E%7Bn%7D%28y_i-f%28x_i%2C%5Comega%29%29%5E2%3D%28%5Cboldsymbol%20y-X%5ET%5Comega%29%5ET%28%5Cboldsymbol%20y-X%5ET%5Comega%29%0A)

  • 我们对RSS的表达式求梯度有:

三、线性回归Linear Regression - 图12%3D-2X(%5Cboldsymbol%20y-X%5ET%5Calpha)%3D0%0A#card=math&code=%5Cnabla%20J_n%28%5Calpha%29%3D-2X%28%5Cboldsymbol%20y-X%5ET%5Calpha%29%3D0%0A)

  • 因此可以得到使得RSS最小的线性回归的解是:

三、线性回归Linear Regression - 图13%5E%7B-1%7DXy%0A#card=math&code=%5Comega%3D%28XX%5ET%29%5E%7B-1%7DXy%0A)

  • 我们需要注意到:
    • 每一个样本x是三、线性回归Linear Regression - 图14维的向量,因此X是三、线性回归Linear Regression - 图15维的矩阵,而y也是三、线性回归Linear Regression - 图16维的向量
    • 所以当样本数n小于特征数d的时候,三、线性回归Linear Regression - 图17是不满秩的,求不出逆矩阵,此时的线性回归有多个解,其实这也很好理解
      • 因为要求解的是n+1维的向量,有n+1个变量,因此至少需要n+1个样本才能确定n+1个参数
    • 出现上述情况的时候所有可能的解都可以使得均方误差最小化
      • 此时可以考虑引入正则化项来筛选出需要的结果。
  1. def linear_regression(X, y):
  2. """
  3. LINEAR_REGRESSION Linear Regression.
  4. INPUT: X: training sample features, P-by-N matrix.
  5. y: training sample labels, 1-by-N row vector.
  6. OUTPUT: w: learned perceptron parameters, (P+1)-by-1 column vector.
  7. """
  8. P, N = X.shape
  9. w = np.zeros((P + 1, 1))
  10. # 原本的每个样本x是按列存放的,这里需要对矩阵进行转置并在开头加上一列1,成为N*(P+1)维的矩阵
  11. new_x = np.column_stack((np.ones((N, 1)), X.T))
  12. part1 = np.linalg.inv(np.matmul(new_x.T, new_x))
  13. part2 = np.matmul(new_x.T, y.T)
  14. w = np.matmul(part1, part2)
  15. # end answer
  16. return w

3.3线性回归的统计模型

  • 真实情况下的数据样本往往会有噪声,比如三、线性回归Linear Regression - 图18%2B%5Cepsilon#card=math&code=y%3Df%28%5Cboldsymbol%20x%2C%5Comega%29%2B%5Cepsilon)其中三、线性回归Linear Regression - 图19是一个随机噪声,服从三、线性回归Linear Regression - 图20#card=math&code=N%280%2C%5Csigma%5E2%29)的正态分布,此时可以通过极大似然法来估计三、线性回归Linear Regression - 图21,定义

三、线性回归Linear Regression - 图22%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%7D%5Csigma%7D%5Cexp%5B-%5Cfrac%7B1%7D%7B2%5Csigma%5E2%7D(y-f(%5Cboldsymbol%20x%2C%5Comega))%5E2%5D%0A#card=math&code=P%28y%7C%5Cboldsymbol%20x%2C%5Comega%2C%5Csigma%29%3D%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%7D%5Csigma%7D%5Cexp%5B-%5Cfrac%7B1%7D%7B2%5Csigma%5E2%7D%28y-f%28%5Cboldsymbol%20x%2C%5Comega%29%29%5E2%5D%0A)

根据极大似然法有

三、线性回归Linear Regression - 图23%3D%5Cprod%7Bi%3D1%7D%5Climits%5EnP(y_i%7C%5Cboldsymbol%20x_i%2C%5Comega%2C%5Csigma)%0A#card=math&code=L%28D%2C%5Comega%2C%5Csigma%29%3D%5Cprod%7Bi%3D1%7D%5Climits%5EnP%28y_i%7C%5Cboldsymbol%20x_i%2C%5Comega%2C%5Csigma%29%0A)

我们的求解目标变成了:

三、线性回归Linear Regression - 图24%3D%20%5Carg%20%5Cmax%5Cprod%7Bi%3D1%7D%5Climits%5EnP(y_i%7C%5Cboldsymbol%20x_i%2C%5Comega%2C%5Csigma)%0A#card=math&code=%5Comega%3D%5Carg%5Cmax%20L%28D%2C%5Comega%2C%5Csigma%29%3D%20%5Carg%20%5Cmax%5Cprod%7Bi%3D1%7D%5Climits%5EnP%28y_i%7C%5Cboldsymbol%20x_i%2C%5Comega%2C%5Csigma%29%0A)

取对数似然之后有

三、线性回归Linear Regression - 图25%3D-%5Cfrac%7B1%7D%7B2%5Csigma%5E2%7D%5Csum%7Bi%3D1%7D%5Climits%5E%7Bn%7D(y_i-f(x_i%2C%5Comega)%5E2)%2Bc(%5Csigma)%0A#card=math&code=l%28D%2C%5Comega%2C%5Csigma%29%3D-%5Cfrac%7B1%7D%7B2%5Csigma%5E2%7D%5Csum%7Bi%3D1%7D%5Climits%5E%7Bn%7D%28y_i-f%28x_i%2C%5Comega%29%5E2%29%2Bc%28%5Csigma%29%0A)

到这一步为止我们又回到了RSS,因此解的表达式依然和上面推出的是一样的。

3.4 岭回归 Ridge Regression

3.4.1 岭回归的推导

  • 我们发现普通的线性回归很容易出现overfitting的情况,比如一些系数三、线性回归Linear Regression - 图26的值非常极端,仿佛就是为了拟合数据集中的点而生的,
  • 对测试集中的数据表现就非常差。为了控制系数的size,让表达式看起来更像是阳间的求解结果,我们可以引入正则化的方法。

三、线性回归Linear Regression - 图27%5E2%2B%5Clambda%5Csum%7Bj%3D1%7D%5Climits%5E%7Bd%7Dw_j%5E2%0A#card=math&code=%5Comega%5E%7B%2A%7D%3D%5Carg%5Cmin%20%5Csum%7Bi%3D1%7D%5Climits%5E%7Bn%7D%28yi-w%5ETx_i%29%5E2%2B%5Clambda%5Csum%7Bj%3D1%7D%5Climits%5E%7Bd%7Dw_j%5E2%0A)

这实际上就是拉格朗日算子,则我们跟之前一样可以将其写成矩阵形式:

三、线性回归Linear Regression - 图28%5ET(%5Cboldsymbol%20y-X%5ET%5Comega)%2B%5Clambda%5Comega%5ET%5Comega%0A#card=math&code=%28%5Cboldsymbol%20y-X%5ET%5Comega%29%5ET%28%5Cboldsymbol%20y-X%5ET%5Comega%29%2B%5Clambda%5Comega%5ET%5Comega%0A)

对其求梯度可以得到

三、线性回归Linear Regression - 图29%3D-2X(%5Cboldsymbol%20y-X%5ET%5Calpha)%2B2%5Clambda%5Comega%3D0%0A#card=math&code=%5Cnabla%20J_n%28%5Calpha%29%3D-2X%28%5Cboldsymbol%20y-X%5ET%5Calpha%29%2B2%5Clambda%5Comega%3D0%0A)

则其最终的解就是:

三、线性回归Linear Regression - 图30%5E%7B-1%7DXy%0A#card=math&code=%5Comega%5E%7B%2A%7D%3D%28XX%5ET%2B%5Clambda%20I%29%5E%7B-1%7DXy%0A)

  • 这就是岭回归(Ridge Regression)的方法,其中参数三、线性回归Linear Regression - 图31是可以自己确定的,我们可以保证矩阵三、线性回归Linear Regression - 图32是满秩的矩阵。
    • 岭回归其实就是L2正则下的线性回归
    • 值得注意的是矩阵三、线性回归Linear Regression - 图33不完全是单位矩阵,这个矩阵的对角线上第0行是0,其他的都是1

3.4.2 岭回归的代码实现

  1. def ridge(X, y, lmbda):
  2. """
  3. RIDGE Ridge Regression.
  4. INPUT: X: training sample features, P-by-N matrix.
  5. y: training sample labels, 1-by-N row vector.
  6. lmbda: regularization parameter.
  7. OUTPUT: w: learned parameters, (P+1)-by-1 column vector.
  8. NOTE: You can use pinv() if the matrix is singular.
  9. """
  10. P, N = X.shape
  11. # 构造一个P+1维的单位矩阵
  12. I = np.identity(P + 1)
  13. I[0][0] = 0
  14. # 在数据集X的第一行添加一行1表示常数项
  15. new_x = np.concatenate((np.ones((1, N)), X), axis=0)
  16. part1 = np.matmul(new_x, new_x.T) + lmbda * I
  17. part2 = np.matmul(new_x, y.T)
  18. w = np.matmul(linalg.pinv(part1), part2)
  19. # end answer
  20. return w

3.5贝叶斯线性回归

  • 现在我们重新考虑实际样本中可能会出现的噪声,即三、线性回归Linear Regression - 图34%2B%5Cepsilon#card=math&code=y%3Df%28%5Cboldsymbol%20x%2C%5Comega%29%2B%5Cepsilon),其中三、线性回归Linear Regression - 图35服从三、线性回归Linear Regression - 图36#card=math&code=N%280%2C%5Csigma%5E2%29)的正态分布。根据贝叶斯定理可以得到:

三、线性回归Linear Regression - 图37%3D%5Cfrac%7BP(y%7C%5Comega%2Cx%2C%5Csigma)P(%5Comega%7Cx%2C%5Csigma)%7D%7BP(y%7Cx%2C%5Csigma)%7D%0A#card=math&code=P%28%5Comega%7Cy%2Cx%2C%5Csigma%29%3D%5Cfrac%7BP%28y%7C%5Comega%2Cx%2C%5Csigma%29P%28%5Comega%7Cx%2C%5Csigma%29%7D%7BP%28y%7Cx%2C%5Csigma%29%7D%0A)

  • 即posterior正比于prior和likelihood的乘积,即三、线性回归Linear Regression - 图38%5Cpropto%20%5Cln(likelihood)%5Ctimes%5Cln(prior)#card=math&code=%5Cln%28posterior%29%5Cpropto%20%5Cln%28likelihood%29%5Ctimes%5Cln%28prior%29),而在线性回归问题中,我们已经知道了likelihood就是

三、线性回归Linear Regression - 图39%3D-%5Cfrac%7B1%7D%7B2%5Csigma%5E2%7D%5Csum%7Bi%3D1%7D%5Climits%5E%7Bn%7D(y_i-f(x_i%2C%5Comega)%5E2)%2Bc(%5Csigma)%0A#card=math&code=l%28D%2C%5Comega%2C%5Csigma%29%3D-%5Cfrac%7B1%7D%7B2%5Csigma%5E2%7D%5Csum%7Bi%3D1%7D%5Climits%5E%7Bn%7D%28y_i-f%28x_i%2C%5Comega%29%5E2%29%2Bc%28%5Csigma%29%0A)

我们可以用如下方法选择:

三、线性回归Linear Regression - 图40%20%3DN(w%7C0%2C%5Clambda%5E%7B-1%7DI)%26%3D%5Cfrac%7B1%7D%7B(2%5Cpi)%5E%7B%5Cfrac%7Bd%7D%7B2%7D%7D%7C%5Clambda%5E%7B-1%7DI%7C%5E%7B%5Cfrac%7B1%7D%7B2%7D%7D%7D%5Cexp%5E%7B-%5Cfrac%7B1%7D%7B2%7D%5Comega%5ET(%5Clambda%5E%7B-1%7DI)%5Comega%7D%5C%5C%0A%5Cln(p(%5Comega))%20%26%3D-%5Cfrac%7B%5Clambda%7D%7B2%7D%5Comega%5ET%5Comega%2Bc%5Cend%7Baligned%7D#card=math&code=%5Cbegin%7Baligned%7D%0Ap%28%5Comega%29%20%3DN%28w%7C0%2C%5Clambda%5E%7B-1%7DI%29%26%3D%5Cfrac%7B1%7D%7B%282%5Cpi%29%5E%7B%5Cfrac%7Bd%7D%7B2%7D%7D%7C%5Clambda%5E%7B-1%7DI%7C%5E%7B%5Cfrac%7B1%7D%7B2%7D%7D%7D%5Cexp%5E%7B-%5Cfrac%7B1%7D%7B2%7D%5Comega%5ET%28%5Clambda%5E%7B-1%7DI%29%5Comega%7D%5C%5C%0A%5Cln%28p%28%5Comega%29%29%20%26%3D-%5Cfrac%7B%5Clambda%7D%7B2%7D%5Comega%5ET%5Comega%2Bc%5Cend%7Baligned%7D)

因此在贝叶斯理论下的岭回归模型需要优化的目标可以等价于:

三、线性回归Linear Regression - 图41%5E2)%2Bc(%5Csigma)-%5Cfrac%7B%5Clambda%7D%7B2%7D%5Comega%5ET%5Comega%2Bc%0A#card=math&code=-%5Cfrac%7B1%7D%7B2%5Csigma%5E2%7D%5Csum_%7Bi%3D1%7D%5Climits%5E%7Bn%7D%28y_i-f%28x_i%2C%5Comega%29%5E2%29%2Bc%28%5Csigma%29-%5Cfrac%7B%5Clambda%7D%7B2%7D%5Comega%5ET%5Comega%2Bc%0A)

3.6逻辑回归 Logistic Regression

3.6.1逻辑回归的基本概念

  • 逻辑回归往往选择通过一个sigmod函数将样本映射到某个区间上,以此来估计样本属于某种类别的概率从而达到分类的目的。我们经常选用的sigmod函数是:

三、线性回归Linear Regression - 图42%3D%5Cfrac%7B1%7D%7B1%2Be%5E%7B-z%7D%7D%0A#card=math&code=y%3D%5Csigma%28z%29%3D%5Cfrac%7B1%7D%7B1%2Be%5E%7B-z%7D%7D%0A)

  • 比如对于二分类问题,可以将样本映射到区间(-1,1)上,此时如果计算结果为正数则说明样本属于正例,反之就是反例,可以表示为:

三、线性回归Linear Regression - 图43%3D%5Csigma(%5Comega%5ETx_i)%3D%5Cfrac%7B1%7D%7B1%2Be%5E%7B-%5Comega%5ETx_i%7D%7D%5C%5C%0AP(y_i%3D-1%7Cx_i%2C%5Comega)%3D1-%5Csigma(%5Comega%5ETx_i)%3D%5Cfrac%7B1%7D%7B1%2Be%5E%7B%5Comega%5ETx_i%7D%7D%5Cend%7Baligned%7D#card=math&code=%5Cbegin%7Baligned%7D%0AP%28y_i%3D1%7Cx_i%2C%5Comega%29%3D%5Csigma%28%5Comega%5ETx_i%29%3D%5Cfrac%7B1%7D%7B1%2Be%5E%7B-%5Comega%5ETx_i%7D%7D%5C%5C%0AP%28y_i%3D-1%7Cx_i%2C%5Comega%29%3D1-%5Csigma%28%5Comega%5ETx_i%29%3D%5Cfrac%7B1%7D%7B1%2Be%5E%7B%5Comega%5ETx_i%7D%7D%5Cend%7Baligned%7D)

上面的两个式子也可以统一写为:

三、线性回归Linear Regression - 图44%3D%5Csigma(y_i%5Comega%5ETx_i)%3D%5Cfrac%7B1%7D%7B1%2Be%5E%7B-y_i%5Comega%5ETx_i%7D%7D%5Cend%7Baligned%7D#card=math&code=%5Cbegin%7Baligned%7D%0AP%28y_i%3D1%7Cx_i%2C%5Comega%29%3D%5Csigma%28y_i%5Comega%5ETx_i%29%3D%5Cfrac%7B1%7D%7B1%2Be%5E%7B-y_i%5Comega%5ETx_i%7D%7D%5Cend%7Baligned%7D)

3.6.2参数估计

  • 我们可以用极大似然法来估计参数三、线性回归Linear Regression - 图45,依然用D表示数据集,具体的过程如下所示:

三、线性回归Linear Regression - 图46%20%26%20%3D%5Cprod%7Bi%5Cin%20I%7D%5Csigma(y_i%5Comega%5ETx_i)%5C%5C%0Al(P(D))%20%26%20%3D%5Csum%7Bi%5Cin%20I%7D%5Cln(%5Csigma(yi%5Comega%5ETx_i))%3D-%5Csum%7Bi%5Cin%20I%7D%5Cln(1%2Be%5E%7Byi%5Comega%5ETx_i%7D)%0A%5Cend%7Baligned%7D#card=math&code=%5Cbegin%7Baligned%7D%0AP%28D%29%20%26%20%3D%5Cprod%7Bi%5Cin%20I%7D%5Csigma%28yi%5Comega%5ETx_i%29%5C%5C%0Al%28P%28D%29%29%20%26%20%3D%5Csum%7Bi%5Cin%20I%7D%5Cln%28%5Csigma%28yi%5Comega%5ETx_i%29%29%3D-%5Csum%7Bi%5Cin%20I%7D%5Cln%281%2Be%5E%7By_i%5Comega%5ETx_i%7D%29%0A%5Cend%7Baligned%7D)

因此我们可以将逻辑回归的极大似然法参数估计的loss函数定义成:

三、线性回归Linear Regression - 图47%3D%5Csum%7Bi%5Cin%20I%7D%20%5Cln(1%2Be%5E%7B-y_i%5Comega%5ETx_i%7D)%0A#card=math&code=E%28%5Comega%29%3D%5Csum%7Bi%5Cin%20I%7D%20%5Cln%281%2Be%5E%7B-y_i%5Comega%5ETx_i%7D%29%0A)

  • 对于一个二分类问题,我们如果用0和1而不是+1和-1来代表两种分类,那么上面的表达式又可以写为:
    • 我们要求的就是这个表达式的最小值

三、线性回归Linear Regression - 图48%26%3D%5Csum%7Bi%5Cin%20I%5Ccap%20y_i%3D1%7D%5Cln(1%2Be%5E%7B-%5Comega%5ETx_i%7D)%20%2B%20%5Csum%7Bi%5Cin%20I%5Ccap%20yi%3D0%7D%5Cln(1%2Be%5E%7B%5Comega%5ETx_i%7D)%5C%5C%0A%26%3D%5Csum%7Bi%5Cin%20I%5Ccap%20yi%3D1%7D%5Cln%20(e%5E%7B%5Comega%5ETx_i%7D)(1%2Be%5E%7B%5Comega%5ETx_i%7D)%20%2B%20%5Csum%7Bi%5Cin%20I%5Ccap%20yi%3D0%7D%5Cln%20(1%2Be%5E%7B-%5Comega%5ETx_i%7D)%5C%5C%0A%26%3D%5Csum%7Bi%5Cin%20I%7D%20%5Cln(1%2Be%5E%7B%5Comega%5ETxi%7D)-%5Csum%7Bi%5Cin%20I%20%5Ccap%20yi%3D1%7De%5E%7B%5Comega%5ETx_i%7D%20%5C%5C%0A%26%3D%5Csum%7Bi%5Cin%20I%7D%20(-yi%5Comega%5ETx_i%2B%5Cln(1%2Be%5E%7B%5Comega%5ETx_i%7D))%0A%5Cend%7Baligned%7D#card=math&code=%5Cbegin%7Baligned%7D%0AE%28%5Comega%29%26%3D%5Csum%7Bi%5Cin%20I%5Ccap%20yi%3D1%7D%5Cln%281%2Be%5E%7B-%5Comega%5ETx_i%7D%29%20%2B%20%5Csum%7Bi%5Cin%20I%5Ccap%20yi%3D0%7D%5Cln%281%2Be%5E%7B%5Comega%5ETx_i%7D%29%5C%5C%0A%26%3D%5Csum%7Bi%5Cin%20I%5Ccap%20yi%3D1%7D%5Cln%20%28e%5E%7B%5Comega%5ETx_i%7D%29%281%2Be%5E%7B%5Comega%5ETx_i%7D%29%20%2B%20%5Csum%7Bi%5Cin%20I%5Ccap%20yi%3D0%7D%5Cln%20%281%2Be%5E%7B-%5Comega%5ETx_i%7D%29%5C%5C%0A%26%3D%5Csum%7Bi%5Cin%20I%7D%20%5Cln%281%2Be%5E%7B%5Comega%5ETxi%7D%29-%5Csum%7Bi%5Cin%20I%20%5Ccap%20yi%3D1%7De%5E%7B%5Comega%5ETx_i%7D%20%5C%5C%0A%26%3D%5Csum%7Bi%5Cin%20I%7D%20%28-y_i%5Comega%5ETx_i%2B%5Cln%281%2Be%5E%7B%5Comega%5ETx_i%7D%29%29%0A%5Cend%7Baligned%7D)

  • 我们可以证明三、线性回归Linear Regression - 图49#card=math&code=E%28%5Comega%29)是一个关于三、线性回归Linear Regression - 图50的凸函数,根据凸函数的可加性,我们只需要证明三、线性回归Linear Regression - 图51#card=math&code=-y_i%5Comega%5ETx_i%2B%5Cln%281%2Be%5E%7B%5Comega%5ETx_i%7D%29)是关于三、线性回归Linear Regression - 图52的凸函数,我们令三、线性回归Linear Regression - 图53%3D-y_i%5Comega%5ETx_i%2B%5Cln(1%2Be%5E%7B%5Comega%5ETx_i%7D)#card=math&code=g%28%5Comega%29%3D-y_i%5Comega%5ETx_i%2B%5Cln%281%2Be%5E%7B%5Comega%5ETx_i%7D%29)则对其求一阶梯度可以得到:

三、线性回归Linear Regression - 图54%7D%7B%5Cpartial%5Comega%7D%20%3D%20-y_ix_i%2B%5Cfrac%7Bx_ie%5E%7B%5Comega%5ETx_i%7D%7D%7B1%2Be%5E%7B%5Comega%5ETx_i%7D%7D%0A#card=math&code=%5Cfrac%7B%5Cpartial%20g%28%5Comega%29%7D%7B%5Cpartial%5Comega%7D%20%3D%20-y_ix_i%2B%5Cfrac%7Bx_ie%5E%7B%5Comega%5ETx_i%7D%7D%7B1%2Be%5E%7B%5Comega%5ETx_i%7D%7D%0A)

能得到这个结果是因为我们有这样一个结论:

  • 对于一个n维的向量三、线性回归Linear Regression - 图55,我们有三、线性回归Linear Regression - 图56

进一步地,我们对上面求的一阶梯度再求二阶梯度,可以得到:

三、线性回归Linear Regression - 图57%7D%7B%5Cpartial%5Comega%5E2%7D%3D%5Cfrac%7Bx_i%5E2e%5E%7Bw%5ETx_i%7D%7D%7B(1%2Be%5E%7Bw%5ETx_i%7D)%5E2%7D%5Cgeq%200%0A#card=math&code=%5Cfrac%7B%5Cpartial%5E2g%28%5Comega%29%7D%7B%5Cpartial%5Comega%5E2%7D%3D%5Cfrac%7Bx_i%5E2e%5E%7Bw%5ETx_i%7D%7D%7B%281%2Be%5E%7Bw%5ETx_i%7D%29%5E2%7D%5Cgeq%200%0A)

因此我们证明了损失函数是一个凸函数,因此可以用随机梯度下降的方法求得其最优解,即:

三、线性回归Linear Regression - 图58%0A#card=math&code=%5Comega%5E%7B%2A%7D%3D%5Carg%5Cmin_%7B%5Comega%7D%20E%28%5Comega%29%0A)

  • 根据上面求得的一阶梯度,可以得到基于梯度下降法的逻辑回归参数求解迭代方程:

三、线性回归Linear Regression - 图59%5Csum%7Bi%5Cin%20I%7D(-y_ix_i%2B%5Cfrac%7Bx_ie%5E%7B%5Comega_i%5ETx_i%7D%7D%7B1%2Be%5E%7B%5Comega_i%5ETx_i%7D%7D)%5C%5C%0A%26%3D%5Comega_i-%5Ceta(i)%5Csum%20x_i(%5Cfrac%7B1%7D%7B1%2Be%5E%7B-%5Comega_i%5ETx_i%7D%7D-y_i)%5C%5C%0A%26%3D%5Comega_i-%5Ceta(i)X(%5Csigma(%5Comega_i%2C%20X)-y)%0A%5Cend%7Baligned%7D#card=math&code=%5Cbegin%7Baligned%7D%0A%5Comega%7Bi%2B1%7D%26%3D%5Comega%7Bi%7D%2B%5Ceta%28i%29%5Csum%7Bi%5Cin%20I%7D%28-y_ix_i%2B%5Cfrac%7Bx_ie%5E%7B%5Comega_i%5ETx_i%7D%7D%7B1%2Be%5E%7B%5Comega_i%5ETx_i%7D%7D%29%5C%5C%0A%26%3D%5Comega_i-%5Ceta%28i%29%5Csum%20x_i%28%5Cfrac%7B1%7D%7B1%2Be%5E%7B-%5Comega_i%5ETx_i%7D%7D-y_i%29%5C%5C%0A%26%3D%5Comega_i-%5Ceta%28i%29X%28%5Csigma%28%5Comega_i%2C%20X%29-y%29%0A%5Cend%7Baligned%7D)

其中三、线性回归Linear Regression - 图60#card=math&code=%5Csigma%28x%29)是sigmod函数,而三、线性回归Linear Regression - 图61#card=math&code=%5Ceta%28i%29)是自己选择的学习率,一半是一个比较小的数字,并且应该不断减小。

3.6.3 逻辑回归的代码实现

  1. def logistic(X, y):
  2. """
  3. LR Logistic Regression.
  4. INPUT: X: training sample features, P-by-N matrix.
  5. y: training sample labels, 1-by-N row vector.
  6. OUTPUT: w: learned parameters, (P+1)-by-1 column vector.
  7. """
  8. P, N = X.shape
  9. w = np.zeros((P + 1, 1))
  10. def sigmoid(theta, x):
  11. return 1.0 / (1 + np.exp(-np.squeeze(np.matmul(theta.T, x))))
  12. X = np.concatenate((np.ones((1, N)), X), axis=0)
  13. # 原本的y是以+1和-1作为label的,这里将其转化成了1和0
  14. y = np.array(y == 1, dtype=np.float).reshape(N)
  15. step = 0
  16. max_step = 100 # 最大的学习步长
  17. learning_rate = 0.99 # 学习率,也就是推导中的eta函数
  18. while step < max_step:
  19. # 计算梯度
  20. grad = np.matmul(X, (sigmoid(w, X) - y).reshape((N, 1)))
  21. learning_rate *= 0.99
  22. w = w - learning_rate * grad
  23. step += 1
  24. return w

3.6.3 逻辑回归的正则化方法

  • 可以在逻辑回归中的优化目标中加入正则项变成下面的形式:

三、线性回归Linear Regression - 图62%3D%5Csum%7Bi%5Cin%20I%7D%20(-y_i%5Comega%5ETx_i%2B%5Cln(1%2Be%5E%7B%5Comega%5ETx_i%7D))%2B%5Cfrac%20%5Clambda%202w%5E2%0A#card=math&code=E%28%5Comega%29%3D%5Csum%7Bi%5Cin%20I%7D%20%28-y_i%5Comega%5ETx_i%2B%5Cln%281%2Be%5E%7B%5Comega%5ETx_i%7D%29%29%2B%5Cfrac%20%5Clambda%202w%5E2%0A)

  • 则其梯度就是:

三、线性回归Linear Regression - 图63%7D%7B%5Comega%7D%26%3D%5Csum%7Bi%5Cin%20D%7D(-y_ix_i%2B%5Cfrac%7Bx_ie%5E%7B%5Comega_i%5ETx_i%7D%7D%7B1%2Be%5E%7B%5Comega_i%5ETx_i%7D%7D)%2B%5Clambda%20w_i%5C%5C%26%3D%5Csum%20x_i(%5Cfrac%7B1%7D%7B1%2Be%5E%7B-%5Comega_i%5ETx_i%7D%7D-y_i)%2B%5Clambda%20w_i%5C%5C%26%3DX(%5Csigma(%5Comega_i%2C%20X)-y)%2B%5Clambda%20w_i%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%5Cfrac%7B%5Cpartial%20E%28%5Comega%29%7D%7B%5Comega%7D%26%3D%5Csum%7Bi%5Cin%20D%7D%28-y_ix_i%2B%5Cfrac%7Bx_ie%5E%7B%5Comega_i%5ETx_i%7D%7D%7B1%2Be%5E%7B%5Comega_i%5ETx_i%7D%7D%29%2B%5Clambda%20w_i%5C%5C%26%3D%5Csum%20x_i%28%5Cfrac%7B1%7D%7B1%2Be%5E%7B-%5Comega_i%5ETx_i%7D%7D-y_i%29%2B%5Clambda%20w_i%5C%5C%26%3DX%28%5Csigma%28%5Comega_i%2C%20X%29-y%29%2B%5Clambda%20w_i%5Cend%7Baligned%7D%0A)

  • 因此每次的迭代过程就变成了:

三、线性回归Linear Regression - 图64%5Cleft(X(%5Csigma(%5Comegai%2C%20X)-y)%2B%5Clambda%20w_i%5Cright)%0A#card=math&code=w%7Bi%2B1%7D%3Dw_i-%5Ceta%28i%29%5Cleft%28X%28%5Csigma%28%5Comega_i%2C%20X%29-y%29%2B%5Clambda%20w_i%5Cright%29%0A)

  1. def logistic_r(X, y, lmbda):
  2. """
  3. LR Logistic Regression.
  4. INPUT: X: training sample features, P-by-N matrix.
  5. y: training sample labels, 1-by-N row vector.
  6. lambda: regularization parameter.
  7. OUTPUT: w : learned parameters, (P+1)-by-1 column vector.
  8. """
  9. P, N = X.shape
  10. w = np.zeros((P + 1, 1))
  11. # YOUR CODE HERE
  12. # begin answer
  13. def sigmoid(theta, x):
  14. return 1.0 / (1 + np.exp(-np.squeeze(np.matmul(theta.T, x))))
  15. X = np.concatenate((np.ones((1, N)), X), axis=0)
  16. y = np.array(y == 1, dtype=np.float).reshape(N)
  17. step = 0
  18. max_step = 100
  19. learning_rate = 0.008
  20. while step < max_step:
  21. regular_item = w * lmbda
  22. regular_item[0] = 0
  23. grad = np.matmul(X, (sigmoid(w, X) - y).reshape((N, 1))) + regular_item
  24. learning_rate *= 0.95
  25. w = w - learning_rate * grad
  26. step += 1
  27. # end answer
  28. return w