Adaptive Boosting

Adaboost有很多种推导方式,这里讲解最易于理解的加法模型,即最终的结合模块采用加权T个基分类器的方式输出。
AdaBoost只能用于2分类任务,用于多分类和回归需要对算法进行修改

AdaBoost - 图1%3D%5Csum%7Bt%3D1%7D%5E%7BT%7D%20%5Calpha(t)h_t(x)%0A#card=math&code=H%28x%29%3D%5Csum%7Bt%3D1%7D%5E%7BT%7D%20%5Calpha%28t%29h_t%28x%29%0A&height=53&width=155)

其中,AdaBoost - 图2#card=math&code=H%28x%29&height=20&width=37)表示最终的预测结果,AdaBoost - 图3#card=math&code=%5Calpha%28t%29&height=20&width=29)表示每个基分类器的权重,AdaBoost - 图4#card=math&code=h_t%28x%29&height=20&width=38)表示基分类器的预测结果。

算法过程

输入: 训练集D={(x1,y1), (x2,y2),…,(xm,ym)};基学习器:G;训练轮数T。
过程:

  1. 初始化时,每个样本的权重一致,因为样本总数为m个,所以,每个样本的权重是AdaBoost - 图5

AdaBoost - 图6%3D%5Cfrac%7B1%7D%7Bm%7D%0A#card=math&code=D_1%28x%29%3D%5Cfrac%7B1%7D%7Bm%7D%0A&height=37&width=87)

  1. for t = 1,2,…,T do ,总共T轮迭代,也就是依次训练T个基分类器,每个训练过程如下:
    1. 基于分布AdaBoost - 图7,从训练数据集D中训练出分类器AdaBoost - 图8

AdaBoost - 图9%0A#card=math&code=h_t%3DG%28D%2CD_t%29%0A&height=20&width=105)

  1. 计算AdaBoost - 图10的误差,AdaBoost - 图11

AdaBoost - 图12%5Cneq%20f(x))%0A#card=math&code=et%3DP%7Bx%20%5Csim%20D_t%7D%28h_t%28x%29%5Cneq%20f%28x%29%29%0A&height=20&width=185)

  1. 如果误差AdaBoost - 图13,那么就进行下一轮迭代
  2. 确定第t个基分类器AdaBoost - 图14的权重,可以看到,基分类器的误差AdaBoost - 图15越大,那么权重就越小

AdaBoost - 图16%3D%5Cfrac%7B1%7D%7B2%7D%20ln(%5Cfrac%7B1-e_t%7D%7Be_t%7D)%0A#card=math&code=%5Calpha%28t%29%3D%5Cfrac%7B1%7D%7B2%7D%20ln%28%5Cfrac%7B1-e_t%7D%7Be_t%7D%29%0A&height=40&width=144)

  1. 更新样本分布,AdaBoost - 图17是规范化因子

AdaBoost - 图18%26%3D%5Cfrac%7BDt(x)%7D%7BZ_t%7D%20%5Ctimes%20%0A%5Cbegin%7Bcases%7D%0A%20%20%20%20e%5E%7B-%5Calpha(t)%7D%20%26%5Ctext%7Bif%20%7D%20h(x)%3Df(x)%20%5C%5C%0A%20%20%20%20e%5E%7B%5Calpha(t)%7D%20%26%5Ctext%7Bif%20%7D%20h(x)%20%5Cneq%20f(x)%0A%5Cend%7Bcases%7D%20%5C%5C%0A%0A%26%3D%20%5Cfrac%7BD_t(x)e%5E%7B-%5Calpha%20f(x)h_t(x)%7D%7D%7BZ_t%7D%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0AD%7Bt%2B1%7D%28x%29%26%3D%5Cfrac%7BD_t%28x%29%7D%7BZ_t%7D%20%5Ctimes%20%0A%5Cbegin%7Bcases%7D%0A%20%20%20%20e%5E%7B-%5Calpha%28t%29%7D%20%26%5Ctext%7Bif%20%7D%20h%28x%29%3Df%28x%29%20%5C%5C%0A%20%20%20%20e%5E%7B%5Calpha%28t%29%7D%20%26%5Ctext%7Bif%20%7D%20h%28x%29%20%5Cneq%20f%28x%29%0A%5Cend%7Bcases%7D%20%5C%5C%0A%0A%26%3D%20%5Cfrac%7BD_t%28x%29e%5E%7B-%5Calpha%20f%28x%29h_t%28x%29%7D%7D%7BZ_t%7D%0A%5Cend%7Baligned%7D%0A&height=93&width=330)

输出:

AdaBoost - 图19%3Dsign%20%5Cbigg(%5Csum%7Bt%3D1%7D%5E%7BT%7D%20%5Calpha_t%20h_t(x)%20%5Cbigg)%0A#card=math&code=H%28X%29%3Dsign%20%5Cbigg%28%5Csum%7Bt%3D1%7D%5E%7BT%7D%20%5Calpha_t%20h_t%28x%29%20%5Cbigg%29%0A&height=53&width=206)

AdaBoost算法的理解

adaboost为什么是指数损失函数?

原因可能有2个:

  1. 良好的可计算性
  2. 更新权重分布时公式简单

为什么我们要推导最小化损失函数,并证明后面的两个公式?也就是为什么我们的每一步的目的都是为了最小化损失函数就是对的了?

这个不知道我当年为啥有这问题,还是回答一下。损失函数本身就是用来衡量模型的好坏程度的指标,因此,最小化损失函数,就是在寻找最优的模型。

最小化损失函数的推导

先来看指数损失函数,意思是关于D()的一个期望值,至于为什么要采用这个指数损失函数,算是个问题

AdaBoost - 图20%3DEx%20%5Csim%20D%5Be%5E%7B-f(x)H(x)%7D%5D%0A#card=math&code=l%7Bexp%7D%28H%7CD%29%3DE_x%20%5Csim%20D%5Be%5E%7B-f%28x%29H%28x%29%7D%5D%0A&height=25&width=228)

因为AdaBoost的输出空间为{-1, +1},因此,上式就可以写为:

AdaBoost - 图21%7D%26%3DP(f(x)%3D1%20%5Cmid%20x)%20%5Ccdot%20e%5E%7B-(f(x)%3D1)H(x)%7D%20%2B%20P(f(x%3D-1)%20%5Cmid%20x)%20%5Ccdot%20e%5E%7B-(f(x)%3D-1)H(x)%7D%20%5C%5C%0A%26%3DP(f(x)%3D1%20%5Cmid%20x)%20%5Ccdot%20e%5E%7B-1%20%5Ccdot%20H(x)%7D%20%2B%20P(f(x)%3D-1%20%5Cmid%20x)%20%5Ccdot%20e%5E%7B-(-1)%20%5Ccdot%20H(x)%7D%20%5C%5C%0A%26%3DP(f(x)%3D1%20%5Cmid%20x)%20%5Ccdot%20e%5E%7B-H(x)%7D%20%2B%20P(f(x)%3D-1%20%5Cmid%20x)%20%5Ccdot%20e%5E%7BH(x)%7D%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0Al_%7Bexp%7D%7B%28H%20%5Cmid%20D%29%7D%26%3DP%28f%28x%29%3D1%20%5Cmid%20x%29%20%5Ccdot%20e%5E%7B-%28f%28x%29%3D1%29H%28x%29%7D%20%2B%20P%28f%28x%3D-1%29%20%5Cmid%20x%29%20%5Ccdot%20e%5E%7B-%28f%28x%29%3D-1%29H%28x%29%7D%20%5C%5C%0A%26%3DP%28f%28x%29%3D1%20%5Cmid%20x%29%20%5Ccdot%20e%5E%7B-1%20%5Ccdot%20H%28x%29%7D%20%2B%20P%28f%28x%29%3D-1%20%5Cmid%20x%29%20%5Ccdot%20e%5E%7B-%28-1%29%20%5Ccdot%20H%28x%29%7D%20%5C%5C%0A%26%3DP%28f%28x%29%3D1%20%5Cmid%20x%29%20%5Ccdot%20e%5E%7B-H%28x%29%7D%20%2B%20P%28f%28x%29%3D-1%20%5Cmid%20x%29%20%5Ccdot%20e%5E%7BH%28x%29%7D%0A%5Cend%7Baligned%7D%0A&height=74&width=585)

如果H(x)能令AdaBoost - 图22%7D#card=math&code=l_%7Bexp%7D%7B%28H%20%5Cmid%20D%29%7D&height=21&width=81)最小化,则考虑对上式求偏导:
怎么理解求最小值就是偏导=0?

AdaBoost - 图23%7D%7D%7D%7B%5Cpartial%20H(x)%7D%26%3D%5Cfrac%7B%5Cpartial%20(%7BP(f(x)%3D1%20%5Cmid%20x)%20%5Ccdot%20e%5E%7B-H(x)%7D%20%2B%20P(f(x)%3D-1%20%5Cmid%20x)%20%5Ccdot%20e%5E%7BH(x)%7D%7D)%7D%20%7B%5Cpartial%20H(x)%7D%20%5C%5C%0A%26%3D-P(f(x)%3D1%20%5Cmid%20x)%20%5Ccdot%20e%5E%7B-H(x)%7D%20%2B%20P(f(x)%3D-1%20%5Cmid%20x)%20%5Ccdot%20e%5E%7BH(x)%7D%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0A%5Cfrac%7B%5Cpartial%7Bl_%7Bexp%7D%7B%28H%20%5Cmid%20D%29%7D%7D%7D%7B%5Cpartial%20H%28x%29%7D%26%3D%5Cfrac%7B%5Cpartial%20%28%7BP%28f%28x%29%3D1%20%5Cmid%20x%29%20%5Ccdot%20e%5E%7B-H%28x%29%7D%20%2B%20P%28f%28x%29%3D-1%20%5Cmid%20x%29%20%5Ccdot%20e%5E%7BH%28x%29%7D%7D%29%7D%20%7B%5Cpartial%20H%28x%29%7D%20%5C%5C%0A%26%3D-P%28f%28x%29%3D1%20%5Cmid%20x%29%20%5Ccdot%20e%5E%7B-H%28x%29%7D%20%2B%20P%28f%28x%29%3D-1%20%5Cmid%20x%29%20%5Ccdot%20e%5E%7BH%28x%29%7D%0A%5Cend%7Baligned%7D%0A&height=74&width=517)

令上式=0,也就是AdaBoost - 图24%7D#card=math&code=l_%7Bexp%7D%7B%28H%20%5Cmid%20D%29%7D&height=21&width=81)最小化,可以解得:

AdaBoost - 图25%20%3D%20%7B1%20%5Cover%202%7D%20ln%5Cfrac%7BP(f(x)%3D1%20%5Cmid%20x)%7D%7BP(f(x)%3D-1%20%5Cmid%20x)%7D%20%0A#card=math&code=H%28x%29%20%3D%20%7B1%20%5Cover%202%7D%20ln%5Cfrac%7BP%28f%28x%29%3D1%20%5Cmid%20x%29%7D%7BP%28f%28x%29%3D-1%20%5Cmid%20x%29%7D%20%0A&height=47&width=221)

因此有(这里忽略了等于的情况):

AdaBoost - 图26)%26%3Dsign%5Cbigg(%20%7B1%20%5Cover%202%7D%20ln%5Cfrac%7BP(f(x)%3D1%20%5Cmid%20x)%7D%7BP(f(x)%3D-1%20%5Cmid%20x)%7D%20%20%20%5Cbigg)%20%5C%5C%0A%26%3D%5Cbegin%7Bcases%7D%0A%20%20%201%20%26%5Ctext%7Bif%20%7D%20P(f(x)%3D1%20%5Cmid%20x)%20%3E%20P(f(x)%3D-1%20%5Cmid%20x)%20%5C%5C%0A%20%20%20-1%20%26%5Ctext%7Bif%20%7D%20P(f(x)%3D1%20%5Cmid%20x)%20%3C%20P(f(x)%3D-1%20%5Cmid%20x)%0A%5Cend%7Bcases%7D%20%5C%5C%0A%26%3D%20%5Carg%5Cmax%7By%20%5Cin%20%7B%5C%7B1%2C%20-1%5C%7D%7D%7D%20%7BP(f(x)%3Dy%20%5Cmid%20x)%7D%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0Asign%28H%28x%29%29%26%3Dsign%5Cbigg%28%20%7B1%20%5Cover%202%7D%20ln%5Cfrac%7BP%28f%28x%29%3D1%20%5Cmid%20x%29%7D%7BP%28f%28x%29%3D-1%20%5Cmid%20x%29%7D%20%20%20%5Cbigg%29%20%5C%5C%0A%26%3D%5Cbegin%7Bcases%7D%0A%20%20%201%20%26%5Ctext%7Bif%20%7D%20P%28f%28x%29%3D1%20%5Cmid%20x%29%20%3E%20P%28f%28x%29%3D-1%20%5Cmid%20x%29%20%5C%5C%0A%20%20%20-1%20%26%5Ctext%7Bif%20%7D%20P%28f%28x%29%3D1%20%5Cmid%20x%29%20%3C%20P%28f%28x%29%3D-1%20%5Cmid%20x%29%0A%5Cend%7Bcases%7D%20%5C%5C%0A%26%3D%20%5Carg%5Cmax%7By%20%5Cin%20%7B%5C%7B1%2C%20-1%5C%7D%7D%7D%20%7BP%28f%28x%29%3Dy%20%5Cmid%20x%29%7D%0A%5Cend%7Baligned%7D%0A&height=127&width=440)

这意味着sign(H(x))达到了贝叶斯最优错误率,换言之,若指数函数损失最小化,即AdaBoost - 图27%20%3D%20%7B1%20%5Cover%202%7D%20ln%5Cfrac%7BP(f(x)%3D1%20%5Cmid%20x)%7D%7BP(f(x)%3D-1%20%5Cmid%20x)%7D#card=math&code=H%28x%29%20%3D%20%7B1%20%5Cover%202%7D%20ln%5Cfrac%7BP%28f%28x%29%3D1%20%5Cmid%20x%29%7D%7BP%28f%28x%29%3D-1%20%5Cmid%20x%29%7D&height=47&width=221),此时sign(H(x))分类是一定正确的,也就是错误率为0。

AdaBoost算法中的第4步,为什么要更新AdaBoost - 图28的权重为AdaBoost - 图29%3D%5Cfrac%7B1%7D%7B2%7D%20ln(%5Cfrac%7B1-e_t%7D%7Be_t%7D)#card=math&code=%5Calpha%28t%29%3D%5Cfrac%7B1%7D%7B2%7D%20ln%28%5Cfrac%7B1-e_t%7D%7Be_t%7D%29&height=40&width=144)?
在AdaBoost算法中,第一个基分类器AdaBoost - 图30是通过直接将基学习算法用于初始数据分布而得,此后迭代地生成AdaBoost - 图31AdaBoost - 图32,当基分类器AdaBoost - 图33基于分布AdaBoost - 图34产生后,该基分类器的权重AdaBoost - 图35应当使AdaBoost - 图36最小化指数损失函数:

AdaBoost - 图37%26%3DE%7Bx%20%5Csim%20D_t%7D%5Be%7B-f(x)%20%5Calpha_t%20h_t(x)%7D%5D%20%5C%5C%0A%26%3DE%7Bx%20%5Csim%20Dt%7D%5Be%5E%7B-%5Calpha_t%7DI(f(x)%3Dh_t(x))%20%2B%20e%5E%7B%5Calpha_t%7DI(f(x)%20%5Cneq%20h_t(x))%5D%20%5C%5C%0A%26%3De%5E%7B-%5Calpha_t%7DP%7Bx%20%5Csim%20Dt%7D(f(x)%3Dh_t(x))%20%2B%20e%5E%7B%5Calpha_t%7DP%7Bx%20%5Csim%20Dt%7D(f(x)%20%5Cneq%20h_t(x))%20%5C%5C%0A%26%3D%20e%5E%7B-%5Calpha_t%7D(1-%5Cepsilon_t)%2Be%5E%7B-%5Calpha_t%7D%5Cepsilon_t%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0Al%7Bexp%7D%28%5Calphat%20h_t%20%5Cmid%20D_t%29%26%3DE%7Bx%20%5Csim%20Dt%7D%5Be%7B-f%28x%29%20%5Calpha_t%20h_t%28x%29%7D%5D%20%5C%5C%0A%26%3DE%7Bx%20%5Csim%20Dt%7D%5Be%5E%7B-%5Calpha_t%7DI%28f%28x%29%3Dh_t%28x%29%29%20%2B%20e%5E%7B%5Calpha_t%7DI%28f%28x%29%20%5Cneq%20h_t%28x%29%29%5D%20%5C%5C%0A%26%3De%5E%7B-%5Calpha_t%7DP%7Bx%20%5Csim%20Dt%7D%28f%28x%29%3Dh_t%28x%29%29%20%2B%20e%5E%7B%5Calpha_t%7DP%7Bx%20%5Csim%20D_t%7D%28f%28x%29%20%5Cneq%20h_t%28x%29%29%20%5C%5C%0A%26%3D%20e%5E%7B-%5Calpha_t%7D%281-%5Cepsilon_t%29%2Be%5E%7B-%5Calpha_t%7D%5Cepsilon_t%0A%5Cend%7Baligned%7D%0A&height=93&width=503)

其中AdaBoost - 图38%20%5Cneq%20ht(x))#card=math&code=%5Cepsilon_t%3DP%7Bx%20%5Csim%20D_t%7D%28f%28x%29%20%5Cneq%20h_t%28x%29%29&height=20&width=184)。考虑损失函数的导数:

AdaBoost - 图39%7D%7D%7B%5Cpartial%7B%5Calphat%7D%7D%3D-e%5E%7B-%5Calpha_t%7D(1-%5Cepsilon_t)%2Be%5E%7B%5Calpha_t%7D%5Cepsilon_t%0A#card=math&code=%5Cfrac%7B%5Cpartial%7Bl%7Bexp%7D%28%5Calpha_t%20h_t%20%5Cmid%20D_t%29%7D%7D%7B%5Cpartial%7B%5Calpha_t%7D%7D%3D-e%5E%7B-%5Calpha_t%7D%281-%5Cepsilon_t%29%2Be%5E%7B%5Calpha_t%7D%5Cepsilon_t%0A&height=45&width=296)

令上式为0,得:

AdaBoost - 图40%0A#card=math&code=%5Calpha_t%20%3D%20%7B1%20%5Cover%202%7Dln%5Cbigg%28%20%5Cfrac%7B1-%5Cepsilon_t%7D%7B%5Cepsilon_t%7D%20%5Cbigg%29%0A&height=45&width=141)

由此,得到AdaBoost算法的第4步的权重,也就是说,按照这个权重来更新当前的基分类器,是可以使得损失函数最小化的。

AdaBoost算法中的第5步,为什么会按照那个公式来更新数据的分布呢?
AdaBoost算法在获得AdaBoost - 图41之后,样本分布将进行调整,使下一轮的基学习器AdaBoost - 图42能纠正AdaBoost - 图43的一些错误。理想的AdaBoost - 图44应该能纠正AdaBoost - 图45的所有错误,即最小化:

AdaBoost - 图46%26%3DE%7Bx%20%5Csim%20D%7D%5Be%5E%7B-f(x)(H%7Bt-1%7D(x)%2Bht(x))%7D%5D%20%5C%5C%0A%26%3DE%7Bx%20%5Csim%20D%7D%5Be%5E%7B-f(x)H%7Bt-1%7D(x)%7D%20%5Ccdot%20e%5E%7B-f(x)h_t(x)%7D%5D%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0Al%7Bexp%7D%28H%7Bt-1%7D%20%2B%20h_t%20%5Cmid%20D_t%29%26%3DE%7Bx%20%5Csim%20D%7D%5Be%5E%7B-f%28x%29%28H%7Bt-1%7D%28x%29%2Bh_t%28x%29%29%7D%5D%20%5C%5C%0A%26%3DE%7Bx%20%5Csim%20D%7D%5Be%5E%7B-f%28x%29H_%7Bt-1%7D%28x%29%7D%20%5Ccdot%20e%5E%7B-f%28x%29h_t%28x%29%7D%5D%0A%5Cend%7Baligned%7D%0A&height=49&width=386)

注意到AdaBoost - 图47%3Dh_t%5E2(x)%3D1#card=math&code=f%5E2%28x%29%3Dh_t%5E2%28x%29%3D1&height=24&width=133),上式可以用AdaBoost - 图48h_t(x)%7D#card=math&code=e%5E%7B-f%28x%29h_t%28x%29%7D&height=20&width=69)的泰勒展开式近似为:

AdaBoost - 图49%26%5Cbacksimeq%20E%7Bx%20%5Csim%20D%7D%20%5Cbigg%5Be%5E%7B-f(x)H%7Bt-1%7D(x)%7D%20%5Cbigg(%201-f(x)ht(x)%2B%5Cfrac%7Bf%5E2(x)h_t%5E2(x)%7D%7B2%7D%20%5Cbigg)%20%5Cbigg%5D%20%5C%5C%0A%26%3DE%7Bx%20%5Csim%20D%7D%20%5Cbigg%5Be%5E%7B-f(x)H%7Bt-1%7D(x)%7D%20%5Cbigg(%201-f(x)h_t(x)%2B%5Cfrac%7B1%7D%7B2%7D%20%5Cbigg)%20%5Cbigg%5D%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0Al%7Bexp%7D%28H%7Bt-1%7D%20%2B%20h_t%20%5Cmid%20D_t%29%26%5Cbacksimeq%20E%7Bx%20%5Csim%20D%7D%20%5Cbigg%5Be%5E%7B-f%28x%29H%7Bt-1%7D%28x%29%7D%20%5Cbigg%28%201-f%28x%29h_t%28x%29%2B%5Cfrac%7Bf%5E2%28x%29h_t%5E2%28x%29%7D%7B2%7D%20%5Cbigg%29%20%5Cbigg%5D%20%5C%5C%0A%26%3DE%7Bx%20%5Csim%20D%7D%20%5Cbigg%5Be%5E%7B-f%28x%29H_%7Bt-1%7D%28x%29%7D%20%5Cbigg%28%201-f%28x%29h_t%28x%29%2B%5Cfrac%7B1%7D%7B2%7D%20%5Cbigg%29%20%5Cbigg%5D%0A%5Cend%7Baligned%7D%0A&height=93&width=544)

于是,理想的基学习器:

AdaBoost - 图50%26%3D%5Carg%5Cminh%20l%7Bexp%7D(H%7Bt-1%7D%2Bh%20%5Cmid%20D)%20%5C%5C%0A%26%3D%5Carg%5Cmin_h%20E%7Bx%20%5Csim%20D%7D%20%5Cbigg%5Be%5E%7B-f(x)H%7Bt-1%7D(x)%7D%20%5Cbigg(%201-f(x)h_t(x)%2B%5Cfrac%7B1%7D%7B2%7D%20%5Cbigg)%20%5Cbigg%5D%20%5C%5C%0A%26%3D%5Carg%5Cmin_h%20E%7Bx%20%5Csim%20D%7D%20%5Cbigg%5Be%5E%7B-f(x)H%7Bt-1%7D(x)%7D%20f(x)h_t(x)%20%20%5Cbigg%5D%20%5C%5C%0A%26%3D%5Carg%5Cmin_h%20E%7Bx%20%5Csim%20D%7D%20%5Cbigg%5B%5Cfrac%7Be%5E%7B-f(x)H%7Bt-1%7D(x)%7D%7D%7BE%7Bx%20%5Csim%20D%7D%5Be%5E%7B-f(x)H%7Bt-1%7D(x)%7D%5D%7D%20f(x)h_t(x)%20%20%5Cbigg%5D%20%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0Ah_t%28x%29%26%3D%5Carg%5Cmin_h%20l%7Bexp%7D%28H%7Bt-1%7D%2Bh%20%5Cmid%20D%29%20%5C%5C%0A%26%3D%5Carg%5Cmin_h%20E%7Bx%20%5Csim%20D%7D%20%5Cbigg%5Be%5E%7B-f%28x%29H%7Bt-1%7D%28x%29%7D%20%5Cbigg%28%201-f%28x%29h_t%28x%29%2B%5Cfrac%7B1%7D%7B2%7D%20%5Cbigg%29%20%5Cbigg%5D%20%5C%5C%0A%26%3D%5Carg%5Cmin_h%20E%7Bx%20%5Csim%20D%7D%20%5Cbigg%5Be%5E%7B-f%28x%29H%7Bt-1%7D%28x%29%7D%20f%28x%29h_t%28x%29%20%20%5Cbigg%5D%20%5C%5C%0A%26%3D%5Carg%5Cmin_h%20E%7Bx%20%5Csim%20D%7D%20%5Cbigg%5B%5Cfrac%7Be%5E%7B-f%28x%29H%7Bt-1%7D%28x%29%7D%7D%7BE%7Bx%20%5Csim%20D%7D%5Be%5E%7B-f%28x%29H_%7Bt-1%7D%28x%29%7D%5D%7D%20f%28x%29h_t%28x%29%20%20%5Cbigg%5D%20%0A%5Cend%7Baligned%7D%0A&height=170&width=424)

注意到AdaBoost - 图51H%7Bt-1%7D(x)%7D%5D#card=math&code=E%7Bx%20%5Csim%20D%7D%5Be%5E%7B-f%28x%29H_%7Bt-1%7D%28x%29%7D%5D&height=24&width=133)是一个常数。令AdaBoost - 图52表示一个分布:

AdaBoost - 图53%3D%5Cfrac%7BD(x)e%5E%7B-f(x)H%7Bt-1%7D(x)%7D%7D%7BE%7Bx%20%5Csim%20D%7D%5Be%5E%7B-f(x)H%7Bt-1%7D(x)%7D%5D%7D%0A#card=math&code=D_t%28x%29%3D%5Cfrac%7BD%28x%29e%5E%7B-f%28x%29H%7Bt-1%7D%28x%29%7D%7D%7BE%7Bx%20%5Csim%20D%7D%5Be%5E%7B-f%28x%29H%7Bt-1%7D%28x%29%7D%5D%7D%0A&height=50&width=205)

根据数学期望的定义,这等价于令:

AdaBoost - 图54%26%3D%5Carg%5Cmaxh%20E%7Bx%20%5Csim%20D%7D%20%5Cbigg%5B%20%5Cfrac%7Be%5E%7B-f(x)H%7Bt-1%7D(x)%7D%7D%7BE%7Bx%20%5Csim%20D%7D%5Be%5E%7B-f(x)H%7Bt-1%7D(x)%7D%5D%7D%20f(x)h(x)%20%5Cbigg%5D%20%5C%5C%0A%26%3D%5Carg%5Cmax_h%20E%7Bx%20%5Csim%20D%7D%20%5Bf(x)h(x)%5D%0A%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0Aht%28x%29%26%3D%5Carg%5Cmax_h%20E%7Bx%20%5Csim%20D%7D%20%5Cbigg%5B%20%5Cfrac%7Be%5E%7B-f%28x%29H%7Bt-1%7D%28x%29%7D%7D%7BE%7Bx%20%5Csim%20D%7D%5Be%5E%7B-f%28x%29H%7Bt-1%7D%28x%29%7D%5D%7D%20f%28x%29h%28x%29%20%5Cbigg%5D%20%5C%5C%0A%26%3D%5Carg%5Cmax_h%20E%7Bx%20%5Csim%20D%7D%20%5Bf%28x%29h%28x%29%5D%0A%0A%5Cend%7Baligned%7D%0A&height=81&width=388)

AdaBoost - 图55%2Ch(x)%20%5Cin%20%7B-1%2C%20%2B1%7D#card=math&code=f%28x%29%2Ch%28x%29%20%5Cin%20%7B-1%2C%20%2B1%7D&height=20&width=143),有:

AdaBoost - 图56h(x)%3D1-2I(f(x)%20%5Cneq%20h(x))%0A#card=math&code=f%28x%29h%28x%29%3D1-2I%28f%28x%29%20%5Cneq%20h%28x%29%29%0A&height=20&width=232)

则理想的基学习器:

AdaBoost - 图57%3D%5Carg%5Cminh%20E%7Bx%20%5Csim%20D%7D%20%5BI(f(x)%20%5Cneq%20h(x))%5D%0A#card=math&code=h%28x%29%3D%5Carg%5Cminh%20E%7Bx%20%5Csim%20D%7D%20%5BI%28f%28x%29%20%5Cneq%20h%28x%29%29%5D%0A&height=28&width=269)

由此可见,理想的AdaBoost - 图58将在分布AdaBoost - 图59下最下化分类误差。因此,弱分类器将基于分布AdaBoost - 图60来训练,且针对AdaBoost - 图61的分类误差应小于0.5。这在一定程度上类似”残差逼近“的思想。不太明白这哪里类似残差逼近的思想了?
考虑到AdaBoost - 图62AdaBoost - 图63的关系,有:

AdaBoost - 图64%26%3D%5Cfrac%7B%20D(x)e%5E%7B-f(x)Ht(x)%7D%7D%20%7BE%7Bx%20%5Csim%20D%7D%20%5Be%5E%7B-f(x)Ht(x)%7D%5D%7D%20%5C%5C%0A%5C%5C%0A%26%3D%5Cfrac%7B%20D(x)e%5E%7B-f(x)H%7Bt-1%7D(x)%7D%20e%5E%7B-f(x)%20%5Calphat%20h_t(x)%7D%7D%20%7BE%7Bx%20%5Csim%20D%7D%20%5Be%5E%7B-f(x)Ht(x)%7D%5D%7D%20%5C%5C%0A%0A%26%3D%20D_t(x)%20%5Ccdot%20e%5E%7B-f(x)%20%5Calpha_t%20h_t(x)%7D%20%5Cfrac%7B%7BE%7Bx%20%5Csim%20D%7D%20%5Be%5E%7B-f(x)H%7Bt-1%7D(x)%7D%5D%7D%7D%20%7B%7BE%7Bx%20%5Csim%20D%7D%20%5Be%5E%7B-f(x)Ht(x)%7D%5D%7D%7D%0A%5Cend%7Baligned%7D%0A#card=math&code=%5Cbegin%7Baligned%7D%0AD%7Bt%2B1%7D%28x%29%26%3D%5Cfrac%7B%20D%28x%29e%5E%7B-f%28x%29Ht%28x%29%7D%7D%20%7BE%7Bx%20%5Csim%20D%7D%20%5Be%5E%7B-f%28x%29Ht%28x%29%7D%5D%7D%20%5C%5C%0A%5C%5C%0A%26%3D%5Cfrac%7B%20D%28x%29e%5E%7B-f%28x%29H%7Bt-1%7D%28x%29%7D%20e%5E%7B-f%28x%29%20%5Calphat%20h_t%28x%29%7D%7D%20%7BE%7Bx%20%5Csim%20D%7D%20%5Be%5E%7B-f%28x%29Ht%28x%29%7D%5D%7D%20%5C%5C%0A%0A%26%3D%20D_t%28x%29%20%5Ccdot%20e%5E%7B-f%28x%29%20%5Calpha_t%20h_t%28x%29%7D%20%5Cfrac%7B%7BE%7Bx%20%5Csim%20D%7D%20%5Be%5E%7B-f%28x%29H%7Bt-1%7D%28x%29%7D%5D%7D%7D%20%7B%7BE%7Bx%20%5Csim%20D%7D%20%5Be%5E%7B-f%28x%29H_t%28x%29%7D%5D%7D%7D%0A%5Cend%7Baligned%7D%0A&height=175&width=362)

这恰是AdaBoost算法中第5步的公式。