Boosting

Boosting extends the idea of bagging as follows

  • Boosting with AdaBoost - 图1 classifiers are learned iteratively. k个分类器进行迭代学习的
  • Boosting assigns a weight to each training tuple. Boosting 给每一个训练集分配一个权重。
  • After training some classifier Boosting with AdaBoost - 图2, the weights of tuples in the training set are updated to allow the subsequent classifier Boosting with AdaBoost - 图3 to pay more attention to the training tuples that were misclassified by Boosting with AdaBoost - 图4. 当训练过Mi模型后,训练集中的权重会被更新,Mi+1分类器会对之前误分类的数据集更多的关注。
  • The final classifier Boosting with AdaBoost - 图5 combines votes of classifiers, where the importance of a vote is measured by the accuracy of the classifier. 最后的分类器组合是分类器投票,投票的重要性由各个分类器的准确率来决定。

AdaBoost

AdaBoost is a boosting algorithm which adjusts the weight of each tuple adaptively after training each classifier Boosting with AdaBoost - 图6 AdaBoost是一种boosting算法,它在训练每个分类器Mi后自适应地调整每个元组的权重
image.png

  • Line (1): AdaBoost initialise each tuple with equal weight. 每个样本初始权重都为N分之一
  • Line (3): To construct Boosting with AdaBoost - 图8, sampling Boosting with AdaBoost - 图9-tuples from Boosting with AdaBoost - 图10 according to the weights with replacement. If a weight of certain tuple is relatively larger than the others, the tuple is more likely to be sampled from the dataset Boosting with AdaBoost - 图11. 重新构建新的数据集,如果某个数据的权重大于其他,则其更容易被采样。
    • The next classifier Boosting with AdaBoost - 图12 is more likely to focus on the tuples that have large weights. 接下来的分类器则更容易专注于权重较大的数据元组
  • Line (5): Boosting with AdaBoost - 图13
    • where Boosting with AdaBoost - 图14 is 1 if the tuple is misclassified by Boosting with AdaBoost - 图15 and 0 otherwise.
    • Note that we only use the tuples in Boosting with AdaBoost - 图16 to compute the error, i.e. Boosting with AdaBoost - 图17.
  • Line (10): Adjust the weights of tuples used to train classifier Boosting with AdaBoost - 图18
    • The range of Boosting with AdaBoost - 图19 is
      Boosting with AdaBoost - 图20 because Boosting with AdaBoost - 图21 is always less than 0.5
    • We will decrease the weights of the correctly classified tuples! 正确分类的tuple 需要减去权重。
    • The weights of unused and misclassified tuples are not changed. 没有被训练使用和未被正确分类的对象保持不变
    • Note that the weight of a tuple will be updated once, although the tuple appears in Boosting with AdaBoost - 图22 multiple times.
  • Line (11): Normalise the weights in order to ensure that the sum of weights is 1 (For the sampling in line (3))
    • The normalised weight of Boosting with AdaBoost - 图23th tuple: Boosting with AdaBoost - 图24
    • Replace Boosting with AdaBoost - 图25 to Boosting with AdaBoost - 图26 after the normalisation, and go back to step 3 if Boosting with AdaBoost - 图27.

Prediction with AdaBoost
Unlike bagging, where each classifier was assigned an equal vote, boosting and AdaBoost assign a different weight to the vote of each classifier. 不像bagging为等价投票,boosting将不同的权值赋给分类器投票。
The weight of classifier Boosting with AdaBoost - 图28 is given by

Boosting with AdaBoost - 图29.
For each class Boosting with AdaBoost - 图30, we sum the weights of each classifier that assigns class Boosting with AdaBoost - 图31. The class with the highest sum is the winner.

Boosting vs Bagging

  • Boosting focuses on the misclassified tuples, it risks overfitting the resulting composite model. Therefore, sometimes the boosted model may be less accurate than a single model. 增强侧重于错误分类的元组,它有过度拟合合成模型的风险。因此,有时增强模型可能不如单一模型精确。
  • Bagging is less susceptible to model overfitting. 装袋不易受模型过度拟合的影响。
  • Both can significantly improve accuracy, but Boosting tends to achieve greater accuracy.两者都可以显著提高准确性,但Boosting往往会获得更高的准确性。

ACTION: Follow this worked example for AdaBoost. You can follow the example on paper, or in the video, or both; the example is the same.
Example: AdaBoost ensemble method
video walkthrough example for AdaBoost