Decision Trees 决策树 - Overfitting and tree pruning - 《机器学习》

If you have consistently labelled data, and you allow attributes to be re-used when growing the tree, and you stop growing when every training tuple is classified, it is always possible to have a tree that describes the training set with 100% accuracy. Such a tree typically has very poor generalisation _and is said to overfit the training data.
_
Overfitting: An induced tree may overfit the training data

Too many branches, some may reflect anomalies due to noise or outliers
Poor accuracy for unseen samples is observed because anomalies are modelled and objects like the training data are not modelled. 100% accuracy on the training data can be a _bad thing _for accuracy on unseen data.

There are two typical approaches to avoid overfitting:

Prepruning: Stop tree construction early ̵ do not split a node if this would result in the goodness measure falling below a threshold. But it is difficult to choose an appropriate threshold.
Postpruning: Remove branches from a “fully grown” tree. This produces a sequence of progressively pruned trees. Then use a set of data different from the training data to decide which is the “best pruned tree”