If you have consistently labelled data, and you allow attributes to be re-used when growing the tree, and you stop growing when every training tuple is classified, it is always possible to have a tree that describes the training set with 100% accuracy. Such a tree typically has very poor generalisation _and is said to overfit the training data.
    _
    Overfitting: An induced tree may overfit the training data

    • Too many branches, some may reflect anomalies due to noise or outliers
    • Poor accuracy for unseen samples is observed because anomalies are modelled and objects like the training data are not modelled. 100% accuracy on the training data can be a _bad thing _for accuracy on unseen data.

    There are two typical approaches to avoid overfitting:

    • Prepruning: Stop tree construction early ̵ do not split a node if this would result in the goodness measure falling below a threshold. But it is difficult to choose an appropriate threshold.
    • Postpruning: Remove branches from a “fully grown” tree. This produces a sequence of progressively pruned trees. Then use a set of data different from the training data to decide which is the “best pruned tree”