A decision tree classifies labelled data by applying a sequence of logical tests on attributes that partition the data into finer and finer sets. The model that is learnt is a tree of logical tests.

    The decision tree is a flowchart-like structure, where

    • each internal node as well as the topmost root node represents a test on an attribute; commonly the tests can have only two outcomes, in which case the tree is binary 每个内部节点以及最顶端的根节点代表一个属性测试;通常,测试只能有两种结果,在这种情况下,树是二元的
    • each branch directed out and down from an internal node represents an outcome of the test 从内部节点向下引出的每个分支代表测试结果
    • each leaf node (or terminal node) represents represents a decision and holds a class label
    • a path from the root to a leaf traces out the classification for a tuple

    Decision tree induction is very popular for classification because:

    • relatively fast learning speed 相对较快的学习速度
    • convertible to simple and easy to understand classification rules 可转换为简单易懂的分类规则
    • can work with SQL queries to access databases while tree-building 可以在构建树的同时使用SQL查询来访问数据库
    • comparable classification accuracy with other methods 与其他方法的分类精度相当

    Exercise
    Here is some data for a binary classification problem, with label buys_computer.

    image.png

    image.png