A decision tree classifies labelled data by applying a sequence of logical tests on attributes that partition the data into finer and finer sets. The model that is learnt is a tree of logical tests.
The decision tree is a flowchart-like structure, where
- each internal node as well as the topmost root node represents a test on an attribute; commonly the tests can have only two outcomes, in which case the tree is binary 每个内部节点以及最顶端的根节点代表一个属性测试;通常,测试只能有两种结果,在这种情况下,树是二元的
- each branch directed out and down from an internal node represents an outcome of the test 从内部节点向下引出的每个分支代表测试结果
- each leaf node (or terminal node) represents represents a decision and holds a class label
- a path from the root to a leaf traces out the classification for a tuple
Decision tree induction is very popular for classification because:
- relatively fast learning speed 相对较快的学习速度
- convertible to simple and easy to understand classification rules 可转换为简单易懂的分类规则
- can work with SQL queries to access databases while tree-building 可以在构建树的同时使用SQL查询来访问数据库
- comparable classification accuracy with other methods 与其他方法的分类精度相当
Exercise
Here is some data for a binary classification problem, with label buys_computer.