A typical basic algorithm follows. It is greedy (it makes decisions optimising the next step context and never backtracks to reconsider).
It is a recursive, top-down divide-and-conquer approach to build a tree.
Attributes may be nominal, ordinal, or continuous. 属性可以是名义的、顺序的或连续的。
- At the start, all the training examples are at the root 开始的时候,所有培训示例都是根
- At a node, test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain)
- Examples at the node are partitioned to sub-nodes based on selected attributes 基于选定的属性,节点上的示例被划分为子节点。
- Recurse over subnodes 在子节点上递归
- Paritioning stops when
- All samples for a given node belong to the same class; or 给定的节点的样例属于同一类别
- There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf; or 没有可以用于进一步划分的剩余属性
- There are no samples left 没有样本了
- All samples for a given node belong to the same class; or 给定的节点的样例属于同一类别
The slightly more generic algorithm sketch below permits n-ary (multiway) trees and can discretise continuous attributes dynamically, according to local context in the tree.