Attribute Selection Methods - Information Gain - 《机器学习》

Information Gain

This was a very early method that sprang from AI research in ID3, and was refined further to become Gain Ratio in C4.5.
It selects the attribute to split on with the highest information gain 选择信息增益最大的属性来拆分节点
Let be the probability that an arbitrary tuple in belongs to class , of classes, where is the set of tuples in labelled with class
- estimated by 相对应的类别占节点总类别的比例
Expected information (entropy) needed to classify a tuple in is defined by

After using attribute to split into partitions, corresponding to each attribute value for , each one of these partitions being , the information that is still needed to separate the classes is:

Therefore, information gained by branching on attribute is given by

Example (continued from previous)

Consider 2 classes: Class P is buyscomputer = “yes”. Class _N is buyscomputer = “no”
For some partition on with examples of _P and examples of N, let be written as .

Using the definition from above,

we have
Now consider the first partition on _age. _We have the following

and so

= 0.694

Therefore

Similarly,

So Gain(age) is optimal and we split on age to get