algorithm - Decision Tree - 《Artificial Intelligence》

Resources
Decision tree algorithm
2 The process of decision tree
First example: Judje fish or non-fish
1 Code analyse
- 1 Prepare data
- 2 Train algorithm

Resources

Books

周志华《机器学习》第四章：决策树
李航《统计学习方法》第五章：决策树
《机器学习实战》第三章：决策树

documents

A series of decision tree resourceshttps://blog.csdn.net/cindy407/article/details/92850901
Sklearn resourceshttps://scikit-learn.org/stable/modules/tree.html#classification
ID3、C4.5、CART algorithmhttps://zhuanlan.zhihu.com/p/85731206
Decision tree algorithm
ID3
2 C4.5
3 CART

2 The process of decision tree

collect data
prepare data
analyse data
train algorithm
test algorithm

use algorithm

First example: Judje fish or non-fish

1 Code analyse

1 Prepare data

1 Argument analyse
2 Process analyse
3 Code implementation

def createDataSet():
  # dataSet 前两列是特征，最后一列对应的是每条数据对应的分类标签
  dataSet = [[1, 1, 'yes'],
             [1, 1, 'yes'],
             [1, 0, 'no'],
             [0, 1, 'no'],
             [0, 1, 'no']]
  # dataSet = [['yes'],
  #         ['yes'],
  #         ['no'],
  #         ['no'],
  #         ['no']]
  # labels  露出水面   脚蹼，注意: 这里的labels是写的 dataSet 中特征的含义，并不是对应的分类标签或者说目标变量
  labels = ['no surfacing', 'flippers']
  # 返回
  return dataSet, labels

2 Train algorithm

1 Calculate Shannon entropy

Arguments analyse:
- data
Process analyse
- Calculate data length
- Initialize labelCount and shannonEntropy
- Calculate labelCount(for cycle to achieve)
- Calculate shannonEntropy(for cycle to achieve)

Code implements

def CalculateShannonEntpry(data):  
  dataNum = len(data)
  # Calculate the number of the label
  labelCount = {}
  for line in data:
      currentLabel = data[-1]
      if currentLabel not in labelCount.key():
          labelCount[currentLabel] = 0
      labelCount += 1
  shannonEntropy = 0.0
  for key in labelCount:
      prob = float(labelCount[key])/dataNum
      shannonEntropy += prob
  return shannonEntropy

2 Devision data and choose the best way of devision data

Arguments analyse:
Process analyse:

Code implements:

def DevisionData(dataSet, index, value):
  retDataSet = []
  for featVec in dataSet: 
      # index列为value的数据集【该数据集需要排除index列】
      # 判断index列的值是否为value
      if featVec[index] == value:
          # chop out index used for splitting
          # [:index]表示前index行，即若 index 为2，就是取 featVec 的前 index 行
          reducedFeatVec = featVec[:index]
          '''
          extend和append的区别:
          list.append(object) 向列表中添加一个对象object
          list.extend(sequence) 把一个序列seq的内容添加到列表中
          1、使用append的时候，是将new_media看作一个对象，整体打包添加到music_media对象中。
          2、使用extend的时候，是将new_media看作一个序列，将这个序列和music_media序列合并，并放在其后面。
          result = []
          result.extend([1,2,3])
          print(result)
          result.append([4,5,6])
          print(result)
          result.extend([7,8,9])
          print(result)
          结果: 
          [1, 2, 3]
          [1, 2, 3, [4, 5, 6]]
          [1, 2, 3, [4, 5, 6], 7, 8, 9]
          '''
          reducedFeatVec.extend(featVec[index+1:])
          # [index+1:]表示从跳过 index 的 index+1行，取接下来的数据
          # 收集结果值 index列为value的行【该行需要排除index列】
          retDataSet.append(reducedFeatVec)
  return retDataSet

3 Recursive build decision tree

Arguments analyse:
Process analyse:

Code implements:

classList = [example[-1] for example in dataSet]
# first termination condition: only a class
if classList.count(classList[0]) == len(classList):
  return classList[0]
# second termination condition: Use all features,but 
if len(dataSet[0]) == 1:
  return majorityCnt(classList)

Decision Tree

Resources

Decision tree algorithm

ID3

2 C4.5

3 CART

2 The process of decision tree

First example: Judje fish or non-fish

1 Code analyse

1 Prepare data

2 Train algorithm

1 Calculate Shannon entropy

2 Devision data and choose the best way of devision data

3 Recursive build decision tree