接到小伙伴的决策树代码实现需求,为了在windows环境下愉快的调试并输出结果,调整了下环境细节。现在将整个过程记录并分享

环境

  1. vscode配置ptyhon
    1. Ctrl + Shift + X 打开应用商店搜索安装python就OK了
  2. vscode配置jupyter
  3. vscode配置graphviz
    • 下载graphviz
    • 配置graphviz的/bin目录到windows环境变量
    • 然后执行pip install graphviz
    • 最后在配置好的jupyter中执行代码就能享受可视化了

      参考

GitHub Desktop配置

vscode的github插件有点感人,安利个GitHub Desktop

  • 下载、登录~~似乎没啥介绍的
  • 便捷同步github库、打开IDE、排除同步文件、快速建代码仓
  • Ctrl+ P同步、Ctrl+ Shift + A …. 必要快捷键,界面上有提示
  • image.png

    通过jupyter导出的代码

    tips: jupyter调试好的code可以直接通过角标导出 image.png

  1. # To add a new cell, type '# %%'
  2. # To add a new markdown cell, type '# %% [markdown]'
  3. # %%
  4. import numpy as np
  5. import pandas as pd
  6. import matplotlib
  7. import matplotlib.pyplot as plt
  8. from sklearn import tree
  9. from sklearn.tree import DecisionTreeClassifier as DTC, export_graphviz
  10. import pydot
  11. import pydotplus
  12. import time
  13. from IPython.display import Image
  14. # %%
  15. Train_data = pd.read_csv('./datalab/car_info_test.csv')
  16. Train_data.info()
  17. # %%
  18. Train_data
  19. # %%
  20. # 筛选数值特征列
  21. numerical_cols = Train_data.select_dtypes(exclude = 'object').columns
  22. print(numerical_cols)
  23. # %%
  24. # 对部分特征列进行分类
  25. Train_data['CUST_AGE'] = Train_data.CUST_AGE.apply(lambda x: 1 if (x<=35 and x >=16) else 2 if (x >= 36 and x <= 60) else 3)
  26. Train_data['CAR_AGE'] = Train_data.CAR_AGE.apply(lambda x: 1 if (x <=730) else 2 if (x >=731 and x <= 1460) else 3 if (x >= 1461 and x <= 2190) else 4 if (x >= 2191 and x <= 3650) else 5)
  27. Train_data['CAR_PRICE'] = Train_data.CAR_PRICE.apply(lambda x: 1 if (x >=50000 and x <= 90000) else 2 if (x >= 90001 and x<= 150000) else 3 if (x >= 150001 and x <= 300000) else 4)
  28. Train_data['LOAN_AMOUNT'] = Train_data.LOAN_AMOUNT.apply(lambda x: 1 if (x <= 50000) else 2 if(x >=50001 and x <= 200000) else 3 if(x >= 200001 and x <= 500000) else 4)
  29. # %%
  30. Train_data.head()
  31. # %%
  32. # 查看是否有空值特征列
  33. Train_data.isnull().any()
  34. # %%
  35. # 从数值特种列中区分 data taeget(标记的特征)
  36. feature_cols = [col for col in numerical_cols if col != 'IS_LOST']
  37. data = Train_data[feature_cols].fillna(value=0)
  38. target = Train_data['IS_LOST']
  39. # %%
  40. # DecisionTreeClassifier 的决策树实现
  41. dtc = DTC(criterion='entropy',max_depth=6)
  42. dtc.fit(data, target)
  43. print('准确率:', dtc.score(data, target))
  44. # %%
  45. # 可视化展示
  46. dot_data = tree.export_graphviz(dtc, out_file=None,
  47. filled=True, rounded=True,
  48. special_characters=True)
  49. graph = pydotplus.graph_from_dot_data(dot_data)
  50. Image(graph.create_png())
  51. # %%
  52. # 输出为PDF
  53. import pydot
  54. import pydotplus
  55. with open('./tree.dot', 'w') as f:
  56. f = export_graphviz(dtc, feature_names=data.columns, out_file=f)
  57. graph = pydotplus.graph_from_dot_file('./tree.dot')
  58. graph.write_pdf("iris.pdf")

效果展示

Video_2020-07-22_160115.wmv (3.27MB)

听说vim能舒爽到刷剧