1.特征选择

1.1.低方差过滤

  1. - 删除方差低于阈值的列
  2. - sklearn.feature_selection.VarianceThreshold(threshold=0)

1.2.相关系数

  1. - 皮尔逊相关系数
  2. - ![image.png](https://cdn.nlark.com/yuque/0/2021/png/21772492/1626246630069-34ae8811-70fa-476d-be4d-fbf370c9f1b7.png#clientId=u078045bc-8b3d-4&from=paste&height=55&id=uf96ad652&margin=%5Bobject%20Object%5D&name=image.png&originHeight=117&originWidth=692&originalType=binary&ratio=1&size=38558&status=done&style=none&taskId=u3b6d1120-ca02-4d20-a5db-625a127cf94&width=326)
  3. - [-1,1],绝对值越大相关性越强,正值正相关,负值负相关
  4. - scipy.stats.pearsonr(x1,x2)
  5. - 斯皮尔曼相关系数
  6. - ![image.png](https://cdn.nlark.com/yuque/0/2021/png/21772492/1626246495201-fb71eeb8-3904-4289-85d0-25302d8b526f.png#clientId=u078045bc-8b3d-4&from=paste&height=39&id=u0bd59bd7&margin=%5Bobject%20Object%5D&name=image.png&originHeight=77&originWidth=308&originalType=binary&ratio=1&size=10785&status=done&style=none&taskId=uc27992b4-7d92-455e-a271-61b06a6296a&width=154)
  7. - [-1,1],绝对值越大相关性越强,正值正相关,负值负相关
  8. - scipy.stats.spearmanr(x1,x2)

2.主成分分析PCA

  • sklearn.decomposition.PCA(n_components=None)
    • n_components为整数n时,保留n列
    • n_components为小数f时,保留100f%的信息