其他方法

来源：https://blog.csdn.net/Dawei_01/article/details/80846371

过采样与欠采样是针对不均衡数据集合的采样方法

过采样：**重复正比例数据，实际上没有为模型引入更多数据，过分强调正比例数据，会放大正比例噪音对模型的影响。

# 对数据集进行过采样，使得采样的比例稀疏类占据丰富类的0.1，
# 然后把采样的数据 赋值给X_train_res, y_train_res
from collections import Counter
from imblearn import over_sampling
print('Original dataset shape {}'.format(Counter(y_train)))
# sampling_strategy   对稀疏类进行过采样
smote_model = over_sampling.SMOTE(random_state=7, sampling_strategy=0.1)
X_train_res,y_train_res = smote_model.fit_resample(X_train_std,y_train)

欠采样：丢弃大量数据，和过采样一样会存在过拟合的问题。

其他方法

调整分类阈值，使得更倾向与类别少的数据。
选择合适的评估标准，比如ROC，AUC或者F1, G-mean，而不是准确度（accuracy）