代码来源:第二届中国移动“梧桐杯”-数智乡村(浙江赛区)训练赛base_line
    https://www.datacastle.cn/project_content.html?type=project&id=6060

    5折交叉验证lgb建模

    1. # 建模
    2. def lgb_model(train_x, train_y, test_x):
    3. folds = 5
    4. seed = 2022
    5. kf = KFold(n_splits=folds, shuffle=True, random_state=seed)
    6. train = np.zeros(train_x.shape[0])
    7. test = np.zeros(test_x.shape[0])
    8. cv_scores = []
    9. for i, (train_index, valid_index) in enumerate(kf.split(train_x, train_y)):
    10. print('************************************ {} ************************************'.format(str(i+1)))
    11. trn_x, trn_y, val_x, val_y = train_x.iloc[train_index], train_y[train_index], train_x.iloc[valid_index], train_y[valid_index]
    12. train_matrix = lgb.Dataset(trn_x, label=trn_y)
    13. valid_matrix = lgb.Dataset(val_x, label=val_y)
    14. params = {
    15. 'boosting_type': 'gbdt',
    16. 'objective': 'binary',
    17. 'metric': 'auc',
    18. 'num_leaves': 2 ** 5,
    19. 'learning_rate': 0.1,
    20. 'seed': 2022,
    21. 'nthread': 28,
    22. 'n_jobs':24,
    23. }
    24. model = lgb.train(params, train_matrix, 50000, valid_sets=[train_matrix, valid_matrix], verbose_eval=200,early_stopping_rounds=200)
    25. val_pred = model.predict(val_x, num_iteration=model.best_iteration)
    26. test_pred = model.predict(test_x, num_iteration=model.best_iteration)
    27. train[valid_index] = val_pred
    28. test = test_pred / kf.n_splits
    29. cv_scores.append(roc_auc_score(val_y, val_pred))
    30. print(cv_scores)
    31. print("%s_scotrainre_list:" % 'lgb', cv_scores)
    32. print("%s_score_mean:" % 'lgb', np.mean(cv_scores))
    33. print("%s_score_std:" % 'lgb', np.std(cv_scores))
    34. return test
    1. # 划分数据集,并训练模型,预测结果
    2. train_x = data[data['flag'].notnull()][used_feat].copy()
    3. train_y = data[data['flag'].notnull()]['flag']
    4. test_x = data[data['flag'].isnull()][used_feat].copy()
    5. print(train_x.shape, test_x.shape)
    6. lgb_test = lgb_model(train_x, train_y, test_x)
    1. (35000, 52) (15000, 52)
    2. ************************************ 1 ************************************
    3. [LightGBM] [Warning] num_threads is set with nthread=28, will be overridden by n_jobs=24. Current value: num_threads=24
    4. [LightGBM] [Warning] num_threads is set with nthread=28, will be overridden by n_jobs=24. Current value: num_threads=24
    5. [LightGBM] [Info] Number of positive: 5605, number of negative: 22395
    6. [LightGBM] [Warning] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001488 seconds.
    7. You can set `force_row_wise=true` to remove the overhead.
    8. And if memory is not enough, you can set `force_col_wise=true`.
    9. [LightGBM] [Info] Total Bins 5624
    10. [LightGBM] [Info] Number of data points in the train set: 28000, number of used features: 52
    11. [LightGBM] [Warning] num_threads is set with nthread=28, will be overridden by n_jobs=24. Current value: num_threads=24
    12. [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.200179 -> initscore=-1.385179
    13. [LightGBM] [Info] Start training from score -1.385179
    14. Training until validation scores don't improve for 200 rounds
    15. [200] training's auc: 0.971499 valid_1's auc: 0.89008
    16. Early stopping, best iteration is:
    17. [188] training's auc: 0.969951 valid_1's auc: 0.890539
    18. [0.8905392049233034]
    19. ************************************ 2 ************************************
    20. [LightGBM] [Warning] num_threads is set with nthread=28, will be overridden by n_jobs=24. Current value: num_threads=24
    21. [LightGBM] [Warning] num_threads is set with nthread=28, will be overridden by n_jobs=24. Current value: num_threads=24
    22. [LightGBM] [Info] Number of positive: 5627, number of negative: 22373
    23. [LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.003396 seconds.
    24. You can set `force_col_wise=true` to remove the overhead.
    25. [LightGBM] [Info] Total Bins 5623
    26. [LightGBM] [Info] Number of data points in the train set: 28000, number of used features: 52
    27. [LightGBM] [Warning] num_threads is set with nthread=28, will be overridden by n_jobs=24. Current value: num_threads=24
    28. [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.200964 -> initscore=-1.380278
    29. [LightGBM] [Info] Start training from score -1.380278
    30. Training until validation scores don't improve for 200 rounds
    31. [200] training's auc: 0.970912 valid_1's auc: 0.876217
    32. Early stopping, best iteration is:
    33. [93] training's auc: 0.944954 valid_1's auc: 0.879035
    34. [0.8905392049233034, 0.879034975812047]
    35. ************************************ 3 ************************************
    36. [LightGBM] [Warning] num_threads is set with nthread=28, will be overridden by n_jobs=24. Current value: num_threads=24
    37. [LightGBM] [Warning] num_threads is set with nthread=28, will be overridden by n_jobs=24. Current value: num_threads=24
    38. [LightGBM] [Info] Number of positive: 5634, number of negative: 22366
    39. [LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.003407 seconds.
    40. You can set `force_col_wise=true` to remove the overhead.
    41. [LightGBM] [Info] Total Bins 5617
    42. [LightGBM] [Info] Number of data points in the train set: 28000, number of used features: 52
    43. [LightGBM] [Warning] num_threads is set with nthread=28, will be overridden by n_jobs=24. Current value: num_threads=24
    44. [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.201214 -> initscore=-1.378722
    45. [LightGBM] [Info] Start training from score -1.378722
    46. Training until validation scores don't improve for 200 rounds
    47. [200] training's auc: 0.97181 valid_1's auc: 0.888507
    48. Early stopping, best iteration is:
    49. [119] training's auc: 0.953518 valid_1's auc: 0.88972
    50. [0.8905392049233034, 0.879034975812047, 0.8897195773363378]
    51. ************************************ 4 ************************************
    52. [LightGBM] [Warning] num_threads is set with nthread=28, will be overridden by n_jobs=24. Current value: num_threads=24
    53. [LightGBM] [Warning] num_threads is set with nthread=28, will be overridden by n_jobs=24. Current value: num_threads=24
    54. [LightGBM] [Info] Number of positive: 5622, number of negative: 22378
    55. [LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.003478 seconds.
    56. You can set `force_col_wise=true` to remove the overhead.
    57. [LightGBM] [Info] Total Bins 5624
    58. [LightGBM] [Info] Number of data points in the train set: 28000, number of used features: 52
    59. [LightGBM] [Warning] num_threads is set with nthread=28, will be overridden by n_jobs=24. Current value: num_threads=24
    60. [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.200786 -> initscore=-1.381391
    61. [LightGBM] [Info] Start training from score -1.381391
    62. Training until validation scores don't improve for 200 rounds
    63. [200] training's auc: 0.970869 valid_1's auc: 0.893076
    64. Early stopping, best iteration is:
    65. [114] training's auc: 0.950973 valid_1's auc: 0.89486
    66. [0.8905392049233034, 0.879034975812047, 0.8897195773363378, 0.8948604591836734]
    67. ************************************ 5 ************************************
    68. [LightGBM] [Warning] num_threads is set with nthread=28, will be overridden by n_jobs=24. Current value: num_threads=24
    69. [LightGBM] [Warning] num_threads is set with nthread=28, will be overridden by n_jobs=24. Current value: num_threads=24
    70. [LightGBM] [Info] Number of positive: 5600, number of negative: 22400
    71. [LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.003464 seconds.
    72. You can set `force_col_wise=true` to remove the overhead.
    73. [LightGBM] [Info] Total Bins 5632
    74. [LightGBM] [Info] Number of data points in the train set: 28000, number of used features: 52
    75. [LightGBM] [Warning] num_threads is set with nthread=28, will be overridden by n_jobs=24. Current value: num_threads=24
    76. [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.200000 -> initscore=-1.386294
    77. [LightGBM] [Info] Start training from score -1.386294
    78. Training until validation scores don't improve for 200 rounds
    79. [200] training's auc: 0.972879 valid_1's auc: 0.885969
    80. [400] training's auc: 0.987334 valid_1's auc: 0.886842
    81. Early stopping, best iteration is:
    82. [330] training's auc: 0.984063 valid_1's auc: 0.887086
    83. [0.8905392049233034, 0.879034975812047, 0.8897195773363378, 0.8948604591836734, 0.8870862349021346]
    84. lgb_scotrainre_list: [0.8905392049233034, 0.879034975812047, 0.8897195773363378, 0.8948604591836734, 0.8870862349021346]
    85. lgb_score_mean: 0.8882480904314992
    86. lgb_score_std: 0.005241551089184859