前言

为了得到robust parameters,常常会对模型参数进行tuning,tuning的方法有K-folds CV或 N repeats 构建模型,前者对单次运行时划分样本训练模型,后者重复前者N次,接着选择前者最小lambda组合成最小lambda集合,再根据median或min等选择最佳lambda,最后基于最佳lambda构建新模型。更多知识分享请到 https://zouhua.top/

Codes

  1. library(glmnet)
  2. data(QuickStartExample)
  3. df_lambdas_min <- c()
  4. # 10-fold CV + 10 repeat
  5. for(i in 1:10){
  6. cvfit <- cv.glmnet(x=x,
  7. y=y,
  8. nfolds = 10,
  9. alpha = 1,
  10. nlambda = 100,
  11. type.measure = "auc")
  12. # require(ipflasso)
  13. # cvfit <- cvr.glmnet(X=dat_table,
  14. # Y=dat_target,
  15. # family='binomial',
  16. # nfolds = 10,
  17. # alpha = 1,
  18. # ncv = 10,
  19. # nlambda = 100,
  20. # type.measure = "auc")
  21. df_lambdas_min <- rbind(df_lambdas_min, cvfit$lambda.min)
  22. }
  23. print(df_lambdas_min)

数据分析:基于K-folds   repeats tuning 模型参数 - 图1

Notes: K folds设置应该考虑到sample size的问题,每个fold的sample size一定要大于8,所以10 folds的最小sample size是80

参考

  1. Circulating Protein Biomarkers for Use in Pancreatic Ductal Adenocarcinoma Identification
  2. An Introduction to glmnet
  3. Repeating cv.glmnet

参考文章如引起任何侵权问题,可以与我联系,谢谢。