获取更多R语言知识,请关注公众号:医学和生信笔记

医学和生信笔记 公众号主要分享:1.医学小知识、肛肠科小知识;2.R语言和Python相关的数据分析、可视化、机器学习等;3.生物信息学学习资料和自己的学习笔记!

前面一篇介绍了如何使用mlr3创建任务和学习器、拟合模型、预测和简单的评价,本篇将模型评价的一些细节问题,展示mlr3如何使得这些步骤变得更加简单!

二分类变量和ROC曲线

对于二分类变量,结果有阴性和阳性两种,而且判定阴性和阳性的阈值是可以认为设定的。ROC曲线可以很好的帮助我们确定最佳的分割点。

首先看一下如何获取一个分类变量的混淆矩阵:

  1. library(mlr3verse)
  2. ## 载入需要的程辑包:mlr3
  3. data("Sonar", package = "mlbench")
  4. task <- as_task_classif(Sonar, target = "Class", positive = "M") # 指定阳性
  5. learner <- lrn("classif.rpart", predict_type = "prob") # 指定预测类型
  6. prediction <- learner$train(task)$predict(task)
  7. conf <- prediction$confusion
  8. print(conf)
  9. ## truth
  10. ## response M R
  11. ## M 95 10
  12. ## R 16 87

绘制ROC曲线也是非常方便:

  1. autoplot(prediction, type = "roc")

R语言机器学习mlr3:模型评价和比较 - 图1

也可以非常方便的绘制PRC曲线:

  1. autoplot(prediction, type = "prc")

R语言机器学习mlr3:模型评价和比较 - 图2

重抽样

mlr3支持的重抽样方法:

  • cross validation (“cv”),
  • leave-one-out cross validation (“loo”),
  • repeated cross validation (“repeated_cv”),
  • otstrapping (“bootstrap”),
  • subsampling (“subsampling”),
  • holdout (“holdout”),
  • in-sample resampling (“insample”),
  • custom resampling (“custom”).

查看重抽样的方法:

  1. library(mlr3verse)
  2. as.data.table(mlr_resamplings)
  3. ## key params iters
  4. ## 1: bootstrap ratio,repeats 30
  5. ## 2: custom NA
  6. ## 3: custom_cv NA
  7. ## 4: cv folds 10
  8. ## 5: holdout ratio 1
  9. ## 6: insample 1
  10. ## 7: loo NA
  11. ## 8: repeated_cv folds,repeats 100
  12. ## 9: subsampling ratio,repeats 30

还有一些特殊类型的重抽样方法可以通过扩展包实现,比如mlr3spatiotemporal包。

默认的方法是holdout

  1. resampling <- rsmp("holdout")
  2. print(resampling)
  3. ## <ResamplingHoldout> with 1 iterations
  4. ## * Instantiated: FALSE
  5. ## * Parameters: ratio=0.6667

可以通过以下方法改变比例:

  1. resampling$param_set$values <- list(ratio = 0.8)
  2. # 或者
  3. rsmp("holdout", ratio = 0.8)
  4. ## <ResamplingHoldout> with 1 iterations
  5. ## * Instantiated: FALSE
  6. ## * Parameters: ratio=0.8

下面一个例子使用5折交叉验证方法,建立一个决策树模型:

  1. library(mlr3verse)
  2. task <- tsk("penguins") # 创建任务
  3. learner <- lrn("classif.rpart", predict_type = "prob") # 创建学习器,设定预测的结果是概率
  4. resampling <- rsmp("cv", folds = 5) # 选择重抽样方法
  5. rr <- resample(task, learner, resampling, store_models = T) # 1行代码搞定
  6. ## INFO [20:47:12.966] [mlr3] Applying learner 'classif.rpart' on task 'penguins' (iter 5/5)
  7. ## INFO [20:47:12.996] [mlr3] Applying learner 'classif.rpart' on task 'penguins' (iter 1/5)
  8. ## INFO [20:47:13.010] [mlr3] Applying learner 'classif.rpart' on task 'penguins' (iter 2/5)
  9. ## INFO [20:47:13.019] [mlr3] Applying learner 'classif.rpart' on task 'penguins' (iter 4/5)
  10. ## INFO [20:47:13.029] [mlr3] Applying learner 'classif.rpart' on task 'penguins' (iter 3/5)
  11. print(rr)
  12. ## <ResampleResult> of 5 iterations
  13. ## * Task: penguins
  14. ## * Learner: classif.rpart
  15. ## * Warnings: 0 in 0 iterations
  16. ## * Errors: 0 in 0 iterations

获得平均的模型表现

  1. rr$aggregate(msr("classif.acc"))
  2. ## classif.acc
  3. ## 0.9448423

获得单个模型的表现

  1. rr$score(msr("classif.acc"))[,7:9]
  2. ## iteration prediction classif.acc
  3. ## 1: 1 <PredictionClassif[20]> 0.9710145
  4. ## 2: 2 <PredictionClassif[20]> 0.8985507
  5. ## 3: 3 <PredictionClassif[20]> 0.9130435
  6. ## 4: 4 <PredictionClassif[20]> 0.9710145
  7. ## 5: 5 <PredictionClassif[20]> 0.9705882

检查警告或者错误:

  1. rr$warnings
  2. ## Empty data.table (0 rows and 2 cols): iteration,msg
  3. rr$errors
  4. ## Empty data.table (0 rows and 2 cols): iteration,msg

取出单个模型

  1. rr$learners[[5]]$model
  2. ## n= 276
  3. ##
  4. ## node), split, n, loss, yval, (yprob)
  5. ## * denotes terminal node
  6. ##
  7. ## 1) root 276 158 Adelie (0.427536232 0.206521739 0.365942029)
  8. ## 2) flipper_length< 206.5 170 54 Adelie (0.682352941 0.311764706 0.005882353)
  9. ## 4) bill_length< 43.35 117 4 Adelie (0.965811966 0.034188034 0.000000000) *
  10. ## 5) bill_length>=43.35 53 4 Chinstrap (0.056603774 0.924528302 0.018867925) *
  11. ## 3) flipper_length>=206.5 106 6 Gentoo (0.018867925 0.037735849 0.943396226)
  12. ## 6) bill_depth>=17.2 8 4 Chinstrap (0.250000000 0.500000000 0.250000000) *
  13. ## 7) bill_depth< 17.2 98 0 Gentoo (0.000000000 0.000000000 1.000000000) *

这个包也可以和其他决策树可视化R包无缝衔接,比如非常画图非常好看的rpart.plot:

  1. library(rpart.plot)
  2. ## 载入需要的程辑包:rpart
  3. rpart.plot(rr$learners[[5]]$model)

R语言机器学习mlr3:模型评价和比较 - 图3

查看预测结果:

  1. rr$prediction()
  2. ## <PredictionClassif> for 344 observations:
  3. ## row_ids truth response prob.Adelie prob.Chinstrap prob.Gentoo
  4. ## 1 Adelie Adelie 0.96969697 0.03030303 0.00000000
  5. ## 4 Adelie Adelie 0.96969697 0.03030303 0.00000000
  6. ## 26 Adelie Adelie 0.96969697 0.03030303 0.00000000
  7. ## ---
  8. ## 333 Chinstrap Chinstrap 0.05660377 0.92452830 0.01886792
  9. ## 334 Chinstrap Chinstrap 0.05660377 0.92452830 0.01886792
  10. ## 335 Chinstrap Chinstrap 0.05660377 0.92452830 0.01886792
  1. # 查看单个预测结果
  2. rr$predictions()[[1]]
  3. ## <PredictionClassif> for 69 observations:
  4. ## row_ids truth response prob.Adelie prob.Chinstrap prob.Gentoo
  5. ## 1 Adelie Adelie 0.96969697 0.03030303 0.00000000
  6. ## 4 Adelie Adelie 0.96969697 0.03030303 0.00000000
  7. ## 26 Adelie Adelie 0.96969697 0.03030303 0.00000000
  8. ## ---
  9. ## 338 Chinstrap Chinstrap 0.08888889 0.88888889 0.02222222
  10. ## 342 Chinstrap Chinstrap 0.08888889 0.88888889 0.02222222
  11. ## 344 Chinstrap Chinstrap 0.08888889 0.88888889 0.02222222

提取特定iteration的结果

  1. rr$filter(c(3,5))
  2. print(rr)
  3. ## <ResampleResult> of 2 iterations
  4. ## * Task: penguins
  5. ## * Learner: classif.rpart
  6. ## * Warnings: 0 in 0 iterations
  7. ## * Errors: 0 in 0 iterations

可视化结果:

  1. task <- tsk("pima") # 非常著名的糖尿病数据集
  2. task$select(c("glucose","mass"))
  3. learner <- lrn("classif.rpart", predict_type = "prob")
  4. resampling <- rsmp("cv")
  5. rr <- resample(task, learner, resampling, store_models = T)
  6. ## INFO [20:47:13.436] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 5/10)
  7. ## INFO [20:47:13.449] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 6/10)
  8. ## INFO [20:47:13.461] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 9/10)
  9. ## INFO [20:47:13.473] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 8/10)
  10. ## INFO [20:47:13.488] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 3/10)
  11. ## INFO [20:47:13.501] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 1/10)
  12. ## INFO [20:47:13.513] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 10/10)
  13. ## INFO [20:47:13.524] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 4/10)
  14. ## INFO [20:47:13.536] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 7/10)
  15. ## INFO [20:47:13.548] [mlr3] Applying learner 'classif.rpart' on task 'pima' (iter 2/10)
  16. autoplot(rr, measure = msr("classif.auc"))

R语言机器学习mlr3:模型评价和比较 - 图4

ROC曲线:10折交叉验证平均后的:

  1. autoplot(rr, type = "roc")

R语言机器学习mlr3:模型评价和比较 - 图5

树状图:

  1. autoplot(rr, type = "prediction")

R语言机器学习mlr3:模型评价和比较 - 图6

可视化单个模型:

  1. rr1 <- rr$filter(1)
  2. autoplot(rr1, type = "prediction")

R语言机器学习mlr3:模型评价和比较 - 图7

所有支持的可视化类型可在此处找到:autoplot.ResampleResult

benchmark

用于比较多个模型,比如多个模型在单个任务的表现、多个模型在多个任务的表现等,使用不同的预处理进行的多个模型的表现等!

首先创建一个design

mlr3通过design进行比较多个模型,这个design是包含TaskLearnerResampling的组合。

  1. library(mlr3verse)
  2. # 使用benchmark_grid函数创建
  3. design <- benchmark_grid(
  4. tasks = tsks(c("spam", "german_credit", "sonar")),
  5. learners = lrns(c("classif.ranger", "classif.rpart", "classif.featureless"), predict_type = "prob"),
  6. resamplings = rsmps(c("holdout", "cv"))
  7. )
  8. print(design)
  9. ## task learner resampling
  10. ## 1: <TaskClassif[49]> <LearnerClassifRanger[37]> <ResamplingHoldout[19]>
  11. ## 2: <TaskClassif[49]> <LearnerClassifRanger[37]> <ResamplingCV[19]>
  12. ## 3: <TaskClassif[49]> <LearnerClassifRpart[37]> <ResamplingHoldout[19]>
  13. ## 4: <TaskClassif[49]> <LearnerClassifRpart[37]> <ResamplingCV[19]>
  14. ## 5: <TaskClassif[49]> <LearnerClassifFeatureless[37]> <ResamplingHoldout[19]>
  15. ## 6: <TaskClassif[49]> <LearnerClassifFeatureless[37]> <ResamplingCV[19]>
  16. ## 7: <TaskClassif[49]> <LearnerClassifRanger[37]> <ResamplingHoldout[19]>
  17. ## 8: <TaskClassif[49]> <LearnerClassifRanger[37]> <ResamplingCV[19]>
  18. ## 9: <TaskClassif[49]> <LearnerClassifRpart[37]> <ResamplingHoldout[19]>
  19. ## 10: <TaskClassif[49]> <LearnerClassifRpart[37]> <ResamplingCV[19]>
  20. ## 11: <TaskClassif[49]> <LearnerClassifFeatureless[37]> <ResamplingHoldout[19]>
  21. ## 12: <TaskClassif[49]> <LearnerClassifFeatureless[37]> <ResamplingCV[19]>
  22. ## 13: <TaskClassif[49]> <LearnerClassifRanger[37]> <ResamplingHoldout[19]>
  23. ## 14: <TaskClassif[49]> <LearnerClassifRanger[37]> <ResamplingCV[19]>
  24. ## 15: <TaskClassif[49]> <LearnerClassifRpart[37]> <ResamplingHoldout[19]>
  25. ## 16: <TaskClassif[49]> <LearnerClassifRpart[37]> <ResamplingCV[19]>
  26. ## 17: <TaskClassif[49]> <LearnerClassifFeatureless[37]> <ResamplingHoldout[19]>
  27. ## 18: <TaskClassif[49]> <LearnerClassifFeatureless[37]> <ResamplingCV[19]>

然后进行比较,也是1行代码即可!

  1. bmr <- benchmark(design, store_models = T)
  2. ## INFO [20:47:16.049] [mlr3] Running benchmark with 99 resampling iterations
  3. #......
  4. ## INFO [20:47:36.375] [mlr3] Applying learner 'classif.rpart' on task 'sonar' (iter 9/10)
  5. ## INFO [20:47:36.414] [mlr3] Finished benchmark

查看模型的表现,使用多种度量指标:

  1. measures <- msrs(c("classif.acc", "classif.mcc"))
  2. tab <- bmr$aggregate(measures)
  3. print(tab)
  4. ## nr resample_result task_id learner_id resampling_id
  5. ## 1: 1 <ResampleResult[22]> spam classif.ranger holdout
  6. ## 2: 2 <ResampleResult[22]> spam classif.ranger cv
  7. ## 3: 3 <ResampleResult[22]> spam classif.rpart holdout
  8. ## 4: 4 <ResampleResult[22]> spam classif.rpart cv
  9. ## 5: 5 <ResampleResult[22]> spam classif.featureless holdout
  10. ## 6: 6 <ResampleResult[22]> spam classif.featureless cv
  11. ## 7: 7 <ResampleResult[22]> german_credit classif.ranger holdout
  12. ## 8: 8 <ResampleResult[22]> german_credit classif.ranger cv
  13. ## 9: 9 <ResampleResult[22]> german_credit classif.rpart holdout
  14. ## 10: 10 <ResampleResult[22]> german_credit classif.rpart cv
  15. ## 11: 11 <ResampleResult[22]> german_credit classif.featureless holdout
  16. ## 12: 12 <ResampleResult[22]> german_credit classif.featureless cv
  17. ## 13: 13 <ResampleResult[22]> sonar classif.ranger holdout
  18. ## 14: 14 <ResampleResult[22]> sonar classif.ranger cv
  19. ## 15: 15 <ResampleResult[22]> sonar classif.rpart holdout
  20. ## 16: 16 <ResampleResult[22]> sonar classif.rpart cv
  21. ## 17: 17 <ResampleResult[22]> sonar classif.featureless holdout
  22. ## 18: 18 <ResampleResult[22]> sonar classif.featureless cv
  23. ## iters classif.acc classif.mcc
  24. ## 1: 1 0.9445893 0.8835453
  25. ## 2: 10 0.9495723 0.8943582
  26. ## 3: 1 0.8917862 0.7725102
  27. ## 4: 10 0.8934967 0.7765629
  28. ## 5: 1 0.6069100 0.0000000
  29. ## 6: 10 0.6059511 0.0000000
  30. ## 7: 1 0.7567568 0.4358851
  31. ## 8: 10 0.7670000 0.3927548
  32. ## 9: 1 0.6996997 0.2847394
  33. ## 10: 10 0.7290000 0.2984376
  34. ## 11: 1 0.6516517 0.0000000
  35. ## 12: 10 0.7000000 0.0000000
  36. ## 13: 1 0.7971014 0.6247458
  37. ## 14: 10 0.8221429 0.6390361
  38. ## 15: 1 0.6956522 0.3981439
  39. ## 16: 10 0.6545238 0.3098052
  40. ## 17: 1 0.4782609 0.0000000
  41. ## 18: 10 0.5340476 0.0000000

可视化结果

  1. library(ggplot2)
  2. autoplot(bmr) + theme_bw() +
  3. theme(axis.text.x = element_text(angle = 45,hjust = 1))

R语言机器学习mlr3:模型评价和比较 - 图8

上面的图给出了多个模型在不同数据集中的平均表现,我们也可以查看多个模型在某一个特定数据集中的表现:

  1. bmr_german <- bmr$clone(deep = T)$filter(task_ids = "german_credit",resampling_ids = "holdout")
  2. autoplot(bmr_german, type = "roc")

R语言机器学习mlr3:模型评价和比较 - 图9

当然也可以只提取其中一个结果:

  1. tab <- bmr$aggregate(measures)
  2. rr <- tab[task_id == "german_credit" & learner_id == "classif.ranger"]$resample_result[[1]]
  3. print(rr)
  4. ## <ResampleResult> of 1 iterations
  5. ## * Task: german_credit
  6. ## * Learner: classif.ranger
  7. ## * Warnings: 0 in 0 iterations
  8. ## * Errors: 0 in 0 iterations

查看一个结果的表现:

  1. rr$aggregate(msr("classif.auc"))
  2. ## classif.auc
  3. ## 0.8085969

合并多个BenchmarkResult,比如在2台电脑上做了2个不同的benchmarks,可以直接合并成一个更大的对象:

  1. task <- tsk("iris")
  2. resampling <- rsmp("holdout")$instantiate(task)
  3. rr1 <- resample(task, lrn("classif.rpart"), resampling)
  4. ## INFO [20:47:40.585] [mlr3] Applying learner 'classif.rpart' on task 'iris' (iter 1/1)
  5. rr2 <- resample(task, lrn("classif.featureless"), resampling)
  6. ## INFO [20:47:40.606] [mlr3] Applying learner 'classif.featureless' on task 'iris' (iter 1/1)
  7. # 通过以下代码合并结果
  8. bmr1 <- as_benchmark_result(rr1)
  9. bmr2 <- as_benchmark_result(rr2)
  10. bmr1$combine(bmr2)
  11. bmr1
  12. ## <BenchmarkResult> of 2 rows with 2 resampling runs
  13. ## nr task_id learner_id resampling_id iters warnings errors
  14. ## 1 iris classif.rpart holdout 1 0 0
  15. ## 2 iris classif.featureless holdout 1 0 0

获取更多R语言知识,请关注公众号:医学和生信笔记

医学和生信笔记 公众号主要分享:1.医学小知识、肛肠科小知识;2.R语言和Python相关的数据分析、可视化、机器学习等;3.生物信息学学习资料和自己的学习笔记!