Java 类名:com.alibaba.alink.operator.batch.evaluation.EvalBinaryClassBatchOp
Python 类名:EvalBinaryClassBatchOp

功能介绍

二分类评估是对二分类算法的预测结果进行效果评估。
支持Roc曲线,LiftChart曲线,K-S曲线,Recall-Precision曲线绘制。
流式的实验支持累计统计和窗口统计,除却上述四条曲线外,还给出Auc/Kappa/Accuracy/Logloss随时间的变化曲线。
给出整体的评估指标包括:AUC、K-S、PRC, 不同阈值下的Precision、Recall、F-Measure、Sensitivity、Accuracy、Specificity和Kappa。

混淆矩阵

二分类评估 (EvalBinaryClassBatchOp) - 图1#### Roc曲线 横坐标:FPR
纵坐标:TPR

AUC

Roc曲线下面的面积

K-S

横坐标:阈值
纵坐标:TPR和FPR

KS

K-S曲线两条纵轴的最大差值

Recall-Precision曲线

横坐标:Recall
纵坐标:Precision

PRC

Recall-Precision曲线下面的面积

提升曲线

横坐标: \dfrac{TP + FP}{total}
纵坐标:TP

Precision

Recall

F-Measure

Sensitivity

Accuracy

Specificity

Kappa

Logloss

参数说明

| 名称 | 中文名称 | 描述 | 类型 | 是否必须? | 取值范围 | 默认值 | | —- | —- | —- | —- | —- | —- | —- |

| labelCol | 标签列名 | 输入表中的标签列名 | String | ✓ | | |

| predictionDetailCol | 预测详细信息列名 | 预测详细信息列名 | String | ✓ | 所选列类型为 [STRING] | |

| positiveLabelValueString | 正样本 | 正样本对应的字符串格式。 | String | | | null |

代码示例

Python 代码

  1. from pyalink.alink import *
  2. import pandas as pd
  3. useLocalEnv(1)
  4. df = pd.DataFrame([
  5. ["prefix1", "{\"prefix1\": 0.9, \"prefix0\": 0.1}"],
  6. ["prefix1", "{\"prefix1\": 0.8, \"prefix0\": 0.2}"],
  7. ["prefix1", "{\"prefix1\": 0.7, \"prefix0\": 0.3}"],
  8. ["prefix0", "{\"prefix1\": 0.75, \"prefix0\": 0.25}"],
  9. ["prefix0", "{\"prefix1\": 0.6, \"prefix0\": 0.4}"]
  10. ])
  11. inOp = BatchOperator.fromDataframe(df, schemaStr='label string, detailInput string')
  12. metrics = EvalBinaryClassBatchOp().setLabelCol("label").setPredictionDetailCol("detailInput").linkFrom(inOp).collectMetrics()
  13. print("AUC:", metrics.getAuc())
  14. print("KS:", metrics.getKs())
  15. print("PRC:", metrics.getPrc())
  16. print("Accuracy:", metrics.getAccuracy())
  17. print("Macro Precision:", metrics.getMacroPrecision())
  18. print("Micro Recall:", metrics.getMicroRecall())
  19. print("Weighted Sensitivity:", metrics.getWeightedSensitivity())

Java 代码

  1. import org.apache.flink.types.Row;
  2. import com.alibaba.alink.operator.batch.BatchOperator;
  3. import com.alibaba.alink.operator.batch.evaluation.EvalBinaryClassBatchOp;
  4. import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
  5. import com.alibaba.alink.operator.common.evaluation.BinaryClassMetrics;
  6. import org.junit.Test;
  7. import java.util.Arrays;
  8. import java.util.List;
  9. public class EvalBinaryClassBatchOpTest {
  10. @Test
  11. public void testEvalBinaryClassBatchOp() throws Exception {
  12. List <Row> df = Arrays.asList(
  13. Row.of("prefix1", "{\"prefix1\": 0.9, \"prefix0\": 0.1}"),
  14. Row.of("prefix1", "{\"prefix1\": 0.8, \"prefix0\": 0.2}"),
  15. Row.of("prefix1", "{\"prefix1\": 0.7, \"prefix0\": 0.3}"),
  16. Row.of("prefix0", "{\"prefix1\": 0.75, \"prefix0\": 0.25}"),
  17. Row.of("prefix0", "{\"prefix1\": 0.6, \"prefix0\": 0.4}")
  18. );
  19. BatchOperator <?> inOp = new MemSourceBatchOp(df, "label string, detailInput string");
  20. BinaryClassMetrics metrics = new EvalBinaryClassBatchOp().setLabelCol("label").setPredictionDetailCol(
  21. "detailInput").linkFrom(inOp).collectMetrics();
  22. System.out.println("AUC:" + metrics.getAuc());
  23. System.out.println("KS:" + metrics.getKs());
  24. System.out.println("PRC:" + metrics.getPrc());
  25. System.out.println("Accuracy:" + metrics.getAccuracy());
  26. System.out.println("Macro Precision:" + metrics.getMacroPrecision());
  27. System.out.println("Micro Recall:" + metrics.getMicroRecall());
  28. System.out.println("Weighted Sensitivity:" + metrics.getWeightedSensitivity());
  29. }
  30. }

运行结果

  1. AUC: 0.8333333333333334
  2. KS: 0.6666666666666666
  3. PRC: 0.9027777777777777
  4. Accuracy: 0.6
  5. Macro Precision: 0.8
  6. Micro Recall: 0.6
  7. Weighted Sensitivity: 0.6