Java 类名:com.alibaba.alink.operator.batch.feature.VectorChiSqSelectorBatchOp
Python 类名:VectorChiSqSelectorBatchOp

功能介绍

针对vector数据,进行特征筛选

参数说明

名称 中文名称 描述 类型 是否必须? 取值范围 默认值
labelCol 标签列名 输入表中的标签列名 String
selectedCol 选中的列名 计算列对应的列名 String 所选列类型为 [DENSE_VECTOR, SPARSE_VECTOR, STRING, VECTOR]
fdr 发现阈值 发现阈值, 默认值0.05 Double 0.05
fpr p value的阈值 p value的阈值,默认值0.05 Double 0.05
fwe 错误率阈值 错误率阈值, 默认值0.05 Double 0.05
numTopFeatures 最大的p-value列个数 最大的p-value列个数, 默认值50 Integer 50
percentile 筛选的百分比 筛选的百分比,默认值0.1 Double 0.1
selectorType 筛选类型 筛选类型,包含”NumTopFeatures”,”percentile”, “fpr”, “fdr”, “fwe”五种。 String “NumTopFeatures”, “PERCENTILE”, “FPR”, “FDR”, “FWE” “NumTopFeatures”

代码示例

以下代码仅用于示意,可能需要修改部分代码或者配置环境后才能正常运行!

Python 代码

无python接口

Java 代码

  1. package javatest.com.alibaba.alink.batch.feature;
  2. import org.apache.flink.types.Row;
  3. import com.alibaba.alink.operator.batch.BatchOperator;
  4. import com.alibaba.alink.operator.batch.feature.VectorChiSqSelectorBatchOp;
  5. import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
  6. import org.junit.Test;
  7. import java.util.Arrays;
  8. public class VectorChiSqSelectorBatchOpTest {
  9. @Test
  10. public void testVectorChiSqSelectorBatchOp() throws Exception {
  11. Row[] testArray = new Row[] {
  12. Row.of(7, "0.0 0.0 18.0 1.0", 1.0),
  13. Row.of(8, "0.0 1.0 12.0 0.0", 0.0),
  14. Row.of(9, "1.0 0.0 15.0 0.1", 0.0),
  15. };
  16. String[] colNames = new String[] {"id", "features", "clicked"};
  17. MemSourceBatchOp source = new MemSourceBatchOp(Arrays.asList(testArray), colNames);
  18. VectorChiSqSelectorBatchOp test = new VectorChiSqSelectorBatchOp()
  19. .setSelectedCol("features")
  20. .setLabelCol("clicked");
  21. test.linkFrom(source);
  22. test.lazyPrintModelInfo();
  23. BatchOperator.execute();
  24. }
  25. }

运行结果

  1. ------------------------- ChisqSelectorModelInfo -------------------------
  2. Number of Selector Features: 4
  3. Number of Features: 4
  4. Type of Selector: NumTopFeatures
  5. Number of Top Features: 50
  6. Selector Indices:
  7. |VectorIndex|ChiSquare|PValue| DF|Selected|
  8. |-----------|---------|------|---|--------|
  9. | 3| 3|0.2231| 2| true|
  10. | 2| 3|0.2231| 2| true|
  11. | 0| 0.75|0.3865| 1| true|
  12. | 1| 0.75|0.3865| 1| true|