Java 类名:com.alibaba.alink.operator.batch.dataproc.AppendIdBatchOp
Python 类名:AppendIdBatchOp

功能介绍

将表附加ID列

参数说明

名称 中文名称 描述 类型 是否必须? 取值范围 默认值
appendType append类型 append类型,”UNIQUE”和”DENSE”,分别为稀疏和稠密,稀疏的为非连续唯一id,稠密的为连续唯一id String “DENSE”, “UNIQUE” “DENSE”
idCol ID列名 ID列名 String “append_id”

代码示例

Python 代码

  1. from pyalink.alink import *
  2. import pandas as pd
  3. useLocalEnv(1)
  4. df = pd.DataFrame([
  5. [1.0, "A", 0, 0, 0],
  6. [2.0, "B", 1, 1, 0],
  7. [3.0, "C", 2, 2, 1],
  8. [4.0, "D", 3, 3, 1]
  9. ])
  10. inOp = BatchOperator.fromDataframe(df, schemaStr='f0 double,f1 string,f2 int,f3 int,label int')
  11. AppendIdBatchOp()\
  12. .setIdCol("append_id")\
  13. .linkFrom(inOp)\
  14. .print()

Java 代码

  1. import org.apache.flink.types.Row;
  2. import com.alibaba.alink.operator.batch.BatchOperator;
  3. import com.alibaba.alink.operator.batch.dataproc.AppendIdBatchOp;
  4. import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
  5. import org.junit.Test;
  6. import java.util.Arrays;
  7. import java.util.List;
  8. public class AppendIdBatchOpTest {
  9. @Test
  10. public void testAppendIdBatchOp() throws Exception {
  11. List <Row> df = Arrays.asList(
  12. Row.of(1.0, "A", 0, 0, 0),
  13. Row.of(2.0, "B", 1, 1, 0),
  14. Row.of(3.0, "C", 2, 2, 1),
  15. Row.of(4.0, "D", 3, 3, 1)
  16. );
  17. BatchOperator <?> inOp = new MemSourceBatchOp(df, "f0 double,f1 string,f2 int,f3 int,label int");
  18. new AppendIdBatchOp()
  19. .setIdCol("append_id")
  20. .linkFrom(inOp)
  21. .print();
  22. }
  23. }

运行结果

| f0 | f1 | f2 | f3 | label | append_id | | —- | —- | —- | —- | —- | —- |

| 1.0000 | A | 0 | 0 | 0 | 0 |

| 2.0000 | B | 1 | 1 | 0 | 1 |

| 3.0000 | C | 2 | 2 | 1 | 2 |

| 4.0000 | D | 3 | 3 | 1 | 3 |