Java 类名:com.alibaba.alink.pipeline.dataproc.vector.VectorFunction
Python 类名:VectorFunction

功能介绍

  • 获取一个向量的最大值、最小值,或者最大值、最小值的索引,或者对向量做尺度变换, 求NormL2, 求NormL1, 求NormL2Square, Normalize。
  • 支持稀疏和稠密两种 Vector。

    参数说明

名称 中文名称 描述 类型 是否必须? 取值范围 默认值
funcName 函数名字 函数操作名称, 可取max(最大值), min(最小值), argMax(最大值索引), argMin(最小值索引), scale(尺度变换), NormL2, NormL1, NormL2Square, Normalize String “Max”, “Min”, “ArgMax”, “ArgMin”, “Scale”, “NormL2”, “NormL1”, “NormL2Square”, “Normalize”
selectedCol 选中的列名 计算列对应的列名 String
WithVariable Not available! Not available! String
outputCol 输出结果列 输出结果列列名,可选,默认null String null
reservedCols 算法保留列名 算法保留列 String[] null
numThreads 组件多线程线程个数 组件多线程线程个数 Integer 1

代码示例

Python 代码

  1. from pyalink.alink import *
  2. import pandas as pd
  3. useLocalEnv(1)
  4. df = pd.DataFrame([
  5. [1,"16.3, 1.1, 1.1"],
  6. [2,"16.8, 1.4, 1.5"],
  7. [3,"19.2, 1.7, 1.8"],
  8. [4,"10.0, 1.7, 1.7"],
  9. [5,"19.5, 1.8, 1.9"],
  10. [6,"20.9, 1.8, 1.8"],
  11. [7,"21.1, 1.9, 1.8"],
  12. [8,"20.9, 2.0, 2.1"],
  13. [9,"20.3, 2.3, 2.4"],
  14. [10,"22.0, 2.4, 2.5"]
  15. ])
  16. opData = BatchOperator.fromDataframe(df, schemaStr="id bigint, vec string")
  17. result = VectorFunction().setSelectedCol("vec")\
  18. .setOutputCol("out").setFuncName("max").transform(opData)
  19. result.collectToDataframe()

Java 代码

  1. import org.apache.flink.types.Row;
  2. import com.alibaba.alink.pipeline.dataproc.vector.VectorFunction;
  3. import com.alibaba.alink.operator.stream.BatchOperator;
  4. import com.alibaba.alink.operator.stream.source.MemSourceBatchOp;
  5. import com.alibaba.alink.testutil.AlinkTestBase;
  6. import org.junit.Test;
  7. import java.util.ArrayList;
  8. import java.util.List;
  9. public class VectorFunctionTest extends AlinkTestBase {
  10. @Test
  11. public void testVectorFunction() throws Exception {
  12. List <Row> df = new ArrayList <>();
  13. df.add(Row.of(1, "16.3, 1.1, 1.1"));
  14. df.add(Row.of(2, "16.8, 1.4, 1.5"));
  15. df.add(Row.of(3, "19.2, 1.7, 1.8"));
  16. df.add(Row.of(4, "10.0, 1.7, 1.7"));
  17. df.add(Row.of(5, "19.5, 1.8, 1.9"));
  18. df.add(Row.of(6, "20.9, 1.8, 1.8"));
  19. df.add(Row.of(7, "21.1, 1.9, 1.8"));
  20. df.add(Row.of(8, "20.9, 2.0, 2.1"));
  21. df.add(Row.of(9, "20.3, 2.3, 2.4"));
  22. df.add(Row.of(10, "22.0, 2.4, 2.5"));
  23. BatchOperator<?> streamData = new MemSourceBatchOp(df, "id int, vec string");
  24. new VectorFunction().setSelectedCol("vec")
  25. .setOutputCol("out").setFuncName("max").transform(streamData).print();
  26. }
  27. }

运行结果

| id | vec | out | | —- | —- | —- |

| 1 | 16.3, 1.1, 1.1 | 16.3 |

| 2 | 16.8, 1.4, 1.5 | 16.8 |

| 3 | 19.2, 1.7, 1.8 | 19.2 |

| 4 | 10.0, 1.7, 1.7 | 10.0 |

| 5 | 19.5, 1.8, 1.9 | 19.5 |

| 6 | 20.9, 1.8, 1.8 | 20.9 |

| 7 | 21.1, 1.9, 1.8 | 21.1 |

| 8 | 20.9, 2.0, 2.1 | 20.9 |

| 9 | 20.3, 2.3, 2.4 | 20.3 |

| 10 | 22.0, 2.4, 2.5 | 22.0 |