Java 类名:com.alibaba.alink.operator.batch.dataproc.WeightSampleBatchOp
Python 类名:WeightSampleBatchOp

功能介绍

  • 本算子是按照数据点的权重对数据按照比例进行加权采样,权重越大的数据点被采样的可能性越大。

    参数说明

    | 名称 | 中文名称 | 描述 | 类型 | 是否必须? | 取值范围 | 默认值 | | —- | —- | —- | —- | —- | —- | —- | | ratio | 采样比例 | 采样率,范围为[0, 1] | Double | ✓ | [0.0, 1.0] | | | weightCol | 权重列名 | 权重列对应的列名 | String | ✓ | 所选列类型为 [BIGDECIMAL, BIGINTEGER, BYTE, DOUBLE, FLOAT, INTEGER, LONG, SHORT] | | | withReplacement | 是否放回 | 是否有放回的采样,默认不放回 | Boolean | | | false |

代码示例

Python 代码

  1. from pyalink.alink import *
  2. import pandas as pd
  3. useLocalEnv(1)
  4. df = pd.DataFrame([
  5. ["a", 1.3, 1.1],
  6. ["b", 2.5, 0.9],
  7. ["c", 100.2, -0.01],
  8. ["d", 99.9, 100.9],
  9. ["e", 1.4, 1.1],
  10. ["f", 2.2, 0.9],
  11. ["g", 100.9, -0.01],
  12. ["j", 99.5, 100.9],
  13. ])
  14. # batch source
  15. inOp = BatchOperator.fromDataframe(df, schemaStr='id string, weight double, value double')
  16. sampleOp = WeightSampleBatchOp() \
  17. .setWeightCol("weight") \
  18. .setRatio(0.5) \
  19. .setWithReplacement(False)
  20. inOp.link(sampleOp).print()

Java 代码

  1. import org.apache.flink.types.Row;
  2. import com.alibaba.alink.operator.batch.BatchOperator;
  3. import com.alibaba.alink.operator.batch.dataproc.WeightSampleBatchOp;
  4. import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
  5. import org.junit.Test;
  6. import java.util.Arrays;
  7. import java.util.List;
  8. public class WeightSampleBatchOpTest {
  9. @Test
  10. public void testWeightSampleBatchOp() throws Exception {
  11. List <Row> df = Arrays.asList(
  12. Row.of("a", 1.3, 1.1),
  13. Row.of("b", 2.5, 0.9),
  14. Row.of("c", 100.2, -0.01),
  15. Row.of("d", 99.9, 100.9),
  16. Row.of("e", 1.4, 1.1),
  17. Row.of("f", 2.2, 0.9),
  18. Row.of("g", 100.9, -0.01),
  19. Row.of("j", 99.5, 100.9)
  20. );
  21. BatchOperator <?> inOp = new MemSourceBatchOp(df, "id string, weight double, value double");
  22. BatchOperator <?> sampleOp = new WeightSampleBatchOp()
  23. .setWeightCol("weight")
  24. .setRatio(0.5)
  25. .setWithReplacement(false);
  26. inOp.link(sampleOp).print();
  27. }
  28. }

结果

| id | weight | value | | —- | —- | —- |

| g | 100.9000 | -0.0100 |

| d | 99.9000 | 100.9000 |

| c | 100.2000 | -0.0100 |

| j | 99.5000 | 100.9000 |