Java 类名:com.alibaba.alink.operator.batch.graph.LineBatchOp
Python 类名:LineBatchOp

功能介绍

line算法(Large-scale Information Network Embedding)是一种graph embedding算法,可以将网络中的每一个点表示成连续特征空间中的一个点向量。
line算法有一阶相似度和二阶相似度两种描述方法,一阶相似度用于描述图中成对顶点之间的局部相似度,二阶相似度以两个顶点的邻域特征来描述顶点的相似度。
line算法的论文:LINE: Large-scale Information Network Embedding

参数说明

| 名称 | 中文名称 | 描述 | 类型 | 是否必须? | 取值范围 | 默认值 | | —- | —- | —- | —- | —- | —- | —- |

| sourceCol | 起始点列名 | 用来指定起始点列 | String | ✓ | 所选列类型为 [INTEGER, LONG, STRING] | |

| targetCol | 中止点点列名 | 用来指定中止点列 | String | ✓ | 所选列类型为 [INTEGER, LONG, STRING] | |

| batchSize | batch大小 | batch大小, 按行计算 | Integer | | [1, +inf) | |

| isToUndigraph | 是否转无向图 | 选为true时,会将当前图转成无向图,然后再游走 | Boolean | | | false |

| maxIter | 最大迭代步数 | 最大迭代步数,默认为 100 | Integer | | [1, +inf) | 100 |

| minRhoRate | 最小学习率的比例 | 最小学习率的比例 | Double | | [0.0, 1.0] | 0.001 |

| negative | 负采样大小 | 负采样大小 | Integer | | | 5 |

| order | 阶数 | 选择一阶优化或是二阶优化 | String | | “FirstOrder”, “SecondOrder” | “FirstOrder” |

| rho | 学习率 | 学习率 | Double | | [0.0, +inf) | 0.025 |

| sampleRatioPerPartition | 采样率 | 每轮迭代在每个partition上采样样本的比率 | Double | | [0.0, +inf) | 1.0 |

| vectorSize | embedding的向量长度 | embedding的向量长度 | Integer | | [1, +inf) | 100 |

| weightCol | 权重列名 | 权重列对应的列名 | String | | 所选列类型为 [BIGDECIMAL, BIGINTEGER, BYTE, DOUBLE, FLOAT, INTEGER, LONG, SHORT] | null |

| numThreads | 组件多线程线程个数 | 组件多线程线程个数 | Integer | | | 1 |

代码示例

Python 代码

  1. from pyalink.alink import *
  2. import pandas as pd
  3. useLocalEnv(1)
  4. df = pd.DataFrame([\
  5. ["1L", "5L", 1.],\
  6. ["2L", "5L", 1.],\
  7. ["3L", "5L", 1.],\
  8. ["4L", "5L", 1.],\
  9. ["1L", "6L", 1.],\
  10. ["2L", "6L", 1.],\
  11. ["3L", "6L", 1.],\
  12. ["4L", "6L", 1.],\
  13. ["7L", "6L", 15.],\
  14. ["7L", "8L", 1.],\
  15. ["7L", "9L", 1.],\
  16. ["7L", "10L", 1.]])
  17. data = BatchOperator.fromDataframe(df, schemaStr="source string, target string, weight double")
  18. line = LineBatchOp()\
  19. .setOrder("firstorder")\
  20. .setRho(.025)\
  21. .setVectorSize(5)\
  22. .setNegative(5)\
  23. .setIsToUndigraph(False)\
  24. .setMaxIter(20)\
  25. .setSampleRatioPerPartition(2.)\
  26. .setSourceCol("source")\
  27. .setTargetCol("target")\
  28. .setWeightCol("weight")
  29. line.linkFrom(data).print()

Java 代码

  1. import org.apache.flink.types.Row;
  2. import com.alibaba.alink.operator.batch.BatchOperator;
  3. import com.alibaba.alink.operator.batch.graph.LineBatchOp;
  4. import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
  5. import org.junit.Test;
  6. import java.util.Arrays;
  7. import java.util.List;
  8. public class LineBatchOpTest {
  9. @Test
  10. public void testLineBatchOp() throws Exception {
  11. List <Row> df = Arrays.asList(
  12. Row.of("1L", "5L", 1.),
  13. Row.of("2L", "5L", 1.),
  14. Row.of("3L", "5L", 1.),
  15. Row.of("4L", "5L", 1.),
  16. Row.of("1L", "6L", 1.),
  17. Row.of("2L", "6L", 1.),
  18. Row.of("3L", "6L", 1.),
  19. Row.of("4L", "6L", 1.),
  20. Row.of("7L", "6L", 15.),
  21. Row.of("7L", "8L", 1.),
  22. Row.of("7L", "9L", 1.));
  23. BatchOperator <?> data = new MemSourceBatchOp(df, "source string, target string, weight double");
  24. BatchOperator <?> line = new LineBatchOp()
  25. .setOrder("firstorder")
  26. .setRho(.025)
  27. .setVectorSize(5)
  28. .setNegative(5)
  29. .setIsToUndigraph(false)
  30. .setMaxIter(20)
  31. .setSampleRatioPerPartition(2.)
  32. .setSourceCol("source")
  33. .setTargetCol("target")
  34. .setWeightCol("weight");
  35. line.linkFrom(data).print();
  36. }
  37. }

运行结果

| vertexId | vertexVector | | —- | —- |

| 10L | 0.21389429778837246,0.1911353696863806,0.1316112606087454,-0.15504651922643958,0.9361386397244865 |

| 1L | 0.43122351608756165,0.29783837576159716,-0.6242134421932172,-0.5699927640850769,0.10394425704541377 |

| 2L | 0.46132608248237156,0.35541855098613856,-0.5135636636216717,-0.6265741465013136,0.06717962176150442 |

| 3L | 0.22368818387133887,0.36608001332291756,-0.8030241335180479,-0.34202409002367046,0.23263873940705357 |

| 4L | 0.3570337202077478,0.7126398702420819,-0.4696611127705199,-0.23265871937843668,-0.2999328215189313 |

| 5L | 0.45792127623648854,0.47070594164899743,-0.56933846250774,-0.47612043020847256,0.13381730946965611 |

| 6L | -0.5015833272182806,-0.4481268255333818,0.4612455253782732,0.43140895120924494,-0.385662282611449 |

| 7L | -0.3152196336701024,-0.4607786197664082,0.6574313951991989,0.48164878283999957,0.15529989282105744 |

| 8L | -0.42030975318452385,0.05099491454249831,0.4269511935747453,-0.2071107702180848,0.7717234201665388 |

| 9L | -0.22769540933018892,-0.5154933470780315,0.5261821838436239,0.5103449434077099,0.3809222464388005 |