Graph 生成器

Gelly 提供了一组可扩展的图生成器。每个生成器都是:

  • 并行的, 用于创建大型数据集。
  • 自由扩展的, 用于生成并行度无关的同样的图。
  • 简洁的,使用了尽可能少的操作。

图生成器使用Builder模式进行配置,可以通过调用setParallelism(parallelism)设置并行度。减少 并行度可以降低内存和网络缓冲区的使用。

特定的图配置必须首先被调用,该配置对所有的图生成器都是通用的,最后才会调用generate()。 接下来的例子使用两个维度配置了网格图,配置了并行度并生成了图。

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. boolean wrapEndpoints = false;
  3. int parallelism = 4;
  4. Graph<LongValue,NullValue,NullValue> graph = new GridGraph(env)
  5. .addDimension(2, wrapEndpoints)
  6. .addDimension(4, wrapEndpoints)
  7. .setParallelism(parallelism)
  8. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.GridGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. wrapEndpoints = false
  5. val parallelism = 4
  6. val graph = new GridGraph(env.getJavaEnv).addDimension(2, wrapEndpoints).addDimension(4, wrapEndpoints).setParallelism(parallelism).generate()

完全图

连接所有不同顶点对的无向图。

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. long vertexCount = 5;
  3. Graph<LongValue,NullValue,NullValue> graph = new CompleteGraph(env, vertexCount)
  4. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.CompleteGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val vertexCount = 5
  5. val graph = new CompleteGraph(env.getJavaEnv, vertexCount).generate()
0 1 2 3 4

环图

所有的边形成一个环的无向图。

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. long vertexCount = 5;
  3. Graph<LongValue,NullValue,NullValue> graph = new CycleGraph(env, vertexCount)
  4. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.CycleGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val vertexCount = 5
  5. val graph = new CycleGraph(env.getJavaEnv, vertexCount).generate()
0 1 2 3 4

空图

不存在边的图。

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. long vertexCount = 5;
  3. Graph<LongValue,NullValue,NullValue> graph = new EmptyGraph(env, vertexCount)
  4. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.EmptyGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val vertexCount = 5
  5. val graph = new EmptyGraph(env.getJavaEnv, vertexCount).generate()
0 1 2 3 4

网格图

一种点在一到多个维度正常平铺的无向图。每个维度都是独立配置的。当维度大小多于3时,每个维度的端点 可以通过设置wrapEndpoints连接起来,那么下边例子的addDimension(4, true)将会连接03 以及47

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. boolean wrapEndpoints = false;
  3. Graph<LongValue,NullValue,NullValue> graph = new GridGraph(env)
  4. .addDimension(2, wrapEndpoints)
  5. .addDimension(4, wrapEndpoints)
  6. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.GridGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val wrapEndpoints = false
  5. val graph = new GridGraph(env.getJavaEnv).addDimension(2, wrapEndpoints).addDimension(4, wrapEndpoints).generate()
0 1 2 3 4 5 6 7

超立方体图

所有的边形成N维超立方体的无向图。超立方体内的每个顶点和同维度的其他顶点连接。

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. long dimensions = 3;
  3. Graph<LongValue,NullValue,NullValue> graph = new HypercubeGraph(env, dimensions)
  4. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.HypercubeGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val dimensions = 3
  5. val graph = new HypercubeGraph(env.getJavaEnv, dimensions).generate()
0 1 2 3 4 5 6 7

路径图

所有的边形成了一条路径的无向图。

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. long vertexCount = 5
  3. Graph<LongValue,NullValue,NullValue> graph = new PathGraph(env, vertexCount)
  4. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.PathGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val vertexCount = 5
  5. val graph = new PathGraph(env.getJavaEnv, vertexCount).generate()
0 1 2 3 4

RMat图

使用Recursive Matrix (R-Mat)模型 生成的有向或者无向幂图。

RMat是一个使用实现RandomGenerableFactory接口的随机源配置的随机生成器,JDKRandomGeneratorFactoryMersenneTwisterFactory实现了该接口。它产生了一个用于生成边的随机种子的随机初始序列。

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. RandomGenerableFactory<JDKRandomGenerator> rnd = new JDKRandomGeneratorFactory();
  3. int vertexCount = 1 << scale;
  4. int edgeCount = edgeFactor * vertexCount;
  5. Graph<LongValue,NullValue,NullValue> graph = new RMatGraph<>(env, rnd, vertexCount, edgeCount)
  6. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.RMatGraph
  3. val env = ExecutionEnvironment.getExecutionEnvironment
  4. val vertexCount = 1 << scale
  5. val edgeCount = edgeFactor * vertexCount
  6. val graph = new RMatGraph(env.getJavaEnv, rnd, vertexCount, edgeCount).generate()

单边图

包含独立的双路径的无向图。

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. long vertexPairCount = 4
  3. // note: configured with the number of vertex pairs
  4. Graph<LongValue,NullValue,NullValue> graph = new SingletonEdgeGraph(env, vertexPairCount)
  5. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.SingletonEdgeGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val vertexPairCount = 4
  5. // note: configured with the number of vertex pairs
  6. val graph = new SingletonEdgeGraph(env.getJavaEnv, vertexPairCount).generate()
0 1 2 3 4 5 6 7

星图

包含一个连接到所有其他叶子顶点的中心顶点的无向图。

  1. ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
  2. long vertexCount = 6;
  3. Graph<LongValue,NullValue,NullValue> graph = new StarGraph(env, vertexCount)
  4. .generate();
  1. import org.apache.flink.api.scala._
  2. import org.apache.flink.graph.generator.StarGraph
  3. val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
  4. val vertexCount = 6
  5. val graph = new StarGraph(env.getJavaEnv, vertexCount).generate()
0 1 2 3 4 5