Graph 生成器
Gelly 提供了一组可扩展的图生成器。每个生成器都是:
- 并行的, 用于创建大型数据集。
- 自由扩展的, 用于生成并行度无关的同样的图。
- 简洁的,使用了尽可能少的操作。
图生成器使用Builder模式进行配置,可以通过调用setParallelism(parallelism)
设置并行度。减少
并行度可以降低内存和网络缓冲区的使用。
特定的图配置必须首先被调用,该配置对所有的图生成器都是通用的,最后才会调用generate()
。
接下来的例子使用两个维度配置了网格图,配置了并行度并生成了图。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
boolean wrapEndpoints = false;
int parallelism = 4;
Graph<LongValue,NullValue,NullValue> graph = new GridGraph(env)
.addDimension(2, wrapEndpoints)
.addDimension(4, wrapEndpoints)
.setParallelism(parallelism)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.GridGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
wrapEndpoints = false
val parallelism = 4
val graph = new GridGraph(env.getJavaEnv).addDimension(2, wrapEndpoints).addDimension(4, wrapEndpoints).setParallelism(parallelism).generate()
完全图
连接所有不同顶点对的无向图。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
long vertexCount = 5;
Graph<LongValue,NullValue,NullValue> graph = new CompleteGraph(env, vertexCount)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.CompleteGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val vertexCount = 5
val graph = new CompleteGraph(env.getJavaEnv, vertexCount).generate()
环图
所有的边形成一个环的无向图。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
long vertexCount = 5;
Graph<LongValue,NullValue,NullValue> graph = new CycleGraph(env, vertexCount)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.CycleGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val vertexCount = 5
val graph = new CycleGraph(env.getJavaEnv, vertexCount).generate()
空图
不存在边的图。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
long vertexCount = 5;
Graph<LongValue,NullValue,NullValue> graph = new EmptyGraph(env, vertexCount)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.EmptyGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val vertexCount = 5
val graph = new EmptyGraph(env.getJavaEnv, vertexCount).generate()
网格图
一种点在一到多个维度正常平铺的无向图。每个维度都是独立配置的。当维度大小多于3时,每个维度的端点
可以通过设置wrapEndpoints
连接起来,那么下边例子的addDimension(4, true)
将会连接0
和3
以及4
和7
。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
boolean wrapEndpoints = false;
Graph<LongValue,NullValue,NullValue> graph = new GridGraph(env)
.addDimension(2, wrapEndpoints)
.addDimension(4, wrapEndpoints)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.GridGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val wrapEndpoints = false
val graph = new GridGraph(env.getJavaEnv).addDimension(2, wrapEndpoints).addDimension(4, wrapEndpoints).generate()
超立方体图
所有的边形成N维超立方体的无向图。超立方体内的每个顶点和同维度的其他顶点连接。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
long dimensions = 3;
Graph<LongValue,NullValue,NullValue> graph = new HypercubeGraph(env, dimensions)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.HypercubeGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val dimensions = 3
val graph = new HypercubeGraph(env.getJavaEnv, dimensions).generate()
路径图
所有的边形成了一条路径的无向图。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
long vertexCount = 5
Graph<LongValue,NullValue,NullValue> graph = new PathGraph(env, vertexCount)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.PathGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val vertexCount = 5
val graph = new PathGraph(env.getJavaEnv, vertexCount).generate()
RMat图
使用Recursive Matrix (R-Mat)模型 生成的有向或者无向幂图。
RMat是一个使用实现RandomGenerableFactory
接口的随机源配置的随机生成器,JDKRandomGeneratorFactory
和MersenneTwisterFactory
实现了该接口。它产生了一个用于生成边的随机种子的随机初始序列。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
RandomGenerableFactory<JDKRandomGenerator> rnd = new JDKRandomGeneratorFactory();
int vertexCount = 1 << scale;
int edgeCount = edgeFactor * vertexCount;
Graph<LongValue,NullValue,NullValue> graph = new RMatGraph<>(env, rnd, vertexCount, edgeCount)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.RMatGraph
val env = ExecutionEnvironment.getExecutionEnvironment
val vertexCount = 1 << scale
val edgeCount = edgeFactor * vertexCount
val graph = new RMatGraph(env.getJavaEnv, rnd, vertexCount, edgeCount).generate()
单边图
包含独立的双路径的无向图。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
long vertexPairCount = 4
// note: configured with the number of vertex pairs
Graph<LongValue,NullValue,NullValue> graph = new SingletonEdgeGraph(env, vertexPairCount)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.SingletonEdgeGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val vertexPairCount = 4
// note: configured with the number of vertex pairs
val graph = new SingletonEdgeGraph(env.getJavaEnv, vertexPairCount).generate()
星图
包含一个连接到所有其他叶子顶点的中心顶点的无向图。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
long vertexCount = 6;
Graph<LongValue,NullValue,NullValue> graph = new StarGraph(env, vertexCount)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.StarGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val vertexCount = 6
val graph = new StarGraph(env.getJavaEnv, vertexCount).generate()