Gelly提供了一系列可扩展的图形生成器。每台发电机都是
- 可并行化,以创建大型数据集
- 无标度,无论并行性如何都生成相同的图形
- 节俭,使用尽可能少的算子
使用构建器模式配置图生成器。可以通过调用显式设置生成器 算子的并行性setParallelism(parallelism)
。降低并行度将Reduce内存和网络缓冲区的分配。
必须首先调用特定于图形的配置,然后调用所有生成器通用的配置,最后调用generate()
。以下示例配置具有两个维度的网格图,配置并行度并生成图形。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
boolean wrapEndpoints = false;
int parallelism = 4;
Graph<LongValue, NullValue, NullValue> graph = new GridGraph(env)
.addDimension(2, wrapEndpoints)
.addDimension(4, wrapEndpoints)
.setParallelism(parallelism)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.GridGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
wrapEndpoints = false
val parallelism = 4
val graph = new GridGraph(env.getJavaEnv).addDimension(2, wrapEndpoints).addDimension(4, wrapEndpoints).setParallelism(parallelism).generate()
循环图
一个循环图是一个 有向图具有偏移的一个或多个连续的范围进行配置。边连接整数顶点ID,其差值等于配置的偏移量。没有偏移的循环图是空图,具有最大范围的图是 完整图。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
long vertexCount = 5;
Graph<LongValue, NullValue, NullValue> graph = new CirculantGraph(env, vertexCount)
.addRange(1, 2)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.CirculantGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val vertexCount = 5
val graph = new CirculantGraph(env.getJavaEnv, vertexCount).addRange(1, 2).generate()
完整的图表
连接每个不同顶点对的无向图。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
long vertexCount = 5;
Graph<LongValue, NullValue, NullValue> graph = new CompleteGraph(env, vertexCount)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.CompleteGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val vertexCount = 5
val graph = new CompleteGraph(env.getJavaEnv, vertexCount).generate()
0 1 2 3 4
循环图
一个无向图,其中边集通过将每个顶点连接到链式循环中的两个相邻顶点而形成单个循环。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
long vertexCount = 5;
Graph<LongValue, NullValue, NullValue> graph = new CycleGraph(env, vertexCount)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.CycleGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val vertexCount = 5
val graph = new CycleGraph(env.getJavaEnv, vertexCount).generate()
0 1 2 3 4
回声图
一个回波图形是一个 循环图与n
由在中心的偏移量的单一范围的宽度所定义的顶点n/2
。顶点连接到’far’顶点,顶点连接到’near’顶点,连接到’far’顶点,….
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
long vertexCount = 5;
long vertexDegree = 2;
Graph<LongValue, NullValue, NullValue> graph = new EchoGraph(env, vertexCount, vertexDegree)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.EchoGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val vertexCount = 5
val vertexDegree = 2
val graph = new EchoGraph(env.getJavaEnv, vertexCount, vertexDegree).generate()
空图
不包含边的图。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
long vertexCount = 5;
Graph<LongValue, NullValue, NullValue> graph = new EmptyGraph(env, vertexCount)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.EmptyGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val vertexCount = 5
val graph = new EmptyGraph(env.getJavaEnv, vertexCount).generate()
0 1 2 3 4
网格图
一个无向图,用于连接一个或多个维度中常规平铺中的顶点。每个维度都单独配置。当尺寸大小至少为3时,可选择通过设置连接端点wrapEndpoints
。更改下面的示例addDimension(4, true)
将连接0
到3
和4
到7
。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
boolean wrapEndpoints = false;
Graph<LongValue, NullValue, NullValue> graph = new GridGraph(env)
.addDimension(2, wrapEndpoints)
.addDimension(4, wrapEndpoints)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.GridGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val wrapEndpoints = false
val graph = new GridGraph(env.getJavaEnv).addDimension(2, wrapEndpoints).addDimension(4, wrapEndpoints).generate()
0 1 2 3 4 5 6 7
超立方体图
一个无向图,其中边形成一个n
三维超立方体。超立方体中的每个顶点连接到每个维度中的另一个顶点。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
long dimensions = 3;
Graph<LongValue, NullValue, NullValue> graph = new HypercubeGraph(env, dimensions)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.HypercubeGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val dimensions = 3
val graph = new HypercubeGraph(env.getJavaEnv, dimensions).generate()
0 1 2 3 4 5 6 7
路径图
一个无向图,其中边集形成一个单独的路径,通过连接两个endpoint
顶点与度1
和所有中点顶点与度 2
。可以通过从循环图中移除单个边来形成路径图。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
long vertexCount = 5
Graph<LongValue, NullValue, NullValue> graph = new PathGraph(env, vertexCount)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.PathGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val vertexCount = 5
val graph = new PathGraph(env.getJavaEnv, vertexCount).generate()
0 1 2 3 4
RMat图
使用递归矩阵(R-Mat)模型生成的有向幂律多图 。
RMat是一个随机生成器,配置有实现RandomGenerableFactory
接口的随机源 。提供的实现是JDKRandomGeneratorFactory
和MersenneTwisterFactory
。这些生成随机值的初始序列,然后将其用作生成边缘的种子。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
RandomGenerableFactory<JDKRandomGenerator> rnd = new JDKRandomGeneratorFactory();
int vertexCount = 1 << scale;
int edgeCount = edgeFactor * vertexCount;
Graph<LongValue, NullValue, NullValue> graph = new RMatGraph<>(env, rnd, vertexCount, edgeCount)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.RMatGraph
val env = ExecutionEnvironment.getExecutionEnvironment
val vertexCount = 1 << scale
val edgeCount = edgeFactor * vertexCount
val graph = new RMatGraph(env.getJavaEnv, rnd, vertexCount, edgeCount).generate()
可以覆盖默认的RMat常量,如以下示例所示。常量定义了每个生成的边的源标记和目标标签的位的相互依赖性。可以启用RMat噪声,并在生成每个边缘时逐渐扰乱常量。
RMat生成器可以配置为通过删除自循环和重复边来生成简单的图。对称化是通过“剪切和翻转”抛弃对角线上方的半矩阵或完全“翻转”保存并镜像所有边缘来进行的。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
RandomGenerableFactory<JDKRandomGenerator> rnd = new JDKRandomGeneratorFactory();
int vertexCount = 1 << scale;
int edgeCount = edgeFactor * vertexCount;
boolean clipAndFlip = false;
Graph<LongValue, NullValue, NullValue> graph = new RMatGraph<>(env, rnd, vertexCount, edgeCount)
.setConstants(0.57f, 0.19f, 0.19f)
.setNoise(true, 0.10f)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.RMatGraph
val env = ExecutionEnvironment.getExecutionEnvironment
val vertexCount = 1 << scale
val edgeCount = edgeFactor * vertexCount
clipAndFlip = false
val graph = new RMatGraph(env.getJavaEnv, rnd, vertexCount, edgeCount).setConstants(0.57f, 0.19f, 0.19f).setNoise(true, 0.10f).generate()
Singleton边缘图
包含孤立的双路径的无向图,其中每个顶点都具有度 1
。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
long vertexPairCount = 4
// note: configured with the number of vertex pairs
Graph<LongValue, NullValue, NullValue> graph = new SingletonEdgeGraph(env, vertexPairCount)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.SingletonEdgeGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val vertexPairCount = 4
// note: configured with the number of vertex pairs val graph = new SingletonEdgeGraph(env.getJavaEnv, vertexPairCount).generate()
0 1 2 3 4 5 6 7
星图
一个无向图,包含连接到所有其他叶顶点的单个中心顶点。
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
long vertexCount = 6;
Graph<LongValue, NullValue, NullValue> graph = new StarGraph(env, vertexCount)
.generate();
import org.apache.flink.api.scala._
import org.apache.flink.graph.generator.StarGraph
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment
val vertexCount = 6
val graph = new StarGraph(env.getJavaEnv, vertexCount).generate()
0 1 2 3 4 5