基本概念

= JanusGraph 的基本概念

配置

== 配置

一个JanusGraph图数据库的集群是由一个或者多个JanusGraph实例所组成的。为了打开一个JanusGraph实例,需要提供一个配置去指定如何设置JanusGraph。

JanusGraph配置指定JanusGraph应使用哪些组件,控制所有与JanusGraph部署的所有操作方面,并提供许多调整选项以从JanusGraph集群获得最大性能。

JanusGraph的配置至少需要定义持久性引擎作为JanusGraph使用的存储后端。 <>中列出了所有支持的持久性引擎以及如何分别配置它们。 如果使用高级图形查询支持(比如全文搜索,地理位置搜索或者是区间查询)则需要配置一个额外的索引后端。有关详细信息,请参见<>。如果需要考虑查询的性能,那么应该启用缓存。缓存的配置和调优在<>中描述。

示例配置

下面是一些示例配置文件,用于演示如何配置最常用的存储后端、索引系统和性能组件。这只涵盖了可用配置选项的一小部分。所有选项的完整列表请参考<>。

Cassandra+Elasticsearch

设置JanusGraph以使用本地运行的Cassandra持久化引擎和远程Elasticsearch索引系统:

  1. storage.backend=cql
  2. storage.hostname=localhost
  3. index.search.backend=elasticsearch
  4. index.search.hostname=100.100.101.1, 100.100.101.2
  5. index.search.elasticsearch.client-only=true

HBase+Caching

设置JanusGraph以使用远程运行的HBase持久化引擎,并使用JanusGraph的缓存组件来获得更好的性能。

  1. storage.backend=hbase
  2. storage.hostname=100.100.101.1
  3. storage.port=2181
  4. cache.db-cache = true
  5. cache.db-cache-clean-wait = 20
  6. cache.db-cache-time = 180000
  7. cache.db-cache-size = 0.5

BerkeleyDB

设置JanusGraph以使用BerkeleyDB作为嵌入式持久化引擎,Elasticsearch作为嵌入式索引系统。

  1. storage.backend=berkeleyje
  2. storage.directory=/tmp/graph
  3. index.search.backend=elasticsearch
  4. index.search.directory=/tmp/searchindex
  5. index.search.elasticsearch.client-only=false
  6. index.search.elasticsearch.local-mode=true

<>中详细描述了所有的这些配置选项。JanusGraph发行版的+conf+目录中包含其他配置示例。

更多的例子

conf/ 目录中有几个示例配置文件,可以用来快速的使用JanusGraph。这些文件的路径可以传递到 JanusGraphFactory.open(...) 方法中,如下图所示:

[source, java]

  1. // Connect to Cassandra on localhost using a default configuration
  2. graph = JanusGraphFactory.open("conf/janusgraph-cql.properties")
  3. // Connect to HBase on localhost using a default configuration
  4. graph = JanusGraphFactory.open("conf/janusgraph-hbase.properties")

使用配置

如何为JanusGraph提供配置取决于实例化模式。

JanusGraphFactory

= Gremlin 控制台

JanusGraph发行版包含一个Gremlin命令行控制台,这使得启动JanusGraph并与之交互变得很容易。调用 bin/gremlin.sh (Unix/Linux) or bin/gremlin.bat (Windows) 来启动控制台,然后使用JanusGraphFactory工厂打开JanusGraph图形,工厂所需要的配置存储在一个可访问的属性配置文件中:

  1. graph = JanusGraphFactory.open('path/to/configuration.properties')

= 嵌入式的JanusGraph

JanusGraphFactory还可以从基于jvm的用户应用程序中打开嵌入式的JanusGraph图形实例。在这种情况下,JanusGraph是用户应用程序的一部分,应用程序可以通过其公共API直接访问JanusGraph。

= 极少的代码

如果JanusGraph 图形集群已经预先配置,并且/或者只需要定义存储后端,JanusGraphFactory可以接受被冒号分隔的含有存储后端名称和主机名或目录的字符串表示。

  1. graph = JanusGraphFactory.open('cql:localhost')
  1. graph = JanusGraphFactory.open('berkeleyje:/tmp/graph')

JanusGraph 服务

JanusGraph本身只是一组没有执行线程的jar文件。连接和使用JanusGraph数据库有两种基本模式:

.模式 . JanusGraph可以通过在客户端程序中嵌入JanusGraph调用来使用,该程序提供了执行线程。 . JanusGraph包装了一个可以长时间运行的服务器进程,当启动时,允许远程客户端或代码逻辑运行在一个单独的程序中进行JanusGraph调用。这个长时间运行的服务器进程称为 JanusGraph Server

对于JanusGraph服务,JanusGraph使用 https://tinkerpop.apache.org/[Apache TinkerPop] 技术栈中的 https://tinkerpop.apache.org/docs/{tinkerpop_version}/reference#gremlin-server[Gremlin Server] 来服务客户机请求。JanusGraph提供了一个开箱即用的配置,可以快速启动JanusGraph Server,但是可以更改配置以提供广泛的服务器功能。

配置JanusGraph 服务器是通过JanusGraph Server yaml配置文件完成的,该文件位于JanusGraph发行版中的./conf/gremlin-server目录中。要使用一个图形实例(JanusGraph)来配置JanusGraph服务器,那么JanusGraph服务器配置文件需要以下设置:

[source, yaml]

  1. ...
  2. graphs: {
  3. graph: conf/janusgraph-berkeleyje.properties
  4. }
  5. scriptEngines: {
  6. gremlin-groovy: {
  7. plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
  8. org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
  9. org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
  10. org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
  11. org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}}}}
  12. ...

graphs 条目定义 JanusGraph 特性的配置绑定。在上面的例子中,它将 graph 绑定到位于 conf/janusgraph-berkeleyje.properties 的JanusGraph配置。plugins 条目开启了JanusGraph Gremlin插件,从而启用了JanusGraph类的自动导入,以便可以在远程提交的脚本中引用它们。

在 <> 中了解更多关于配置和使用JanusGraph Server的信息。

= 服务的分布

JanusGraph的zip文件包含一个可以快速的启动服务器组件,可以帮助您更容易地开始使用Gremlin server和JanusGraph。调用 bin/janusgraph.sh start 来启动组合了Cassandra和Elasticsearch的Gremlin服务器。

[提示] 出于安全原因,Elasticsearch 和 janusgraph.sh 必须在非root帐户下运行

configuration-global

全局配置

JanusGraph区分本地和全局配置选项。本地配置选项应用于单个JanusGraph实例。全局配置选项适用于集群中的所有实例。更具体地说,JanusGraph为配置选项区分了以下五个范围:

  • LOCAL: 这些选项只适用于单个独立的JanusGraph实例,并在初始化JanusGraph实例时提供指定的配置。
  • MASKABLE: 这些配置选项可以通过本地配置文件覆盖到单个独立的JanusGraph实例。如果本地配置文件中没有指定该选项,则从JanusGraph集群的全局配置中读取其值。
  • GLOBAL: 这些选项总是从集群配置中读取,并且不能被实例自身的配置重写。
  • GLOBAL_OFFLINE: 与 GLOBAL 类似, 但是更改这些选项时需要重新启动集群,以确保整个集群的值是相同的。
  • FIXED: 与 GLOBAL 类似, 但是一旦JanusGraph集群初始化,该值就无法更改。

当集群中的第一个JanusGraph实例启动时,将用提供的本地配置文件来初始化集群的全局配置选项。随后,通过JanusGraph的管理API更改来全局配置选项。要访问管理API,请在打开的JanusGraph实例句柄 g 上调用 g.getManagementSystem() 。例如,要更改JanusGraph集群中的默认缓存行为:

  1. mgmt = graph.openManagement()
  2. mgmt.get('cache.db-cache')
  3. // Prints the current config setting
  4. mgmt.set('cache.db-cache', true)
  5. // Changes option
  6. mgmt.get('cache.db-cache')
  7. // Prints 'true'
  8. mgmt.commit()
  9. // Changes take effect

改变离线选项

更改配置选项并不影响正在运行的实例,只适用于新启动的实例。更改 GLOBAL_OFFLINE 配置选项需要重新启动集群,以便立即对所有实例生效。 要更改 GLOBAL_OFFLINE 选项,请执行以下步骤:

  • 关闭集群中除一个JanusGraph实例外的所有实例
  • 连接到单个实例
  • 确保所有正在运行的事务都已关闭
  • 确保没有启动任何新事务(即集群必须离线)
  • 打开管理API
  • 更改配置选项
  • 调用commit,它将自动关闭图实例
  • 重启所有实例

有关更多信息,包括每个选项的配置范围,请参考 <> 中完整的配置选项列表。

schema

== 模式和数据模型

每个JanusGraph图都有一个模式,由其中使用的边缘标签、属性键和顶点标签组成。JanusGraph的模式可以显式定义,也可以隐式定义。鼓励用户在应用程序开发期间显式定义图形模式。显式定义的模式是健壮的图应用程序的一个重要组成部分并且极大地改进了软件协作开发。注意,JanusGraph模式可以在不中断正常数据库操作的情况下随时间发展。扩展模式不会降低查询的响应速度,也不需要数据库停机。

模式中的类型(即边标签、属性键或顶点标签)会在图中元素(即边、属性或顶点)首次创建时被分配。不能为特定元素更改指定的模式类型。很容易推理出这是为了确保了一个稳定的类型系统。

除了本节解释的模式定义选项之外,模式类型还提供了性能调优选项,这些选项在 <<高级模式>> 中进行了讨论。

显示模式信息

有一些方法可以通过管理API查看图形模式中的特定元素。 这些方法是 mgmt.printIndexes(), mgmt.printPropertyKeys(), mgmt.printVertexLabels(), 以及 mgmt.printEdgeLabels(). 还有一个名为 printSchema() 的方法可以显示所有组合的输出。 [source, gremlin] mgmt = graph.openManagement() mgmt.printSchema()

定义标签边缘(Edge Label)

连接两个顶点的每条边都有一个定义关系语义的标签。例如,在顶点A和顶点B之间标记为 friend 的边译成两个个体之间的友谊。

要定义边缘标签,请在打开的图形或管理事务上调用 makeEdgeLabel(String) ,并提供边缘标签的名称作为参数。边缘标签名称在图中必须是唯一的。此方法返回一个用于边缘标签的生成器,该生成器允许定义其 多样性 。边标签的 多样性 定义了该标签所有边的多样性约束,即一对顶点之间的最大边数。JanusGraph可以识别以下多样性设置。

边标签多样性

.多样性设置

  • MULTI: 允许同一标签的多条边位于任意一对顶点之间。换句话说,对于这样的边标签,图是一个 multi graph 。对边的多样性没有约束。
  • SIMPLE: 允许任意一对顶点之间同一标签的边最多只有一条。换句话说,该图是关于标签的 simple graph 。确保对于给定的标签和顶点对,边是唯一的。
  • MANY2ONE: 允许在图中的任意顶点上最多有一条这样的标签的输出边,但不限制传入边。边缘标签 mother 是MANY2ONE多样性的一个例子,因为每个人最多有一个母亲,但母亲可以有多个孩子。
  • ONE2MANY: 允许在图中的任意顶点上最多有一条这样的标签的传入边,但不限制输出边。边缘标签 winnerOf 就是一个具有ONE2MANY多样性的例子,因为每个比赛最多由一个人赢得,但是一个人可以赢得多个比赛。
  • ONE2ONE: 允许在图形的任何顶点上最多有一条这样标签的传入或输出的边。边缘标签 marriedTo 就是一个例子,因为一个人只和另一个人结婚,所以有ONE2ONE多样性。

默认的多样性是MULTI。边标签的定义是通过在构建器上调用 make() 方法来完成的,该方法返回定义的边标签,如下例所示。 The default multiplicity is MULTI. The definition of an edge label is completed by calling the make() method on the builder which returns the defined edge label as shown in the following example.

[source, gremlin] mgmt = graph.openManagement() follow = mgmt.makeEdgeLabel(‘follow’).multiplicity(MULTI).make() mother = mgmt.makeEdgeLabel(‘mother’).multiplicity(MANY2ONE).make() mgmt.commit()

定义属性键

顶点和边上的属性是键值对。例如,属性 name='Daniel' 具有键 name 和值 'Daniel' 。属性键是JanusGraph模式的一部分,可以约束允许的数据类型和值的基数。

要定义一个属性键,在打开的图或管理事务上调用 makePropertyKey(String) ,并提供属性键的名称作为参数。属性键名在图中必须是惟一的,建议在属性名中避免空格或特殊字符。此方法返回属性键的构造器。

属性键的数据类型

使用 dataType(Class) 定义属性键的数据类型。JanusGraph将强制与键关联的所有值都具有已配置的数据类型,从而确保添加到图中的数据是有效的。例如,可以定义 name 键具有字符串数据类型。

将数据类型定义为 Object.class 可以允许任何(可序列化的)值与键关联。但是,我们鼓励尽可能使用具体的数据类型。 配置的数据类型必须是具体的类,而不是接口或抽象类。JanusGraph执行强类型判断,因此不允许添加属于已配置数据类型的子类型的数据。

JanusGraph原生支持以下数据类型。

.原生 JanusGraph 数据类型 [options=”header”] |####= |Name |Description |String |Character sequence |Character |Individual character |Boolean |true or false |Byte |byte value |Short |short value |Integer |integer value |Long |long value |Float |4 byte floating point number |Double |8 byte floating point number |Date |Specific instant in time (java.util.Date) |Geoshape |Geographic shape like point, circle or box |UUID |Universally unique identifier (java.util.UUID) |####=

property-cardinality

属性键的基数

使用 cardinality(Cardinality) 来定义给定顶点上与键关联值的容许基数。

.基数设置

  • SINGLE: 对于这样的键,每个元素最多允许一个值。换句话说,key->value 的映射对于图中的所有元素都是惟一的。属性键 birthDate 是一个基数为SINGLE基数的示例,因为每个人只有一个出生日期。
  • LIST: 允许对这样的键,每个元素拥有任意数量的值。换句话说,与键相关联的值列表允许重复的值。假设我们将传感器建模为图中的顶点,属性键 sensorReading 就是一个LIST类型基数的例子,允许记录大量(可能重复的)传感器读数。
  • SET: 允许多个值,但是对于这样的键,每个元素不能有重复的值。换句话说,键与一组不重复的值相关联。如果我们想捕获一个人的所有名字(包括昵称、婚前姓等),属性键 name 可以设置SET基数。

默认基数设置为SINGLE。 注意,在边和属性上使用的属性键的基数是SINGLE。不支持在边缘或属性上为单个键附加多个值。

[source, gremlin] mgmt = graph.openManagement() birthDate = mgmt.makePropertyKey(‘birthDate’).dataType(Long.class).cardinality(Cardinality.SINGLE).make() name = mgmt.makePropertyKey(‘name’).dataType(String.class).cardinality(Cardinality.SET).make() sensorReading = mgmt.makePropertyKey(‘sensorReading’).dataType(Double.class).cardinality(Cardinality.LIST).make() mgmt.commit()

关系类型

边缘标签和属性键联合称为 relation types 。关系类型的名称在图中必须是惟一的,这意味着属性键和边缘标签不能具有相同的名称。JanusGraph API中有一些方法用于查询包含属性键和边缘标签的关系类型是否存在或者检索其关系类型。

[source, gremlin] mgmt = graph.openManagement() if (mgmt.containsRelationType(‘name’)) name = mgmt.getPropertyKey(‘name’) mgmt.getRelationTypes(EdgeLabel.class) mgmt.commit()

定义顶点标签

像边缘一样,顶点也有标签。与边缘的标签不同,顶点标签是可选的。顶点标签用于区分不同类型的顶点,例如 user 顶点和 product 顶点。

尽管标签在概念和数据模型级别是可选的,作为内部的实现细节,JanusGraph将为所有顶点分配一个标签。由 addVertex 方法创建的顶点使用JanusGraph的默认标签。

创建一个标签,需要在一个打开的图形或管理事务上调用 makeVertexLabel(String).make() ,并提供顶点标签的名称作为参数。顶点标签名称在图中必须是唯一的。

[source, gremlin] mgmt = graph.openManagement() person = mgmt.makeVertexLabel(‘person’).make() mgmt.commit() // Create a labeled vertex person = graph.addVertex(label, ‘person’) // Create an unlabeled vertex v = graph.addVertex() graph.tx().commit()

自动模式创建器

如果没有显式定义边缘标签、属性键或顶点标签,则在首次添加边缘、顶点或属性设置期间隐式定义它。为JanusGraph配置的 DefaultSchemaMaker 定义了这些类型。

默认情况下,隐式创建的边缘标签具有MULTI的多样性,隐式创建的属性键具有SINGLE的基数类型和Object.class数据类型。用户可以通过实现和注册自己的 DefaultSchemaMaker 来控制模式元素的自动创建。

强烈建议在JanusGraph图形配置中显式定义所有模式元素,并通过设置 schema.default=none 禁用自动模式创建器。

修改模式元素

边缘标签、属性键或顶点标签的定义一旦提交到图中就不能再更改。但是,模式元素的名称可以通过JanusGraphManagement.changeName(JanusGraphSchemaElement, String)更改,如下面的示例所示,其中属性键 place 被重命名为 location

[source, gremlin] mgmt = graph.openManagement() place = mgmt.getPropertyKey(‘place’) mgmt.changeName(place, ‘location’) mgmt.commit()

注意,在当前运行的事务和集群中的其他JanusGraph图实例中,模式名称的更改可能不会立即可见。虽然模式名称的更改会通过存储后端来告诉所有JanusGraph实例,可能需要一段时间使模式更改生效,如果他们配合重命名的时间在一些情况下 - 比如网络分区 - 失败,也可能需要重新启动实例。因此,使用者必须确保下列其中一项成立:

  • 重命名的标签或键没有在当前活动中使用(即写或读),并且在所有JanusGraph实例都意识到名称更改之前也不会使用。
  • 运行事务时,根据特定的JanusGraph实例和名称更改通知的状态,积极适应新旧名称都有效的短暂中间时期。例如,这可能意味着同时查询两个名称。

如果需要重新定义现有模式类型,建议将该类型的名称更改为当前不使用(而且永远不会使用)的名称。然后,可以用原始名称定义新的标签或键,从而有效地替换旧的标签或键。 但是,请注意这不会影响先前使用现有类型创建的顶点、边或属性。不支持联机重新定义现有的图元素,必须通过批处理图转换来完成。

schema-constraints

模式中的约束

模式的定义允许用户配置明确的属性和连接约束。属性可以绑定到特定的顶点标签和/或边缘标签。此外,连接约束允许用户显式定义哪两个顶点标签可以通过某个边缘标签连接。这些约束可用于确保图匹配给定的域模型。例如,在众神图中,一个 god 可以是另一个 god 的兄弟,但不是一个 monster 的兄弟,一个 god 可以有一个属性 age,但是 location 不能有属性 age。默认情况下这些约束被禁用。

通过设置 schema.constraints=true 启用这些模式约束。此设置取决于设置 schema.default。如果将 schema.default 设置为 none,则会为违反模式约束的行为而抛出 IllegalArgumentException 。如果’ schema.default ‘未设置为 schema.default ,则会自动创建模式约束,且不会抛出异常。 激活模式约束对现有数据没有影响,因为这些模式约束只在插入过程中应用。所以数据的读取完全不受这些约束的影响。

可以使用 JanusGraphManagement.addProperties(VertexLabel, PropertyKey...) 将多个属性绑定到一个顶点,例如:

[source, gremlin] mgmt = graph.openManagement() person = mgmt.makeVertexLabel(‘person’).make() name = mgmt.makePropertyKey(‘name’).dataType(String.class).cardinality(Cardinality.SET).make() birthDate = mgmt.makePropertyKey(‘birthDate’).dataType(Long.class).cardinality(Cardinality.SINGLE).make() mgmt.addProperties(person, name, birthDate) mgmt.commit()

可以使用 JanusGraphManagement.addProperties(EdgeLabel, PropertyKey...) 将多个属性绑定到边缘,例如:

[source, gremlin] mgmt = graph.openManagement() follow = mgmt.makeEdgeLabel(‘follow’).multiplicity(MULTI).make() name = mgmt.makePropertyKey(‘name’).dataType(String.class).cardinality(Cardinality.SET).make() mgmt.addProperties(follow, name) mgmt.commit()

可以使用 JanusGraphManagement.addConnection(EdgeLabel, VertexLabel out, VertexLabel in) 来定义在输出、输入和边缘之间的连接,例如:

[source, gremlin] mgmt = graph.openManagement() person = mgmt.makeVertexLabel(‘person’).make() company = mgmt.makeVertexLabel(‘company’).make() works = mgmt.makeEdgeLabel(‘works’).multiplicity(MULTI).make() mgmt.addConnection(works, person, company) mgmt.commit()

gremlin

== Gremlin Query Language

image:https://tinkerpop.apache.org/docs/{tinkerpop_version}/images/gremlin-logo.png[width=370,height=143]

https://tinkerpop.apache.org/gremlin.html[Gremlin] is JanusGraph’s query language used to retrieve data from and modify data in the graph. Gremlin is a path-oriented language which succinctly expresses complex graph traversals and mutation operations. Gremlin is a https://en.wikipedia.org/wiki/Functional_programming[functional language] whereby traversal operators are chained together to form path-like expressions. For example, “from Hercules, traverse to his father and then his father’s father and return the grandfather’s name.”

Gremlin is a component of https://tinkerpop.apache.org[Apache TinkerPop]. It is developed independently from JanusGraph and is supported by most graph databases. By building applications on top of JanusGraph through the Gremlin query language, users avoid vendor-lock in because their application can be migrated to other graph databases supporting Gremlin.

This section is a brief overview of the Gremlin query language. For more information on Gremlin, refer to the following resources:

In addition to these resources, <> explains how Gremlin can be used in different programming languages to query a JanusGraph Server.

Introductory Traversals

A Gremlin query is a chain of operations/functions that are evaluated from left to right. A simple grandfather query is provided below over the Graph of the Gods dataset discussed in <>.

[source, gremlin] gremlin> g.V().has(‘name’, ‘hercules’).out(‘father’).out(‘father’).values(‘name’) ==>saturn

The query above can be read:

. g: for the current graph traversal. . V: for all vertices in the graph . has('name', 'hercules'): filters the vertices down to those with name property “hercules” (there is only one). . out('father'): traverse outgoing father edge’s from Hercules. . out('father'): traverse outgoing father edge’s from Hercules’ father’s vertex (i.e. Jupiter). . name: get the name property of the “hercules” vertex’s grandfather.

Taken together, these steps form a path-like traversal query. Each step can be decomposed and its results demonstrated. This style of building up a traversal/query is useful when constructing larger, complex query chains.

[source, gremlin] gremlin> g ==>graphtraversalsource[janusgraph[cql:127.0.0.1], standard] gremlin> g.V().has(‘name’, ‘hercules’) ==>v[24] gremlin> g.V().has(‘name’, ‘hercules’).out(‘father’) ==>v[16] gremlin> g.V().has(‘name’, ‘hercules’).out(‘father’).out(‘father’) ==>v[20] gremlin> g.V().has(‘name’, ‘hercules’).out(‘father’).out(‘father’).values(‘name’) ==>saturn

For a sanity check, it is usually good to look at the properties of each return, not the assigned long id.

[source, gremlin] gremlin> g.V().has(‘name’, ‘hercules’).values(‘name’) ==>hercules gremlin> g.V().has(‘name’, ‘hercules’).out(‘father’).values(‘name’) ==>jupiter gremlin> g.V().has(‘name’, ‘hercules’).out(‘father’).out(‘father’).values(‘name’) ==>saturn

Note the related traversal that shows the entire father family tree branch of Hercules. This more complicated traversal is provided in order to demonstrate the flexibility and expressivity of the language. A competent grasp of Gremlin provides the JanusGraph user the ability to fluently navigate the underlying graph structure.

[source, gremlin] gremlin> g.V().has(‘name’, ‘hercules’).repeat(out(‘father’)).emit().values(‘name’) ==>jupiter ==>saturn

Some more traversal examples are provided below.

[source, gremlin] gremlin> hercules = g.V().has(‘name’, ‘hercules’).next() ==>v[1536] gremlin> g.V(hercules).out(‘father’, ‘mother’).label() ==>god ==>human gremlin> g.V(hercules).out(‘battled’).label() ==>monster ==>monster ==>monster gremlin> g.V(hercules).out(‘battled’).valueMap() ==>{name=nemean} ==>{name=hydra} ==>{name=cerberus}

Each step (denoted by a separating .) is a function that operates on the objects emitted from the previous step. There are numerous steps in the Gremlin language (see https://tinkerpop.apache.org/docs/{tinkerpop_version}/reference#graph-traversal-steps[Gremlin Steps]). By simply changing a step or order of the steps, different traversal semantics are enacted. The example below returns the name of all the people that have battled the same monsters as Hercules who themselves are not Hercules (i.e. “co-battlers” or perhaps, “allies”).

Given that The Graph of the Gods only has one battler (Hercules), another battler (for the sake of example) is added to the graph with Gremlin showcasing how vertices and edges are added to the graph.

[source, gremlin] gremlin> theseus = graph.addVertex(‘human’) ==>v[3328] gremlin> theseus.property(‘name’, ‘theseus’) ==>null gremlin> cerberus = g.V().has(‘name’, ‘cerberus’).next() ==>v[2816] gremlin> battle = theseus.addEdge(‘battled’, cerberus, ‘time’, 22) ==>e[7eo-2kg-iz9-268][3328-battled->2816] gremlin> battle.values(‘time’) ==>22

When adding a vertex, an optional vertex label can be provided. An edge label must be specified when adding edges. Properties as key-value pairs can be set on both vertices and edges. When a property key is defined with SET or LIST cardinality, addProperty must be used when adding a respective property to a vertex.

[source, gremlin] gremlin> g.V(hercules).as(‘h’).out(‘battled’).in(‘battled’).where(neq(‘h’)).values(‘name’) ==>theseus

The example above has 4 chained functions: out, in, except, and values (i.e. name is shorthand for values('name')). The function signatures of each are itemized below, where V is vertex and U is any object, where V is a subset of U.

. out: V -> V . in: V -> V . except: U -> U . values: V -> U

When chaining together functions, the incoming type must match the outgoing type, where U matches anything. Thus, the “co-battled/ally” traversal above is correct.

[NOTE] The Gremlin overview presented in this section focused on the Gremlin-Groovy language implementation used in the Gremlin Console. Refer to <> for information about connecting to JanusGraph with other languages than Groovy and independent of the Gremlin Console.

Iterating the Traversal

One convenient feature of the Gremlin Console is that it automatically iterates all results from a query executed from the gremlin> prompt. This works well within the https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop[REPL] environment as it shows you the results as a String. As you transition towards writing a Gremlin application, it is important to understand how to iterate a traversal explicitly because your application’s traversals will not iterate automatically. These are some of the common ways to iterate the https://tinkerpop.apache.org/javadocs/{tinkerpop_version}/full/org/apache/tinkerpop/gremlin/process/traversal/Traversal.html[`Traversal`]:

  • iterate() - Zero results are expected or can be ignored.
  • next() - Get one result. Make sure to check hasNext() first.
  • next(int n) - Get the next n results. Make sure to check hasNext() first.
  • toList() - Get all results as a list. If there are no results, an empty list is returned.

A Java code example is shown below to demonstrate these concepts:

[source, java] Traversal t = g.V().has(“name”, “pluto”); // Define a traversal // Note the traversal is not executed/iterated yet Vertex pluto = null; if (t.hasNext()) { // Check if results are available pluto = g.V().has(“name”, “pluto”).next(); // Get one result g.V(pluto).drop().iterate(); // Execute a traversal to drop pluto from graph } // Note the traversal can be cloned for reuse Traversal tt = t.asAdmin().clone(); if (tt.hasNext()) { System.err.println(“pluto was not dropped!”); } List gods = g.V().hasLabel(“god”).toList(); // Find all the gods

server

== JanusGraph Server

JanusGraph uses the https://tinkerpop.apache.org/docs/{tinkerpop_version}/reference#gremlin-server[Gremlin Server] engine as the server component to process and answer client queries. When packaged in JanusGraph, Gremlin Server is called JanusGraph Server.

JanusGraph Server must be started manually in order to use it. JanusGraph Server provides a way to remotely execute Gremlin traversals against one or more JanusGraph instances hosted within it. This section will describe how to use the WebSocket configuration, as well as describe how to configure JanusGraph Server to handle HTTP endpoint interactions. For information about how to connect to a JanusGraph Server from different languages refer to <>.

server-getting-started

Getting Started

Using the Pre-Packaged Distribution

The JanusGraph https://github.com/JanusGraph/janusgraph/releases[release] comes pre-configured to run JanusGraph Server out of the box leveraging a sample Cassandra and Elasticsearch configuration to allow users to get started quickly with JanusGraph Server. This configuration defaults to client applications that can connect to JanusGraph Server via WebSocket with a custom subprotocol. There are a number of clients developed in different languages to help support the subprotocol. The most familiar client to use the WebSocket interface is the Gremlin Console. The quick-start bundle is not intended to be representative of a production installation, but does provide a way to perform development with JanusGraph Server, run tests and see how the components are wired together. To use this default configuration:

  • Download a copy of the current janusgraph-$VERSION.zip file from the https://github.com/JanusGraph/janusgraph/releases[Releases page]
  • Unzip it and enter the janusgraph-$VERSION directory
  • Run bin/janusgraph.sh start. This step will start Gremlin Server with Cassandra/ES forked into a separate process. Note for security reasons Elasticsearch and therefore janusgraph.sh must be run under a non-root account.

[source,bourne]

  1. $ bin/janusgraph.sh start
  2. Forking Cassandra...
  3. Running `nodetool statusthrift`.. OK (returned exit status 0 and printed string "running").
  4. Forking Elasticsearch...
  5. Connecting to Elasticsearch (127.0.0.1:9300)... OK (connected to 127.0.0.1:9300).
  6. Forking Gremlin-Server...
  7. Connecting to Gremlin-Server (127.0.0.1:8182)... OK (connected to 127.0.0.1:8182).
  8. Run gremlin.sh to connect.

first-example-connecting-gremlin-server

= Connecting to Gremlin Server

After running janusgraph.sh, Gremlin Server will be ready to listen for WebSocket connections. The easiest way to test the connection is with Gremlin Console.

Start https://tinkerpop.apache.org/docs/{tinkerpop_version}/reference#gremlin-console[Gremlin Console] with bin/gremlin.sh and use the :remote and :> commands to issue Gremlin to Gremlin Server:

[source, text]

  1. $ bin/gremlin.sh
  2. \,,,/
  3. (o o)
  4. ```-oOOo-(3)-oOOo```-
  5. plugin activated: tinkerpop.server
  6. plugin activated: tinkerpop.hadoop
  7. plugin activated: tinkerpop.utilities
  8. plugin activated: janusgraph.imports
  9. plugin activated: tinkerpop.tinkergraph
  10. gremlin> :remote connect tinkerpop.server conf/remote.yaml
  11. ==>Connected - localhost/127.0.0.1:8182
  12. gremlin> :> graph.addVertex("name", "stephen")
  13. ==>v[256]
  14. gremlin> :> g.V().values('name')
  15. ==>stephen

The :remote command tells the console to configure a remote connection to Gremlin Server using the conf/remote.yaml file to connect. That file points to a Gremlin Server instance running on localhost. The :> is the “submit” command which sends the Gremlin on that line to the currently active remote. By default remote conenctions are sessionless, meaning that each line sent in the console is interpreted as a single request. Multiple statements can be sent on a single line using a semicolon as the delimiter. Alternately, you can establish a console with a session by specifying https://tinkerpop.apache.org/docs/current/reference/#sessions[session] when creating the connection. A https://tinkerpop.apache.org/docs/current/reference/#console-sessions[console session] allows you to reuse variables across several lines of input.

  1. gremlin> :remote connect tinkerpop.server conf/remote.yaml
  2. ==>Configured localhost/127.0.0.1:8182
  3. gremlin> graph
  4. ==>standardjanusgraph[cql:[127.0.0.1]]
  5. gremlin> g
  6. ==>graphtraversalsource[standardjanusgraph[cql:[127.0.0.1]], standard]
  7. gremlin> g.V()
  8. gremlin> user = "Chris"
  9. ==>Chris
  10. gremlin> graph.addVertex("name", user)
  11. No such property: user for class: Script21
  12. Type ':help' or ':h' for help.
  13. Display stack trace? [yN]
  14. gremlin> :remote connect tinkerpop.server conf/remote.yaml session
  15. ==>Configured localhost/127.0.0.1:8182-[9acf239e-a3ed-4301-b33f-55c911e04052]
  16. gremlin> g.V()
  17. gremlin> user = "Chris"
  18. ==>Chris
  19. gremlin> user
  20. ==>Chris
  21. gremlin> graph.addVertex("name", user)
  22. ==>v[4344]
  23. gremlin> g.V().values('name')
  24. ==>Chris

Cleaning up after the Pre-Packaged Distribution

If you want to start fresh and remove the database and logs you can use the clean command with janusgraph.sh. The server should be stopped before running the clean operation. [source, text]

  1. $ cd /Path/to/janusgraph/janusgraph-0.2.0-hadoop2/
  2. $ ./bin/janusgraph.sh stop
  3. Killing Gremlin-Server (pid 91505)...
  4. Killing Elasticsearch (pid 91402)...
  5. Killing Cassandra (pid 91219)...
  6. $ ./bin/janusgraph.sh clean
  7. Are you sure you want to delete all stored data and logs? [y/N] y
  8. Deleted data in /Path/to/janusgraph/janusgraph-0.2.0-hadoop2/db
  9. Deleted logs in /Path/to/janusgraph/janusgraph-0.2.0-hadoop2/log

JanusGraph Server as a WebSocket Endpoint

The default configuration described in <> is already a WebSocket configuration. If you want to alter the default configuration to work with your own Cassandra or HBase environment rather than use the quick start environment, follow these steps:

.To Configure JanusGraph Server For WebSocket . Test a local connection to a JanusGraph database first. This step applies whether using the Gremlin Console to test the connection, or whether connecting from a program. Make appropriate changes in a properties file in the ./conf directory for your environment. For example, edit ./conf/janusgraph-hbase.properties and make sure the storage.backend, storage.hostname and storage.hbase.table parameters are specified correctly. For more information on configuring JanusGraph for various storage backends, see <>. Make sure the properties file contains the following line: +

  1. gremlin.graph=org.janusgraph.core.JanusGraphFactory

. Once a local configuration is tested and you have a working properties file, copy the properties file from the ./conf directory to the ./conf/gremlin-server directory. +

  1. cp conf/janusgraph-hbase.properties conf/gremlin-server/socket-janusgraph-hbase-server.properties

+ . Copy ./conf/gremlin-server/gremlin-server.yaml to a new file called socket-gremlin-server.yaml. Do this in case you need to refer to the original version of the file +

  1. cp conf/gremlin-server/gremlin-server.yaml conf/gremlin-server/socket-gremlin-server.yaml

+ . Edit the socket-gremlin-server.yaml file and make the following updates: .. If you are planning to connect to JanusGraph Server from something other than localhost, update the IP address for host: +

  1. host: 10.10.10.100

+ .. Update the graphs section to point to your new properties file so the JanusGraph Server can find and connect to your JanusGraph instance: +

  1. graphs: {
  2. graph: conf/gremlin-server/socket-janusgraph-hbase-server.properties}

+ . Start the JanusGraph Server, specifying the yaml file you just configured: +

  1. bin/gremlin-server.sh ./conf/gremlin-server/socket-gremlin-server.yaml

IMPORTANT: Do not use bin/janusgraph.sh. That starts the default configuration, which starts a separate Cassandra/Elasticsearch environment. + . The JanusGraph Server should now be running in WebSocket mode and can be tested by following the instructions in <>

JanusGraph Server as a HTTP Endpoint

The default configuration described in <> is a WebSocket configuration. If you want to alter the default configuration in order to use JanusGraph Server as an HTTP endpoint for your JanusGraph database, follow these steps:

.To Configure JanusGraph Server for HTTP . Test a local connection to a JanusGraph database first. This step applies whether using the Gremlin Console to test the connection, or whether connecting from a program. Make appropriate changes in a properties file in the ./conf directory for your environment. For example, edit ./conf/janusgraph-hbase.properties and make sure the storage.backend, storage.hostname and storage.hbase.table parameters are specified correctly. For more information on configuring JanusGraph for various storage backends, see <>. Make sure the properties file contains the following line: +

  1. gremlin.graph=org.janusgraph.core.JanusGraphFactory

. Once a local configuration is tested and you have a working properties file, copy the properties file from the ./conf directory to the ./conf/gremlin-server directory. +

  1. cp conf/janusgraph-hbase.properties conf/gremlin-server/http-janusgraph-hbase-server.properties

+ . Copy ./conf/gremlin-server/gremlin-server.yaml to a new file called http-gremlin-server.yaml. Do this in case you need to refer to the original version of the file +

  1. cp conf/gremlin-server/gremlin-server.yaml conf/gremlin-server/http-gremlin-server.yaml

+ . Edit the http-gremlin-server.yaml file and make the following updates: .. If you are planning to connect to JanusGraph Server from something other than localhost, update the IP address for host: +

  1. host: 10.10.10.100

+ .. Update the channelizer setting to specify the HttpChannelizer: +

  1. channelizer: org.apache.tinkerpop.gremlin.server.channel.HttpChannelizer

+ .. Update the graphs section to point to your new properties file so the JanusGraph Server can find and connect to your JanusGraph instance: +

  1. graphs: {
  2. graph: conf/gremlin-server/http-janusgraph-hbase-server.properties}

+ . Start the JanusGraph Server, specifying the yaml file you just configured: +

  1. bin/gremlin-server.sh ./conf/gremlin-server/http-gremlin-server.yaml

+ . The JanusGraph Server should now be running in HTTP mode and available for testing. curl can be used to verify the server is working: +

  1. curl -XPOST -Hcontent-type:application/json -d '{"gremlin":"g.V().count()"}' http://[IP for JanusGraph server host]:8182

JanusGraph Server as Both a WebSocket and HTTP Endpoint

As of JanusGraph 0.2.0, you can configure your gremlin-server.yaml to accept both WebSocket and HTTP connections over the same port. This can be achieved by changing the channelizer in any of the previous examples as follows.

  1. channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer

Advanced JanusGraph Server Configurations

Authentication over HTTP

IMPORTANT: In the following example, credentialsDb should be different from the graph(s) you are using. It should be configured with the correct backend and a different keyspace, table, or storage directory as appropriate for the configured backend. This graph will be used for storing usernames and passwords.

= HTTP Basic authentication

To enable Basic authentication in JanusGraph Server include the following configuration in your gremlin-server.yaml.

  1. authentication: {
  2. authenticator: org.janusgraph.graphdb.tinkerpop.gremlin.server.auth.JanusGraphSimpleAuthenticator,
  3. authenticationHandler: org.apache.tinkerpop.gremlin.server.handler.HttpBasicAuthenticationHandler,
  4. config: {
  5. defaultUsername: user,
  6. defaultPassword: password,
  7. credentialsDb: conf/janusgraph-credentials-server.properties
  8. }
  9. }

Verify that basic authentication is configured correctly. For example

  1. curl -v -XPOST http://localhost:8182 -d '{"gremlin": "g.V().count()"}'

should return a 401 if the authentication is configured correctly and

  1. curl -v -XPOST http://localhost:8182 -d '{"gremlin": "g.V().count()"}' -u user:password

should return a 200 and the result of 4 if authentication is configured correctly.

Authentication over WebSocket

Authentication over WebSocket occurs through a Simple Authentication and Security Layer (https://en.wikipedia.org/wiki/Simple_Authentication_and_Security_Layer[SASL]) mechanism.

To enable SASL authentication include the following configuration in the gremlin-server.yaml

  1. authentication: {
  2. authenticator: org.janusgraph.graphdb.tinkerpop.gremlin.server.auth.JanusGraphSimpleAuthenticator,
  3. authenticationHandler: org.apache.tinkerpop.gremlin.server.handler.SaslAuthenticationHandler,
  4. config: {
  5. defaultUsername: user,
  6. defaultPassword: password,
  7. credentialsDb: conf/janusgraph-credentials-server.properties
  8. }
  9. }

IMPORTANT: In the preceding example, credentialsDb should be different from the graph(s) you are using. It should be configured with the correct backend and a different keyspace, table, or storage directory as appropriate for the configured backend. This graph will be used for storing usernames and passwords.

If you are connecting through the gremlin console, your remote yaml file should ammend the username and password properties with the appropriate values.

  1. username: user
  2. password: password

Authentication over HTTP and WebSocket

If you are using the combined channelizer for both HTTP and WebSocket you can use the SaslAndHMACAuthenticator to authorize through either WebSocket through SASL, HTTP through basic auth, and HTTP through hash-based messsage authentication code (https://en.wikipedia.org/wiki/Hash-based_message_authentication_code[HMAC]) Auth. HMAC is a token based authentication designed to be used over HTTP. You first acquire a token via the /session endpoint and then use that to authenticate. It is used to amortize the time spent encrypting the password using basic auth.

The gremlin-server.yaml should include the following configurations

  1. authentication: {
  2. authenticator: org.janusgraph.graphdb.tinkerpop.gremlin.server.auth.SaslAndHMACAuthenticator,
  3. authenticationHandler: org.janusgraph.graphdb.tinkerpop.gremlin.server.handler.SaslAndHMACAuthenticationHandler,
  4. config: {
  5. defaultUsername: user,
  6. defaultPassword: password,
  7. hmacSecret: secret,
  8. credentialsDb: conf/janusgraph-credentials-server.properties
  9. }
  10. }

IMPORTANT: In the preceding example, credentialsDb should be different from the graph(s) you are using. It should be configured with the correct backend and a different keyspace, table, or storage directory as appropriate for the configured backend. This graph will be used for storing usernames and passwords.

IMPORTANT: Note the hmacSecret here. This should be the same across all running JanusGraph servers if you want to be able to use the same HMAC token on each server.

For HMAC authentication over HTTP, this creates a /session endpoint that provides a token that expires after an hour by default. This timeout for the token can be configured through the tokenTimeout configuration option in the authentication.config map. This value is a Long value and in milliseconds.

You can obtain the token using curl by issuing a get request to the /session endpoint. For example

  1. curl http://localhost:8182/session -XGET -u user:password
  2. {"token": "dXNlcjoxNTA5NTQ2NjI0NDUzOkhrclhYaGhRVG9KTnVSRXJ5U2VpdndhalJRcVBtWEpSMzh5WldqRTM4MW89"}

You can then use that token for authentication by using the “Authorization: Token” header. For example

  1. curl -v http://localhost:8182/session -XPOST -d '{"gremlin": "g.V().count()"}' -H "Authorization: Token dXNlcjoxNTA5NTQ2NjI0NDUzOkhrclhYaGhRVG9KTnVSRXJ5U2VpdndhalJRcVBtWEpSMzh5WldqRTM4MW89"

gremlin-server-with-janusgraph

Using TinkerPop Gremlin Server with JanusGraph

Since JanusGraph Server is a TinkerPop Gremlin Server packaged with configuration files for JanusGraph, a version compatible TinkerPop Gremlin Server can be downloaded separately and used with JanusGraph. Get started by link:https://tinkerpop.apache.org/downloads.html[downloading] the appropriate version of Gremlin Server, which needs to <> supported by the JanusGraph version in use ({tinkerpop_version}).

IMPORTANT: Any references to file paths in this section refer to paths under a TinkerPop distribution for Gremlin Server and not a JanusGraph distribution with the JanusGraph Server, unless specifically noted.

Configuring a standalone Gremlin Server to work with JanusGraph is similar to configuring the packaged JanusGraph Server. You should be familiar with link:https://tinkerpop.apache.org/docs/{tinkerpop_version}/reference#_configuring_2[graph configuration]. Basically, the Gremlin Server yaml file points to graph-specific configuration files that are used to instantiate JanusGraph instances that it will then host. In order to instantiate these Graph instances, Gremlin Server requires that the appropriate libraries and dependencies for the JanusGraph be available on its classpath.

For purposes of demonstration, these instructions will outline how to configure the BerkeleyDB backend for JanusGraph in Gremlin Server. As stated earlier, Gremlin Server needs JanusGraph dependencies on its classpath. Invoke the following command replacing $VERSION with the version of JanusGraph to use:

[source,bourne]

  1. bin/gremlin-server.sh install org.janusgraph janusgraph-all $VERSION

When this process completes, Gremlin Server should now have all the JanusGraph dependencies available to it and will thus be able to instantiate JanusGraph objects.

IMPORTANT: The above command uses Groovy Grape and if it is not configured properly download errors may ensue. Please refer to link:https://tinkerpop.apache.org/docs/{tinkerpop_version}/reference#gremlin-applications[this section] of the TinkerPop documentation for more information around setting up ~/.groovy/grapeConfig.xml.

Create a file called GREMLIN_SERVER_HOME/conf/janusgraph.properties with the following contents:

[source,text]

  1. gremlin.graph=org.janusgraph.core.JanusGraphFactory
  2. storage.backend=berkeleyje
  3. storage.directory=db/berkeley

Configuration of other backends is similar. See <>. If using Cassandra, then use Cassandra configuration options in the janusgraph.properties file. The only important piece to leave unchanged is the gremlin.graph setting which should always use JanusGraphFactory. This setting tells Gremlin Server how to instantiate a JanusGraph instance.

Next create a file called GREMLIN_SERVER_HOME/conf/gremlin-server-janusgraph.yaml that has the following contents:

[source,yaml]

  1. host: localhost
  2. port: 8182
  3. graphs: {
  4. graph: conf/janusgraph.properties}
  5. scriptEngines: {
  6. gremlin-groovy: {
  7. plugins: { org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {},
  8. org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {},
  9. org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {},
  10. org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},
  11. org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {files: [scripts/empty-sample.groovy]}}}}
  12. serializers:
  13. - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  14. - { className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: { serializeResultToString: true }}
  15. - { className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: { ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}
  16. metrics: {
  17. slf4jReporter: {enabled: true, interval: 180000}}

.There are several important parts to this configuration file as they relate to JanusGraph. . In the graphs map, there is a key called graph and its value is conf/janusgraph.properties. This tells Gremlin Server to instantiate a Graph instance called “graph” and use the conf/janusgraph.properties file to configure it. The “graph” key becomes the unique name for the Graph instance in Gremlin Server and it can be referenced as such in the scripts submitted to it. . In the plugins list, there is a reference to JanusGraphGremlinPlugin, which tells Gremlin Server to initialize the “JanusGraph Plugin”. The “JanusGraph Plugin” will auto-import JanusGraph specific classes for usage in scripts. . Note the scripts key and the reference to scripts/janusgraph.groovy. This Groovy file is an initialization script for Gremlin Server and that particular ScriptEngine. Create scripts/janusgraph.groovy with the following contents:

[source,groovy]

  1. def globals = [:]
  2. globals << [g : graph.traversal()]

The above script creates a Map called globals and assigns to it a key/value pair. The key is g and its value is a TraversalSource generated from graph, which was configured for Gremlin Server in its configuration file. At this point, there are now two global variables available to scripts provided to Gremlin Server - graph and g.

At this point, Gremlin Server is configured and can be used to connect to a new or existing JanusGraph database. To start the server:

[source,bourne]

  1. $ bin/gremlin-server.sh conf/gremlin-server-janusgraph.yaml
  2. [INFO] GremlinServer -
  3. \,,,/
  4. (o o)
  5. ```-oOOo-(3)-oOOo```-
  6. [INFO] GremlinServer - Configuring Gremlin Server from conf/gremlin-server-janusgraph.yaml
  7. [INFO] MetricManager - Configured Metrics Slf4jReporter configured with interval=180000ms and loggerName=org.apache.tinkerpop.gremlin.server.Settings$Slf4jReporterMetrics
  8. [INFO] GraphDatabaseConfiguration - Set default timestamp provider MICRO
  9. [INFO] GraphDatabaseConfiguration - Generated unique-instance-id=7f0000016240-ubuntu1
  10. [INFO] Backend - Initiated backend operations thread pool of size 8
  11. [INFO] KCVSLog$MessagePuller - Loaded unidentified ReadMarker start time 2015-10-02T12:28:24.411Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@35399441
  12. [INFO] GraphManager - Graph [graph] was successfully configured via [conf/janusgraph.properties].
  13. [INFO] ServerGremlinExecutor - Initialized Gremlin thread pool. Threads in pool named with pattern gremlin-*
  14. [INFO] ScriptEngines - Loaded gremlin-groovy ScriptEngine
  15. [INFO] GremlinExecutor - Initialized gremlin-groovy ScriptEngine with scripts/janusgraph.groovy
  16. [INFO] ServerGremlinExecutor - Initialized GremlinExecutor and configured ScriptEngines.
  17. [INFO] ServerGremlinExecutor - A GraphTraversalSource is now bound to [g] with graphtraversalsource[standardjanusgraph[berkeleyje:db/berkeley], standard]
  18. [INFO] AbstractChannelizer - Configured application/vnd.gremlin-v3.0+gryo with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
  19. [INFO] AbstractChannelizer - Configured application/vnd.gremlin-v3.0+gryo-stringd with org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0
  20. [INFO] GremlinServer$1 - Gremlin Server configured with worker thread pool of 1, gremlin pool of 8 and boss thread pool of 1.
  21. [INFO] GremlinServer$1 - Channel started at port 8182.

The following section explains how to connect to the running server.

= Connecting to JanusGraph via Gremlin Server

Gremlin Server will be ready to listen for WebSocket connections when it is started. The easiest way to test the connection is with Gremlin Console.

Follow the instructions here <> to verify the Gremlin Server is working.

IMPORTANT: A difference you should understand is that when working with JanusGraph Server, the Gremlin Console is started from underneath the JanusGraph distribution and when following the test instructions here for a standalone Gremlin Server, the Gremlin Console is started from under the TinkerPop distribution.

[source,java]

  1. GryoMapper mapper = GryoMapper.build().addRegistry(JanusGraphIoRegistry.INSTANCE).create();
  2. Cluster cluster = Cluster.build().serializer(new GryoMessageSerializerV3d0(mapper)).create();
  3. Client client = cluster.connect();
  4. client.submit("g.V()").all().get();

By adding the JanusGraphIoRegistry to the org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, the driver will know how to properly deserialize custom data types returned by JanusGraph.

Extending JanusGraph Server

It is possible to extend Gremlin Server with other means of communication by implementing the interfaces that it provides and leverage this with JanusGraph. See more details in the appropriate TinkerPop documentation.

include::deploymentscenarios.adoc[]

include::configuredgraphfactory.adoc[]

include::multinodejanusgraphcluster.adoc[]

indexes

== Indexing for Better Performance

JanusGraph supports two different kinds of indexing to speed up query processing: graph indexes and vertex-centric indexes. Most graph queries start the traversal from a list of vertices or edges that are identified by their properties. Graph indexes make these global retrieval operations efficient on large graphs. Vertex-centric indexes speed up the actual traversal through the graph, in particular when traversing through vertices with many incident edges.

graph-indexes

Graph Index

Graph indexes are global index structures over the entire graph which allow efficient retrieval of vertices or edges by their properties for sufficiently selective conditions. For instance, consider the following queries

[source, gremlin] g.V().has(‘name’, ‘hercules’) g.E().has(‘reason’, textContains(‘loves’))

The first query asks for all vertices with the name hercules. The second asks for all edges where the property reason contains the word loves. Without a graph index answering those queries would require a full scan over all vertices or edges in the graph to find those that match the given condition which is very inefficient and infeasible for huge graphs.

JanusGraph distinguishes between two types of graph indexes: composite and mixed indexes. Composite indexes are very fast and efficient but limited to equality lookups for a particular, previously-defined combination of property keys. Mixed indexes can be used for lookups on any combination of indexed keys and support multiple condition predicates in addition to equality depending on the backing index store.

Both types of indexes are created through the JanusGraph management system and the index builder returned by JanusGraphManagement.buildIndex(String, Class) where the first argument defines the name of the index and the second argument specifies the type of element to be indexed (e.g. Vertex.class). The name of a graph index must be unique. Graph indexes built against newly defined property keys, i.e. property keys that are defined in the same management transaction as the index, are immediately available. The same applies to graph indexes that are constrained to a label that is created in the same management transaction as the index. Graph indexes built against property keys that are already in use without being constrained to a newly created label require the execution of a <> to ensure that the index contains all previously added elements. Until the reindex procedure has completed, the index will not be available. It is encouraged to define graph indexes in the same transaction as the initial schema.

[NOTE] In the absence of an index, JanusGraph will default to a full graph scan in order to retrieve the desired list of vertices. While this produces the correct result set, the graph scan can be very inefficient and lead to poor overall system performance in a production environment. Enable the force-index configuration option in production deployments of JanusGraph to prohibit graph scans.

Composite Index

Composite indexes retrieve vertices or edges by one or a (fixed) composition of multiple keys. Consider the following composite index definitions.

[source, gremlin] graph.tx().rollback() //Never create new indexes while a transaction is active mgmt = graph.openManagement() name = mgmt.getPropertyKey(‘name’) age = mgmt.getPropertyKey(‘age’) mgmt.buildIndex(‘byNameComposite’, Vertex.class).addKey(name).buildCompositeIndex() mgmt.buildIndex(‘byNameAndAgeComposite’, Vertex.class).addKey(name).addKey(age).buildCompositeIndex() mgmt.commit() //Wait for the index to become available ManagementSystem.awaitGraphIndexStatus(graph, ‘byNameComposite’).call() ManagementSystem.awaitGraphIndexStatus(graph, ‘byNameAndAgeComposite’).call() //Reindex the existing data mgmt = graph.openManagement() mgmt.updateIndex(mgmt.getGraphIndex(“byNameComposite”), SchemaAction.REINDEX).get() mgmt.updateIndex(mgmt.getGraphIndex(“byNameAndAgeComposite”), SchemaAction.REINDEX).get() mgmt.commit()

First, two property keys name and age are already defined. Next, a simple composite index on just the name property key is built. JanusGraph will use this index to answer the following query.

[source, gremlin] g.V().has(‘name’, ‘hercules’)

The second composite graph index includes both keys. JanusGraph will use this index to answer the following query.

[source, gremlin] g.V().has(‘age’, 30).has(‘name’, ‘hercules’)

Note, that all keys of a composite graph index must be found in the query’s equality conditions for this index to be used. For example, the following query cannot be answered with either of the indexes because it only contains a constraint on age but not name.

[source, gremlin] g.V().has(‘age’, 30)

Also note, that composite graph indexes can only be used for equality constraints like those in the queries above. The following query would be answered with just the simple composite index defined on the name key because the age constraint is not an equality constraint.

[source, gremlin] g.V().has(‘name’, ‘hercules’).has(‘age’, inside(20, 50))

Composite indexes do not require configuration of an external indexing backend and are supported through the primary storage backend. Hence, composite index modifications are persisted through the same transaction as graph modifications which means that those changes are atomic and/or consistent if the underlying storage backend supports atomicity and/or consistency.

[NOTE] A composite index may comprise just one or multiple keys. A composite index with just one key is sometimes referred to as a key-index.

index-unique

= Index Uniqueness

Composite indexes can also be used to enforce property uniqueness in the graph. If a composite graph index is defined as unique() there can be at most one vertex or edge for any given concatenation of property values associated with the keys of that index. For instance, to enforce that names are unique across the entire graph the following composite graph index would be defined.

[source, gremlin] graph.tx().rollback() //Never create new indexes while a transaction is active mgmt = graph.openManagement() name = mgmt.getPropertyKey(‘name’) mgmt.buildIndex(‘byNameUnique’, Vertex.class).addKey(name).unique().buildCompositeIndex() mgmt.commit() //Wait for the index to become available ManagementSystem.awaitGraphIndexStatus(graph, ‘byNameUnique’).call() //Reindex the existing data mgmt = graph.openManagement() mgmt.updateIndex(mgmt.getGraphIndex(“byNameUnique”), SchemaAction.REINDEX).get() mgmt.commit()

[NOTE] To enforce uniqueness against an eventually consistent storage backend, the <> of the index must be explicitly set to enabling locking.

index-mixed

Mixed Index

Mixed indexes retrieve vertices or edges by any combination of previously added property keys. Mixed indexes provide more flexibility than composite indexes and support additional condition predicates beyond equality. On the other hand, mixed indexes are slower for most equality queries than composite indexes.

Unlike composite indexes, mixed indexes require the configuration of an <> and use that indexing backend to execute lookup operations. JanusGraph can support multiple indexing backends in a single installation. Each indexing backend must be uniquely identified by name in the JanusGraph configuration which is called the indexing backend name.

[source, gremlin] graph.tx().rollback() //Never create new indexes while a transaction is active mgmt = graph.openManagement() name = mgmt.getPropertyKey(‘name’) age = mgmt.getPropertyKey(‘age’) mgmt.buildIndex(‘nameAndAge’, Vertex.class).addKey(name).addKey(age).buildMixedIndex(“search”) mgmt.commit() //Wait for the index to become available ManagementSystem.awaitGraphIndexStatus(graph, ‘nameAndAge’).call() //Reindex the existing data mgmt = graph.openManagement() mgmt.updateIndex(mgmt.getGraphIndex(“nameAndAge”), SchemaAction.REINDEX).get() mgmt.commit()

The example above defines a mixed index containing the property keys name and age. The definition refers to the indexing backend name search so that JanusGraph knows which configured indexing backend it should use for this particular index. The search parameter specified in the buildMixedIndex call must match the second clause in the JanusGraph configuration definition like this: index.search.backend If the index was named ‘solrsearch’ then the configuration definition would appear like this: index.solrsearch.backend.

The mgmt.buildIndex example specified above uses text search as its default behavior. An index statement that explicitly defines the index as a text index can be written as follows:

[source,gremlin] mgmt.buildIndex(‘nameAndAge’,Vertex.class).addKey(name,Mapping.TEXT.getParameter()).addKey(age,Mapping.TEXT.getParameter()).buildMixedIndex(“search”)

See <> for more information on text and string search options, and see the documentation section specific to the indexing backend in use for more details on how each backend handles text versus string searches.

While the index definition example looks similar to the composite index above, it provides greater query support and can answer any of the following queries.

[source, gremlin] g.V().has(‘name’, textContains(‘hercules’)).has(‘age’, inside(20, 50)) g.V().has(‘name’, textContains(‘hercules’)) g.V().has(‘age’, lt(50)) g.V().has(‘age’, outside(20, 50)) g.V().has(‘age’, lt(50).or(gte(60))) g.V().or(.has(‘name’, textContains(‘hercules’)), .has(‘age’, inside(20, 50)))

Mixed indexes support full-text search, range search, geo search and others. Refer to <> for a list of predicates supported by a particular indexing backend.

[NOTE] Unlike composite indexes, mixed indexes do not support uniqueness.

= Adding Property Keys

Property keys can be added to an existing mixed index which allows subsequent queries to include this key in the query condition.

[source, gremlin] graph.tx().rollback() //Never create new indexes while a transaction is active mgmt = graph.openManagement() location = mgmt.makePropertyKey(‘location’).dataType(Geoshape.class).make() nameAndAge = mgmt.getGraphIndex(‘nameAndAge’) mgmt.addIndexKey(nameAndAge, location) mgmt.commit() //Previously created property keys already have the status ENABLED, but //our newly created property key “location” needs to REGISTER so we wait for both statuses ManagementSystem.awaitGraphIndexStatus(graph, ‘nameAndAge’).status(SchemaStatus.REGISTERED, SchemaStatus.ENABLED).call() //Reindex the existing data mgmt = graph.openManagement() mgmt.updateIndex(mgmt.getGraphIndex(“nameAndAge”), SchemaAction.REINDEX).get() mgmt.commit()

To add a newly defined key, we first retrieve the existing index from the management transaction by its name and then invoke the addIndexKey method to add the key to this index.

If the added key is defined in the same management transaction, it will be immediately available for querying. If the property key has already been in use, adding the key requires the execution of a <> to ensure that the index contains all previously added elements. Until the reindex procedure has completed, the key will not be available in the mixed index.

= Mapping Parameters

When adding a property key to a mixed index - either through the index builder or the addIndexKey method - a list of parameters can be optionally specified to adjust how the property value is mapped into the indexing backend. Refer to the <> for a complete list of parameter types supported by each indexing backend.

Ordering

The order in which the results of a graph query are returned can be defined using the order().by() directive. The order().by() method expects two parameters:

  • The name of the property key by which to order the results. The results will be ordered by the value of the vertices or edges for this property key.
  • The sort order: either ascending asc or descending desc

For example, the query g.V().has('name', textContains('hercules')).order().by('age', desc).limit(10) retrieves the ten oldest individuals with ‘hercules’ in their name.

When using order().by() it is important to note that:

  • Composite graph indexes do not natively support ordering search results. All results will be retrieved and then sorted in-memory. For large result sets, this can be very expensive.
  • Mixed indexes support ordering natively and efficiently. However, the property key used in the order().by() method must have been previously added to the mixed indexed for native result ordering support. This is important in cases where the the order().by() key is different from the query keys. If the property key is not part of the index, then sorting requires loading all results into memory.

Label Constraint

In many cases it is desirable to only index vertices or edges with a particular label. For instance, one may want to index only gods by their name and not every single vertex that has a name property. When defining an index it is possible to restrict the index to a particular vertex or edge label using the indexOnly method of the index builder. The following creates a composite index for the property key name that indexes only vertices labeled god.

[source, gremlin] graph.tx().rollback() //Never create new indexes while a transaction is active mgmt = graph.openManagement() name = mgmt.getPropertyKey(‘name’) god = mgmt.getVertexLabel(‘god’) mgmt.buildIndex(‘byNameAndLabel’, Vertex.class).addKey(name).indexOnly(god).buildCompositeIndex() mgmt.commit() //Wait for the index to become available ManagementSystem.awaitGraphIndexStatus(graph, ‘byNameAndLabel’).call() //Reindex the existing data mgmt = graph.openManagement() mgmt.updateIndex(mgmt.getGraphIndex(“byNameAndLabel”), SchemaAction.REINDEX).get() mgmt.commit()

Label restrictions similarly apply to mixed indexes. When a composite index with label restriction is defined as unique, the uniqueness constraint only applies to properties on vertices or edges for the specified label.

Composite versus Mixed Indexes

. Use a composite index for exact match index retrievals. Composite indexes do not require configuring or operating an external index system and are often significantly faster than mixed indexes. .. As an exception, use a mixed index for exact matches when the number of distinct values for query constraint is relatively small or if one value is expected to be associated with many elements in the graph (i.e. in case of low selectivity). . Use a mixed indexes for numeric range, full-text or geo-spatial indexing. Also, using a mixed index can speed up the order().by() queries.

vertex-indexes

Vertex-centric Indexes

Vertex-centric indexes are local index structures built individually per vertex. In large graphs vertices can have thousands of incident edges. Traversing through those vertices can be very slow because a large subset of the incident edges has to be retrieved and then filtered in memory to match the conditions of the traversal. Vertex-centric indexes can speed up such traversals by using localized index structures to retrieve only those edges that need to be traversed.

Suppose that Hercules battled hundreds of monsters in addition to the three captured in the introductory <>. Without a vertex-centric index, a query asking for those monsters battled between time point 10 and 20 would require retrieving all battled edges even though there are only a handful of matching edges.

[source, gremlin] h = g.V().has(‘name’, ‘hercules’).next() g.V(h).outE(‘battled’).has(‘time’, inside(10, 20)).inV()

Building a vertex-centric index by time speeds up such traversal queries. Note, this initial index example already exists in the Graph of the Gods as an index named edges. As a result, running the steps below will result in a uniqueness constraint error.

[source, gremlin] graph.tx().rollback() //Never create new indexes while a transaction is active mgmt = graph.openManagement() time = mgmt.getPropertyKey(‘time’) battled = mgmt.getEdgeLabel(‘battled’) mgmt.buildEdgeIndex(battled, ‘battlesByTime’, Direction.BOTH, Order.desc, time) mgmt.commit() //Wait for the index to become available ManagementSystem.awaitRelationIndexStatus(graph, ‘battlesByTime’).call() //Reindex the existing data mgmt = graph.openManagement() mgmt.updateIndex(mgmt.getGraphIndex(“battlesByTime”), SchemaAction.REINDEX).get() mgmt.commit()

This example builds a vertex-centric index which indexes battled edges in both direction by time in descending order. A vertex-centric index is built against a particular edge label which is the first argument to the index construction method JanusGraphManagement.buildEdgeIndex(). The index only applies to edges of this label - battled in the example above. The second argument is a unique name for the index. The third argument is the edge direction in which the index is built. The index will only apply to traversals along edges in this direction. In this example, the vertex-centric index is built in both direction which means that time restricted traversals along battled edges can be served by this index in both the IN and OUT direction. JanusGraph will maintain a vertex-centric index on both the in- and out-vertex of battled edges. Alternatively, one could define the index to apply to the OUT direction only which would speed up traversals from Hercules to the monsters but not in the reverse direction. This would only require maintaining one index and hence half the index maintenance and storage cost. The last two arguments are the sort order of the index and a list of property keys to index by. The sort order is optional and defaults to ascending order (i.e. Order.ASC). The list of property keys must be non-empty and defines the keys by which to index the edges of the given label. A vertex-centric index can be defined with multiple keys.

[source, gremlin] graph.tx().rollback() //Never create new indexes while a transaction is active mgmt = graph.openManagement() time = mgmt.getPropertyKey(‘time’) rating = mgmt.makePropertyKey(‘rating’).dataType(Double.class).make() battled = mgmt.getEdgeLabel(‘battled’) mgmt.buildEdgeIndex(battled, ‘battlesByRatingAndTime’, Direction.OUT, Order.desc, rating, time) mgmt.commit() //Wait for the index to become available ManagementSystem.awaitRelationIndexStatus(graph, ‘battlesByRatingAndTime’, ‘battled’).call() //Reindex the existing data mgmt = graph.openManagement() mgmt.updateIndex(mgmt.getRelationIndex(battled, ‘battlesByRatingAndTime’), SchemaAction.REINDEX).get() mgmt.commit()

This example extends the schema by a rating property on battled edges and builds a vertex-centric index which indexes battled edges in the out-going direction by rating and time in descending order. Note, that the order in which the property keys are specified is important because vertex-centric indexes are prefix indexes. This means, that battled edges are indexed by rating first and time second.

[source, gremlin] //Add some rating data h = g.V().has(‘name’, ‘hercules’).next() g.V(h).outE(‘battled’).property(‘rating’, 5.0) //Add some rating properties g.V(h).outE(‘battled’).has(‘rating’, gt(3.0)).inV() g.V(h).outE(‘battled’).has(‘rating’, 5.0).has(‘time’, inside(10, 50)).inV() g.V(h).outE(‘battled’).has(‘time’, inside(10, 50)).inV()

Hence, the battlesByRatingAndTime index can speed up the first two but not the third query.

Multiple vertex-centric indexes can be built for the same edge label in order to support different constraint traversals. JanusGraph’s query optimizer attempts to pick the most efficient index for any given traversal. Vertex-centric indexes only support equality and range/interval constraints.

[NOTE] The property keys used in a vertex-centric index must have an explicitly defined data type (i.e. not Object.class) which supports a native sort order. This means not only that they must implement Comparable but that their serializer must impement OrderPreservingSerializer. The types that are currently supported are Boolean, UUID, Byte, Float, Long, String, Integer, Date, Double, Character, and Short

If the vertex-centric index is built against an edge label that is defined in the same management transaction, the index will be immediately available for querying. If the edge label has already been in use, building a vertex-centric index against it requires the execution of a <> to ensure that the index contains all previously added edges. Until the reindex procedure has completed, the index will not be available.

[NOTE] JanusGraph automatically builds vertex-centric indexes per edge label and property key. That means, even with thousands of incident battled edges, queries like g.V(h).out('mother') or g.V(h).values('age') are efficiently answered by the local index.

Vertex-centric indexes cannot speed up unconstrained traversals which require traversing through all incident edges of a particular label. Those traversals will become slower as the number of incident edges increases. Often, such traversals can be rewritten as constrained traversals that can utilize a vertex-centric index to ensure acceptable performance at scale.

Ordered Traversals

The following queries specify an order in which the incident edges are to be traversed. Use the localLimit command to retrieve a subset of the edges (in a given order) for EACH vertex that is traversed.

[source, gremlin] h = g..V().has(‘name’, ‘hercules’).next() g.V(h).local(outE(‘battled’).order().by(‘time’, desc).limit(10)).inV().values(‘name’) g.V(h).local(outE(‘battled’).has(‘rating’, 5.0).order().by(‘time’, desc).limit(10)).values(‘place’)

The first query asks for the names of the 10 most recently battled monsters by Hercules. The second query asks for the places of the 10 most recent battles of Hercules that are rated 5 stars. In both cases, the query is constrained by an order on a property key with a limit on the number of elements to be returned.

Such queries can also be efficiently answered by vertex-centric indexes if the order key matches the key of the index and the requested order (i.e. ascending or descending) is the same as the one defined for the index. The battlesByTime index would be used to answer the first query and battlesByRatingAndTime applies to the second. Note, that the battlesByRatingAndTime index cannot be used to answer the first query because an equality constraint on rating must be present for the second key in the index to be effective.

[NOTE] Ordered vertex queries are a JanusGraph extension to Gremlin which causes the verbose syntax and requires the _() step to convert the JanusGraph result back into a Gremlin pipeline.

tx

== Transactions

Almost all interaction with JanusGraph is associated with a transaction. JanusGraph transactions are safe for concurrent use by multiple threads. Methods on a JanusGraph instance like graph.V(...) and graph.tx().commit() perform a ThreadLocal lookup to retrieve or create a transaction associated with the calling thread. Callers can alternatively forego ThreadLocal transaction management in favor of calling graph.tx().createThreadedTx(), which returns a reference to a transaction object with methods to read/write graph data and commit or rollback.

JanusGraph transactions are not necessarily ACID. They can be so configured on BerkeleyDB, but they are not generally so on Cassandra or HBase, where the underlying storage system does not provide serializable isolation or multi-row atomic writes and the cost of simulating those properties would be substantial.

This section describes JanusGraph’s transactional semantics and API.

Transaction Handling

Every graph operation in JanusGraph occurs within the context of a transaction. According to the TinkerPop’s transactional specification, each thread opens its own transaction against the graph database with the first operation (i.e. retrieval or mutation) on the graph:

  1. graph = JanusGraphFactory.open("berkeleyje:/tmp/janusgraph")
  2. juno = graph.addVertex() //Automatically opens a new transaction
  3. juno.property("name", "juno")
  4. graph.tx().commit() //Commits transaction

In this example, a local JanusGraph graph database is opened. Adding the vertex “juno” is the first operation (in this thread) which automatically opens a new transaction. All subsequent operations occur in the context of that same transaction until the transaction is explicitly stopped or the graph database is closed. If transactions are still open when close() is called, then the behavior of the outstanding transactions is technically undefined. In practice, any non-thread-bound transactions will usually be effectively rolled back, but the thread-bound transaction belonging to the thread that invoked shutdown will first be committed. Note, that both read and write operations occur within the context of a transaction.

Transactional Scope

All graph elements (vertices, edges, and types) are associated with the transactional scope in which they were retrieved or created. Under TinkerPop’s default transactional semantics, transactions are automatically created with the first operation on the graph and closed explicitly using commit() or rollback(). Once the transaction is closed, all graph elements associated with that transaction become stale and unavailable. However, JanusGraph will automatically transition vertices and types into the new transactional scope as shown in this example:

[source, gremlin] graph = JanusGraphFactory.open(“berkeleyje:/tmp/janusgraph”) juno = graph.addVertex() //Automatically opens a new transaction graph.tx().commit() //Ends transaction juno.property(“name”, “juno”) //Vertex is automatically transitioned

Edges, on the other hand, are not automatically transitioned and cannot be accessed outside their original transaction. They must be explicitly transitioned:

[source, gremlin] e = juno.addEdge(“knows”, graph.addVertex()) graph.tx().commit() //Ends transaction e = g.E(e).next() //Need to refresh edge e.property(“time”, 99)

Transaction Failures

When committing a transaction, JanusGraph will attempt to persist all changes to the storage backend. This might not always be successful due to IO exceptions, network errors, machine crashes or resource unavailability. Hence, transactions can fail. In fact, transactions will eventually fail in sufficiently large systems. Therefore, we highly recommend that your code expects and accommodates such failures:

[source, gremlin] try { if (g.V().has(“name”, name).iterator().hasNext()) throw new IllegalArgumentException(“Username already taken: “ + name) user = graph.addVertex() user.property(“name”, name) graph.tx().commit() } catch (Exception e) { //Recover, retry, or return error message println(e.getMessage()) }

The example above demonstrates a simplified user signup implementation where name is the name of the user who wishes to register. First, it is checked whether a user with that name already exists. If not, a new user vertex is created and the name assigned. Finally, the transaction is committed.

If the transaction fails, a JanusGraphException is thrown. There are a variety of reasons why a transaction may fail. JanusGraph differentiates between potentially temporary and permanent failures.

Potentially temporary failures are those related to resource unavailability and IO hiccups (e.g. network timeouts). JanusGraph automatically tries to recover from temporary failures by retrying to persist the transactional state after some delay. The number of retry attempts and the retry delay are configurable (see <>).

Permanent failures can be caused by complete connection loss, hardware failure or lock contention. To understand the cause of lock contention, consider the signup example above and suppose a user tries to signup with username “juno”. That username may still be available at the beginning of the transaction but by the time the transaction is committed, another user might have concurrently registered with “juno” as well and that transaction holds the lock on the username therefore causing the other transaction to fail. Depending on the transaction semantics one can recover from a lock contention failure by re-running the entire transaction.

Permanent exceptions that can fail a transaction include:

  • PermanentLockingException(Local lock contention): Another local thread has already been granted a conflicting lock.
  • PermanentLockingException(Expected value mismatch for X: expected=Y vs actual=Z): The verification that the value read in this transaction is the same as the one in the datastore after applying for the lock failed. In other words, another transaction modified the value after it had been read and modified.

multi-thread-tx

Multi-Threaded Transactions

JanusGraph supports multi-threaded transactions through TinkerPop’s https://tinkerpop.apache.org/docs/{tinkerpop_version}/reference#_threaded_transactions[threaded transactions]. Hence, to speed up transaction processing and utilize multi-core architectures multiple threads can run concurrently in a single transaction.

With TinkerPop’s default transaction handling, each thread automatically opens its own transaction against the graph database. To open a thread-independent transaction, use the createThreadedTx() method.

[source, gremlin] threadedGraph = graph.tx().createThreadedTx(); threads = new Thread[10]; for (int i=0; i<threads.length; i++) { threads[i]=new Thread({ println(“Do something with ‘threadedGraph’’”); }); threads[i].start(); } for (int i=0; i<threads.length; i++) threads[i].join(); threadedGraph.tx().commit();

The createThreadedTx() method returns a new Graph object that represents this newly opened transaction. The graph object tx supports all of the methods that the original graph did, but does so without opening new transactions for each thread. This allows us to start multiple threads which all work concurrently in the same transaction and one of which finally commits the transaction when all threads have completed their work.

JanusGraph relies on optimized concurrent data structures to support hundreds of concurrent threads running efficiently in a single transaction.

Concurrent Algorithms

Thread independent transactions started through createThreadedTx() are particularly useful when implementing concurrent graph algorithms. Most traversal or message-passing (ego-centric) like graph algorithms are https://en.wikipedia.org/wiki/Embarrassingly_parallel[embarrassingly parallel] which means they can be parallelized and executed through multiple threads with little effort. Each of these threads can operate on a single Graph object returned by createThreadedTx() without blocking each other.

Nested Transactions

Another use case for thread independent transactions is nested transactions that ought to be independent from the surrounding transaction.

For instance, assume a long running transactional job that has to create a new vertex with a unique name. Since enforcing unique names requires the acquisition of a lock (see <> for more detail) and since the transaction is running for a long time, lock congestion and expensive transactional failures are likely.

[source, gremlin] v1 = graph.addVertex() //Do many other things v2 = graph.addVertex() v2.property(“uniqueName”, “foo”) v1.addEdge(“related”, v2) //Do many other things graph.tx().commit() // This long-running tx might fail due to contention on its uniqueName lock

One way around this is to create the vertex in a short, nested thread-independent transaction as demonstrated by the following pseudo code::

[source, gremlin] v1 = graph.addVertex() //Do many other things tx = graph.tx().createThreadedTx() v2 = tx.addVertex() v2.property(“uniqueName”, “foo”) tx.commit() // Any lock contention will be detected here v1.addEdge(“related”, g.V(v2).next()) // Need to load v2 into outer transaction //Do many other things graph.tx().commit() // Can’t fail due to uniqueName write lock contention involving v2

Common Transaction Handling Problems

Transactions are started automatically with the first operation executed against the graph. One does NOT have to start a transaction manually. The method newTransaction is used to start <> only.

Transactions are automatically started under the TinkerPop semantics but not automatically terminated. Transactions must be terminated manually with commit() or rollback(). If a commit() transactions fails, it should be terminated manually with rollback() after catching the failure. Manual termination of transactions is necessary because only the user knows the transactional boundary.

A transaction will attempt to maintain its state from the beginning of the transaction. This might lead to unexpected behavior in multi-threaded applications as illustrated in the following artificial example::

[source, gremlin] v = g.V(4).next() // Retrieve vertex, first action automatically starts transaction g.V(v).bothE()

returns nothing, v has no edges //thread is idle for a few seconds, another thread adds edges to v g.V(v).bothE() still returns nothing because the transactional state from the beginning is maintained

Such unexpected behavior is likely to occur in client-server applications where the server maintains multiple threads to answer client requests. It is therefore important to terminate the transaction after a unit of work (e.g. code snippet, query, etc). So, the example above should be:

[source, gremlin] v = g.V(4).next() // Retrieve vertex, first action automatically starts transaction g.V(v).bothE() graph.tx().commit() //thread is idle for a few seconds, another thread adds edges to v g.V(v).bothE()

returns the newly added edge graph.tx().commit()

When using multi-threaded transactions via newTransaction all vertices and edges retrieved or created in the scope of that transaction are not available outside the scope of that transaction. Accessing such elements after the transaction has been closed will result in an exception. As demonstrated in the example above, such elements have to be explicitly refreshed in the new transaction using g.V(existingVertex) or g.E(existingEdge).

tx-config

Transaction Configuration

JanusGraph’s JanusGraph.buildTransaction() method gives the user the ability to configure and start a new <> against a JanusGraph. Hence, it is identical to JanusGraph.newTransaction() with additional configuration options.

buildTransaction() returns a TransactionBuilder which allows the following aspects of a transaction to be configured:

  • readOnly() - makes the transaction read-only and any attempt to modify the graph will result in an exception.
  • enableBatchLoading() - enables batch-loading for an individual transaction. This setting results in similar efficiencies as the graph-wide setting storage.batch-loading due to the disabling of consistency checks and other optimizations. Unlike storage.batch-loading this option will not change the behavior of the storage backend.
  • setTimestamp(long) - Sets the timestamp for this transaction as communicated to the storage backend for persistence. Depending on the storage backend, this setting may be ignored. For eventually consistent backends, this is the timestamp used to resolve write conflicts. If this setting is not explicitly specified, JanusGraph uses the current time.
  • setVertexCacheSize(long size) - The number of vertices this transaction caches in memory. The larger this number, the more memory a transaction can potentially consume. If this number is too small, a transaction might have to re-fetch data which causes delays in particular for long running transactions.
  • checkExternalVertexExistence(boolean) - Whether this transaction should verify the existence of vertices for user provided vertex ids. Such checks requires access to the database which takes time. The existence check should only be disabled if the user is absolutely sure that the vertex must exist - otherwise data corruption can ensue.
  • checkInternalVertexExistence(boolean) - Whether this transaction should double-check the existence of vertices during query execution. This can be useful to avoid phantom vertices on eventually consistent storage backends. Disabled by default. Enabling this setting can slow down query processing.
  • consistencyChecks(boolean) - Whether JanusGraph should enforce schema level consistency constraints (e.g. multiplicity constraints). Disabling consistency checks leads to better performance but requires that the user ensures consistency confirmation at the application level to avoid inconsistencies. USE WITH GREAT CARE!

Once, the desired configuration options have been specified, the new transaction is started via start() which returns a JanusGraphTransaction.

caching

== JanusGraph Cache

Caching

JanusGraph employs multiple layers of data caching to facilitate fast graph traversals. The caching layers are listed here in the order they are accessed from within a JanusGraph transaction. The closer the cache is to the transaction, the faster the cache access and the higher the memory footprint and maintenance overhead.

tx-cache

Transaction-Level Caching

Within an open transaction, JanusGraph maintains two caches:

  • Vertex Cache: Caches accessed vertices and their adjacency list (or subsets thereof) so that subsequent access is significantly faster within the same transaction. Hence, this cache speeds up iterative traversals.
  • Index Cache: Caches the results for index queries so that subsequent index calls can be served from memory instead of calling the index backend and (usually) waiting for one or more network round trips.

The size of both of those is determined by the transaction cache size. The transaction cache size can be configured via cache.tx-cache-size or on a per transaction basis by opening a transaction via the transaction builder graph.buildTransaction() and using the setVertexCacheSize(int) method.

Vertex Cache

The vertex cache contains vertices and the subset of their adjacency list that has been retrieved in a particular transaction. The maximum number of vertices maintained in this cache is equal to the transaction cache size. If the transaction workload is an iterative traversal, the vertex cache will significantly speed it up. If the same vertex is not accessed again in the transaction, the transaction level cache will make no difference.

Note, that the size of the vertex cache on heap is not only determined by the number of vertices it may hold but also by the size of their adjacency list. In other words, vertices with large adjacency lists (i.e. many incident edges) will consume more space in this cache than those with smaller lists.

Furthermore note, that modified vertices are pinned in the cache, which means they cannot be evicted since that would entail loosing their changes. Therefore, transaction which contain a lot of modifications may end up with a larger than configured vertex cache.

Index Cache

The index cache contains the results of index queries executed in the context of this transaction. Subsequent identical index calls will be served from this cache and are therefore significantly cheaper. If the same index call never occurs twice in the same transaction, the index cache makes no difference.

Each entry in the index cache is given a weight equal to 2 + result set size and the total weight of the cache will not exceed half of the transaction cache size.

db-cache

Database Level Caching

The database level cache retains adjacency lists (or subsets thereof) across multiple transactions and beyond the duration of a single transaction. The database level cache is shared by all transactions across a database. It is more space efficient than the transaction level caches but also slightly slower to access. In contrast to the transaction level caches, the database level caches do not expire immediately after closing a transaction. Hence, the database level cache significantly speeds up graph traversals for read heavy workloads across transactions.

<> lists all of the configuration options that pertain to JanusGraph’s database level cache. This page attempts to explain their usage.

Most importantly, the database level cache is disabled by default in the current release version of JanusGraph. To enable it, set cache.db-cache=true.

Cache Expiration Time

The most important setting for performance and query behavior is the cache expiration time which is configured via cache.db-cache-time. The cache will hold graph elements for at most that many milliseconds. If an element expires, the data will be re-read from the storage backend on the next access.

If there is only one JanusGraph instance accessing the storage backend or if this instance is the only one modifying the graph, the cache expiration can be set to 0 which disables cache expiration. This allows the cache to hold elements indefinitely (unless they are evicted due to space constraints or on update) which provides the best cache performance. Since no other JanusGraph instance is modifying the graph, there is no danger of holding on to stale data.

If there are multiple JanusGraph instances accessing the storage backend, the time should be set to the maximum time that can be allowed between another JanusGraph instance modifying the graph and this JanusGraph instance seeing the data. If any change should be immediately visible to all JanusGraph instances, the database level cache should be disabled in a distributed setup. However, for most applications it is acceptable that a particular JanusGraph instance sees remote modifications with some delay. The larger the maximally allowed delay, the better the cache performance. Note, that a given JanusGraph instance will always immediately see its own modifications to the graph irrespective of the configured cache expiration time.

Cache Size

The configuration option cache.db-cache-size controls how much heap space JanusGraph’s database level cache is allowed to consume. The larger the cache, the more effective it will be. However, large cache sizes can lead to excessive GC and poor performance.

The cache size can be configured as a percentage (expressed as a decimal between 0 and 1) of the total heap space available to the JVM running JanusGraph or as an absolute number of bytes.

Note, that the cache size refers to the amount of heap space that is exclusively occupied by the cache. JanusGraph’s other data structures and each open transaction will occupy additional heap space. If additional software layers are running in the same JVM, those may occupy a significant amount of heap space as well (e.g. Gremlin Server, embedded Cassandra, etc). Be conservative in your heap memory estimation. Configuring a cache that is too large can lead to out-of-memory exceptions and excessive GC.

Clean Up Wait Time

When a vertex is locally modified (e.g. an edge is added) all of the vertex’s related database level cache entries are marked as expired and eventually evicted. This will cause JanusGraph to refresh the vertex’s data from the storage backend on the next access and re-populate the cache.

However, when the storage backend is eventually consistent, the modifications that triggered the eviction may not yet be visible. By configuring cache.db-cache-clean-wait, the cache will wait for at least this many milliseconds before repopulating the cache with the entry retrieved from the storage backend.

If JanusGraph runs locally or against a storage backend that guarantees immediate visibility of modifications, this value can be set to 0.

Storage Backend Caching

Each storage backend maintains its own data caching layer. These caches benefit from compression, data compactness, coordinated expiration and are often maintained off heap which means that large caches can be used without running into garbage collection issues. While these caches can be significantly larger than the database level cache, they are also slower to access.

The exact type of caching and its properties depends on the particular <>. Please refer to the respective documentation for more information about the caching infrastructure and how to optimize it.

log

== Transaction Log

JanusGraph can automatically log transactional changes for additional processing or as a record of change. To enable logging for a particular transaction, specify the name of the target log during the start of the transaction.

[source, gremlin] tx = graph.buildTransaction().logIdentifier(‘addedPerson’).start() u = tx.addVertex(label, ‘human’) u.property(‘name’, ‘proteros’) u.property(‘age’, 36) tx.commit()

Upon commit, any changes made during the transaction are logged to the user logging system into a log named addedPerson. The user logging system is a configurable logging backend with a JanusGraph compatible log interface. By default, the log is written to a separate store in the primary storage backend which can be configured as described below. The log identifier specified during the start of the transaction identifies the log in which the changes are recorded thereby allowing different types of changes to be recorded in separate logs for individual processing.

[source, gremlin] tx = graph.buildTransaction().logIdentifier(‘battle’).start() h = tx.traversal().V().has(‘name’, ‘hercules’).next() m = tx.addVertex(label, ‘monster’) m.property(‘name’, ‘phylatax’) h.addEdge(‘battled’, m, ‘time’, 22) tx.commit()

JanusGraph provides a user transaction log processor framework to process the recorded transactional changes. The transaction log processor is opened via JanusGraphFactory.openTransactionLog(JanusGraph) against a previously opened JanusGraph graph instance. One can then add processors for a particular log which holds transactional changes.

[source, gremlin] import java.util.concurrent.atomic.; import org.janusgraph.core.log.; import java.util.concurrent.*; logProcessor = JanusGraphFactory.openTransactionLog(g); totalHumansAdded = new AtomicInteger(0); totalGodsAdded = new AtomicInteger(0); logProcessor.addLogProcessor(“addedPerson”). setProcessorIdentifier(“addedPersonCounter”). setStartTimeNow(). addProcessor(new ChangeProcessor() { @Override public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) { for (v in changeState.getVertices(Change.ADDED)) { if (v.label().equals(“human”)) totalHumansAdded.incrementAndGet(); } } }). addProcessor(new ChangeProcessor() { @Override public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) { for (v in changeState.getVertices(Change.ADDED)) { if (v.label().equals(“god”)) totalGodsAdded.incrementAndGet(); } } }). build();

In this example, a log processor is built for the user transaction log named addedPerson to process the changes made in transactions which used the addedPerson log identifier. Two change processors are added to this log processor. The first processor counts the number of humans added and the second counts the number of gods added to the graph.

When a log processor is built against a particular log, such as the addedPerson log in the example above, it will start reading transactional change records from the log immediately upon successful construction and initialization up to the head of the log. The start time specified in the builder marks the time point in the log where the log processor will start reading records. Optionally, one can specify an identifier for the log processor in the builder. The log processor will use the identifier to regularly persist its state of processing, i.e. it will maintain a marker on the last read log record. If the log processor is later restarted with the same identifier, it will continue reading from the last read record. This is particularly useful when the log processor is supposed to run for long periods of time and is therefore likely to fail. In such failure situations, the log processor can simply be restarted with the same identifier. It must be ensured that log processor identifiers are unique in a JanusGraph cluster in order to avoid conflicts on the persisted read markers.

A change processor must implement the ChangeProcessor interface. It’s process() method is invoked for each change record read from the log with a JanusGraphTransaction handle, the id of the transaction that caused the change, and a ChangeState container which holds the transactional changes. The change state container can be queried to retrieve individual elements that were part of the change state. In the example, all added vertices are retrieved. Refer to the API documentation for a description of all the query methods on ChangeState. The provided transaction id can be used to investigate the origin of the transaction which is uniquely identified by the combination of the id of the JanusGraph instance that executed the transaction (txId.getInstanceId()) and the instance specific transaction id (txId.getTransactionId()). In addition, the time of the transaction is available through txId.getTransactionTime().

Change processors are executed individually and in multiple threads. If a change processor accesses global state it must be ensured that such state allows concurrent access. While the log processor reads log records sequentially, the changes are processed in multiple threads so it cannot be guaranteed that the log order is preserved in the change processors.

Note, that log processors run each registered change processor at least once for each record in the log which means that a single transactional change record may be processed multiple times under certain failure conditions. One cannot add or remove change processor from a running log processor. In other words, a log processor is immutable after it is built. To change log processing, start a new log processor and shut down an existing one.

[source, gremlin] logProcessor.addLogProcessor(“battle”). setProcessorIdentifier(“battleTimer”). setStartTimeNow(). addProcessor(new ChangeProcessor() { @Override public void process(JanusGraphTransaction tx, TransactionId txId, ChangeState changeState) { h = tx.V().has(“name”, “hercules”).toList().iterator().next(); for (edge in changeState.getEdges(h, Change.ADDED, Direction.OUT, “battled”)) { if (edge.value(“time”)>1000) h.property(“oldFighter”, true); } } }). build();

The log processor above processes transactions for the battle log identifier with a single change processor which evaluates battled edges that were added to Hercules. This example demonstrates that the transaction handle passed into the change processor is a normal JanusGraphTransaction which query the JanusGraph graph and make changes to it.

Transaction Log Use Cases

Record of Change

The user transaction log can be used to keep a record of all changes made against the graph. By using separate log identifiers, changes can be recorded in different logs to distinguish separate transaction types.

At any time, a log processor can be built which can processes all recorded changes starting from the desired start time. This can be used for forensic analysis, to replay changes against a different graph, or to compute an aggregate.

Downstream Updates

It is often the case that a JanusGraph graph cluster is part of a larger architecture. The user transaction log and the log processor framework provide the tools needed to broadcast changes to other components of the overall system without slowing down the original transactions causing the change. This is particularly useful when transaction latencies need to be low and/or there are a number of other systems that need to be alerted to a change in the graph.

Triggers

The user transaction log provides the basic infrastructure to implement triggers that can scale to a large number of concurrent transactions and very large graphs. A trigger is registered with a particular change of data and either triggers an event in an external system or additional changes to the graph. At scale, it is not advisable to implement triggers in the original transaction but rather process triggers with a slight delay through the log processor framework. The second example shows how changes to the graph can be evaluated and trigger additional modifications.

Log Configuration

There are a number of configuration options to fine tune how the log processor reads from the log. Refer to the complete list of configuration options <> for the options under the log namespace. To configure the user transaction log, use the log.user namespace. The options listed there allow the configuration of the number of threads to be used, the number of log records read in each batch, the read interval, and whether the transaction change records should automatically expire and be removed from the log after a configurable amount of time (TTL).

include::configref.adoc[]

common-questions

== Common Questions

Accidental type creation

By default, JanusGraph will automatically create property keys and edge labels when a new type is encountered. It is strongly encouraged that users explicitly schemata as documented in <> before loading any data and disable automatic type creation by setting the option schema.default = none.

Automatic type creation can cause problems in multi-threaded or highly concurrent environments. Since JanusGraph needs to ensure that types are unique, multiple attempts at creating the same type will lead to locking or other exceptions. It is generally recommended to create all needed types up front or in one batch when new property keys and edge labels are needed.

Custom Class Datatype

JanusGraph supports arbitrary objects as attribute values on properties. To use a custom class as data type in JanusGraph, either register a custom serializer or ensure that the class has a no-argument constructor and implements the equals method because JanusGraph will verify that it can successfully de-/serialize objects of that class. Please see <> for more information.

Transactional Scope for Edges

Edges should not be accessed outside the scope in which they were originally created or retrieved.

Locking Exceptions

When defining unique types with <> (i.e. requesting that JanusGraph ensures uniqueness) it is likely to encounter locking exceptions of the type PermanentLockingException under concurrent modifications to the graph.

Such exceptions are to be expected, since JanusGraph cannot know how to recover from a transactional state where an earlier read value has been modified by another transaction since this may invalidate the state of the transaction. In most cases it is sufficient to simply re-run the transaction. If locking exceptions are very frequent, try to analyze and remove the source of congestion.

Ghost Vertices

When the same vertex is concurrently removed in one transaction and modified in another, both transactions will successfully commit on eventually consistent storage backends and the vertex will still exist with only the modified properties or edges. This is referred to as a ghost vertex. It is possible to guard against ghost vertices on eventually consistent backends using key <> but this is prohibitively expensive in most cases. A more scalable approach is to allow ghost vertices temporarily and clearing them out in regular time intervals.

Another option is to detect them at read-time using the option checkInternalVertexExistence() documented in <>.

Debug-level Logging Slows Execution

When the log level is set to DEBUG JanusGraph produces a lot of logging output which is useful to understand how particular queries get compiled, optimized, and executed. However, the output is so large that it will impact the query performance noticeably. Hence, use INFO severity or higher for production systems or benchmarking.

JanusGraph OutOfMemoryException or excessive Garbage Collection

If you experience memory issues or excessive garbage collection while running JanusGraph it is likely that the caches are configured incorrectly. If the caches are too large, the heap may fill up with cache entries. Try reducing the size of the transaction level cache before tuning the database level cache, in particular if you have many concurrent transactions. See <> for more information.

JAMM Warning Messages

When launching JanusGraph with embedded Cassandra, the following warnings may be displayed:

958 [MutationStage:25] WARN org.apache.cassandra.db.Memtable - MemoryMeter uninitialized (jamm not specified as java agent); assuming liveRatio of 10.0. Usually this means cassandra-env.sh disabled jamm because you are using a buggy JRE; upgrade to the Sun JRE instead

Cassandra uses a Java agent called MemoryMeter which allows it to measure the actual memory use of an object, including JVM overhead. To use https://github.com/jbellis/jamm[JAMM] (Java Agent for Memory Measurements), the path to the JAMM jar must be specific in the Java javaagent parameter when launching the JVM (e.g. -javaagent:path/to/jamm.jar) through either janusgraph.sh, gremlin.sh, or Gremlin Server:

[source, bash] export JANUSGRAPH_JAVA_OPTS=-javaagent:$JANUSGRAPH_HOME/lib/jamm-$MAVEN{jamm.version}.jar

Cassandra Connection Problem

By default, JanusGraph uses the Astyanax library to connect to Cassandra clusters. On EC2 and Rackspace, it has been reported that Astyanax was unable to establish a connection to the cluster. In those cases, changing the backend to storage.backend=cassandrathrift solved the problem.

Elasticsearch OutOfMemoryException

When numerous clients are connecting to Elasticsearch, it is likely that an OutOfMemoryException occurs. This is not due to a memory issue, but to the OS not allowing more threads to be spawned by the user (the user running Elasticsearch). To circumvent this issue, increase the number of allowed processes to the user running Elasticsearch. For example, increase the ulimit -u from the default 1024 to 10024.

Dropping a Database

To drop a database using the Gremlin Console you can call JanusGraphFactory.drop(graph). The graph you want to drop needs to be defined prior to running the drop method.

With ConfiguredGraphFactory

  1. graph = ConfiguredGraphFactory.open('example')
  2. ConfiguredGraphFactory.drop('example');

With JanusGraphFactory

  1. graph = JanusGraphFactory.open('path/to/configuration.properties')
  2. JanusGraphFactory.drop(graph);

Note that on JanusGraph versions prior to 0.3.0 if multiple Gremlin Server instances are connecting to the graph that has been dropped it is reccomended to close the graph on all active nodes by running either JanusGraphFactory.close(graph) or ConfiguredGraphFactory.close("example") depending on which graph manager is in use. Closing and reopening the graph on all active nodes will prevent cached(stale) references to the graph that has been dropped. ConfiguredGraphFactory graphs that are dropped may need to have their configurations recreated using the <> or <>.

limitations

== Technical Limitations

There are various limitations and “gotchas” that one should be aware of when using JanusGraph. Some of these limitations are necessary design choices and others are issues that will be rectified as JanusGraph development continues. Finally, the last section provides solutions to common issues.

Design Limitations

These limitations reflect long-term tradeoffs design tradeoffs which are either difficult or impractical to change. These limitations are unlikely to be removed in the near future.

Size Limitation

JanusGraph can store up to a quintillion edges (2^60) and half as many vertices. That limitation is imposed by JanusGraph’s id scheme.

DataType Definitions

When declaring the data type of a property key using dataType(Class) JanusGraph will enforce that all properties for that key have the declared type, unless that type is Object.class. This is an equality type check, meaning that sub-classes will not be allowed. For instance, one cannot declare the data type to be Number.class and use Integer or Long. For efficiency reasons, the type needs to match exactly. Hence, use Object.class as the data type for type flexibility. In all other cases, declare the actual data type to benefit from increased performance and type safety.

Edge Retrievals are O(log(k)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Retrieving an edge by id, e.g tx.getEdge(edge.getId()), is not a constant time operation because it requires an index call on one of its adjacent vertices. Hence, the cost of retrieving an individual edge by its id is O(log(k)) where k is the number of incident edges on the adjacent vertex. JanusGraph will attempt to pick the adjacent vertex with the smaller degree.

This also applies to index retrievals for edges via a standard or external index.

Type Definitions cannot be changed

The definition of an edge label, property key, or vertex label cannot be changed once it has been committed to the graph. However, a type can be renamed and new types can be created at runtime to accommodate an evolving schema.

Reserved Keywords

There are certain keywords that JanusGraph uses internally for types that cannot be used otherwise. These types include vertex labels, edge labels, and property keys. The following are keywords that cannot be used:

  • vertex
  • element
  • edge
  • property
  • label
  • key

For example, if you attempt to create a vertex with the label of property, you will receive an exception regarding protected system types.

Temporary Limitations

These are limitations in JanusGraph’s current implementation. These limitations could reasonably be removed in upcoming versions of JanusGraph.

Limited Mixed Index Support

Mixed indexes only support a subset of the data types that JanusGraph supports. See <> for a current listing. Also, mixed indexes do not currently support property keys with SET or LIST cardinality.

Batch Loading Speed

JanusGraph provides a batch loading mode that can be enabled through the <>. However, this batch mode only facilitates faster loading into the storage backend, it does not use storage backend specific batch loading techniques that prepare the data in memory for disk storage. As such, batch loading in JanusGraph is currently slower than batch loading modes provided by single machine databases. <> contains information on speeding up batch loading in JanusGraph.

Another limitation related to batch loading is the failure to load millions of edges into a single vertex at once or in a short time of period. Such supernode loading can fail for some storage backends. This limitation also applies to dense index entries.