Dgraph的 GraphQL+- 基于Facebook的GraphQL。GraphQL不是为Graph数据库开发的,但它的图形式查询语法,模式验证和子图形状响应使其成为一种很好的语言选择。我们修改了语言以更好地支持图形操作,添加和删除功能以最适合图形数据库。我们称这种简化的,功能丰富的语言为“GraphQL+- ”。

GraphQL+-正在开发中。我们正在添加更多功能,我们可能会进一步简化现有功能。

浏览一下 - https://tour.dgraph.io

本文档是Dgraph查询参考资料。这不是一个教程。它被设计为已经知道如何在GraphQL+-中编写查询的用户的参考,但需要检查语法,索引或函数等。

注意 如果您是Dgraph的新手并想学习如何使用Dgraph和GraphQL+-,请浏览一下 - https://tour.dgraph.io

运行示例

该参考文献中的示例使用了关于电影和演员的2100万三倍的数据库。示例查询运行并返回结果。查询由运行在 https://play.dgraph.io/ 的Dgraph实例执行。要在本地运行查询或进行更多实验,请参阅入门 指南,该指南还说明如何加载此处示例中使用的数据集。

GraphQL+-基本原理

GraphQL+- 查询根据搜索条件查找节点,匹配图中的模式并返回图形作为结果。

查询由嵌套块组成,从查询根开始。根找到初始节点集,应用以下图形匹配和过滤。

返回结果

每个查询都有一个名称,在查询根目录中指定,并且相同的名称标识结果。

如果边是值类型,则可以通过给出边名来返回该值。

查询示例:在示例数据集中,将电影链接到导演和演员的边,电影具有许多众所周知的电影数据库的名称,发布日期和标识符。这个名为bladerunner的查询,以及与电影名称匹配的根,返回80年代早期科幻经典“Blade Runner”的值。

  1. {
  2. bladerunner(func: eq(name@en, "Blade Runner")) {
  3. uid
  4. name@en
  5. initial_release_date
  6. netflix_id
  7. }
  8. }

对于所有“name”边缘等于“Blade Runner”的节点,查询首先使用索引搜索图形,以使搜索更有效。对于找到的节点,查询然后返回列出的传出边。

每个节点都有一个唯一的64位标识符。上面查询中的uid边缘返回该标识符。如果已知所需节点,则函数“uid”找到该节点。

查询示例:通过UID找到的“Blade Runner”电影数据。

  1. {
  2. bladerunner(func: uid(0x107b2c)) {
  3. uid
  4. name@en
  5. initial_release_date
  6. netflix_id
  7. }
  8. }

查询可以匹配许多节点并返回每个节点的值。

查询示例:名称中包含“Blade”或“Runner”的所有节点。

  1. {
  2. bladerunner(func: anyofterms(name@en, "Blade Runner")) {
  3. uid
  4. name@en
  5. initial_release_date
  6. netflix_id
  7. }
  8. }

可以在列表中为uid函数指定多个ID。

查询示例:

  1. {
  2. movies(func: uid(0x107b2c, 0x85f961)) {
  3. uid
  4. name@en
  5. initial_release_date
  6. netflix_id
  7. ~director.film {
  8. uid
  9. name@en
  10. }
  11. }
  12. }

注意 如果你的谓词有特殊字符,那么你应该在查询中询问它时用尖括号包装它。 例如 <first:name>

扩展图形边缘

查询通过使用 { } 嵌套查询块来扩展节点之间的边缘。

Query Example: 查询在“Blade Runner”中扮演的演员和角色。 查询首先找到名为“Blade Runner”的节点, 然后将传出的“starring”边缘跟随表示actor作为角色的表现的节点。 从那里扩展performance.actorperformance,character 的边,以找到电影中每个演员的演员姓名和角色。

  1. {
  2. brCharacters(func: eq(name@en, "Blade Runner")) {
  3. name@en
  4. initial_release_date
  5. starring {
  6. performance.actor {
  7. name@en # actor name
  8. }
  9. performance.character {
  10. name@en # character name
  11. }
  12. }
  13. }
  14. }

注释

# 后面的任何内容都是注释。

应用过滤器

查询根找到一组初始节点,查询通过返回值和后续边缘继续进一步节点继续进行 - 查询中到达的任何节点都是在根搜索后通过遍历找到的。 找到的节点可以通过在根之后或任何边缘应用 @ filter 进行过滤。

查询示例: 查询导演雷德利斯科特在2000年之前发布的电影“Blade Runner”。

  1. {
  2. scott(func: eq(name@en, "Ridley Scott")) {
  3. name@en
  4. initial_release_date
  5. director.film @filter(le(initial_release_date, "2000")) {
  6. name@en
  7. initial_release_date
  8. }
  9. }
  10. }

查询示例:查询标题中带有“Blade”或“Runner”,并在2000年之前发行的电影。

  1. {
  2. bladerunner(func: anyofterms(name@en, "Blade Runner")) @filter(le(initial_release_date, "2000")) {
  3. uid
  4. name@en
  5. initial_release_date
  6. netflix_id
  7. }
  8. }

语言支持

注意 @lang 指令 必须明确指定对应schema 的 query 或者 mutate 的谓语后面跟随的特定的语言标记.

Dgraph 支持 UTF-8 字符.

在一个查询中, 对于一个字符类型的 edge, 语法如下

  1. edge@lang1:...:langN

按照如下规则,依据指定的顺序与语言返回

  • 最多一个返回.
  • 列表返回顺序,优先级从左到右: 一个没找到就往下找,找到就返回.
  • 这个列表都没匹配到,就不返回.
  • 最后一个 .意味着没有指定语言或者对应的值没有匹配到特定的语言, 某个被匹配到的语言会返回.

举例如下:

  • name => 查找标识值为空的语言; 如果标识值不存在,什么都不返回.
  • name@. => 查找标识值为空的语言,然后查找任意一种语言.
  • name@en => 查找标识值为 en的语言; 如果标识值 en 没匹配到,什么都不返回.
  • name@en:. => 查找标识值为 en的语言, 然后是未标识的语言, 最后是任意一种语言.
  • name@en:pl => 查找标识值为 en的语言, 然后是 pl, 没有匹配到,什么都不返回.
  • name@en:pl:. => 查找标识值为en的语言, 然后是pl的语言, 接着找未标识的语言, 最后是任意一种语言.

注意 在函数中,不支持多种语言.只能查找一种语言, . 符号 和 属性名也不支持多语言

注意 在全文搜索函数中(alloftext, anyoftext), 没有指定特定语言的标识值,默认使用英文标识器tokenizer

查询案例: 查找演员 Farhan Akhtar’s 导演的电影 按照下面三种语言分类 俄语 印度语 英语 显示,其他语言不显示.

  1. {
  2. q(func: allofterms(name@en, "Farhan Akhtar")) {
  3. name@hi
  4. name@en
  5. director.film {
  6. name@ru:hi:en
  7. name@en
  8. name@hi
  9. name@ru
  10. }
  11. }
  12. }

函数

注意 函数只能被应用在已经建立索引的谓词上

函数允许基于节点或变量的属性进行筛选。函数可以应用于查询根或过滤器中。

对于字符串值谓词上的函数,如果没有提供语言首选项,则将该函数应用于所有没有语言标记的语言和字符串;如果给定了语言首选项,则该函数仅应用于给定语言的字符串.

词匹配

所有的词 allofterms

语法: allofterms(predicate, "space-separated term list")

Schema 类型: string

索引 要求: term

匹配以任何顺序包含所有指定项的字符串;不区分大小写.

在根节点使用 Usage at root

查询案例: 在全部节点中查找 名字name 包含词indianajones, 返回 英语名字 和英语电影类型.

  1. {
  2. me(func: allofterms(name@en, "jones indiana")) {
  3. name@en
  4. genre {
  5. name@en
  6. }
  7. }
  8. }
使用过滤器

查询案例: 所有史蒂文·斯皮尔伯格的电影中 indiana and jones这两个词. 这个过滤器@filter(has(director.film)) 删除不是Steven Spielberg导演的节点 —- 这些数据还包括一部名为史蒂文·斯皮尔伯格(Steven Spielberg)的电影中的一个角色.

  1. {
  2. me(func: eq(name@en, "Steven Spielberg")) @filter(has(director.film)) {
  3. name@en
  4. director.film @filter(allofterms(name@en, "jones indiana")) {
  5. name@en
  6. }
  7. }
  8. }

任意一个词 anyofterms

语法: anyofterms(predicate, "space-separated term list")

Schema 类型: string

索引 要求: term

匹配具有任意顺序指定项的字符串;不区分大小写.

在根节点使用 Usage at root

查询案例: 所有节点名字name包含 poison 或者 peacock. 返回的许多节点是影片, 但是像琼·皮科克(Joan Peacock)这样的人也符合搜索条件,因为级联指令没有指定查询需要类型。

  1. {
  2. me(func:anyofterms(name@en, "poison peacock")) {
  3. name@en
  4. genre {
  5. name@en
  6. }
  7. }
  8. }
Usage as filter

查询案例: 所有史蒂文·斯皮尔伯格的电影都包含战争或间谍 war or spies。过滤器 @filter(has(director.film))过滤掉名为Steven Spielberg的节点,但这些这些节点不是导演———这些数据包含一个名为Steven Spielberg的电影中的角色。

  1. {
  2. me(func: eq(name@en, "Steven Spielberg")) @filter(has(director.film)) {
  3. name@en
  4. director.film @filter(anyofterms(name@en, "war spies")) {
  5. name@en
  6. }
  7. }
  8. }

正则表达式

语法: regexp(predicate, /regular-expression/) 或不区分大小写 regexp(predicate, /regular-expression/i)

Schema 类型: string

索引 要求: trigram

通过正则表达式匹配字符串。正则表达式语言是go语言的正则表达式.

查询案例: 从根节点开始,将节点与名称开头为 Steven Sp匹配,后面跟着任何字符。对于每个匹配的uid,匹配包含 ryan的影片。注意与 allofterms函数的不同, allofterms函数 只匹配ryan但是正则表达式会所有包含ryan, 例如 bryan.

  1. {
  2. directors(func: regexp(name@en, /^Steven Sp.*$/)) {
  3. name@en
  4. director.film @filter(regexp(name@en, /ryan/i)) {
  5. name@en
  6. }
  7. }
  8. }

技术细节

一个三元组是三个连续符文的子串。例如 Dgraph 的三元组有 Dgr, gra, rap, aph

为保证正则表达式匹配的效率, Dgraph使用三元组索引trigram indexing.也就是说,Dgraph将正则表达式转换为三元组查询,使用三元组索引和三元组查询查找可能的匹配项,并仅对可能的项应用完整的正则表达式搜索。

编写有效的正则表达式和限制条件

当你设计正则表达式查询语句是把如下建议记在脑中

  • 至少一个三元组必须与正则表达式匹配 (不支持正则模式少于三个字符的)。也就是说, Dgraph 要求查询的正则表达式要能被转换为一个三元组.
  • 正则表达式匹配的备选三元组的数量应该尽可能少([a-zA-Z][a-zA-Z][0-9] 这样的正则不是一个好选择)。许多可能的匹配意味着对要对许多字符串检查完整的正则表达式; 然而,如果正则表达式强制匹配更多的三元组,Dgraph 最好地使用索引,并根据更小的可能匹配集代替检查完整的正则表达式。
  • 因此,正则表达式应该尽可能精确。匹配较长的字符串意味着需要更多的三元组,这有助于有效地使用索引。
  • 如果使用重复的正则符号(*, +, ?, {n,m}), 整个正则表达式必须不匹配空字符串或任意字符串,例如, * 可以这样使用[Aa]bcd* 但不能这样 (abcd)* 或者这样 (abcd)|((defg)*)
  • 重复的正则符号在括号表达式后面(例如. [fgh]{7}, [0-9]+ or [a-z]{3,5}) 通常被认为匹配任意字符串因为他们匹配太多元组。
  • 如果部分结果(对于三元组的子集) 超过 1000000 uids 在索引扫描时,这条查询由于过于查询代价过于昂贵阻止被禁止。

全文检索

语法: alloftext(predicate, "space-separated text")anyoftext(predicate, "space-separated text")

Schema 类型: string

索引 要求: fulltext

应用词干分析和停止词的全文检索来查找匹配所有或任何给定文本的字符串。

在索引生成和处理全文搜索参数时,应用以下步骤:

  1. 标记化(根据Unicode单词边界)。
  2. 转换为小写的。
  3. 单点标准化(以KC形式标准化)(to Normalization Form KC).
  4. 使用特定于语言的词干分析器进行词干分析(如果有语言支持)。
  5. 停止词删除(如果有语言支持)。

Dgraph使用bleve作为全文搜索索引。参见bleve语言特定的停止单词列表

下表包含所有支持的语言,对应的国家代码,词干提取和停止过滤支持。

Language Country Code Stemming Stop words
Arabic ar
Armenian hy
Basque eu
Bulgarian bg
Catalan ca
Chinese zh
Czech cs
Danish da
Dutch nl
English en
Finnish fi
French fr
Gaelic ga
Galician gl
German de
Greek el
Hindi hi
Hungarian hu
Indonesian id
Italian it
Japanese ja
Korean ko
Norwegian no
Persian fa
Portuguese pt
Romanian ro
Russian ru
Spanish es
Swedish sv
Turkish tr

查询案例: 所有名字有run, running, 等词 和 man。消除停止字 themaybe

  1. {
  2. movie(func:alloftext(name@en, "the man maybe runs")) {
  3. name@en
  4. }
  5. }

不等式

等于

语法例子:

  • eq(predicate, value)
  • eq(val(varName), value)
  • eq(predicate, val(varName))
  • eq(count(predicate), value)
  • eq(predicate, [val1, val2, ..., valN])

Schema 类型: int, float, bool, string, dateTime

索引 要求: eq(predicate, ...) 需要一个索引 (请参阅下面的表)。 对于在查询根 count(predicate),需要@count上有索引.对于变量,值是作为查询的一部分计算的,因此不需要索引。

Type Index Options
int int
float float
bool bool
string exact, hash
dateTime dateTime

测试谓词或变量的值是否相等或能否与列表中的值对应。

布尔常量是 true and false, 因此对于 eq , 就变成了, eq(boolPred, true).

查询示例: 恰好属于有13种类型的电影.

  1. {
  2. me(func: eq(count(genre), 13)) {
  3. name@en
  4. genre {
  5. name@en
  6. }
  7. }
  8. }

查询示例: 名字叫史蒂文且执导过1部、2部或3部电影。

  1. {
  2. steve as var(func: allofterms(name@en, "Steven")) {
  3. films as count(director.film)
  4. }
  5. stevens(func: uid(steve)) @filter(eq(val(films), [1,2,3])) {
  6. name@en
  7. numFilms : val(films)
  8. }
  9. }

小于,小于或等于,大于,大于或等于

语法示例:不等式 IE

  • IE(predicate, value)
  • IE(val(varName), value)
  • IE(predicate, val(varName))
  • IE(count(predicate), value)

IE 可以替换成下面这些

  • le 小于或等于
  • lt 小于
  • ge 大于或等于
  • gt 大于

Schema 类型: int, float, string, dateTime

索引 要求: IE(predicate, ...) 需要一个索引 (请参阅下面的表)。 对于在查询根 count(predicate),需要@count上有索引.对于变量,值是作为查询的一部分计算的,因此不需要索引。

Type Index Options
int int
float float
string exact
dateTime dateTime

查询示例: 1980年以前上映的雷德利·斯科特电影。

  1. {
  2. me(func: eq(name@en, "Ridley Scott")) {
  3. name@en
  4. director.film @filter(lt(initial_release_date, "1980-01-01")) {
  5. initial_release_date
  6. name@en
  7. }
  8. }
  9. }

查询示例:电影导演名字含有 Steven 同时指导超过100名演员。

  1. {
  2. ID as var(func: allofterms(name@en, "Steven")) {
  3. director.film {
  4. num_actors as count(starring)
  5. }
  6. total as sum(val(num_actors))
  7. }
  8. dirs(func: uid(ID)) @filter(gt(val(total), 100)) {
  9. name@en
  10. total_actors : val(total)
  11. }
  12. }

查询示例:每类电影超过30000部。因为这边没有指定电影种类返回按照什么顺序排序 将使用UID排序。count 索引记录节点外的边数,并进行更多的查询。

  1. {
  2. genre(func: gt(count(~genre), 30000)){
  3. name@en
  4. ~genre (first:1) {
  5. name@en
  6. }
  7. }
  8. }

查询示例:查找名字为斯蒂芬·斯皮尔伯格导演的电影,同时要求initial_release_date大于(大于就是晚于)电影《少数派报告》的initial_release_date(首次发布日期)。

  1. {
  2. var(func: eq(name@en,"Minority Report")) {
  3. d as initial_release_date
  4. }
  5. me(func: eq(name@en, "Steven Spielberg")) {
  6. name@en
  7. director.film @filter(ge(initial_release_date, val(d))) {
  8. initial_release_date
  9. name@en
  10. }
  11. }
  12. }

uid

语法示例:

  • q(func: uid(<uid>))
  • predicate @filter(uid(<uid1>, ..., <uidn>))
  • predicate @filter(uid(a)) 使用变量 a
  • q(func: uid(a,b)) 使用变量 ab

将当前查询级别的节点过滤到给定uid集中的节点。

对于查询变量 a, uid(a)表示存储在 a 其中的一组uid。 对于值变量 b, uid(b) 表示从UID到值映射的UID. 有两个或两个以上的变量, uid(a,b,...)表示所有变量的并集。

uid(<uid>), 像标识函数一样, 即使节点没有任何边, 也会返回请求的 UID。

查询示例: 如果已知节点的UID,则可以直接读取该节点的值。如已知电影普里扬卡·乔普的UID 为 0x878110,可以通过UID直接查

  1. {
  2. films(func: uid(0x878110)) {
  3. name@hi
  4. actor.film {
  5. performance.film {
  6. name@hi
  7. }
  8. }
  9. }
  10. }

查询示例: 塔拉吉·汉森的电影按类型划分

  1. {
  2. var(func: allofterms(name@en, "Taraji Henson")) {
  3. actor.film {
  4. F as performance.film {
  5. G as genre
  6. }
  7. }
  8. }
  9. Taraji_films_by_genre(func: uid(G)) {
  10. genre_name : name@en
  11. films : ~genre @filter(uid(F)) {
  12. film_name : name@en
  13. }
  14. }
  15. }

查询示例: 塔拉吉·汉森的电影按类型划分然后排序,最后统计每种类型电影的。

  1. {
  2. var(func: allofterms(name@en, "Taraji Henson")) {
  3. actor.film {
  4. F as performance.film {
  5. G as count(genre)
  6. genre {
  7. C as count(~genre @filter(uid(F)))
  8. }
  9. }
  10. }
  11. }
  12. Taraji_films_by_genre_count(func: uid(G), orderdesc: val(G)) {
  13. film_name : name@en
  14. genres : genre (orderdesc: val(C)) {
  15. genre_name : name@en
  16. }
  17. }
  18. }

uid_in

语法 例子:

  • q(func: ...) @filter(uid_in(predicate, <uid>)
  • predicate1 @filter(uid_in(predicate2, <uid>)

Schema 类型: UID

索引 要求: 无

uid 函数则根据uid过滤当前级别的节点,函数 uid_in 允许沿着边缘向前查看,以检查它是否指向特定的UID。这通常可以保存一个额外的查询块,并避免返回边缘。

uid_in 不能在根节点下使用,它接受一个UID常量作为参数(而不是变量)。

查询示例: Marc Caro和Jean-PierreJeunet(UID 0x6777ba)的合作。如果Jean-Pierre Jeunet的UID是已知的, 通过这种~director.film方式进行查询,就不需要一个块将其UID提取到变量中,也不需要额外的边缘遍历和过滤器 .

  1. {
  2. caro(func: eq(name@en, "Marc Caro")) {
  3. name@en
  4. director.film @filter(uid_in(~director.film, 0x6777ba)){
  5. name@en
  6. }
  7. }
  8. }

has

语法 例子: has(predicate)

Schema 类型: all

确定节点是否具有特定谓词。

查询示例: 前五位导演和他们所有的电影都有上映日期的记录。导演至少导演过一部电影——相当于 gt(count(director.film), 0).

  1. {
  2. me(func: has(director.film), first: 5) {
  3. name@en
  4. director.film @filter(has(initial_release_date)) {
  5. initial_release_date
  6. name@en
  7. }
  8. }
  9. }

定位

注意 到目前为止,我们只支持索引点、多边形和多边形集合类型。

注意,对于定位查询,任何带有孔的多边形都将被替换为外部循环,忽略孔洞。另外,对于0.7.7版本,多边形包含检查是近似的。

Mutations(变化)

要使用geo函数,谓词上需要一个索引。

  1. loc: geo @index(geo) .

下面是如何添加一个点

  1. {
  2. set {
  3. <_:0xeb1dde9c> <loc> "{'type':'Point','coordinates':[-122.4220186,37.772318]}"^^<geo:geojson> .
  4. <_:0xeb1dde9c> <name> "Hamon Tower" .
  5. }
  6. }

下面是如何将“多边形”与节点关联。添加一个“多边形集合”也是类似的。

  1. {
  2. set {
  3. <_:0xf76c276b> <loc> "{'type':'Polygon','coordinates':[[[-122.409869,37.7785442],[-122.4097444,37.7786443],[-122.4097544,37.7786521],[-122.4096334,37.7787494],[-122.4096233,37.7787416],[-122.4094004,37.7789207],[-122.4095818,37.7790617],[-122.4097883,37.7792189],[-122.4102599,37.7788413],[-122.409869,37.7785442]],[[-122.4097357,37.7787848],[-122.4098499,37.778693],[-122.4099025,37.7787339],[-122.4097882,37.7788257],[-122.4097357,37.7787848]]]}"^^<geo:geojson> .
  4. <_:0xf76c276b> <name> "Best Western Americana Hotel" .
  5. }
  6. }

上面的例子是从我们的SF旅游数据集中挑选出来的。

查询

接近

语法 例子: near(predicate, [long, lat], distance)

Schema 类型: geo

索引 要求: geo

匹配’谓词’所给出的位置在distance米的geojson[long,lat]坐标的所有实体。

查询示例: 旅游景点在1公里内的一个点在旧金山的金门公园。

  1. {
  2. tourist(func: near(loc, [-122.469829, 37.771935], 1000) ) {
  3. name
  4. }
  5. }
在…之内

语法 例子: within(predicate, [[[long1, lat1], ..., [longN, latN]]])

Schema 类型: geo

索引 要求: geo

匹配“谓词”给出的位置位于geojson坐标数组指定的多边形中的所有实体。

查询示例: 旧金山金门公园指定区域内的旅游景点。

  1. {
  2. tourist(func: within(loc, [[[-122.47266769409178, 37.769018558337926 ], [ -122.47266769409178, 37.773699921075135 ], [ -122.4651575088501, 37.773699921075135 ], [ -122.4651575088501, 37.769018558337926 ], [ -122.47266769409178, 37.769018558337926]]] )) {
  3. name
  4. }
  5. }
包含

语法 例子: contains(predicate, [long, lat]) or contains(predicate, [[long1, lat1], ..., [longN, latN]])

Schema 类型: geo

索引 要求: geo

匹配“谓词”给出的坐标的多边形[long, lat]或给定的geojson多边形的所有实体

查询示例 :所有实体中包含一个点在火烈鸟围场的旧金山动物园。

  1. {
  2. tourist(func: contains(loc, [ -122.50326097011566, 37.73353615592843 ] )) {
  3. name
  4. }
  5. }
交叉

语法 例子: intersects(predicate, [[[long1, lat1], ..., [longN, latN]]])

Schema 类型: geo

索引 要求: geo

匹配”谓词”给定位置的多边形与给定geojson多边形相交的所有实体。

  1. {
  2. tourist(func: intersects(loc, [[[-122.503325343132, 37.73345766902749 ], [ -122.503325343132, 37.733903134117966 ], [ -122.50271648168564, 37.733903134117966 ], [ -122.50271648168564, 37.73345766902749 ], [ -122.503325343132, 37.73345766902749]]] )) {
  3. name
  4. }
  5. }

连接过滤器

@filter中,多个函数可以与布尔连接词一起使用。

AND, OR and NOT

连接词 AND, ORNOT 连接过滤器,可以构建到任意复杂的过滤器中, 比如 (NOT A OR B) AND (C AND NOT (D OR E))。 注意, NOTAND 绑定比 NOTOR 更紧密。

查询示例 :所有史蒂文斯皮尔伯格电影包含’印第安纳’和’琼斯’或“侏罗纪”和“公园”。

  1. {
  2. me(func: eq(name@en, "Steven Spielberg")) @filter(has(director.film)) {
  3. name@en
  4. director.film @filter(allofterms(name@en, "jones indiana") OR allofterms(name@en, "jurassic park")) {
  5. uid
  6. name@en
  7. }
  8. }
  9. }

别名

语法 例子:

  • aliasName : predicate
  • aliasName : predicate { ... }
  • aliasName : varName as ...
  • aliasName : count(predicate)
  • aliasName : max(val(varName))

别名可以结果中提供另一个名称。谓词,变量和聚合可以通过添加:来添加别名。别名不必与原始谓词名不同,但是,在一个作用域内, 别名必须与同一作用域中返回的谓词名和其他别名不同。别名可用于在一个作用域内多次返回相同的谓词。

查询示例 : 名称与“Steven”相匹配的导演,他们的UID,英文名,每部电影的平均演员人数,每部电影的总数量以及每部电影的英文和法文名称。

  1. {
  2. ID as var(func: allofterms(name@en, "Steven")) @filter(has(director.film)) {
  3. director.film {
  4. num_actors as count(starring)
  5. }
  6. average as avg(val(num_actors))
  7. }
  8. films(func: uid(ID)) {
  9. director_id : uid
  10. english_name : name@en
  11. average_actors : val(average)
  12. num_films : count(director.film)
  13. films : director.film {
  14. name : name@en
  15. english_name : name@en
  16. french_name : name@fr
  17. }
  18. }
  19. }

分页

分页允许只返回部分结果集,而不是返回整个结果集。这对于top-k风格的查询非常有用同时也减小结果集的大小对于 客户端处理或允许分页访问结果。for client side processing or to allow paged access to results.

分页通常用于排序。

注意 在没有指定排序顺序的情况下,结果按“uid”进行排序,uid是随机分配的。因此,虽然顺序是确定的,但可能不是您所期望的。

First

语法 例子:

  • q(func: ..., first: N)
  • predicate (first: N) { ... }
  • predicate @filter(...) (first: N) { ... }

对于正数N, ‘first: N根据排序或UID顺序检索第一个N结果。

对于负数 N, first: N 根据排序或UID顺序检索最后一个N结果,目前,负数只有在排序后才支持。要通过排序实现负数查找的效果,请颠倒排序的顺序后使用正数N

查询示例 : 史蒂文斯皮尔伯格的导演和的前三种电影类型的最后两部电影,通过UID排序,按英文名称的字母顺序排序。

  1. {
  2. me(func: allofterms(name@en, "Steven Spielberg")) {
  3. director.film (first: -2) {
  4. name@en
  5. initial_release_date
  6. genre (orderasc: name@en) (first: 3) {
  7. name@en
  8. }
  9. }
  10. }
  11. }

查询示例 : 所有名字含有叫史蒂文的导演中执导过最多演员的前三位导演

  1. {
  2. ID as var(func: allofterms(name@en, "Steven")) @filter(has(director.film)) {
  3. director.film {
  4. stars as count(starring)
  5. }
  6. totalActors as sum(val(stars))
  7. }
  8. mostStars(func: uid(ID), orderdesc: val(totalActors), first: 3) {
  9. name@en
  10. stars : val(totalActors)
  11. director.film {
  12. name@en
  13. }
  14. }
  15. }

Offset 偏移量

语法 例子:

  • q(func: ..., offset: N)
  • predicate (offset: N) { ... }
  • predicate (first: M, offset: N) { ... }
  • predicate @filter(...) (offset: N) { ... }

使用offset: N时,不会返回第一个 N 结果。组合使用first: M, offset: N ,从第N项后开始,返回M项数据,第N会被跳过。

查询示例 : 徐克的电影所有按英文名排序,跳过前4条数据返回后续的6条数据。

  1. {
  2. me(func: allofterms(name@en, "Hark Tsui")) {
  3. name@zh
  4. name@en
  5. director.film (orderasc: name@en) (first:6, offset:4) {
  6. genre {
  7. name@en
  8. }
  9. name@zh
  10. name@en
  11. initial_release_date
  12. }
  13. }
  14. }

在…之后 After

语法 例子:

  • q(func: ..., after: UID)
  • predicate (first: N, after: UID) { ... }
  • predicate @filter(...) (first: N, after: UID) { ... }

另一种方法得到跳过某些结果的返回结果后可以使用默认的UID排序,并直接跳过由UID指定的节点。例如,第一个查询可以是这种形式predicate (after: 0x0, first: N),或者 predicate (first: N), 使用表单的后续查询 predicate(after: <uid of last entity in last result>, first: N).

查询示例 :巴兹·鲁曼电影的前五部,按UID顺序排列。

  1. {
  2. me(func: allofterms(name@en, "Baz Luhrmann")) {
  3. name@en
  4. director.film (first:5) {
  5. uid
  6. name@en
  7. }
  8. }
  9. }

第五部电影是澳大利亚经典电影《舞动奇迹》。它的UID是0x8116e4。 在《舞动奇迹》的后一项可以使用after获得。

  1. {
  2. me(func: allofterms(name@en, "Baz Luhrmann")) {
  3. name@en
  4. director.film (first:5, after: 0x264ce8) {
  5. uid
  6. name@en
  7. }
  8. }
  9. }

计算

语法 例子:

  • count(predicate)
  • count(uid)

这个方式 count(predicate) 可以计算有多少 predicate 边从节点引出。

这个方式 count(uid) 可以计算有多少 UIDs 匹配在封闭块中。

查询示例 : 以“奥兰多”Orlando为名字的演员出演的电影数量。

  1. {
  2. me(func: allofterms(name@en, "Orlando")) @filter(has(actor.film)) {
  3. name@en
  4. count(actor.film)
  5. }
  6. }

计算可以在根目录下使用并别名。

查询示例 : 计算已执导五部以上影片的导演人数。在查询根上使用时,count需要索引。

  1. {
  2. directors(func: gt(count(director.film), 5)) {
  3. totalDirectors : count(uid)
  4. }
  5. }

Count可以赋值给值变量。

查询示例 : 李安的《吃喝男女》的演员们按出演电影的数量排序。

  1. {
  2. var(func: allofterms(name@en, "eat drink man woman")) {
  3. starring {
  4. actors as performance.actor {
  5. totalRoles as count(actor.film)
  6. }
  7. }
  8. }
  9. edmw(func: uid(actors), orderdesc: val(totalRoles)) {
  10. name@en
  11. name@zh
  12. totalRoles : val(totalRoles)
  13. }
  14. }

排序

语法 例子:

  • q(func: ..., orderasc: predicate)
  • q(func: ..., orderdesc: val(varName))
  • predicate (orderdesc: predicate) { ... }
  • predicate @filter(...) (orderasc: N) { ... }
  • q(func: ..., orderasc: predicate1, orderdesc: predicate2)

可排序的类型: int, float, String, dateTime, id, default

结果可以通过谓词或变量按升序(orderasc)或降序(orderdesc)排序。

对于使用可排序索引对谓词排序,Dgraph对值和索引并行排序,并返回最先计算的结果。

默认情况下,排序查询最多检索1000个结果。这可以使用first改变。

查询示例 : 法国导演让-皮埃尔·让内的电影按上映日期排序。

  1. {
  2. me(func: allofterms(name@en, "Jean-Pierre Jeunet")) {
  3. name@fr
  4. director.film(orderasc: initial_release_date) {
  5. name@fr
  6. name@en
  7. initial_release_date
  8. }
  9. }
  10. }

排序可以在根变量和值变量上执行。

查询示例 : 所有电影类型按字母顺序排列,返回每类中属于最多类型的前5部电影。

  1. {
  2. genres as var(func: has(~genre)) {
  3. ~genre {
  4. numGenres as count(genre)
  5. }
  6. }
  7. genres(func: uid(genres), orderasc: name@en) {
  8. name@en
  9. ~genre (orderdesc: val(numGenres), first: 5) {
  10. name@en
  11. genres : val(numGenres)
  12. }
  13. }
  14. }

排序还可以由多个谓词执行,如下所示,第一个谓词的值相等,然后根据第二个谓词排序,依此类推。

找到所有具有Person类型的节点,按它们的firstname对它们排序,在具有相同firstname的节点中按lastname降序对它们排序。

  1. {
  2. me(func: eq(type, "Person", orderasc: first_name, orderdesc: last_name)) {
  3. first_name
  4. last_name
  5. }
  6. }

多个查询块

在单个查询中,允许使用多个查询块。返回结果是具有一致块名称的所有块。

多个查询块会并行执行。

这些块不需要以任何方式关联。

查询示例 :所有安吉丽娜·朱莉(Angelina Jolie)的各种类型的电影, 彼得杰克逊导演的电影自2008年以后的电影。

  1. {
  2. AngelinaInfo(func:allofterms(name@en, "angelina jolie")) {
  3. name@en
  4. actor.film {
  5. performance.film {
  6. genre {
  7. name@en
  8. }
  9. }
  10. }
  11. }
  12. DirectorInfo(func: eq(name@en, "Peter Jackson")) {
  13. name@en
  14. director.film @filter(ge(initial_release_date, "2008")) {
  15. Release_date: initial_release_date
  16. Name: name@en
  17. }
  18. }
  19. }

如果查询在答案中包含一些重叠,则结果集仍然是独立的。

查询示例 : 麦肯齐·克鲁克演过的电影,杰克·达文波特演过的电影。结果集重叠,因为它们都在《加勒比海盗》电影中扮演过角色,但查询返回结果是独立的,并且都包含完整的答案集。

  1. {
  2. Mackenzie(func:allofterms(name@en, "Mackenzie Crook")) {
  3. name@en
  4. actor.film {
  5. performance.film {
  6. uid
  7. name@en
  8. }
  9. performance.character {
  10. name@en
  11. }
  12. }
  13. }
  14. Jack(func:allofterms(name@en, "Jack Davenport")) {
  15. name@en
  16. actor.film {
  17. performance.film {
  18. uid
  19. name@en
  20. }
  21. performance.character {
  22. name@en
  23. }
  24. }
  25. }
  26. }

声明作用域 Var Blocks

声明作用域以关键字 var 开头不会在查询结果中返回。

查询示例 : 安吉丽娜·朱莉的电影按类型排序。

  1. {
  2. var(func:allofterms(name@en, "angelina jolie")) {
  3. name@en
  4. actor.film {
  5. A AS performance.film {
  6. B AS genre
  7. }
  8. }
  9. }
  10. films(func: uid(B), orderasc: name@en) {
  11. name@en
  12. ~genre @filter(uid(A)) {
  13. name@en
  14. }
  15. }
  16. }

查询变量

语法 例子:

  • varName as q(func: ...) { ... }
  • varName as var(func: ...) { ... }
  • varName as predicate { ... }
  • varName as predicate @filter(...) { ... }

类型 : uid

在查询的某个位置匹配的节点(UID)可以存储在变量中,并在其他地方使用。查询变量可用于其他查询块或定义块的子节点。

查询变量在定义时不影响查询的语义。查询变量被计算到定义块匹配的所有节点。

通常,查询块是并行执行的,但是变量对某些块施加了一个计算顺序。不允许由变量依赖引起的循环。

如果定义了一个变量,那么它必须在查询的其他地方使用。

一个查询变量通过使用uid(var-name)提取其中的uid。

语法func: uid(A,B) 或者 @filter(uid(A,B)) 是对AB 的变量UIDs的联合.

查询示例 : 安吉丽娜·朱莉和布拉德·皮特的电影,他们都曾出演过同一类型的电影。 注意,“B”和“D”匹配所有电影的所有类型,而不是每部电影的类型。

  1. {
  2. var(func:allofterms(name@en, "angelina jolie")) {
  3. actor.film {
  4. A AS performance.film { # All films acted in by Angelina Jolie
  5. B As genre # Genres of all the films acted in by Angelina Jolie
  6. }
  7. }
  8. }
  9. var(func:allofterms(name@en, "brad pitt")) {
  10. actor.film {
  11. C AS performance.film { # All films acted in by Brad Pitt
  12. D as genre # Genres of all the films acted in by Brad Pitt
  13. }
  14. }
  15. }
  16. films(func: uid(D)) @filter(uid(B)) { # Genres from both Angelina and Brad
  17. name@en
  18. ~genre @filter(uid(A, C)) { # Movies in either A or C.
  19. name@en
  20. }
  21. }
  22. }

值变量

Syntax Examples:

  • varName as scalarPredicate
  • varName as count(predicate)
  • varName as avg(...)
  • varName as math(...)

Types : int, float, String, dateTime, id, default, geo, bool

Value variables store scalar values. Value variables are a map from the UIDs of the enclosing block to the corresponding values.

It therefore only makes sense to use the values from a value variable in a context that matches the same UIDs - if used in a block matching different UIDs the value variable is undefined.

It is an error to define a value variable but not use it elsewhere in the query.

Value variables are used by extracting the values with val(var-name), or by extracting the UIDs with uid(var-name).

Facet values can be stored in value variables.

Query Example: The number of movie roles played by the actors of the 80’s classic “The Princess Bride”. Query variable pbActors matches the UIDs of all actors from the movie. Value variable roles is thus a map from actor UID to number of roles. Value variable roles can be used in the the totalRoles query block because that query block also matches the pbActors UIDs, so the actor to number of roles map is available.

  1. {
  2. var(func:allofterms(name@en, "The Princess Bride")) {
  3. starring {
  4. pbActors as performance.actor {
  5. roles as count(actor.film)
  6. }
  7. }
  8. }
  9. totalRoles(func: uid(pbActors), orderasc: val(roles)) {
  10. name@en
  11. numRoles : val(roles)
  12. }
  13. }

Value variables can be used in place of UID variables by extracting the UID list from the map.

Query Example: The same query as the previous example, but using value variable roles for matching UIDs in the totalRoles query block.

  1. {
  2. var(func:allofterms(name@en, "The Princess Bride")) {
  3. starring {
  4. performance.actor {
  5. roles as count(actor.film)
  6. }
  7. }
  8. }
  9. totalRoles(func: uid(roles), orderasc: val(roles)) {
  10. name@en
  11. numRoles : val(roles)
  12. }
  13. }

Variable Propagation

Like query variables, value variables can be used in other query blocks and in blocks nested within the defining block. When used in a block nested within the block that defines the variable, the value is computed as a sum of the variable for parent nodes along all paths to the point of use. This is called variable propagation.

For example:

  1. {
  2. q(func: uid(0x01)) {
  3. myscore as math(1) # A
  4. friends { # B
  5. friends { # C
  6. ...myscore...
  7. }
  8. }
  9. }
  10. }

At line A, a value variable myscore is defined as mapping node with UID 0x01 to value 1. At B, the value for each friend is still 1: there is only one path to each friend. Traversing the friend edge twice reaches the friends of friends. The variable myscore gets propagated such that each friend of friend will receive the sum of its parents values: if a friend of a friend is reachable from only one friend, the value is still 1, if they are reachable from two friends, the value is two and so on. That is, the value of myscore for each friend of friends inside the block marked C will be the number of paths to them.

The value that a node receives for a propagated variable is the sum of the values of all its parent nodes.

This propagation is useful, for example, in normalizing a sum across users, finding the number of paths between nodes and accumulating a sum through a graph.

Query Example: For each Harry Potter movie, the number of roles played by actor Warwick Davis.

  1. {
  2. num_roles(func: eq(name@en, "Warwick Davis")) @cascade @normalize {
  3. paths as math(1) # records number of paths to each character
  4. actor : name@en
  5. actor.film {
  6. performance.film @filter(allofterms(name@en, "Harry Potter")) {
  7. film_name : name@en
  8. characters : math(paths) # how many paths (i.e. characters) reach this film
  9. }
  10. }
  11. }
  12. }

Query Example: Each actor who has been in a Peter Jackson movie and the fraction of Peter Jackson movies they have appeared in.

  1. {
  2. movie_fraction(func:eq(name@en, "Peter Jackson")) @normalize {
  3. paths as math(1)
  4. total_films : num_films as count(director.film)
  5. director : name@en
  6. director.film {
  7. starring {
  8. performance.actor {
  9. fraction : math(paths / (num_films/paths))
  10. actor : name@en
  11. }
  12. }
  13. }
  14. }
  15. }

More examples can be found in two Dgraph blog posts about using variable propagation for recommendation engines (post 1, post 2).

Aggregation

Syntax Example: AG(val(varName))

For AG replaced with

  • min : select the minimum value in the value variable varName
  • max : select the maximum value
  • sum : sum all values in value variable varName
  • avg : calculate the average of values in varName

Schema Types:

Aggregation Schema Types
min / max int, float, string, dateTime, default
sum / avg int, float

Aggregation can only be applied to value variables. An index is not required (the values have already been found and stored in the value variable mapping).

An aggregation is applied at the query block enclosing the variable definition. As opposed to query variables and value variables, which are global, aggregation is computed locally. For example:

  1. A as predicateA {
  2. ...
  3. B as predicateB {
  4. x as ...some value...
  5. }
  6. min(val(x))
  7. }

Here, A and B are the lists of all UIDs that match these blocks. Value variable x is a mapping from UIDs in B to values. The aggregation min(val(x)), however, is computed for each UID in A. That is, it has a semantics of: for each UID in A, take the slice of x that corresponds to A‘s outgoing predicateB edges and compute the aggregation for those values.

Aggregations can themselves be assigned to value variables, making a UID to aggregation map.

Min

Usage at Root

Query Example: Get the min initial release date for any Harry Potter movie.

The release date is assigned to a variable, then it is aggregated and fetched in an empty block.

  1. {
  2. var(func: allofterms(name@en, "Harry Potter")) {
  3. d as initial_release_date
  4. }
  5. me() {
  6. min(val(d))
  7. }
  8. }

Usage at other levels.

Query Example: Directors called Steven and the date of release of their first movie, in ascending order of first movie.

  1. {
  2. stevens as var(func: allofterms(name@en, "steven")) {
  3. director.film {
  4. ird as initial_release_date
  5. # ird is a value variable mapping a film UID to its release date
  6. }
  7. minIRD as min(val(ird))
  8. # minIRD is a value variable mapping a director UID to their first release date
  9. }
  10. byIRD(func: uid(stevens), orderasc: val(minIRD)) {
  11. name@en
  12. firstRelease: val(minIRD)
  13. }
  14. }

Max

Usage at Root

Query Example: Get the max initial release date for any Harry Potter movie.

The release date is assigned to a variable, then it is aggregated and fetched in an empty block.

  1. {
  2. var(func: allofterms(name@en, "Harry Potter")) {
  3. d as initial_release_date
  4. }
  5. me() {
  6. max(val(d))
  7. }
  8. }

Usage at other levels.

Query Example: Quentin Tarantino’s movies and date of release of the most recent movie.

  1. {
  2. director(func: allofterms(name@en, "Quentin Tarantino")) {
  3. director.film {
  4. name@en
  5. x as initial_release_date
  6. }
  7. max(val(x))
  8. }
  9. }

Sum and Avg

Usage at Root

Query Example: Get the sum and average of number of count of movies directed by people who have Steven or Tom in their name.

  1. {
  2. var(func: anyofterms(name@en, "Steven Tom")) {
  3. a as count(director.film)
  4. }
  5. me() {
  6. avg(val(a))
  7. sum(val(a))
  8. }
  9. }

Usage at other levels.

Query Example: Steven Spielberg’s movies, with the number of recorded genres per movie, and the total number of genres and average genres per movie.

  1. {
  2. director(func: eq(name@en, "Steven Spielberg")) {
  3. name@en
  4. director.film {
  5. name@en
  6. numGenres : g as count(genre)
  7. }
  8. totalGenres : sum(val(g))
  9. genresPerMovie : avg(val(g))
  10. }
  11. }

Aggregating Aggregates

Aggregations can be assigned to value variables, and so these variables can in turn be aggregated.

Query Example: For each actor in a Peter Jackson film, find the number of roles played in any movie. Sum these to find the total number of roles ever played by all actors in the movie. Then sum the lot to find the total number of roles ever played by actors who have appeared in Peter Jackson movies. Note that this demonstrates how to aggregate aggregates; the answer in this case isn’t quite precise though, because actors that have appeared in multiple Peter Jackson movies are counted more than once.

  1. {
  2. PJ as var(func:allofterms(name@en, "Peter Jackson")) {
  3. director.film {
  4. starring { # starring an actor
  5. performance.actor {
  6. movies as count(actor.film)
  7. # number of roles for this actor
  8. }
  9. perf_total as sum(val(movies))
  10. }
  11. movie_total as sum(val(perf_total))
  12. # total roles for all actors in this movie
  13. }
  14. gt as sum(val(movie_total))
  15. }
  16. PJmovies(func: uid(PJ)) {
  17. name@en
  18. director.film (orderdesc: val(movie_total), first: 5) {
  19. name@en
  20. totalRoles : val(movie_total)
  21. }
  22. grandTotal : val(gt)
  23. }
  24. }

Math on value variables

Value variables can be combined using mathematical functions. For example, this could be used to associate a score which is then be used to order or perform other operations, such as might be used in building newsfeeds, simple recommendation systems and the likes.

Math statements must be enclosed within math( <exp> ) and must be stored to a value variable.

The supported operators are as follows:

Operators Types accepted What it does
+ - * / % int, float performs the corresponding operation
min max All types except geo, bool (binary functions) selects the min/max value among the two
< > <= >= == != All types except geo, bool Returns true or false based on the values
floor ceil ln exp sqrt int, float (unary function) performs the corresponding operation
since dateTime Returns the number of seconds in float from the time specified
pow(a, b) int, float Returns a to the power b
logbase(a,b) int, float Returns log(a) to the base b
cond(a, b, c) first operand must be a boolean selects b if a is true else c

Query Example: Form a score for each of Steven Spielberg’s movies as the sum of number of actors, number of genres and number of countries. List the top five such movies in order of decreasing score.

  1. {
  2. var(func:allofterms(name@en, "steven spielberg")) {
  3. films as director.film {
  4. p as count(starring)
  5. q as count(genre)
  6. r as count(country)
  7. score as math(p + q + r)
  8. }
  9. }
  10. TopMovies(func: uid(films), orderdesc: val(score), first: 5){
  11. name@en
  12. val(score)
  13. }
  14. }

Value variables and aggregations of them can be used in filters.

Query Example: Calculate a score for each Steven Spielberg movie with a condition on release date to penalize movies that are more than 10 years old, filtering on the resulting score.

  1. {
  2. var(func:allofterms(name@en, "steven spielberg")) {
  3. films as director.film {
  4. p as count(starring)
  5. q as count(genre)
  6. date as initial_release_date
  7. years as math(since(date)/(365*24*60*60))
  8. score as math(cond(years > 10, 0, ln(p)+q-ln(years)))
  9. }
  10. }
  11. TopMovies(func: uid(films), orderdesc: val(score)) @filter(gt(val(score), 2)){
  12. name@en
  13. val(score)
  14. val(date)
  15. }
  16. }

Values calculated with math operations are stored to value variables and so can be aggreated.

Query Example: Compute a score for each Steven Spielberg movie and then aggregate the score.

  1. {
  2. steven as var(func:eq(name@en, "Steven Spielberg")) @filter(has(director.film)) {
  3. director.film {
  4. p as count(starring)
  5. q as count(genre)
  6. r as count(country)
  7. score as math(p + q + r)
  8. }
  9. directorScore as sum(val(score))
  10. }
  11. score(func: uid(steven)){
  12. name@en
  13. val(directorScore)
  14. }
  15. }

GroupBy

Syntax Examples:

  • q(func: ...) @groupby(predicate) { min(...) }
  • `predicate @groupby(pred) { count(uid) }``

A groupby query aggregates query results given a set of properties on which to group elements. For example, a query containing the block friend @groupby(age) { count(uid) }, finds all nodes reachable along the friend edge, partitions these into groups based on age, then counts how many nodes are in each group. The returned result is the grouped edges and the aggregations.

Inside a groupby block, only aggregations are allowed and count may only be applied to uid.

If the groupby is applied to a uid predicate, the resulting aggregations can be saved in a variable (mapping the grouped UIDs to aggregate values) and used elsewhere in the query to extract information other than the grouped or aggregated edges.

Query Example: For Steven Spielberg movies, count the number of movies in each genre and for each of those genres return the genre name and the count. The name can’t be extracted in the groupby because it is not an aggregate, but uid(a) can be used to extract the UIDs from the UID to value map and thus organize the byGenre query by genre UID.

  1. {
  2. var(func:allofterms(name@en, "steven spielberg")) {
  3. director.film @groupby(genre) {
  4. a as count(uid)
  5. # a is a genre UID to count value variable
  6. }
  7. }
  8. byGenre(func: uid(a), orderdesc: val(a)) {
  9. name@en
  10. total_movies : val(a)
  11. }
  12. }

Query Example: Actors from Tim Burton movies and how many roles they have played in Tim Burton movies.

  1. {
  2. var(func:allofterms(name@en, "Tim Burton")) {
  3. director.film {
  4. starring @groupby(performance.actor) {
  5. a as count(uid)
  6. # a is an actor UID to count value variable
  7. }
  8. }
  9. }
  10. byActor(func: uid(a), orderdesc: val(a)) {
  11. name@en
  12. val(a)
  13. }
  14. }

Expand Predicates

Keyword _predicate_ retrieves all predicates out of nodes at the level used.

Query Example: All predicates from actor Geoffrey Rush.

  1. {
  2. director(func: eq(name@en, "Geoffrey Rush")) {
  3. _predicate_
  4. }
  5. }

The number of predicates from a node can be counted and be aliased.

Query Example: All predicates from actor Geoffrey Rush and the count of such predicates.

  1. {
  2. director(func: eq(name@en, "Geoffrey Rush")) {
  3. num_predicates: count(_predicate_)
  4. my_predicates: _predicate_
  5. }
  6. }

Predicates can be stored in a variable and passed to expand() to expand all the predicates in the variable.

If _all_ is passed as an argument to expand(), all the predicates at that level are retrieved. More levels can be specfied in a nested fashion under expand(). If _forward_ is passed as an argument to expand(), all predicates at that level (minus any reverse predicates) are retrieved. If _reverse_ is passed as an argument to expand(), only the reverse predicates are retrieved.

Query Example: Predicates saved to a variable and queried with expand().

  1. {
  2. var(func: eq(name@en, "Lost in Translation")) {
  3. pred as _predicate_
  4. # expand(_all_) { expand(_all_)}
  5. }
  6. director(func: eq(name@en, "Lost in Translation")) {
  7. name@.
  8. expand(val(pred)) {
  9. expand(_all_)
  10. }
  11. }
  12. }

_predicate_ returns string valued predicates as a name without language tag. If the predicate has no string without a language tag, expand() won’t expand it (see language preference). For example, above name generally doesn’t have strings without tags in the dataset, so name@. is required.

Cascade Directive

With the @cascade directive, nodes that don’t have all predicates specified in the query are removed. This can be useful in cases where some filter was applied or if nodes might not have all listed predicates.

Query Example: Harry Potter movies, with each actor and characters played. With @cascade, any character not played by an actor called Warwick is removed, as is any Harry Potter movie without any actors called Warwick. Without @cascade, every character is returned, but only those played by actors called Warwick also have the actor name.

  1. {
  2. HP(func: allofterms(name@en, "Harry Potter")) @cascade {
  3. name@en
  4. starring{
  5. performance.character {
  6. name@en
  7. }
  8. performance.actor @filter(allofterms(name@en, "Warwick")){
  9. name@en
  10. }
  11. }
  12. }
  13. }

Normalize directive

With the @normalize directive, only aliased predicates are returned and the result is flattened to remove nesting.

Query Example: Film name, country and first two actors (by UID order) of every Steven Spielberg movie, without initial_release_date because no alias is given and flattened by @normalize

  1. {
  2. director(func:allofterms(name@en, "steven spielberg")) @normalize {
  3. director: name@en
  4. director.film {
  5. film: name@en
  6. initial_release_date
  7. starring(first: 2) {
  8. performance.actor {
  9. actor: name@en
  10. }
  11. performance.character {
  12. character: name@en
  13. }
  14. }
  15. country {
  16. country: name@en
  17. }
  18. }
  19. }
  20. }

Ignorereflex directive

The @ignorereflex directive forces the removal of child nodes that are reachable from themselves as a parent, through any path in the query result

Query Example: All the coactors of Rutger Hauer. Without @ignorereflex, the result would also include Rutger Hauer for every movie.

  1. {
  2. coactors(func: eq(name@en, "Rutger Hauer")) @ignorereflex {
  3. actor.film {
  4. performance.film {
  5. starring {
  6. performance.actor {
  7. name@en
  8. }
  9. }
  10. }
  11. }
  12. }
  13. }

Debug

For the purposes of debugging, you can attach a query parameter debug=true to a query. Attaching this parameter lets you retrieve the uid attribute for all the entities along with the server_latency information.

Query with debug as a query parameter

  1. curl "http://localhost:8080/query?debug=true" -XPOST -d $'{
  2. tbl(func: allofterms(name@en, "The Big Lebowski")) {
  3. name@en
  4. }
  5. }' | python -m json.tool | less

Returns uid and server_latency

  1. {
  2. "data": {
  3. "tbl": [
  4. {
  5. "uid": "0x41434",
  6. "name@en": "The Big Lebowski"
  7. },
  8. {
  9. "uid": "0x145834",
  10. "name@en": "The Big Lebowski 2"
  11. },
  12. {
  13. "uid": "0x2c8a40",
  14. "name@en": "Jeffrey \"The Big\" Lebowski"
  15. },
  16. {
  17. "uid": "0x3454c4",
  18. "name@en": "The Big Lebowski"
  19. }
  20. ],
  21. "server_latency": {
  22. "parsing": "101µs",
  23. "processing": "802ms",
  24. "json": "115µs",
  25. "total": "802ms"
  26. }
  27. }
  28. }

Schema

For each predicate, the schema specifies the target’s type. If a predicate p has type T, then for all subject-predicate-object triples s p o the object o is of schema type T.

  • On mutations, scalar types are checked and an error thrown if the value cannot be converted to the schema type.

  • On query, value results are returned according to the schema type of the predicate.

If a schema type isn’t specified before a mutation adds triples for a predicate, then the type is inferred from the first mutation. This type is either:

  • type uid, if the first mutation for the predicate has nodes for the subject and object, or

  • derived from the rdf type, if the object is a literal and an rdf type is present in the first mutation, or

  • default type, otherwise.

Schema Types

Dgraph supports scalar types and the UID type.

Scalar Types

For all triples with a predicate of scalar types the object is a literal.

Dgraph Type Go type
default string
int int64
float float
string string
bool bool
dateTime time.Time (RFC3339 format [Optional timezone] eg: 2006-01-02T15:04:05.999999999+10:00 or 2006-01-02T15:04:05.999999999)
geo go-geom
password string (encrypted)

{{% notice “note” %}}Dgraph supports date and time formats for dateTime scalar type only if they are RFC 3339 compatible which is different from ISO 8601(as defined in the RDF spec). You should convert your values to RFC 3339 format before sending them to Dgraph.{{% /notice %}}

UID Type

The uid type denotes a node-node edge; internally each node is represented as a uint64 id.

Dgraph Type Go type
uid uint64

Adding or Modifying Schema

Schema mutations add or modify schema.

Multiple scalar values can also be added for a S P by specifying the schema to be of list type. Occupations in the example below can store a list of strings for each S P.

An index is specified with @index, with arguments to specify the tokenizer. When specifying an index for a predicate it is mandatory to specify the type of the index. For example:

  1. name: string @index(exact, fulltext) @count .
  2. multiname: string @lang .
  3. age: int @index(int) .
  4. friend: uid @count .
  5. dob: dateTime .
  6. location: geo @index(geo) .
  7. occupations: [string] @index(term) .

If no data has been stored for the predicates, a schema mutation sets up an empty schema ready to receive triples.

If data is already stored before the mutation, existing values are not checked to conform to the new schema. On query, Dgraph tries to convert existing values to the new schema types, ignoring any that fail conversion.

If data exists and new indices are specified in a schema mutation, any index not in the updated list is dropped and a new index is created for every new tokenizer specified.

Reverse edges are also computed if specified by a schema mutation.

{{% notice “note” %}} If your predicate is a URI or has special characters, then you should wrap it with angular brackets while doing the schema mutation. E.g. <first:name>{{% /notice %}}

Upsert directive

Predicates can specify the @upsert directive if you want to do upsert operations against it. If the @upsert directive is specified then the index key for the predicate would be checked for conflict while committing a transaction, which would allow upserts.

This is how you specify the upsert directive for a predicate. This replaces the IgnoreIndexConflict field which was part of the mutation object in previous releases.

  1. email: string @index(exact) @upsert .

RDF Types

Dgraph supports a number of RDF types in mutations.

As well as implying a schema type for a first mutation, an RDF type can override a schema type for storage.

If a predicate has a schema type and a mutation has an RDF type with a different underlying Dgraph type, the convertibility to schema type is checked, and an error is thrown if they are incompatible, but the value is stored in the RDF type’s corresponding Dgraph type. Query results are always returned in schema type.

For example, if no schema is set for the age predicate. Given the mutation

  1. {
  2. set {
  3. _:a <age> "15"^^<xs:int> .
  4. _:b <age> "13" .
  5. _:c <age> "14"^^<xs:string> .
  6. _:d <age> "14.5"^^<xs:string> .
  7. _:e <age> "14.5" .
  8. }
  9. }

Dgraph:

  • sets the schema type to int, as implied by the first triple,
  • converts "13" to int on storage,
  • checks "14" can be converted to int, but stores as string,
  • throws an error for the remaining two triples, because "14.5" can’t be converted to int.

Extended Types

The following types are also accepted.

Password type

A password for an entity is set with setting the schema for the attribute to be of type password. Passwords cannot be queried directly, only checked for a match using the checkpwd function.

For example: to set a password, first set schema, then the password:

  1. pass: password .
  1. {
  2. set {
  3. <0x123> <name> "Password Example"
  4. <0x123> <pass> "ThePassword" .
  5. }
  6. }

to check a password:

  1. {
  2. check(func: uid(0x123)) {
  3. name
  4. checkpwd(pass, "ThePassword")
  5. }
  6. }

output:

  1. {
  2. "check": [
  3. {
  4. "name": "Password Example",
  5. "pass": [
  6. {
  7. "checkpwd": true
  8. }
  9. ]
  10. }
  11. ]
  12. }

Indexing

{{% notice “note” %}}Filtering on a predicate by applying a function requires an index.{{% /notice %}}

When filtering by applying a function, Dgraph uses the index to make the search through a potentially large dataset efficient.

All scalar types can be indexed.

Types int, float, bool and geo have only a default index each: with tokenizers named int, float, bool and geo.

Types string and dateTime have a number of indices.

String Indices

The indices available for strings are as follows.

Dgraph function Required index / tokenizer Notes
eq hash, exact, term, or fulltext The most performant index for eq is hash. Only use term or fulltext if you also require term or full text search. If you’re already using term, there is no need to use hash or exact as well.
le, ge, lt, gt exact Allows faster sorting.
allofterms, anyofterms term Allows searching by a term in a sentence.
alloftext, anyoftext fulltext Matching with language specific stemming and stopwords.
regexp trigram Regular expression matching. Can also be used for equality checking.

{{% notice “warning” %}} Incorrect index choice can impose performance penalties and an increased transaction conflict rate. Use only the minimum number of and simplest indexes that your application needs. {{% /notice %}}

DateTime Indices

The indices available for dateTime are as follows.

Index name / Tokenizer Part of date indexed
year index on year (default)
month index on year and month
day index on year, month and day
hour index on year, month, day and hour

The choices of dateTime index allow selecting the precision of the index. Applications, such as the movies examples in these docs, that require searching over dates but have relatively few nodes per year may prefer the year tokenizer; applications that are dependent on fine grained date searches, such as real-time sensor readings, may prefer the hour index.

All the dateTime indices are sortable.

Sortable Indices

Not all the indices establish a total order among the values that they index. Sortable indices allow inequality functions and sorting.

  • Indexes int and float are sortable.
  • string index exact is sortable.
  • All dateTime indices are sortable.

For example, given an edge name of string type, to sort by name or perform inequality filtering on names, the exact index must have been specified. In which case a schema query would return at least the following tokenizers.

  1. {
  2. "predicate": "name",
  3. "type": "string",
  4. "index": true,
  5. "tokenizer": [
  6. "exact"
  7. ]
  8. }

Count index

For predicates with the @count Dgraph indexes the number of edges out of each node. This enables fast queries of the form:

  1. {
  2. q(func: gt(count(pred), threshold)) {
  3. ...
  4. }
  5. }

List Type

Predicate with scalar types can also store a list of values if specified in the schema. The scalar type needs to be enclosed within [] to indicate that its a list type. These lists are like an unordered set.

  1. occupations: [string] .
  2. score: [int] .
  • A set operation adds to the list of values. The order of the stored values is non-deterministic.
  • A delete operation deletes the value from the list.
  • Querying for these predicates would return the list in an array.
  • Indexes can be applied on predicates which have a list type and you can use Functions on them.
  • Sorting is not allowed using these predicates.

Reverse Edges

A graph edge is unidirectional. For node-node edges, sometimes modeling requires reverse edges. If only some subject-predicate-object triples have a reverse, these must be manually added. But if a predicate always has a reverse, Dgraph computes the reverse edges if @reverse is specified in the schema.

The reverse edge of anEdge is ~anEdge.

For existing data, Dgraph computes all reverse edges. For data added after the schema mutation, Dgraph computes and stores the reverse edge for each added triple.

Querying Schema

A schema query queries for the whole schema:

  1. schema {}

{{% notice “note” %}} Unlike regular queries, the schema query is not surrounded by curly braces. {{% /notice %}}

You can query for particular schema fields in the query body.

  1. schema {
  2. type
  3. index
  4. reverse
  5. tokenizer
  6. list
  7. count
  8. upsert
  9. lang
  10. }

You can also query for particular predicates:

  1. schema(pred: [name, friend]) {
  2. type
  3. index
  4. reverse
  5. tokenizer
  6. list
  7. count
  8. upsert
  9. lang
  10. }

Facets : Edge attributes

Dgraph supports facets —- key value pairs on edges —- as an extension to RDF triples. That is, facets add properties to edges, rather than to nodes. For example, a friend edge between two nodes may have a boolean property of close friendship. Facets can also be used as weights for edges.

Though you may find yourself leaning towards facets many times, they should not be misused. It wouldn’t be correct modeling to give the friend edge a facet date_of_birth. That should be an edge for the friend. However, a facet like start_of_friendship might be appropriate. Facets are however not first class citizen in Dgraph like predicates.

Facet keys are strings and values can be string, bool, int, float and dateTime. For int and float, only decimal integers upto 32 signed bits, and 64 bit float values are accepted respectively.

The following mutation is used throughout this section on facets. The mutation adds data for some peoples and, for example, records a since facet in mobile and car to record when Alice bought the car and started using the mobile number.

First we add some schema.

  1. curl localhost:8080/alter -XPOST -d $'
  2. name: string @index(exact, term) .
  3. rated: uid @reverse @count .
  4. ' | python -m json.tool | less
  1. curl localhost:8080/mutate -H "X-Dgraph-CommitNow: true" -XPOST -d $'
  2. {
  3. set {
  4. # -- Facets on scalar predicates
  5. _:alice <name> "Alice" .
  6. _:alice <mobile> "040123456" (since=2006-01-02T15:04:05) .
  7. _:alice <car> "MA0123" (since=2006-02-02T13:01:09, first=true) .
  8. _:bob <name> "Bob" .
  9. _:bob <car> "MA0134" (since=2006-02-02T13:01:09) .
  10. _:charlie <name> "Charlie" .
  11. _:dave <name> "Dave" .
  12. # -- Facets on UID predicates
  13. _:alice <friend> _:bob (close=true, relative=false) .
  14. _:alice <friend> _:charlie (close=false, relative=true) .
  15. _:alice <friend> _:dave (close=true, relative=true) .
  16. # -- Facets for variable propagation
  17. _:movie1 <name> "Movie 1" .
  18. _:movie2 <name> "Movie 2" .
  19. _:movie3 <name> "Movie 3" .
  20. _:alice <rated> _:movie1 (rating=3) .
  21. _:alice <rated> _:movie2 (rating=2) .
  22. _:alice <rated> _:movie3 (rating=5) .
  23. _:bob <rated> _:movie1 (rating=5) .
  24. _:bob <rated> _:movie2 (rating=5) .
  25. _:bob <rated> _:movie3 (rating=5) .
  26. _:charlie <rated> _:movie1 (rating=2) .
  27. _:charlie <rated> _:movie2 (rating=5) .
  28. _:charlie <rated> _:movie3 (rating=1) .
  29. }
  30. }' | python -m json.tool | less

Facets on scalar predicates

Querying name, mobile and car of Alice gives the same result as without facets.

  1. {
  2. data(func: eq(name, "Alice")) {
  3. name
  4. mobile
  5. car
  6. }
  7. }

The syntax @facets(facet-name) is used to query facet data. For Alice the since facet for mobile and car are queried as follows.

  1. {
  2. data(func: eq(name, "Alice")) {
  3. name
  4. mobile @facets(since)
  5. car @facets(since)
  6. }
  7. }

Facets are retuned at the same level as the corresponding edge and have keys like edge|facet.

All facets on an edge are queried with @facets.

  1. {
  2. data(func: eq(name, "Alice")) {
  3. name
  4. mobile @facets
  5. car @facets
  6. }
  7. }

Alias with facets

Alias can be specified while requesting specific predicates. Syntax is similar to how would request alias for other predicates. orderasc and orderdesc are not allowed as alias as they have special meaning. Apart from that anything else can be set as alias.

Here we set car_since, close_friend alias for since, close facets respectively.

  1. {
  2. data(func: eq(name, "Alice")) {
  3. name
  4. mobile
  5. car @facets(car_since: since)
  6. friend @facets(close_friend: close) {
  7. name
  8. }
  9. }
  10. }

Facets on UID predicates

Facets on UID edges work similarly to facets on value edges.

For example, friend is an edge with facet close. It was set to true for friendship between Alice and Bob and false for friendship between Alice and Charlie.

A query for friends of Alice.

  1. {
  2. data(func: eq(name, "Alice")) {
  3. name
  4. friend {
  5. name
  6. }
  7. }
  8. }

A query for friends and the facet close with @facets(close).

  1. {
  2. data(func: eq(name, "Alice")) {
  3. name
  4. friend @facets(close) {
  5. name
  6. }
  7. }
  8. }

For uid edges like friend, facets go to the corresponding child under the key edge|facet. In the above example you can see that the close facet on the edge between Alice and Bob appears with the key friend|close along with Bob’s results.

  1. {
  2. data(func: eq(name, "Alice")) {
  3. name
  4. friend @facets {
  5. name
  6. car @facets
  7. }
  8. }
  9. }

Bob has a car and it has a facet since, which, in the results, is part of the same object as Bob under the key car|since. Also, the close relationship between Bob and Alice is part of Bob’s output object. Charlie does not have car edge and thus only UID facets.

Filtering on facets

Dgraph supports filtering edges based on facets. Filtering works similarly to how it works on edges without facets and has the same available functions.

Find Alice’s close friends

  1. {
  2. data(func: eq(name, "Alice")) {
  3. friend @facets(eq(close, true)) {
  4. name
  5. }
  6. }
  7. }

To return facets as well as filter, add another @facets(<facetname>) to the query.

  1. {
  2. data(func: eq(name, "Alice")) {
  3. friend @facets(eq(close, true)) @facets(relative) { # filter close friends and give relative status
  4. name
  5. }
  6. }
  7. }

Facet queries can be composed with AND, OR and NOT.

  1. {
  2. data(func: eq(name, "Alice")) {
  3. friend @facets(eq(close, true) AND eq(relative, true)) @facets(relative) { # filter close friends in my relation
  4. name
  5. }
  6. }
  7. }

Sorting using facets

Sorting is possible for a facet on a uid edge. Here we sort the movies rated by Alice, Bob and Charlie by their rating which is a facet.

  1. {
  2. me(func: anyofterms(name, "Alice Bob Charlie")) {
  3. name
  4. rated @facets(orderdesc: rating) {
  5. name
  6. }
  7. }
  8. }

Assigning Facet values to a variable

Facets on UID edges can be stored in value variables. The variable is a map from the edge target to the facet value.

Alice’s friends reported by variables for close and relative.

  1. {
  2. var(func: eq(name, "Alice")) {
  3. friend @facets(a as close, b as relative)
  4. }
  5. friend(func: uid(a)) {
  6. name
  7. val(a)
  8. }
  9. relative(func: uid(b)) {
  10. name
  11. val(b)
  12. }
  13. }

Facets and Variable Propagation

Facet values of int and float can be assigned to variables and thus the values propagate.

Alice, Bob and Charlie each rated every movie. A value variable on facet rating maps movies to ratings. A query that reaches a movie through multiple paths sums the ratings on each path. The following sums Alice, Bob and Charlie’s ratings for the three movies.

  1. {
  2. var(func: anyofterms(name, "Alice Bob Charlie")) {
  3. num_raters as math(1)
  4. rated @facets(r as rating) {
  5. total_rating as math(r) # sum of the 3 ratings
  6. average_rating as math(total_rating / num_raters)
  7. }
  8. }
  9. data(func: uid(total_rating)) {
  10. name
  11. val(total_rating)
  12. val(average_rating)
  13. }
  14. }

Facets and Aggregation

Facet values assigned to value variables can be aggregated.

  1. {
  2. data(func: eq(name, "Alice")) {
  3. name
  4. rated @facets(r as rating) {
  5. name
  6. }
  7. avg(val(r))
  8. }
  9. }

Note though that r is a map from movies to the sum of ratings on edges in the query reaching the movie. Hence, the following does not correctly calculate the average ratings for Alice and Bob individually —- it calculates 2 times the average of both Alice and Bob’s ratings.

  1. {
  2. data(func: anyofterms(name, "Alice Bob")) {
  3. name
  4. rated @facets(r as rating) {
  5. name
  6. }
  7. avg(val(r))
  8. }
  9. }

Calculating the average ratings of users requires a variable that maps users to the sum of their ratings.

  1. {
  2. var(func: has(rated)) {
  3. num_rated as math(1)
  4. rated @facets(r as rating) {
  5. avg_rating as math(r / num_rated)
  6. }
  7. }
  8. data(func: uid(avg_rating)) {
  9. name
  10. val(avg_rating)
  11. }
  12. }

K-Shortest Path Queries

The shortest path between a source (from) node and destination (to) node can be found using the keyword shortest for the query block name. It requires the source node UID, destination node UID and the predicates (atleast one) that have to be considered for traversal. A shortest query block does not return any results and requires the path has to be stored in a variable which is used in other query blocks.

By default the shortest path is returned. With numpaths: k, the k-shortest paths are returned. With depth: n, the shortest paths up to n hops away are returned.

{{% notice “note” %}}

  • If no predicates are specified in the shortest block, no path can be fetched as no edge is traversed.
  • If you’re seeing queries take a long time, you can set a gRPC deadline to stop the query after a certain amount of time. {{% /notice %}}

For example:

  1. curl localhost:8080/alter -XPOST -d $'
  2. name: string @index(exact) .
  3. ' | python -m json.tool | less
  1. curl localhost:8080/mutate -H "X-Dgraph-CommitNow: true" -XPOST -d $'
  2. {
  3. set {
  4. _:a <friend> _:b (weight=0.1) .
  5. _:b <friend> _:c (weight=0.2) .
  6. _:c <friend> _:d (weight=0.3) .
  7. _:a <friend> _:d (weight=1) .
  8. _:a <name> "Alice" .
  9. _:b <name> "Bob" .
  10. _:c <name> "Tom" .
  11. _:d <name> "Mallory" .
  12. }
  13. }' | python -m json.tool | less

The shortest path between Alice and Mallory (assuming UIDs 0x2 and 0x5 respectively) can be found with query:

  1. curl localhost:8080/query -XPOST -d $'{
  2. path as shortest(from: 0x2, to: 0x5) {
  3. friend
  4. }
  5. path(func: uid(path)) {
  6. name
  7. }
  8. }' | python -m json.tool | less

Which returns the following results. (Note, without considering the weight facet, each edges’ weight is considered as 1)

  1. {
  2. "data": {
  3. "path": [
  4. {
  5. "name": "Alice"
  6. },
  7. {
  8. "name": "Mallory"
  9. }
  10. ],
  11. "_path_": [
  12. {
  13. "uid": "0x2",
  14. "friend": [
  15. {
  16. "uid": "0x5"
  17. }
  18. ]
  19. }
  20. ]
  21. }
  22. }

The shortest two paths are returned with:

  1. curl localhost:8080/query -XPOST -d $'{
  2. path as shortest(from: 0x2, to: 0x5, numpaths: 2) {
  3. friend
  4. }
  5. path(func: uid(path)) {
  6. name
  7. }
  8. }' | python -m json.tool | less

Edges weights are included by using facets on the edges as follows.

{{% notice “note” %}}One facet per predicate in the shortest query block is allowed.{{% /notice %}}

  1. curl localhost:8080/query -XPOST -d $'{
  2. path as shortest(from: 0x2, to: 0x5) {
  3. friend @facets(weight)
  4. }
  5. path(func: uid(path)) {
  6. name
  7. }
  8. }' | python -m json.tool | less
  1. {
  2. "data": {
  3. "path": [
  4. {
  5. "name": "Alice"
  6. },
  7. {
  8. "name": "Bob"
  9. },
  10. {
  11. "name": "Tom"
  12. },
  13. {
  14. "name": "Mallory"
  15. }
  16. ],
  17. "_path_": [
  18. {
  19. "uid": "0x2",
  20. "friend": [
  21. {
  22. "uid": "0x3",
  23. "friend|weight": 0.1,
  24. "friend": [
  25. {
  26. "uid": "0x4",
  27. "friend|weight": 0.2,
  28. "friend": [
  29. {
  30. "uid": "0x5",
  31. "friend|weight": 0.3
  32. }
  33. ]
  34. }
  35. ]
  36. }
  37. ]
  38. }
  39. ]
  40. }
  41. }

Constraints can be applied to the intermediate nodes as follows.

  1. curl localhost:8080/query -XPOST -d $'{
  2. path as shortest(from: 0x2, to: 0x5) {
  3. friend @filter(not eq(name, "Bob")) @facets(weight)
  4. relative @facets(liking)
  5. }
  6. relationship(func: uid(path)) {
  7. name
  8. }
  9. }' | python -m json.tool | less

Recurse Query

Recurse queries let you traverse a set of predicates (with filter, facets, etc.) until we reach all leaf nodes or we reach the maximum depth which is specified by the depth parameter.

To get 10 movies from a genre that has more than 30000 films and then get two actors for those movies we’d do something as follows:

  1. {
  2. me(func: gt(count(~genre), 30000), first: 1) @recurse(depth: 5, loop: true) {
  3. name@en
  4. ~genre (first:10) @filter(gt(count(starring), 2))
  5. starring (first: 2)
  6. performance.actor
  7. }
  8. }

Some points to keep in mind while using recurse queries are:

  • You can specify only one level of predicates after root. These would be traversed recursively. Both scalar and entity-nodes are treated similarly.
  • Only one recurse block is advised per query.
  • Be careful as the result size could explode quickly and an error would be returned if the result set gets too large. In such cases use more filters, limit results using pagination, or provide a depth parameter at root as shown in the example above.
  • Loop parameter can be set to false, in which case paths which lead to a loops would be ignored while traversing.

Fragments

fragment keyword allows you to define new fragments that can be referenced in a query, as per GraphQL specification. The point is that if there are multiple parts which query the same set of fields, you can define a fragment and refer to it multiple times instead. Fragments can be nested inside fragments, but no cycles are allowed. Here is one contrived example.

  1. curl localhost:8080/query -XPOST -d $'
  2. query {
  3. debug(func: uid(1)) {
  4. name@en
  5. ...TestFrag
  6. }
  7. }
  8. fragment TestFrag {
  9. initial_release_date
  10. ...TestFragB
  11. }
  12. fragment TestFragB {
  13. country
  14. }' | python -m json.tool | less

GraphQL Variables

Variables can be defined and used in queries which helps in query reuse and avoids costly string building in clients at runtime by passing a separate variable map. A variable starts with a $ symbol.

  1. query test($a: int, $b: int, $name: string) {
  2. me(func: allofterms(name@en, $name)) {
  3. name@en
  4. director.film (first: $a, offset: $b) {
  5. name @en
  6. genre(first: $a) {
  7. name@en
  8. }
  9. }
  10. }
  11. }
  • Variables can have default values. In the example below, $a has a default value of 2. Since the value for $a isn’t provided in the variable map, $a takes on the default value.
  • Variables whose type is suffixed with a ! can’t have a default value but must have a value as part of the variables map.
  • The value of the variable must be parsable to the given type, if not, an error is thrown.
  • The variable types that are supported as of now are: int, float, bool and string.
  • Any variable that is being used must be declared in the named query clause in the beginning.
  1. query test($a: int = 2, $b: int!, $name: string) {
  2. me(func: allofterms(name@en, $name)) {
  3. director.film (first: $a, offset: $b) {
  4. genre(first: $a) {
  5. name@en
  6. }
  7. }
  8. }
  9. }

{{% notice “note” %}} If you want to input a list of uids as a GraphQL variable value, you can have the variable as string type and have the value surrounded by square brackets like ["13", "14"]. {{% /notice %}}

Indexing with Custom Tokenizers

Dgraph comes with a large toolkit of builtin indexes, but sometimes for niche use cases they’re not always enough.

Dgraph allows you to implement custom tokenizers via a plugin system in order to fill the gaps.

Caveats

The plugin system uses Go’s pkg/plugin. This brings some restrictions to how plugins can be used.

  • Plugins must be written in Go.

  • As of Go 1.9, pkg/plugin only works on Linux. Therefore, plugins will only work on dgraph instances deployed in a Linux environment.

  • The version of Go used to compile the plugin should be the same as the version of Go used to compile Dgraph itself. Dgraph always uses the latest version of Go (and so should you!).

Implementing a plugin

{{% notice “note” %}} You should consider Go’s plugin documentation to be supplementary to the documentation provided here. {{% /notice %}}

Plugins are implemented as their own main package. They must export a particular symbol that allows Dgraph to hook into the custom logic the plugin provides.

The plugin must export a symbol named Tokenizer. The type of the symbol must be func() interface{}. When the function is called the result returned should be a value that implements the following interface:

  1. type PluginTokenizer interface {
  2. // Name is the name of the tokenizer. It should be unique among all
  3. // builtin tokenizers and other custom tokenizers. It identifies the
  4. // tokenizer when an index is set in the schema and when search/filter
  5. // is used in queries.
  6. Name() string
  7. // Identifier is a byte that uniquely identifiers the tokenizer.
  8. // Bytes in the range 0x80 to 0xff (inclusive) are reserved for
  9. // custom tokenizers.
  10. Identifier() byte
  11. // Type is a string representing the type of data that is to be
  12. // tokenized. This must match the schema type of the predicate
  13. // being indexde. Allowable values are shown in the table below.
  14. Type() string
  15. // Tokens should implement the tokenization logic. The input is
  16. // the value to be tokenized, and will always have a concrete type
  17. // corresponding to Type(). The return value should be a list of
  18. // the tokens generated.
  19. Tokens(interface{}) ([]string, error)
  20. }

The return value of Type() corresponds to the concrete input type of Tokens(interface{}) in the following way:

Type() return value Tokens(interface{}) input type
"int" int64
"float" float64
"string" string
"bool" bool
"datetime" time.Time

Building the plugin

The plugin has to be built using the plugin build mode so that an .so file is produced instead of a regular executable. For example:

  1. go build -buildmode=plugin -o myplugin.so ~/go/src/myplugin/main.go

Running Dgraph with plugins

When starting Dgraph, use the --custom_tokenizers flag to tell dgraph which tokenizers to load. It accepts a comma separated list of plugins. E.g.

  1. dgraph ...other-args... --custom_tokenizers=plugin1.so,plugin2.so

{{% notice “note” %}} Plugin validation is performed on startup. If a problem is detected, Dgraph will refuse to initialise. {{% /notice %}}

Adding the index to the schema

To use a tokenization plugin, an index has to be created in the schema.

The syntax is the same as adding any built-in index. To add an custom index using a tokenizer plugin named foo to a string predicate named my_predicate, use the following in the schema:

  1. my_predicate: string @index(foo) .

Using the index in queries

There are two functions that can use custom indexes:

Mode Behaviour
anyof Returns nodes that match on any of the tokens generated
allof Returns nodes that match on all of the tokens generated

The functions can be used either at the query root or in filters.

There behaviour here an analogous to anyofterms/allofterms and anyoftext/alloftext.

Examples

The following examples should make the process of writing a tokenization plugin more concrete.

Unicode Characters

This example shows the type of tokenization that is similar to term tokenization of full text search. Instead of being broken down into terms or stem words, the text is instead broken down into its constituent unicode codepoints (in Go terminology these are called runes).

{{% notice “note” %}} This tokenizer would create a very large index that would be expensive to manage and store. That’s one of the reasons that text indexing usually occurs at a higher level; stem words for full text search or terms for term search. {{% /notice %}}

The implementation of the plugin looks like this:

  1. package main
  2. import "encoding/binary"
  3. func Tokenizer() interface{} { return RuneTokenizer{} }
  4. type RuneTokenizer struct{}
  5. func (RuneTokenizer) Name() string { return "rune" }
  6. func (RuneTokenizer) Type() string { return "string" }
  7. func (RuneTokenizer) Identifier() byte { return 0xfd }
  8. func (t RuneTokenizer) Tokens(value interface{}) ([]string, error) {
  9. var toks []string
  10. for _, r := range value.(string) {
  11. var buf [binary.MaxVarintLen32]byte
  12. n := binary.PutVarint(buf[:], int64(r))
  13. tok := string(buf[:n])
  14. toks = append(toks, tok)
  15. }
  16. return toks, nil
  17. }

Hints and tips:

  • Inside Tokens, you can assume that value will have concrete type corresponding to that specified by Type(). It’s safe to do a type assertion.

  • Even though the return value is []string, you can always store non-unicode data inside the string. See this blogpost for some interesting background how string are implemented in Go and why they can be used to store non-textual data. By storing arbitrary data in the string, you can make the index more compact. In this case, varints are stored in the return values.

Setting up the indexing and adding data:

  1. name: string @index(rune) .
  1. {
  2. set{
  3. _:ad <name> "Adam" .
  4. _:aa <name> "Aaron" .
  5. _:am <name> "Amy" .
  6. _:ro <name> "Ronald" .
  7. }
  8. }

Now queries can be performed.

The only person that has all of the runes A and n in their name is Aaron:

  1. {
  2. q(func: allof(name, rune, "An")) {
  3. name
  4. }
  5. }
  6. =>
  7. {
  8. "data": {
  9. "q": [
  10. { "name": "Aaron" }
  11. ]
  12. }
  13. }

But there are multiple people who have both of the runes A and m:

  1. {
  2. q(func: allof(name, rune, "Am")) {
  3. name
  4. }
  5. }
  6. =>
  7. {
  8. "data": {
  9. "q": [
  10. { "name": "Amy" },
  11. { "name": "Adam" }
  12. ]
  13. }
  14. }

Case is taken into account, so if you search for all names containing "ron", you would find "Aaron", but not "Ronald". But if you were to search for "no", you would match both "Aaron" and "Ronald". The order of the runes in the strings doesn’t matter.

It’s possible to search for people that have any of the supplied runes in their names (rather than all of the supplied runes). To do this, use anyof instead of allof:

  1. {
  2. q(func: anyof(name, rune, "mr")) {
  3. name
  4. }
  5. }
  6. =>
  7. {
  8. "data": {
  9. "q": [
  10. { "name": "Adam" },
  11. { "name": "Aaron" },
  12. { "name": "Amy" }
  13. ]
  14. }
  15. }

"Ronald" doesn’t contain m or r, so isn’t found by the search.

{{% notice “note” %}} Understanding what’s going on under the hood can help you intuitively understand how Tokens method should be implemented.

When Dgraph sees new edges that are to be indexed by your tokenizer, it will tokenize the value. The resultant tokens are used as keys for posting lists. The edge subject is then added to the posting list for each token.

When a query root search occurs, the search value is tokenized. The result of the search is all of the nodes in the union or intersection of the correponding posting lists (depending on whether anyof or allof was used). {{% /notice %}}

CIDR Range

Tokenizers don’t always have to be about splitting text up into its constituent parts. This example indexes IP addresses into their CIDR ranges. This allows you to search for all IP addresses that fall into a particular CIDR range.

The plugin code is more complicated than the rune example. The input is an IP address stored as a string, e.g. "100.55.22.11/32". The output are the CIDR ranges that the IP address could possibly fall into. There could be up to 32 different outputs ("100.55.22.11/32" does indeed have 32 possible ranges, one for each mask size).

  1. package main
  2. import "net"
  3. func Tokenizer() interface{} { return CIDRTokenizer{} }
  4. type CIDRTokenizer struct{}
  5. func (CIDRTokenizer) Name() string { return "cidr" }
  6. func (CIDRTokenizer) Type() string { return "string" }
  7. func (CIDRTokenizer) Identifier() byte { return 0xff }
  8. func (t CIDRTokenizer) Tokens(value interface{}) ([]string, error) {
  9. _, ipnet, err := net.ParseCIDR(value.(string))
  10. if err != nil {
  11. return nil, err
  12. }
  13. ones, bits := ipnet.Mask.Size()
  14. var toks []string
  15. for i := ones; i >= 1; i-- {
  16. m := net.CIDRMask(i, bits)
  17. tok := net.IPNet{
  18. IP: ipnet.IP.Mask(m),
  19. Mask: m,
  20. }
  21. toks = append(toks, tok.String())
  22. }
  23. return toks, nil
  24. }

An example of using the tokenizer:

Setting up the indexing and adding data:

  1. ip: string @index(cidr) .
  1. {
  2. set{
  3. _:a <ip> "100.55.22.11/32" .
  4. _:b <ip> "100.33.81.19/32" .
  5. _:c <ip> "100.49.21.25/32" .
  6. _:d <ip> "101.0.0.5/32" .
  7. _:e <ip> "100.176.2.1/32" .
  8. }
  9. }
  1. {
  2. q(func: allof(ip, cidr, "100.48.0.0/12")) {
  3. ip
  4. }
  5. }
  6. =>
  7. {
  8. "data": {
  9. "q": [
  10. { "ip": "100.55.22.11/32" },
  11. { "ip": "100.49.21.25/32" }
  12. ]
  13. }
  14. }

The CIDR ranges of 100.55.22.11/32 and 100.49.21.25/32 are both 100.48.0.0/12. The other IP addresses in the database aren’t included in the search result, since they have different CIDR ranges for 12 bit masks (100.32.0.0/12, 101.0.0.0/12, 100.154.0.0/12 for 100.33.81.19/32, 101.0.0.5/32, and 100.176.2.1/32 respectively).

Note that we’re using allof instead of anyof. Only allof will work correctly with this index. Remember that the tokenizer generates all possible CIDR ranges for an IP address. If we were to use anyof then the search result would include all IP addresses under the 1 bit mask (in this case, 0.0.0.0/1, which would match all IPs in this dataset).

Anagram

Tokenizers don’t always have to return multiple tokens. If you just want to index data into groups, have the tokenizer just return an identifying member of that group.

In this example, we want to find groups of words that are anagrams of each other.

A token to correspond to a group of anagrams could just be the letters in the anagram in sorted order, as implemented below:

  1. package main
  2. import "sort"
  3. func Tokenizer() interface{} { return AnagramTokenizer{} }
  4. type AnagramTokenizer struct{}
  5. func (AnagramTokenizer) Name() string { return "anagram" }
  6. func (AnagramTokenizer) Type() string { return "string" }
  7. func (AnagramTokenizer) Identifier() byte { return 0xfc }
  8. func (t AnagramTokenizer) Tokens(value interface{}) ([]string, error) {
  9. b := []byte(value.(string))
  10. sort.Slice(b, func(i, j int) bool { return b[i] < b[j] })
  11. return []string{string(b)}, nil
  12. }

In action:

Setting up the indexing and adding data:

  1. word: string @index(anagram) .
  1. {
  2. set{
  3. _:1 <word> "airmen" .
  4. _:2 <word> "marine" .
  5. _:3 <word> "beat" .
  6. _:4 <word> "beta" .
  7. _:5 <word> "race" .
  8. _:6 <word> "care" .
  9. }
  10. }
  1. {
  2. q(func: allof(word, anagram, "remain")) {
  3. word
  4. }
  5. }
  6. =>
  7. {
  8. "data": {
  9. "q": [
  10. { "word": "airmen" },
  11. { "word": "marine" }
  12. ]
  13. }
  14. }

Since a single token is only ever generated, it doesn’t matter if anyof or allof is used. The result will always be the same.

Integer prime factors

All all of the custom tokenizers shown previously have worked with strings. However, other data types can be used as well. This example is contrived, but nonetheless shows some advanced usages of custom tokenizers.

The tokenizer creates a token for each prime factor in the input.

  1. package main
  2. import (
  3. "encoding/binary"
  4. "fmt"
  5. )
  6. func Tokenizer() interface{} { return FactorTokenizer{} }
  7. type FactorTokenizer struct{}
  8. func (FactorTokenizer) Name() string { return "factor" }
  9. func (FactorTokenizer) Type() string { return "int" }
  10. func (FactorTokenizer) Identifier() byte { return 0xfe }
  11. func (FactorTokenizer) Tokens(value interface{}) ([]string, error) {
  12. x := value.(int64)
  13. if x <= 1 {
  14. return nil, fmt.Errorf("cannot factor int <= 1: %d", x)
  15. }
  16. var toks []string
  17. for p := int64(2); x > 1; p++ {
  18. if x%p == 0 {
  19. toks = append(toks, encodeInt(p))
  20. for x%p == 0 {
  21. x /= p
  22. }
  23. }
  24. }
  25. return toks, nil
  26. }
  27. func encodeInt(x int64) string {
  28. var buf [binary.MaxVarintLen64]byte
  29. n := binary.PutVarint(buf[:], x)
  30. return string(buf[:n])
  31. }

{{% notice “note” %}} Notice that the return of Type() is "int", corresponding to the concrete type of the input to Tokens (which is int64). {{% /notice %}}

This allows you do do things like search for all numbers that share prime factors with a particular number.

In particular, we search for numbers that contain any of the prime factors of 15, i.e. any numbers that are divisible by either 3 or 5.

Setting up the indexing and adding data:

  1. num: int @index(factor) .
  1. {
  2. set{
  3. _:2 <num> "2"^^<xs:int> .
  4. _:3 <num> "3"^^<xs:int> .
  5. _:4 <num> "4"^^<xs:int> .
  6. _:5 <num> "5"^^<xs:int> .
  7. _:6 <num> "6"^^<xs:int> .
  8. _:7 <num> "7"^^<xs:int> .
  9. _:8 <num> "8"^^<xs:int> .
  10. _:9 <num> "9"^^<xs:int> .
  11. _:10 <num> "10"^^<xs:int> .
  12. _:11 <num> "11"^^<xs:int> .
  13. _:12 <num> "12"^^<xs:int> .
  14. _:13 <num> "13"^^<xs:int> .
  15. _:14 <num> "14"^^<xs:int> .
  16. _:15 <num> "15"^^<xs:int> .
  17. _:16 <num> "16"^^<xs:int> .
  18. _:17 <num> "17"^^<xs:int> .
  19. _:18 <num> "18"^^<xs:int> .
  20. _:19 <num> "19"^^<xs:int> .
  21. _:20 <num> "20"^^<xs:int> .
  22. _:21 <num> "21"^^<xs:int> .
  23. _:22 <num> "22"^^<xs:int> .
  24. _:23 <num> "23"^^<xs:int> .
  25. _:24 <num> "24"^^<xs:int> .
  26. _:25 <num> "25"^^<xs:int> .
  27. _:26 <num> "26"^^<xs:int> .
  28. _:27 <num> "27"^^<xs:int> .
  29. _:28 <num> "28"^^<xs:int> .
  30. _:29 <num> "29"^^<xs:int> .
  31. _:30 <num> "30"^^<xs:int> .
  32. }
  33. }
  1. {
  2. q(func: anyof(num, factor, 15)) {
  3. num
  4. }
  5. }
  6. =>
  7. {
  8. "data": {
  9. "q": [
  10. { "num": 3 },
  11. { "num": 5 },
  12. { "num": 6 },
  13. { "num": 9 },
  14. { "num": 10 },
  15. { "num": 12 },
  16. { "num": 15 },
  17. { "num": 18 }
  18. { "num": 20 },
  19. { "num": 21 },
  20. { "num": 25 },
  21. { "num": 24 },
  22. { "num": 27 },
  23. { "num": 30 },
  24. ]
  25. }
  26. }