关系
以下四种常用的方法,用来在 Elasticsearch 中进行关系型数据的管理:
- join 类型父子文档可以独立更新,适合子文档频繁变更的情况;
- join 类型需要维护联接关系,使用
has_child和has_parent查询性能差,显著增加查询时间; -
nested
join(父子文档)
说明
join 类型唯一有意义的使用情况是,数据包含一对多关系,其中一个实体明显多于另一个实体,可以多层级;
子文档与父文档存储于一个索引中,正常查询会返回所有的文档(父子文档都包括);
- 父子文档必须索引在一个分片上,使用 routing 参数实现;
- 子文档按父文档字段排序较困难,只支持父文档字段类型为数字的排序;
- 尽量少的使用父子关系,仅在子文档远多于父文档时使用;
- 避免在一个查询中使用多个父子联合语句;
- 在 has_child 查询中使用 filter 上下文,或者设置 score_mode 为 none 来避免计算文档得分;
- 保证父文档 IDs 尽量短,以便在 doc values 中更好地压缩,被临时载入时占用更少的内存。
多代文档使用
- 联合越多,性能越差;
每一代父文档都要将其字符串类型的
_id字段存储到内存里,这会占用大量内存。示例
字段映射
PUT my_test{"mappings": {"_doc": {"dynamic": "strict","properties": {"id": {"type": "keyword"},"ver": {"type": "keyword"},"title": {"type": "keyword"},"text": {"type": "keyword"},"reads": {"type": "integer"},"comments": {"type": "integer"},"article_created_at": {"type": "date","format": "epoch_second"},"connection": {"type": "join", // join 类型"relations": {"article": "version" // 父子关系 父/子}}}}}}
测试数据
父文档
PUT my_test/_doc/1?routing=1&refresh{"text": "article a1","connection": {"name": "article"},"id": 1,"article_created_at": 1583731094,"reads": 132}PUT my_test/_doc/2?routing=1&refresh{"text": "article a2","connection": {"name": "article"},"id": 2,"article_created_at": 1583731095,"reads": 89}PUT my_test/_doc/3?routing=1&refresh{"text": "article a3","connection": {"name": "article"},"id": 3,"article_created_at": 1583731096,"reads": 32}
子文档
// 为父文档为 1 的文章增加版本PUT my_test/_doc/11?routing=1&refresh{"text": "ver v1 belong to a1","connection": {"name": "version","parent": "1"},"ver": 111,"comments": 100}// 为父文档为 1 的文章增加版本PUT my_test/_doc/12?routing=1&refresh{"text": "ver v2 belong to a1","connection": {"name": "version","parent": "1"},"ver": 222,"comments": 200}// 为父文档为 2 的文章增加版本PUT my_test/_doc/13?routing=1&refresh{"text": "ver vv1 belong to a2","connection": {"name": "version","parent": "2"},"ver": 1111,"comments": 200}
查询
查询某个父文档下的子文档
GET my_test/_search{"query": {"parent_id": {"type": "version","id": "1"}}}
返回相应的子文档,不包括父文档;
按父文档的阅读量评分排序(返回子文档)
GET my_test/_search{"query": {"has_parent": {"parent_type": "article","score": true, //开启对父文档评分"query": {"function_score": {"script_score": {"script": "_score * doc['reads'].value"}}}}}}
查询子文档中 comments 等于 200(返回父文档)
{"query": {"has_child" : {"type" : "version","query" : {"term" : {"comments" : 200}}}}}
nested 类型使用
因为 es 会将数组中的子文档进行扁平化存储,建立索引,如果不使用 nested 类型会造成匹配判断混乱。示例:
{"questions": [{"title": "aaa","comments": 10},{"title": "bbb","comments": 13}]}
不使用 nested 类型会造成查询 title = ‘aaa’ 且 comments = 13 匹配到该文档。
说明
- 子文档与父文档存储与同一个文档中;
示例
PUT my_test2/_doc/1{"title": "what kind of animals","text": "animals","questions": [{"content": "dog","created_at": 1583742663},{"content": "cat","created_at": 1583742664},{"content": "cock","created_at": 1583742665},{"content": "bird","created_at": 1583742666}]}
- 查询子文档内容为 XX,且 创建时间为 AA 的稿件(一个子文档必须同时满足多个条件)
参考:GET my_test2/_search{"query": {"nested": {"path": "questions","query": {"bool": {"must": [{"term": {"questions.content": {"value": "dog"}}},{"term": {"questions.created_at": {"value": "1583742663"}}}]}}}}}
es 6.3 官方文档
