比如说MySQL表与表之间会有关联关系,

ES 提供了类似关系型数据库中 Join 的实现。使用 Join 数据类型实现,可以通过 Parent / Child 的关系,从而分离两个对象
父文档和子文档是两个独立的文档
更新父文档无需重新索引整个子文档。子文档被新增,更改和删除也不会影响到父文档和其他子文档。
要点:父子关系元数据映射,用于确保查询时候的高性能,但是有一个限制,就是父子数据必须存在于一个shard中
父子关系数据存在一个shard中,而且还有映射其关联关系的元数据,那么搜索父子关系数据的时候,不用跨分片,一个分片本地自己就搞定了,性能当然高

父子关系

  • 定义父子关系的几个步骤
  • 设置索引的 Mapping
  • 索引父文档
  • 索引子文档
  • 按需查询文档

设置 Mapping

image.png

  1. PUT my_blogs
  2. {
  3. "mappings": {
  4. "properties": {
  5. "blog_comments_relation": {
  6. "type": "join",
  7. "relations": {
  8. "blog": "comment"
  9. }
  10. },
  11. "content": {
  12. "type": "text"
  13. },
  14. "title": {
  15. "type": "keyword"
  16. }
  17. }
  18. }
  19. }

插入数据

往父索引里面插入数据

image.png

  1. PUT my_blogs/_doc/blog1
  2. {
  3. "title": "Learning Elasticsearch",
  4. "content": "learning ELK is happy",
  5. "blog_comments_relation": {
  6. "name": "blog" // 这个名字不能随便写
  7. }
  8. }
  9. PUT my_blogs/_doc/blog2
  10. {
  11. "title": "Learning Hadoop",
  12. "content": "learning Hadoop",
  13. "blog_comments_relation": {
  14. "name": "blog"
  15. }
  16. }

往子索引库添加信息

  • 父文档和子文档必须存在相同的分片上
  • 确保查询 join 的性能
  • 当指定文档时候,必须指定它的父文档 ID
  • 使用 route 参数来保证,分配到相同的分片

image.png

  1. PUT my_blogs/_doc/comment1?routing=blog1
  2. {
  3. "comment": "I am learning ELK",
  4. "username": "Jack",
  5. "blog_comments_relation": {
  6. "name": "comment",
  7. "parent": "blog1"
  8. }
  9. }
  10. PUT my_blogs/_doc/comment2?routing=blog2
  11. {
  12. "comment": "I like Hadoop!!!!!",
  13. "username": "Jack",
  14. "blog_comments_relation": {
  15. "name": "comment",
  16. "parent": "blog2"
  17. }
  18. }
  19. PUT my_blogs/_doc/comment3?routing=blog2
  20. {
  21. "comment": "Hello Hadoop",
  22. "username": "Bob",
  23. "blog_comments_relation": {
  24. "name": "comment",
  25. "parent": "blog2"
  26. }
  27. }

查询

Parent / Child 所支持的查询

  • 查询所有文档
  • Parent Id 查询
  • Has Child 查询
  • Has Parent 查询

查询所有文档

  1. POST my_blogs/_search
  2. {}

结果

  1. {
  2. "took" : 611,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 1,
  6. "successful" : 1,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 5,
  13. "relation" : "eq"
  14. },
  15. "max_score" : 1.0,
  16. "hits" : [
  17. {
  18. "_index" : "my_blogs",
  19. "_type" : "_doc",
  20. "_id" : "blog1",
  21. "_score" : 1.0,
  22. "_source" : {
  23. "title" : "Learning Elasticsearch",
  24. "content" : "learning ELK is happy",
  25. "blog_comments_relation" : {
  26. "name" : "blog"
  27. }
  28. }
  29. },
  30. {
  31. "_index" : "my_blogs",
  32. "_type" : "_doc",
  33. "_id" : "blog2",
  34. "_score" : 1.0,
  35. "_source" : {
  36. "title" : "Learning Hadoop",
  37. "content" : "learning Hadoop",
  38. "blog_comments_relation" : {
  39. "name" : "blog"
  40. }
  41. }
  42. },
  43. {
  44. "_index" : "my_blogs",
  45. "_type" : "_doc",
  46. "_id" : "comment1",
  47. "_score" : 1.0,
  48. "_routing" : "blog1",
  49. "_source" : {
  50. "comment" : "I am learning ELK",
  51. "username" : "Jack",
  52. "blog_comments_relation" : {
  53. "name" : "comment",
  54. "parent" : "blog1"
  55. }
  56. }
  57. },
  58. {
  59. "_index" : "my_blogs",
  60. "_type" : "_doc",
  61. "_id" : "comment2",
  62. "_score" : 1.0,
  63. "_routing" : "blog2",
  64. "_source" : {
  65. "comment" : "I like Hadoop!!!!!",
  66. "username" : "Jack",
  67. "blog_comments_relation" : {
  68. "name" : "comment",
  69. "parent" : "blog2"
  70. }
  71. }
  72. },
  73. {
  74. "_index" : "my_blogs",
  75. "_type" : "_doc",
  76. "_id" : "comment3",
  77. "_score" : 1.0,
  78. "_routing" : "blog2",
  79. "_source" : {
  80. "comment" : "Hello Hadoop",
  81. "username" : "Bob",
  82. "blog_comments_relation" : {
  83. "name" : "comment",
  84. "parent" : "blog2"
  85. }
  86. }
  87. }
  88. ]
  89. }
  90. }

根据父文档ID查看

  1. GET my_blogs/_doc/blog2

结果

  1. {
  2. "_index" : "my_blogs",
  3. "_type" : "_doc",
  4. "_id" : "blog2",
  5. "_version" : 1,
  6. "_seq_no" : 1,
  7. "_primary_term" : 1,
  8. "found" : true,
  9. "_source" : {
  10. "title" : "Learning Hadoop",
  11. "content" : "learning Hadoop",
  12. "blog_comments_relation" : {
  13. "name" : "blog"
  14. }
  15. }
  16. }

根据Parent Id 查询评论信息

  1. POST my_blogs/_search
  2. {
  3. "query": {
  4. "parent_id": {
  5. "type": "comment",
  6. "id": "blog2"
  7. }
  8. }
  9. }

结果

  1. {
  2. "took" : 4,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 1,
  6. "successful" : 1,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 2,
  13. "relation" : "eq"
  14. },
  15. "max_score" : 0.53899646,
  16. "hits" : [
  17. {
  18. "_index" : "my_blogs",
  19. "_type" : "_doc",
  20. "_id" : "comment2",
  21. "_score" : 0.53899646,
  22. "_routing" : "blog2",
  23. "_source" : {
  24. "comment" : "I like Hadoop!!!!!",
  25. "username" : "Jack",
  26. "blog_comments_relation" : {
  27. "name" : "comment",
  28. "parent" : "blog2" //父id
  29. }
  30. }
  31. },
  32. {
  33. "_index" : "my_blogs",
  34. "_type" : "_doc",
  35. "_id" : "comment3",
  36. "_score" : 0.53899646,
  37. "_routing" : "blog2",
  38. "_source" : {
  39. "comment" : "Hello Hadoop",
  40. "username" : "Bob",
  41. "blog_comments_relation" : {
  42. "name" : "comment",
  43. "parent" : "blog2" //父id
  44. }
  45. }
  46. }
  47. ]
  48. }
  49. }

Has Child 查询,返回父文档

查找 username 包含 jack 的

  1. POST my_blogs/_search
  2. {
  3. "query": {
  4. "has_child": {
  5. "type": "comment",
  6. "query": {
  7. "match": {
  8. "username": "Jack"
  9. }
  10. }
  11. }
  12. }
  13. }

结果

  1. {
  2. "took" : 19,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 1,
  6. "successful" : 1,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 2,
  13. "relation" : "eq"
  14. },
  15. "max_score" : 1.0,
  16. "hits" : [
  17. {
  18. "_index" : "my_blogs",
  19. "_type" : "_doc",
  20. "_id" : "blog1",
  21. "_score" : 1.0,
  22. "_source" : {
  23. "title" : "Learning Elasticsearch",
  24. "content" : "learning ELK is happy",
  25. "blog_comments_relation" : {
  26. "name" : "blog"
  27. }
  28. }
  29. },
  30. {
  31. "_index" : "my_blogs",
  32. "_type" : "_doc",
  33. "_id" : "blog2",
  34. "_score" : 1.0,
  35. "_source" : {
  36. "title" : "Learning Hadoop",
  37. "content" : "learning Hadoop",
  38. "blog_comments_relation" : {
  39. "name" : "blog"
  40. }
  41. }
  42. }
  43. ]
  44. }
  45. }

Has Parent 查询,返回相关的子文档

根据博客的标题来找, 父类型是 blog

父文档是博客,我的查询条件是 标题是 “Learning Hadoop”

  1. POST my_blogs/_search
  2. {
  3. "query": {
  4. "has_parent": {
  5. "parent_type": "blog",
  6. "query": {
  7. "match": {
  8. "title": "Learning Hadoop"
  9. }
  10. }
  11. }
  12. }
  13. }

结果

  1. {
  2. "took" : 17,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 1,
  6. "successful" : 1,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 2,
  13. "relation" : "eq"
  14. },
  15. "max_score" : 1.0,
  16. "hits" : [
  17. {
  18. "_index" : "my_blogs",
  19. "_type" : "_doc",
  20. "_id" : "comment2",
  21. "_score" : 1.0,
  22. "_routing" : "blog2",
  23. "_source" : {
  24. "comment" : "I like Hadoop!!!!!",
  25. "username" : "Jack",
  26. "blog_comments_relation" : {
  27. "name" : "comment",
  28. "parent" : "blog2"
  29. }
  30. }
  31. },
  32. {
  33. "_index" : "my_blogs",
  34. "_type" : "_doc",
  35. "_id" : "comment3",
  36. "_score" : 1.0,
  37. "_routing" : "blog2",
  38. "_source" : {
  39. "comment" : "Hello Hadoop",
  40. "username" : "Bob",
  41. "blog_comments_relation" : {
  42. "name" : "comment",
  43. "parent" : "blog2"
  44. }
  45. }
  46. }
  47. ]
  48. }
  49. }

使用 has_child 查询

  • 返回父文档
  • 通过对子文档进行查询

返回具体相关子文档的父文档
父子文档在相同的分片上,因此 Join 效率高

image.png

使用 has_parent 查询

  • 返回相关性的子文档
  • 通过对父文档进行查询
    • 返回相关的子文档

image.png

使用 parent_id 查询

  • 返回所有相关子文档
  • 通过对付文档 Id 进行查询
    • 返回所有相关的子文档

image.png

访问子文档

  • 需指定父文档 routing 参数

image.png

image.png

  1. #通过ID ,访问子文档
  2. GET my_blogs/_doc/comment2
  3. #通过IDrouting ,访问子文档
  4. GET my_blogs/_doc/comment3?routing=blog2

更新子文档

  • 更新子文档不会影响到父文档
  1. PUT my_blogs/_doc/comment3?routing=blog2
  2. {
  3. "comment": "Hello Hadoop??",
  4. "blog_comments_relation": {
  5. "name": "comment",
  6. "parent": "blog2"
  7. }
  8. }

嵌套对象 v.s 父子文档

Nested Object Parent / Child
优点:文档存储在一起,读取性能高、父子文档可以独立更新
缺点:更新嵌套的子文档时,需要更新整个文档、需要额外的内存去维护关系。读取性能相对差
适用场景子文档偶尔更新,以查询为主、子文档更新频繁