比如说MySQL表与表之间会有关联关系,

ES 提供了类似关系型数据库中 Join 的实现。使用 Join 数据类型实现，可以通过 Parent / Child 的关系，从而分离两个对象
父文档和子文档是两个独立的文档
更新父文档无需重新索引整个子文档。子文档被新增，更改和删除也不会影响到父文档和其他子文档。
要点：父子关系元数据映射，用于确保查询时候的高性能，但是有一个限制，就是父子数据必须存在于一个shard中
父子关系数据存在一个shard中，而且还有映射其关联关系的元数据，那么搜索父子关系数据的时候，不用跨分片，一个分片本地自己就搞定了，性能当然高

父子关系

定义父子关系的几个步骤
设置索引的 Mapping
索引父文档
索引子文档
按需查询文档

设置 Mapping

PUT my_blogs
{
  "mappings": {
    "properties": {
      "blog_comments_relation": {
        "type": "join",
        "relations": {
          "blog": "comment"
        }
      },
      "content": {
        "type": "text"
      },
      "title": {
        "type": "keyword"
      }
    }
  }
}

插入数据

往父索引里面插入数据

PUT my_blogs/_doc/blog1
{
  "title": "Learning Elasticsearch",
  "content": "learning ELK is happy",
  "blog_comments_relation": {
    "name": "blog"  // 这个名字不能随便写
  }
}
PUT my_blogs/_doc/blog2
{
  "title": "Learning Hadoop",
  "content": "learning Hadoop",
  "blog_comments_relation": {
    "name": "blog"
  }
}

往子索引库添加信息

父文档和子文档必须存在相同的分片上
确保查询 join 的性能
当指定文档时候，必须指定它的父文档 ID
使用 route 参数来保证，分配到相同的分片

PUT my_blogs/_doc/comment1?routing=blog1
{
  "comment": "I am learning ELK",
  "username": "Jack",
  "blog_comments_relation": {
    "name": "comment",
    "parent": "blog1"
  }
}
PUT my_blogs/_doc/comment2?routing=blog2
{
  "comment": "I like Hadoop!!!!!",
  "username": "Jack",
  "blog_comments_relation": {
    "name": "comment",
    "parent": "blog2"
  }
}
PUT my_blogs/_doc/comment3?routing=blog2
{
  "comment": "Hello Hadoop",
  "username": "Bob",
  "blog_comments_relation": {
    "name": "comment",
    "parent": "blog2"
  }
}

查询

Parent / Child 所支持的查询

查询所有文档
Parent Id 查询
Has Child 查询
Has Parent 查询

查询所有文档

POST my_blogs/_search
{}

结果

{
  "took" : 611,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_blogs",
        "_type" : "_doc",
        "_id" : "blog1",
        "_score" : 1.0,
        "_source" : {
          "title" : "Learning Elasticsearch",
          "content" : "learning ELK is happy",
          "blog_comments_relation" : {
            "name" : "blog"
          }
        }
      },
      {
        "_index" : "my_blogs",
        "_type" : "_doc",
        "_id" : "blog2",
        "_score" : 1.0,
        "_source" : {
          "title" : "Learning Hadoop",
          "content" : "learning Hadoop",
          "blog_comments_relation" : {
            "name" : "blog"
          }
        }
      },
      {
        "_index" : "my_blogs",
        "_type" : "_doc",
        "_id" : "comment1",
        "_score" : 1.0,
        "_routing" : "blog1",
        "_source" : {
          "comment" : "I am learning ELK",
          "username" : "Jack",
          "blog_comments_relation" : {
            "name" : "comment",
            "parent" : "blog1"
          }
        }
      },
      {
        "_index" : "my_blogs",
        "_type" : "_doc",
        "_id" : "comment2",
        "_score" : 1.0,
        "_routing" : "blog2",
        "_source" : {
          "comment" : "I like Hadoop!!!!!",
          "username" : "Jack",
          "blog_comments_relation" : {
            "name" : "comment",
            "parent" : "blog2"
          }
        }
      },
      {
        "_index" : "my_blogs",
        "_type" : "_doc",
        "_id" : "comment3",
        "_score" : 1.0,
        "_routing" : "blog2",
        "_source" : {
          "comment" : "Hello Hadoop",
          "username" : "Bob",
          "blog_comments_relation" : {
            "name" : "comment",
            "parent" : "blog2"
          }
        }
      }
    ]
  }
}

根据父文档ID查看

GET my_blogs/_doc/blog2

结果

{
  "_index" : "my_blogs",
  "_type" : "_doc",
  "_id" : "blog2",
  "_version" : 1,
  "_seq_no" : 1,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "title" : "Learning Hadoop",
    "content" : "learning Hadoop",
    "blog_comments_relation" : {
      "name" : "blog"
    }
  }
}

根据Parent Id 查询评论信息

POST my_blogs/_search
{
  "query": {
    "parent_id": {
      "type": "comment",
      "id": "blog2"
    }
  }
}

结果

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.53899646,
    "hits" : [
      {
        "_index" : "my_blogs",
        "_type" : "_doc",
        "_id" : "comment2",
        "_score" : 0.53899646,
        "_routing" : "blog2",
        "_source" : {
          "comment" : "I like Hadoop!!!!!",
          "username" : "Jack",
          "blog_comments_relation" : {
            "name" : "comment",
            "parent" : "blog2" //父id
          }
        }
      },
      {
        "_index" : "my_blogs",
        "_type" : "_doc",
        "_id" : "comment3",
        "_score" : 0.53899646,
        "_routing" : "blog2",
        "_source" : {
          "comment" : "Hello Hadoop",
          "username" : "Bob",
          "blog_comments_relation" : {
            "name" : "comment",
            "parent" : "blog2"  //父id
          } 
        }
      }
    ]
  }
}

Has Child 查询,返回父文档

查找 username 包含 jack 的

POST my_blogs/_search
{
  "query": {
    "has_child": {
      "type": "comment",
      "query": {
        "match": {
          "username": "Jack"
        }
      }
    }
  }
}

结果

{
  "took" : 19,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_blogs",
        "_type" : "_doc",
        "_id" : "blog1",
        "_score" : 1.0,
        "_source" : {
          "title" : "Learning Elasticsearch",
          "content" : "learning ELK is happy",
          "blog_comments_relation" : {
            "name" : "blog"
          }
        }
      },
      {
        "_index" : "my_blogs",
        "_type" : "_doc",
        "_id" : "blog2",
        "_score" : 1.0,
        "_source" : {
          "title" : "Learning Hadoop",
          "content" : "learning Hadoop",
          "blog_comments_relation" : {
            "name" : "blog"
          }
        }
      }
    ]
  }
}

Has Parent 查询，返回相关的子文档

根据博客的标题来找, 父类型是 blog

父文档是博客,我的查询条件是标题是 “Learning Hadoop”

POST my_blogs/_search
{
  "query": {
    "has_parent": {
      "parent_type": "blog",
      "query": {
        "match": {
          "title": "Learning Hadoop"
        }
      }
    }
  }
}

结果

{
  "took" : 17,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_blogs",
        "_type" : "_doc",
        "_id" : "comment2",
        "_score" : 1.0,
        "_routing" : "blog2",
        "_source" : {
          "comment" : "I like Hadoop!!!!!",
          "username" : "Jack",
          "blog_comments_relation" : {
            "name" : "comment",
            "parent" : "blog2"
          }
        }
      },
      {
        "_index" : "my_blogs",
        "_type" : "_doc",
        "_id" : "comment3",
        "_score" : 1.0,
        "_routing" : "blog2",
        "_source" : {
          "comment" : "Hello Hadoop",
          "username" : "Bob",
          "blog_comments_relation" : {
            "name" : "comment",
            "parent" : "blog2"
          }
        }
      }
    ]
  }
}

使用 has_child 查询

返回父文档
通过对子文档进行查询

返回具体相关子文档的父文档
父子文档在相同的分片上，因此 Join 效率高

使用 has_parent 查询

返回相关性的子文档
通过对父文档进行查询
- 返回相关的子文档

使用 parent_id 查询

返回所有相关子文档
通过对付文档 Id 进行查询
- 返回所有相关的子文档

访问子文档

需指定父文档 routing 参数

#通过ID ，访问子文档
GET my_blogs/_doc/comment2
#通过ID和routing ，访问子文档
GET my_blogs/_doc/comment3?routing=blog2

更新子文档

更新子文档不会影响到父文档

PUT my_blogs/_doc/comment3?routing=blog2
{
"comment": "Hello Hadoop??",
"blog_comments_relation": {
"name": "comment",
"parent": "blog2"
}
}

嵌套对象 v.s 父子文档

Nested Object Parent / Child
优点：文档存储在一起，读取性能高、父子文档可以独立更新
缺点：更新嵌套的子文档时，需要更新整个文档、需要额外的内存去维护关系。读取性能相对差
适用场景子文档偶尔更新，以查询为主、子文档更新频繁

搜索引擎

父子关系数据建模