一、字段含义

我们执行这GET /_search一条命令，会包含如下部分信息

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 6,
    "successful": 6,
    "failed": 0
  },
  "hits": {
    "total": 10,
    "max_score": 1,
    "hits": [
      {
        "_index": ".kibana",
        "_type": "config",
        "_id": "5.2.0",
        "_score": 1,
        "_source": {
          "buildNum": 14695
        }
      }
    ]
  }
}

took：整个搜索请求花费了多少毫秒
hits.total：本次搜索，返回了几条结果
hits.max_score：本次搜索的所有结果中，最大的相关度分数是多少，每一条document对于search的相关度，越相关，_score分数越大，排位越靠前
hits.hits：默认查询前10条数据，完整数据，_score降序排序
shards：shards fail的条件（primary和replica全部挂掉），不影响其他shard。默认情况下来说，一个搜索请求，会打到一个index的所有primary shard上去，当然了，每个primary shard都可能会有一个或多个replic shard，所以请求也可以到primary shard的其中一个replica shard上去。

二、timeout

timeout：默认无timeout，指定每个shard，就只能在timeout时间范围内，将搜索到的部分结果（也可能全部）直接返回给客户端。而不是等到所有数据全部搜索出来再返回。确保说，一次请求可以在指定时间内完成。
GET /_search?timeout=10m

三、multi-index和multi-type

/_search：所有索引，所有type下的所有数据都搜索出来
/index1/_search：指定一个index，搜索其下所有type的数据
/index1,index2/_search：同时搜索两个index下的数据
/1,2/_search：按照通配符去匹配多个索引
/index1/type1/_search：搜索一个index下指定的type的数据
/index1/type1,type2/_search：可以搜索一个index下多个type的数据
/index1,index2/type1,type2/_search：搜索多个index下的多个type的数据
/_all/type1,type2/_search：_all，可以代表搜索所有index下的指定type的数据

四、分页查询

size和from

GET /_search?size=10
GET /_search?size=10&from=0
GET /_search?size=10&from=20

分页的上机实验
GET /test_index/test_type/_search
“hits”: {
“total”: 9,
“max_score”: 1,
我们假设将这9条数据分成3页，每一页是3条数据，来实验一下这个分页搜索的效果
GET /test_index/test_type/_search?from=0&size=3

deep paging

什么是deep paging问题？为什么会产生这个问题，为什么？
简单来说，就是搜索特别深，比如：总共有60000条数据，每个shard上分了20000条数据。每页10条数据，这个时候，
搜索到1000页，实际上要拿到的是10001~10010；
但是请求首先可能路由到一个不包含这个index的shard的node上，那么这个node就会将搜索请求转发到其他shard所在node上。
其实每个node拿出来不是10条。是10010条，路由到的node会将30030条数据进行排序，相关度等。然后去排位的前10条；
所以，deep paging这个过程，消耗网络带宽，耗费内存，消耗cpu等。

五、query string基础语法

GET /test_index/test_type/_search?q=test_field:test
GET /test_index/test_type/_search?q=+test_field:test
GET /test_index/test_type/_search?q=-test_field:test

+:必须包含
-:没有包含

_all

GET /test_index/test_type/_search?q=test

直接搜索所有的field，任意一个field包含指定的关键字就可以搜索出来。

我们在进行中搜索的时候，难道是对document中的每一个field都进行一次搜索吗？不是的
es中的_all元数据，在建立索引的时候，我们插入一条document，它里面包含了多个field，此时，es会自动将多个field的值，全部用字符串的方式串联起来，变成一个长的字符串，作为_all field的值，同时建立索引；
后面如果在搜索的时候，没有对某个field指定搜索，就默认搜索_all field，其中是包含了所有field的值的。
举个例子：

{
  "name": "jack",
  "age": 26,
  "email": "jack@sina.com",
  "address": "guamgzhou"
}

“jack 26 jack@sina.com guangzhou”，作为这一条document的_all field的值，同时进行分词后建立对应的倒排索引

生产环境不使用

六、mapping

自动或手动为index中的type建立的一种数据结构和相关配置，简称为mapping

插入几条数据，让es自动为我们建立一个索引

PUT /website/article/1
{
  "post_date": "2017-01-01",
  "title": "my first article",
  "content": "this is my first article in this website",
  "author_id": 11400
}
PUT /website/article/2
{
  "post_date": "2017-01-02",
  "title": "my second article",
  "content": "this is my second article in this website",
  "author_id": 11400
}
PUT /website/article/3
{
  "post_date": "2017-01-03",
  "title": "my third article",
  "content": "this is my third article in this website",
  "author_id": 11400
}

尝试各种搜索
GET /website/article/_search?q=2017 3条结果
GET /website/article/_search?q=2017-01-01 3条结果
GET /website/article/_search?q=post_date:2017-01-01 1条结果
GET /website/article/_search?q=post_date:2017 1条结果
查看es自动建立的mapping，带出什么是mapping的知识点

自动创建mapping，自动为我们建立index，创建type，以及type对应的mapping，mapping中包含了每个field对应的数据类型，以及如何分词等设置；我们当然也可以手动在创建数据之前，先创建index和type，以及type对应的mapping

GET /website/_mapping/article
{
  "website": {
    "mappings": {
      "article": {
        "properties": {
          "author_id": {
            "type": "long"
          },
          "content": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "post_date": {
            "type": "date"
          },
          "title": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

搜索结果为什么不一致，因为es自动建立mapping的时候，设置了不同的field不同的data type。不同的data type的分词、搜索等行为是不一样的。所以出现了_all field和post_date field的搜索表现完全不一样。

精确匹配

2017-01-01，exact value，搜索的时候，必须输入2017-01-01，才能搜索出来
如果你输入一个01，是搜索不出来的

全文检索

（1）缩写 vs. 全程：cn vs. china
（2）格式转化：like liked likes
（3）大小写：Tom vs tom
（4）同义词：like vs love

2017-01-01，2017 01 01，搜索2017，或者01，都可以搜索出来
china，搜索cn，也可以将china搜索出来
likes，搜索like，也可以将likes搜索出来
Tom，搜索tom，也可以将Tom搜索出来
like，搜索love，同义词，也可以将like搜索出来

就不是说单纯的只是匹配完整的一个值，而是可以对值进行拆分词语后（分词）进行匹配，也可以通过缩写、时态、大小写、同义词等进行匹配

七、倒排索引

doc1：I really liked my small dogs, and I think my mom also liked them.
doc2：He never liked any dogs, so I hope that my mom will not expect me to liked him.
分词，初步的倒排索引的建立

word	doc1	doc2
I	*	*
really	*
liked	*	*
my	*	*
small	*
dogs	*
and	*
think	*
mom	*	*
also	*
them	*
He	*
never	*
any	*
so	*
hope	*
that	*
will	*
not	*
expect	*
me	*
to	*
him	*

演示了一下倒排索引最简单的建立的一个过程
搜索
mother like little dog，不可能有任何结果
mother
like
little
dog
这个是不是我们想要的搜索结果？绝对不是，因为在我们看来，

mother和mom有区别吗？同义词，都是妈妈的意思。
like和liked有区别吗？没有，都是喜欢的意思，只不过一个是现在时，一个是过去时。
little和small有区别吗？同义词，都是小小的。
dog和dogs有区别吗？狗，只不过一个是单数，一个是复数。

normalization，建立倒排索引的时候，会执行一个操作，也就是说对拆分出的各个单词进行相应的处理，以提升后面搜索的时候能够搜索到相关联的文档的概率
时态的转换，单复数的转换，同义词的转换，大小写的转换
mom —> mother
liked —> like
small —> little
dogs —> dog
重新建立倒排索引，加入normalization，再次用mother liked little dog搜索，就可以搜索到了
mother like little dog，分词，normalization
mother —> mom
like —> like
little —> little
dog —> dog
doc1和doc2都会搜索出来
doc1：I really liked my small dogs, and I think my mom also liked them.
doc2：He never liked any dogs, so I hope that my mom will not expect me to liked him.

分词器

1、什么是分词器
切分词语，normalization（提升recall召回率）
给你一段句子，然后将这段句子拆分成一个一个的单个的单词，同时对每个单词进行normalization（时态转换，单复数转换），分瓷器
recall，召回率：搜索的时候，增加能够搜索到的结果的数量
character filter：在一段文本进行分词之前，先进行预处理，比如说最常见的就是，过滤html标签（hello —> hello），& —> and（I&you —> I and you）
tokenizer：分词，hello you and me —> hello, you, and, me
token filter：lowercase，stop word，synonymom，dogs —> dog，liked —> like，Tom —> tom，a/the/an —> 干掉，mother —> mom，small —> little
一个分词器，很重要，将一段文本进行各种处理，最后处理好的结果才会拿去建立倒排索引
2、内置分词器的介绍
Set the shape to semi-transparent by calling set_trans(5)
standard analyzer：set, the, shape, to, semi, transparent, by, calling, set_trans, 5（默认的是standard）
simple analyzer：set, the, shape, to, semi, transparent, by, calling, set, trans
whitespace analyzer：Set, the, shape, to, semi-transparent, by, calling, set_trans(5)
language analyzer（特定的语言的分词器，比如说，english，英语分词器）：set, shape, semi, transpar, call, set_tran, 5

Elasticsearch总结

06 Elasticsearch搜索