analyzer（分析器） - 《Elasticsearch 5.4 中文文档》

search_quote_analyzer（搜索引用分析器）

原文链接 : https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer.html

译文链接 : http://www.apache.wiki/pages/editpage.action?pageId=9405573

[analyzed](https://www.elastic.co/guide/en/elasticsearch/reference/5.3/mapping-index.html)（被分析）的 string fields（字符串字段）的值通过 analyzer（分析器）来传递，将字符串转换为一串 tokens（标记）标记或者 terms（词条）。例如，基于某种分析器，字符串 “The quick Brown Foxes“ 被解析为 : quick，brown，fox。这些是索引该字段的实际 terms（词条），可以用来有效地搜索大块文本内的单个单词。

这样的分析过程不仅发生在索引的时候，而且在查询时也需要 : 查询字符串需要通过相同（或类似的）analyzer分析器传递，以便尝试查找那些存在于索引的相同格式的 terms（词条）。

Elasticsearch 内置了许多 pre-defined analyzers（预定义的分析器），可以在不进一步配置的情况下使用。它还附带许多 character filters（字符过滤器），tokenizers（分词器）和Token Filters（标记过滤器）。可以用来组合配置每个索引的自定义analyzer（分析器）。

每一个查询，每一个字段或索引都可以指定分析器，在索引的时候，Elasticsearch 将按以下顺序查找 analyzer（分析器）:

定义在字段映射中的 analyzer（分析器）。
索引设置中 default（默认）的 analyzer（分析器）。
standard（标准的）analyzer（分析器）。

在查询时，还有几层 :

在 full-text query（全文查找）中定义的 analyzer（分析器）。
在字段映射中定义的 search_analyzer（搜索分析器）。
在字段映射中定义的 analyzer（分析器）。
在索引配置中 default_search（默认搜索的）analyzer（分析器）。
索引设置中 default（默认）的 analyzer（分析器）。
standard（标准的）analyzer（分析器）。

为特定字段指定分析器的最简单的方法是在字段映射中进行定义，如下所示 :

curl -XPUT 'localhost:9200/my_index?pretty' -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "my_type": {
      "properties": {
        "text": { # 1
          "type": "text",
          "fields": {
            "english": { # 2
              "type":     "text",
              "analyzer": "english"
            }
          }
        }
      }
    }
  }
}
'
curl -XGET 'localhost:9200/my_index/_analyze?pretty' -H 'Content-Type: application/json' -d' # 3
{
  "field": "text",
  "text": "The quick Brown Foxes."
}
'
curl -XGET 'localhost:9200/my_index/_analyze?pretty' -H 'Content-Type: application/json' -d' # 4
{
  "field": "text.english",
  "text": "The quick Brown Foxes."
}
'

| 1 | **text** 字段使用默认的 standard（标准的）分析器。 |
| 2 | text.english 多字段使用 english 分词器，可以删除 stop words（停用词）并应用于 stemming 词干。 |
| 3 | 返回 tokens（标记）: [the，quick，brown，foxes]。 |
| 4 | 返回 tokens（标记）: [quick，brown，fox]。 |

search_quote_analyzer（搜索引用分析器）

该 **search_quote_analyzer **设置允许你为短语指定 analyzer（分析器），这在处理禁用短语的 stop words（停用词）时特别有用。

要使用三个 analyzer（分析器）设置来禁用短语的停用词 :

一个 analyzer（分析器）设置成索引所有的 terms（词条）包括 stop words（停用词）。
一个 search_analyzer设置成将移除 stop words（停用词）的非短语查询。
一个 **search_quote_analyzer**设置不会移除 stop words（停用词）的短语查询。

curl -XPUT 'localhost:9200/my_index?pretty' -H 'Content-Type: application/json' -d'
{
   "settings":{
      "analysis":{
         "analyzer":{
            "my_analyzer":{ # 1
               "type":"custom",
               "tokenizer":"standard",
               "filter":[
                  "lowercase"
               ]
            },
            "my_stop_analyzer":{ # 2
               "type":"custom",
               "tokenizer":"standard",
               "filter":[
                  "lowercase",
                  "english_stop"
               ]
            }
         },
         "filter":{
            "english_stop":{
               "type":"stop",
               "stopwords":"_english_"
            }
         }
      }
   },
   "mappings":{
      "my_type":{
         "properties":{
            "title": {
               "type":"text",
               "analyzer":"my_analyzer", # 3
               "search_analyzer":"my_stop_analyzer", # 4
               "search_quote_analyzer":"my_analyzer" # 5
            }
         }
      }
   }
}
'

PUT my_index/my_type/1
{
   "title":"The Quick Brown Fox"
}

PUT my_index/my_type/2
{
   "title":"A Quick Brown Fox"
}

GET my_index/my_type/_search
{
   "query":{
      "query_string":{
         "query":"\"the quick brown fox\"" # 1
      }
   }
}

| 1 | my_analyzer 分析器，用于标识所有 terms（词条）包括 stop words（停用词）。 |
| 2 | 移除 **stop** **words**（停用词）的 my_stop_analyzer 分析器。 |
| 3 | analyzer（分析器）设置指向将在索引时使用的 my_analyzer 分析器。 |
| 4 | search_analyzer 设置指向 my_stop_analyzer，并移除非短语查询的 stop words（停用词）。 |
| 5 | search_quote_analyzer 设置指向 my_analyzer 分析器，并确保 stop words（停用词）不会从短语查询中移除。 |
| 1 | 由于查询时用括号括起来的,因此它被检测为短语查询。因此 search_quote_analyzer 会启动并确保停用词不会从查询中移除。my_analyzer 分析器将返回与其中一个文档相匹配的 terms（词条）[**the**,``**quick**,``**brown**,fox]。同时，将通过 my_stop_analyzer 分析器分析 terms（词条）查询，该分析器将过滤掉 stop words（停用词）。因此，搜索 The quick brown fox 或 A quick brown fox 将返回两个文档，因为这两个文档都包含以下 tokens（词元）[**quick**,``**brown**,fox]。没有 search_quote_analyzer，将不可能对 phrase queries（短语查询）做到精确匹配，因为短语查询时 stop words（停用词）会被删除，从而导致两个文档都会被匹配到。 |