原文链接 : https://www.elastic.co/guide/en/elasticsearch/reference/current/normalizer.html

    译文链接 : normalizer(归一化)

    贡献者 : 程威ApacheCNApache中文网

    keyword fields(关键字字段) 的 normalizer归一化)属性与 analyzer分析器)类似,只不过它保证 analysis chain(分析链)生成单一的 token (词元).

    normalizer归一化)应用于索引 keyword(关键字)之前,以及诸如在 [match](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html) query(匹配查询),查询解析器搜索 keyword fields(关键字字段)的时候.

    1. curl -XPUT 'localhost:9200/index?pretty' -H 'Content-Type: application/json' -d'
    2. {
    3. "settings": {
    4. "analysis": {
    5. "normalizer": {
    6. "my_normalizer": {
    7. "type": "custom",
    8. "char_filter": [],
    9. "filter": ["lowercase", "asciifolding"]
    10. }
    11. }
    12. }
    13. },
    14. "mappings": {
    15. "type": {
    16. "properties": {
    17. "foo": {
    18. "type": "keyword",
    19. "normalizer": "my_normalizer"
    20. }
    21. }
    22. }
    23. }
    24. }
    25. '
    26. curl -XPUT 'localhost:9200/index/type/1?pretty' -H 'Content-Type: application/json' -d'
    27. {
    28. "foo": "BÀR"
    29. }
    30. '
    31. curl -XPUT 'localhost:9200/index/type/2?pretty' -H 'Content-Type: application/json' -d'
    32. {
    33. "foo": "bar"
    34. }
    35. '
    36. curl -XPUT 'localhost:9200/index/type/3?pretty' -H 'Content-Type: application/json' -d'
    37. {
    38. "foo": "baz"
    39. }
    40. '
    41. curl -XPOST 'localhost:9200/index/_refresh?pretty'
    42. curl -XGET 'localhost:9200/index/_search?pretty' -H 'Content-Type: application/json' -d'
    43. {
    44. "query": {
    45. "match": {
    46. "foo": "BAR"
    47. }
    48. }
    49. }
    50. '

    上述查询与 documents(文档) 12 相匹配,这是因为在索引和查询的时候都将 BAR转换为了 bar.

    {
      "took": $body.took,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 2,
        "max_score": 0.2876821,
        "hits": [
          {
            "_index": "index",
            "_type": "type",
            "_id": "2",
            "_score": 0.2876821,
            "_source": {
              "foo": "bar"
            }
          },
          {
            "_index": "index",
            "_type": "type",
            "_id": "1",
            "_score": 0.2876821,
            "_source": {
              "foo": "BÀR"
            }
          }
        ]
      }
    }
    

    此外,keywords 在索引之前转换意味着聚合返回 normalised values(归一化的值):

    curl -XGET 'localhost:9200/index/_search?pretty' -H 'Content-Type: application/json' -d'
    {
      "size": 0,
      "aggs": {
        "foo_terms": {
          "terms": {
            "field": "foo"
          }
        }
      }
    }
    '
    

    返回

    {
      "took": 43,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 3,
        "max_score": 0.0,
        "hits": []
      },
      "aggregations": {
        "foo_terms": {
          "doc_count_error_upper_bound": 0,
          "sum_other_doc_count": 0,
          "buckets": [
            {
              "key": "bar",
              "doc_count": 2
            },
            {
              "key": "baz",
              "doc_count": 1
            }
          ]
        }
      }
    }