normalizer(归一化)

原文链接 : https://www.elastic.co/guide/en/elasticsearch/reference/current/normalizer.html

译文链接 : normalizer(归一化)

贡献者 : 程威ApacheCNApache中文网

keyword** fields(关键字字段) 的 normalizer归一化)属性与 analyzer分析器)类似,只不过它保证 analysis chain(分析链)生成单一的 token **(词元).

normalizer归一化)应用于索引 keyword(关键字)之前,以及诸如在 [match](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html) query(匹配查询),查询解析器搜索 keyword** fields**(关键字字段)的时候.

  1. curl -XPUT 'localhost:9200/index?pretty' -H 'Content-Type: application/json' -d'
  2. {
  3. "settings": {
  4. "analysis": {
  5. "normalizer": {
  6. "my_normalizer": {
  7. "type": "custom",
  8. "char_filter": [],
  9. "filter": ["lowercase", "asciifolding"]
  10. }
  11. }
  12. }
  13. },
  14. "mappings": {
  15. "type": {
  16. "properties": {
  17. "foo": {
  18. "type": "keyword",
  19. "normalizer": "my_normalizer"
  20. }
  21. }
  22. }
  23. }
  24. }
  25. '
  26. curl -XPUT 'localhost:9200/index/type/1?pretty' -H 'Content-Type: application/json' -d'
  27. {
  28. "foo": "BÀR"
  29. }
  30. '
  31. curl -XPUT 'localhost:9200/index/type/2?pretty' -H 'Content-Type: application/json' -d'
  32. {
  33. "foo": "bar"
  34. }
  35. '
  36. curl -XPUT 'localhost:9200/index/type/3?pretty' -H 'Content-Type: application/json' -d'
  37. {
  38. "foo": "baz"
  39. }
  40. '
  41. curl -XPOST 'localhost:9200/index/_refresh?pretty'
  42. curl -XGET 'localhost:9200/index/_search?pretty' -H 'Content-Type: application/json' -d'
  43. {
  44. "query": {
  45. "match": {
  46. "foo": "BAR"
  47. }
  48. }
  49. }
  50. '

上述查询与 documents(文档) 12 相匹配,这是因为在索引和查询的时候都将 BAR转换为了 bar.

  1. {
  2. "took": $body.took,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "failed": 0
  8. },
  9. "hits": {
  10. "total": 2,
  11. "max_score": 0.2876821,
  12. "hits": [
  13. {
  14. "_index": "index",
  15. "_type": "type",
  16. "_id": "2",
  17. "_score": 0.2876821,
  18. "_source": {
  19. "foo": "bar"
  20. }
  21. },
  22. {
  23. "_index": "index",
  24. "_type": "type",
  25. "_id": "1",
  26. "_score": 0.2876821,
  27. "_source": {
  28. "foo": "BÀR"
  29. }
  30. }
  31. ]
  32. }
  33. }

此外,keywords 在索引之前转换意味着聚合返回 normalised values(归一化的值):

  1. curl -XGET 'localhost:9200/index/_search?pretty' -H 'Content-Type: application/json' -d'
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "foo_terms": {
  6. "terms": {
  7. "field": "foo"
  8. }
  9. }
  10. }
  11. }
  12. '

返回

  1. {
  2. "took": 43,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "failed": 0
  8. },
  9. "hits": {
  10. "total": 3,
  11. "max_score": 0.0,
  12. "hits": []
  13. },
  14. "aggregations": {
  15. "foo_terms": {
  16. "doc_count_error_upper_bound": 0,
  17. "sum_other_doc_count": 0,
  18. "buckets": [
  19. {
  20. "key": "bar",
  21. "doc_count": 2
  22. },
  23. {
  24. "key": "baz",
  25. "doc_count": 1
  26. }
  27. ]
  28. }
  29. }
  30. }