Synonym Token Filter(Synonym 词元过滤器)

原文链接 : https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html

译文链接 : http://www.apache.wiki/pages/viewpage.action?pageId=10028859

贡献者 : fuckerApacheCNApache中文网

synonym (同义词)词元过滤器允许在分析过程中轻松处理同义词。 同义词使用配置文件配置。示例如下:

  1. PUT /test_index
  2. {
  3. "settings": {
  4. "index" : {
  5. "analysis" : {
  6. "analyzer" : {
  7. "synonym" : {
  8. "tokenizer" : "whitespace",
  9. "filter" : ["synonym"]
  10. }
  11. },
  12. "filter" : {
  13. "synonym" : {
  14. "type" : "synonym",
  15. "synonyms_path" : "analysis/synonym.txt"
  16. }
  17. }
  18. }
  19. }
  20. }
  21. }

以上配置一个 synonym (同义词)过滤器,其中包含一个路径 analysis/synonym.txt(相对于 config 的位置)。 然后使用过滤器配置 synonym 同义词分析器。 其他设置有:ignore_case(默认为 false),和 expand (默认为 true)。

tokenizer 参数控制将用于标记同义词的分词器,并且默认为 whitespace 分词器。

支持两种同义词格式:Solr,WordNet

Solr synonyms

以下是文件的示例格式:

  1. # Blank lines and lines starting with pound are comments.
  2. # Explicit mappings match any token sequence on the LHS of "=>"
  3. # and replace with all alternatives on the RHS. These types of mappings
  4. # ignore the expand parameter in the schema.
  5. # Examples:
  6. i-pod, i pod => ipod,
  7. sea biscuit, sea biscit => seabiscuit
  8. # Equivalent synonyms may be separated with commas and give
  9. # no explicit mapping. In this case the mapping behavior will
  10. # be taken from the expand parameter in the schema. This allows
  11. # the same synonym file to be used in different synonym handling strategies.
  12. # Examples:
  13. ipod, i-pod, i pod
  14. foozball , foosball
  15. universe , cosmos
  16. lol, laughing out loud
  17. # If expand==true, "ipod, i-pod, i pod" is equivalent
  18. # to the explicit mapping:
  19. ipod, i-pod, i pod => ipod, i-pod, i pod
  20. # If expand==false, "ipod, i-pod, i pod" is equivalent
  21. # to the explicit mapping:
  22. ipod, i-pod, i pod => ipod
  23. # Multiple synonym mapping entries are merged.
  24. foo => foo bar
  25. foo => baz
  26. # is equivalent to
  27. foo => foo bar, baz

您也可以在配置文件中直接给过滤器定义同义词(请注意使用 synonyms 而不是 synonyms_path ):

  1. PUT /test_index
  2. {
  3. "settings": {
  4. "index" : {
  5. "analysis" : {
  6. "filter" : {
  7. "synonym" : {
  8. "type" : "synonym",
  9. "synonyms" : [
  10. "i-pod, i pod => ipod",
  11. "universe, cosmos"
  12. ]
  13. }
  14. }
  15. }
  16. }
  17. }
  18. }

但是,建议使用 synonyms_path 在文件中定义大型同义词集,因为内联指定会不必要地增加群集大小。

WordNet synonyms

基于 WordNet 格式的 同义词 可以如下使用格式声明:

  1. PUT /test_index
  2. {
  3. "settings": {
  4. "index" : {
  5. "analysis" : {
  6. "filter" : {
  7. "synonym" : {
  8. "type" : "synonym",
  9. "format" : "wordnet",
  10. "synonyms" : [
  11. "s(100000001,1,'abstain',v,1,0).",
  12. "s(100000001,2,'refrain',v,1,0).",
  13. "s(100000001,3,'desist',v,1,0)."
  14. ]
  15. }
  16. }
  17. }
  18. }
  19. }
  20. }

同时支持使用 synonyms_path 在文本中定义 WordNet synonyms