Synonym Token Filter(Synonym 词元过滤器)
原文链接 : https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html
译文链接 : http://www.apache.wiki/pages/viewpage.action?pageId=10028859
贡献者 : fucker,ApacheCN,Apache中文网
synonym (同义词)词元过滤器允许在分析过程中轻松处理同义词。 同义词使用配置文件配置。示例如下:
PUT /test_index{"settings": {"index" : {"analysis" : {"analyzer" : {"synonym" : {"tokenizer" : "whitespace","filter" : ["synonym"]}},"filter" : {"synonym" : {"type" : "synonym","synonyms_path" : "analysis/synonym.txt"}}}}}}
以上配置一个 synonym (同义词)过滤器,其中包含一个路径 analysis/synonym.txt(相对于 config 的位置)。 然后使用过滤器配置 synonym 同义词分析器。 其他设置有:ignore_case(默认为 false),和 expand (默认为 true)。
tokenizer 参数控制将用于标记同义词的分词器,并且默认为 whitespace 分词器。
支持两种同义词格式:Solr,WordNet。
Solr synonyms
以下是文件的示例格式:
# Blank lines and lines starting with pound are comments.# Explicit mappings match any token sequence on the LHS of "=>"# and replace with all alternatives on the RHS. These types of mappings# ignore the expand parameter in the schema.# Examples:i-pod, i pod => ipod,sea biscuit, sea biscit => seabiscuit# Equivalent synonyms may be separated with commas and give# no explicit mapping. In this case the mapping behavior will# be taken from the expand parameter in the schema. This allows# the same synonym file to be used in different synonym handling strategies.# Examples:ipod, i-pod, i podfoozball , foosballuniverse , cosmoslol, laughing out loud# If expand==true, "ipod, i-pod, i pod" is equivalent# to the explicit mapping:ipod, i-pod, i pod => ipod, i-pod, i pod# If expand==false, "ipod, i-pod, i pod" is equivalent# to the explicit mapping:ipod, i-pod, i pod => ipod# Multiple synonym mapping entries are merged.foo => foo barfoo => baz# is equivalent tofoo => foo bar, baz
您也可以在配置文件中直接给过滤器定义同义词(请注意使用 synonyms 而不是 synonyms_path ):
PUT /test_index{"settings": {"index" : {"analysis" : {"filter" : {"synonym" : {"type" : "synonym","synonyms" : ["i-pod, i pod => ipod","universe, cosmos"]}}}}}}
但是,建议使用 synonyms_path 在文件中定义大型同义词集,因为内联指定会不必要地增加群集大小。
WordNet synonyms
基于 WordNet 格式的 同义词 可以如下使用格式声明:
PUT /test_index{"settings": {"index" : {"analysis" : {"filter" : {"synonym" : {"type" : "synonym","format" : "wordnet","synonyms" : ["s(100000001,1,'abstain',v,1,0).","s(100000001,2,'refrain',v,1,0).","s(100000001,3,'desist',v,1,0)."]}}}}}}
同时支持使用 synonyms_path 在文本中定义 WordNet synonyms。
