- 语言分析器
- 配置语言分析仪
**arabic 分析器**- armenian 分析器
- basque分析器
- brazilian分析器
- bulgarian分析器
- catalan 分析器
- cjk 分析器
- czech 分析器
- danish 分析器
- dutch分析器
- english分析器
- finnish 分析器
- french 分析器
- galician 分析器
- german 分析器
- greek 分析器
- hindi 分析器
- hungarian 分析器
- indonesian分析器
- irish 分析器
- italian 分析器
- latvian 分析器
- lithuanian 分析器
- norwegian 分析器
- persian 分析器
- portuguese 分析器
- romanian 分析器
- russian 分析器
- sorani 分析器
- spanish 分析器
- swedish 分析器
- turkish 分析器
- thai 分析器
语言分析器
原文链接 : https://www.elastic.co/guide/en/elasticsearch/reference/5.3/getting-started.html(修改该链接为官网对应的链接)
译文链接 : http://www.apache.wiki/display/Elasticsearch(修改该链接为 ApacheCN 对应的译文链接)
贡献者 : ╮欠n1的太多,ApacheCN,Apache中文网
一组用于分析特定语言文本的分析器。 支持以下类型:arabic, armenian, basque, brazilian, bulgarian, catalan, cjk,czech, danish, dutch, english, finnish, french, galician, german, greek, hindi,hungarian, indonesian, irish, italian, latvian, lithuanian, norwegian, persian,portuguese, romanian, russian, sorani, spanish, swedish, turkish, thai.
配置语言分析仪
Stopwords(停止词)
所有分析仪都支持在配置内部设置自定义停用词,也可以通过设置stopwords_path来使用外部的停用词。 检查Stop Analyzer了解更多详细信息。
Excluding words from stemming(排除词干)
stem_exclusion参数允许您指定不应该被阻止的小写字母数组。 在内部,通过将关键字设置为该值的keyword_marker token filter 来实现此功能
Reimplementing language analyzers(重新实现语言分析器)
内置语言分析器可以作为custom analyzers(如下所述)重新实现,以便自定义其行为。
** 笔记:**如果您不打算排除单词被干扰(相当于上面的stem_exclusion参数),那么您应该从custom analyzer配置中删除keyword_marker token filter。
**arabic 分析器**
arabic 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"arabic_stop": {"type": "stop","stopwords": "_arabic_"},"arabic_keywords": {"type": "keyword_marker","keywords": []},"arabic_stemmer": {"type": "stemmer","language": "arabic"}},"analyzer": {"arabic": {"tokenizer": "standard","filter": ["lowercase","arabic_stop","arabic_normalization","arabic_keywords","arabic_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
armenian 分析器
分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"armenian_stop": {"type": "stop","stopwords": "_armenian_"},"armenian_keywords": {"type": "keyword_marker","keywords": []},"armenian_stemmer": {"type": "stemmer","language": "armenian"}},"analyzer": {"armenian": {"tokenizer": "standard","filter": ["lowercase","armenian_stop","armenian_keywords","armenian_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
basque分析器
basque 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"basque_stop": {"type": "stop","stopwords": "_basque_"},"basque_keywords": {"type": "keyword_marker","keywords": []},"basque_stemmer": {"type": "stemmer","language": "basque"}},"analyzer": {"basque": {"tokenizer": "standard","filter": ["lowercase","basque_stop","basque_keywords","basque_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
brazilian分析器
brazilian 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"brazilian_stop": {"type": "stop","stopwords": "_brazilian_"},"brazilian_keywords": {"type": "keyword_marker","keywords": []},"brazilian_stemmer": {"type": "stemmer","language": "brazilian"}},"analyzer": {"brazilian": {"tokenizer": "standard","filter": ["lowercase","brazilian_stop","brazilian_keywords","brazilian_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
bulgarian分析器
bulgarian 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"bulgarian_stop": {"type": "stop","stopwords": "_bulgarian_"},"bulgarian_keywords": {"type": "keyword_marker","keywords": []},"bulgarian_stemmer": {"type": "stemmer","language": "bulgarian"}},"analyzer": {"bulgarian": {"tokenizer": "standard","filter": ["lowercase","bulgarian_stop","bulgarian_keywords","bulgarian_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
catalan 分析器
catalan 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"catalan_elision": {"type": "elision","articles": [ "d", "l", "m", "n", "s", "t"]},"catalan_stop": {"type": "stop","stopwords": "_catalan_"},"catalan_keywords": {"type": "keyword_marker","keywords": []},"catalan_stemmer": {"type": "stemmer","language": "catalan"}},"analyzer": {"catalan": {"tokenizer": "standard","filter": ["catalan_elision","lowercase","catalan_stop","catalan_keywords","catalan_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
cjk 分析器
cjk 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"english_stop": {"type": "stop","stopwords": "_english_"}},"analyzer": {"cjk": {"tokenizer": "standard","filter": ["cjk_width","lowercase","cjk_bigram","english_stop"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
czech 分析器
czech 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"czech_stop": {"type": "stop","stopwords": "_czech_"},"czech_keywords": {"type": "keyword_marker","keywords": []},"czech_stemmer": {"type": "stemmer","language": "czech"}},"analyzer": {"czech": {"tokenizer": "standard","filter": ["lowercase","czech_stop","czech_keywords","czech_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
danish 分析器
danish 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"danish_stop": {"type": "stop","stopwords": "_danish_"},"danish_keywords": {"type": "keyword_marker","keywords": []},"danish_stemmer": {"type": "stemmer","language": "danish"}},"analyzer": {"danish": {"tokenizer": "standard","filter": ["lowercase","danish_stop","danish_keywords","danish_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
dutch分析器
dutch 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"dutch_stop": {"type": "stop","stopwords": "_dutch_"},"dutch_keywords": {"type": "keyword_marker","keywords": []},"dutch_stemmer": {"type": "stemmer","language": "dutch"},"dutch_override": {"type": "stemmer_override","rules": ["fiets=>fiets","bromfiets=>bromfiets","ei=>eier","kind=>kinder"]}},"analyzer": {"dutch": {"tokenizer": "standard","filter": ["lowercase","dutch_stop","dutch_keywords","dutch_override","dutch_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
english分析器
english 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"english_stop": {"type": "stop","stopwords": "_english_"},"english_keywords": {"type": "keyword_marker","keywords": []},"english_stemmer": {"type": "stemmer","language": "english"},"english_possessive_stemmer": {"type": "stemmer","language": "possessive_english"}},"analyzer": {"english": {"tokenizer": "standard","filter": ["english_possessive_stemmer","lowercase","english_stop","english_keywords","english_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
finnish 分析器
finnish 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"finnish_stop": {"type": "stop","stopwords": "_finnish_"},"finnish_keywords": {"type": "keyword_marker","keywords": []},"finnish_stemmer": {"type": "stemmer","language": "finnish"}},"analyzer": {"finnish": {"tokenizer": "standard","filter": ["lowercase","finnish_stop","finnish_keywords","finnish_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
french 分析器
french 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"french_elision": {"type": "elision","articles_case": true,"articles": ["l", "m", "t", "qu", "n", "s","j", "d", "c", "jusqu", "quoiqu","lorsqu", "puisqu"]},"french_stop": {"type": "stop","stopwords": "_french_"},"french_keywords": {"type": "keyword_marker","keywords": []},"french_stemmer": {"type": "stemmer","language": "light_french"}},"analyzer": {"french": {"tokenizer": "standard","filter": ["french_elision","lowercase","french_stop","french_keywords","french_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
galician 分析器
galician 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"galician_stop": {"type": "stop","stopwords": "_galician_"},"galician_keywords": {"type": "keyword_marker","keywords": []},"galician_stemmer": {"type": "stemmer","language": "galician"}},"analyzer": {"galician": {"tokenizer": "standard","filter": ["lowercase","galician_stop","galician_keywords","galician_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
german 分析器
german 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"german_stop": {"type": "stop","stopwords": "_german_"},"german_keywords": {"type": "keyword_marker","keywords": []},"german_stemmer": {"type": "stemmer","language": "light_german"}},"analyzer": {"german": {"tokenizer": "standard","filter": ["lowercase","german_stop","german_keywords","german_normalization","german_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
greek 分析器
greek 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"greek_stop": {"type": "stop","stopwords": "_greek_"},"greek_lowercase": {"type": "lowercase","language": "greek"},"greek_keywords": {"type": "keyword_marker","keywords": []},"greek_stemmer": {"type": "stemmer","language": "greek"}},"analyzer": {"greek": {"tokenizer": "standard","filter": ["greek_lowercase","greek_stop","greek_keywords","greek_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
hindi 分析器
hindi 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"hindi_stop": {"type": "stop","stopwords": "_hindi_"},"hindi_keywords": {"type": "keyword_marker","keywords": []},"hindi_stemmer": {"type": "stemmer","language": "hindi"}},"analyzer": {"hindi": {"tokenizer": "standard","filter": ["lowercase","indic_normalization","hindi_normalization","hindi_stop","hindi_keywords","hindi_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
hungarian 分析器
hungarian 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"hungarian_stop": {"type": "stop","stopwords": "_hungarian_"},"hungarian_keywords": {"type": "keyword_marker","keywords": []},"hungarian_stemmer": {"type": "stemmer","language": "hungarian"}},"analyzer": {"hungarian": {"tokenizer": "standard","filter": ["lowercase","hungarian_stop","hungarian_keywords","hungarian_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
indonesian分析器
indonesian 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"indonesian_stop": {"type": "stop","stopwords": "_indonesian_"},"indonesian_keywords": {"type": "keyword_marker","keywords": []},"indonesian_stemmer": {"type": "stemmer","language": "indonesian"}},"analyzer": {"indonesian": {"tokenizer": "standard","filter": ["lowercase","indonesian_stop","indonesian_keywords","indonesian_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
irish 分析器
irish 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"irish_elision": {"type": "elision","articles": [ "h", "n", "t" ]},"irish_stop": {"type": "stop","stopwords": "_irish_"},"irish_lowercase": {"type": "lowercase","language": "irish"},"irish_keywords": {"type": "keyword_marker","keywords": []},"irish_stemmer": {"type": "stemmer","language": "irish"}},"analyzer": {"irish": {"tokenizer": "standard","filter": ["irish_stop","irish_elision","irish_lowercase","irish_keywords","irish_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
italian 分析器
italian 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"italian_elision": {"type": "elision","articles": ["c", "l", "all", "dall", "dell","nell", "sull", "coll", "pell","gl", "agl", "dagl", "degl", "negl","sugl", "un", "m", "t", "s", "v", "d"]},"italian_stop": {"type": "stop","stopwords": "_italian_"},"italian_keywords": {"type": "keyword_marker","keywords": []},"italian_stemmer": {"type": "stemmer","language": "light_italian"}},"analyzer": {"italian": {"tokenizer": "standard","filter": ["italian_elision","lowercase","italian_stop","italian_keywords","italian_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
latvian 分析器
latvian 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"latvian_stop": {"type": "stop","stopwords": "_latvian_"},"latvian_keywords": {"type": "keyword_marker","keywords": []},"latvian_stemmer": {"type": "stemmer","language": "latvian"}},"analyzer": {"latvian": {"tokenizer": "standard","filter": ["lowercase","latvian_stop","latvian_keywords","latvian_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
lithuanian 分析器
lithuanian 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"lithuanian_stop": {"type": "stop","stopwords": "_lithuanian_"},"lithuanian_keywords": {"type": "keyword_marker","keywords": []},"lithuanian_stemmer": {"type": "stemmer","language": "lithuanian"}},"analyzer": {"lithuanian": {"tokenizer": "standard","filter": ["lowercase","lithuanian_stop","lithuanian_keywords","lithuanian_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
norwegian 分析器
norwegian 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"norwegian_stop": {"type": "stop","stopwords": "_norwegian_"},"norwegian_keywords": {"type": "keyword_marker","keywords": []},"norwegian_stemmer": {"type": "stemmer","language": "norwegian"}},"analyzer": {"norwegian": {"tokenizer": "standard","filter": ["lowercase","norwegian_stop","norwegian_keywords","norwegian_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
persian 分析器
persian 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"char_filter": {"zero_width_spaces": {"type": "mapping","mappings": [ "\\u200C=> "]}},"filter": {"persian_stop": {"type": "stop","stopwords": "_persian_"}},"analyzer": {"persian": {"tokenizer": "standard","char_filter": [ "zero_width_spaces" ],"filter": ["lowercase","arabic_normalization","persian_normalization","persian_stop"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
portuguese 分析器
portuguese 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"portuguese_stop": {"type": "stop","stopwords": "_portuguese_"},"portuguese_keywords": {"type": "keyword_marker","keywords": []},"portuguese_stemmer": {"type": "stemmer","language": "light_portuguese"}},"analyzer": {"portuguese": {"tokenizer": "standard","filter": ["lowercase","portuguese_stop","portuguese_keywords","portuguese_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
romanian 分析器
romanian 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"romanian_stop": {"type": "stop","stopwords": "_romanian_"},"romanian_keywords": {"type": "keyword_marker","keywords": []},"romanian_stemmer": {"type": "stemmer","language": "romanian"}},"analyzer": {"romanian": {"tokenizer": "standard","filter": ["lowercase","romanian_stop","romanian_keywords","romanian_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
russian 分析器
russian 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"russian_stop": {"type": "stop","stopwords": "_russian_"},"russian_keywords": {"type": "keyword_marker","keywords": []},"russian_stemmer": {"type": "stemmer","language": "russian"}},"analyzer": {"russian": {"tokenizer": "standard","filter": ["lowercase","russian_stop","russian_keywords","russian_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
sorani 分析器
sorani 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"sorani_stop": {"type": "stop","stopwords": "_sorani_"},"sorani_keywords": {"type": "keyword_marker","keywords": []},"sorani_stemmer": {"type": "stemmer","language": "sorani"}},"analyzer": {"sorani": {"tokenizer": "standard","filter": ["sorani_normalization","lowercase","sorani_stop","sorani_keywords","sorani_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
spanish 分析器
spanish 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"spanish_stop": {"type": "stop","stopwords": "_spanish_"},"spanish_keywords": {"type": "keyword_marker","keywords": []},"spanish_stemmer": {"type": "stemmer","language": "light_spanish"}},"analyzer": {"spanish": {"tokenizer": "standard","filter": ["lowercase","spanish_stop","spanish_keywords","spanish_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
swedish 分析器
swedish 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"swedish_stop": {"type": "stop","stopwords": "_swedish_"},"swedish_keywords": {"type": "keyword_marker","keywords": []},"swedish_stemmer": {"type": "stemmer","language": "swedish"}},"analyzer": {"swedish": {"tokenizer": "standard","filter": ["lowercase","swedish_stop","swedish_keywords","swedish_stemmer"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
turkish 分析器
turkish 分析器可以如以下定制分析仪重新实现:
{ “settings”: { “analysis”: { “filter”: { “turkishstop”: { “type”: “stop”, “stopwords”: “_turkish“ }, “turkish_lowercase”: { “type”: “lowercase”, “language”: “turkish” }, “turkish_keywords”: { “type”: “keyword_marker”, “keywords”: [] }, “turkish_stemmer”: { “type”: “stemmer”, “language”: “turkish” } }, “analyzer”: { “turkish”: { “tokenizer”: “standard”, “filter”: [ “apostrophe”, “turkish_lowercase”, “turkish_stop”, “turkish_keywords”, “turkish_stemmer” ] } } } } }
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
2>应该删除此过滤器,除非有字词应该排除在干扰之外。
thai 分析器
thai 分析器可以如以下定制分析仪重新实现:
{"settings": {"analysis": {"filter": {"thai_stop": {"type": "stop","stopwords": "_thai_"}},"analyzer": {"thai": {"tokenizer": "thai","filter": ["lowercase","thai_stop"]}}}}}
1>可以使用无效词或stopwords_path参数覆盖默认的停用词。
