直接测试句子的分词
curl -XPOST "http://localhost:9200/_analyze" -H 'Content-Type: application/json' -d'{"analyzer": "ik_max_word","text":"中华人民共和国国歌"}'
POST _analyze{"analyzer": "ik_max_word","text":"中华人民共和国国歌"}POST _analyze{"analyzer": "ik_smart", // 智能分词"text":"中华人民共和国国歌"}
ik_max_word
{"tokens" : [{"token" : "中华人民共和国","start_offset" : 0,"end_offset" : 7,"type" : "CN_WORD","position" : 0},{"token" : "中华人民","start_offset" : 0,"end_offset" : 4,"type" : "CN_WORD","position" : 1},{"token" : "中华","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 2},{"token" : "华人","start_offset" : 1,"end_offset" : 3,"type" : "CN_WORD","position" : 3},{"token" : "人民共和国","start_offset" : 2,"end_offset" : 7,"type" : "CN_WORD","position" : 4},{"token" : "人民","start_offset" : 2,"end_offset" : 4,"type" : "CN_WORD","position" : 5},{"token" : "共和国","start_offset" : 4,"end_offset" : 7,"type" : "CN_WORD","position" : 6},{"token" : "共和","start_offset" : 4,"end_offset" : 6,"type" : "CN_WORD","position" : 7},{"token" : "国","start_offset" : 6,"end_offset" : 7,"type" : "CN_CHAR","position" : 8},{"token" : "国歌","start_offset" : 7,"end_offset" : 9,"type" : "CN_WORD","position" : 9}]}
ik_smart
{"tokens" : [{"token" : "中华人民共和国","start_offset" : 0,"end_offset" : 7,"type" : "CN_WORD","position" : 0},{"token" : "国歌","start_offset" : 7,"end_offset" : 9,"type" : "CN_WORD","position" : 1}]}
查询的应用
准备的数据:
PUT blogsDELETE blogsPUT blogs/_mapping/_doc{"properties": {"title": {"type": "text","analyzer": "ik_max_word"}}}PUT blogs/_doc/1{"title": "男扮女装"}// 该情况下会分词为「男扮女装」,「女装」PUT blogs/_doc/2{"title": "辣眼睛"}// 分词为「辣」,「眼睛」
查询:
// 全文检索可以查到GET /blogs/_search{"query": {"match": {"title": {"query": "男扮女装"}}}}// 可以使用模糊查询,因为存在「男扮女装」这个词GET /blogs/_search{"query": {"wildcard": {"title": "男扮*"}}}// match_phrase 无法查询到该文档GET /blogs/_search{"query": {"match_phrase": {"title": "男扮"}}}
### **该词无法使用模糊查询,因为分词器会分成两个词,模糊查询只能查询分词后的词**GET /blogs/_search{"query": {"wildcard": {"title": "辣眼睛*"}}}### **match 查询不影响**GET /blogs/_search{"query": {"match": {"title": "辣眼睛"}}}### **match_phrase 查询不影响**GET /blogs/_search{"query": {"match_phrase": {"title": "辣眼睛"}}}
