自定义词库
比如我们要把刘强东算作一个词
修改/usr/local/elasticsearch/plugins/ik/config中的IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"><properties><comment>IK Analyzer 扩展配置</comment><!--用户可以在这里配置自己的扩展字典 --><entry key="ext_dict"></entry><!--用户可以在这里配置自己的扩展停止词字典--><entry key="ext_stopwords"></entry><!--用户可以在这里配置远程扩展字典 --><entry key="remote_ext_dict">http://192.168.11.129/es/fenci.txt</entry><!--用户可以在这里配置远程扩展停止词字典--><!-- <entry key="remote_ext_stopwords">words_location</entry> --></properties>
修改完成后,需要重启elasticsearch容器,否则修改不生效。docker restart elasticsearch
更新完成后,es只会对于新增的数据用更新分词。历史数据是不会重新分词的。如果想要历史数据重新分词,需要执行:
POST my_index/_update_by_query?conflicts=proceed
远程词库位置
安装Nginx随便启动一个nginx实例,只是为了复制出配置docker run -p 80:80 --name nginx -d nginx:1.10将容器内的配置文件拷贝到/usr/local/nginx/conf/ 下mkdir -p /usr/local/nginx/htmlmkdir -p /usr/local/nginx/logsmkdir -p /usr/local/nginx/confdocker container cp nginx:/etc/nginx/* /usr/local/nginx/conf/#由于拷贝完成后会在config中存在一个nginx文件夹,所以需要将它的内容移动到conf中mv /usr/local/nginx/conf/nginx/* /usr/local/nginx/conf/rm -rf /usr/local/nginx/conf/nginx终止原容器:docker stop nginx执行命令删除原容器:docker rm nginx创建新的Nginx,执行以下命令docker run -p 80:80 --name nginx \-v /usr/local/nginx/html:/usr/share/nginx/html \-v /usr/local/nginx/logs:/var/log/nginx \-v /usr/local/nginx/conf/:/etc/nginx \-d nginx:1.10创建“/mydata/nginx/html/index.html”文件,测试是否能够正常访问访问:http://ngix所在主机的IP:80/index.html
安装好nginx,把Nginx当做tomcat来用mkdir /usr/local/nginx/html/escd /usr/local/nginx/html/esvim fenci.txt输入元年云测试http://192.168.11.129/es/fenci.txt然后创建“fenci.txt”文件,内容如下:echo "樱桃萨其马,带你甜蜜入夏" > /usr/local/nginx/html/es/fenci.txt测试效果:GET _analyze{"analyzer": "ik_max_word","text":"樱桃萨其马,带你甜蜜入夏"}输出结果:{"tokens" : [{"token" : "樱桃","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "萨其马","start_offset" : 2,"end_offset" : 5,"type" : "CN_WORD","position" : 1},{"token" : "带你","start_offset" : 6,"end_offset" : 8,"type" : "CN_WORD","position" : 2},{"token" : "甜蜜","start_offset" : 8,"end_offset" : 10,"type" : "CN_WORD","position" : 3},{"token" : "入夏","start_offset" : 10,"end_offset" : 12,"type" : "CN_WORD","position" : 4}]}
