全文检索
MongoDB自身的全文检索对中文支持不好,因为MongoDB建立全文索引时是词语建立的(不连续的字符) 因此需要使用ElasticSearch来实现 这里我们通过python的模块mongo-connector来同步mongo的数据到ES,再通过ES来进行查询
安装
安装elasticsearch
方式1:直接下载官方编译好的文件
https://github.com/elastic/elasticsearch https://www.elastic.co/downloads/elasticsearch
PS: 依赖Java8
wget -c https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.2-linux-x86_64.tar.gz
tar xf elasticsearch-7.6.2-linux-x86_64.tar.gz
cd elasticsearch-7.6.2
方式2::通过官方提供的yum源来安装(需要root权限)
https://www.elastic.co/guide/en/elasticsearch/reference/7.6/rpm.html#rpm-repo
# 1 导入GPG Key
rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
# 2 添加yum源
cat > /etc/yum.repos.d/elasticsearch.repo << EOF
[elasticsearch]
name=Elasticsearch repository for 7.x packages
baseurl=https://artifacts.elastic.co/packages/7.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=0
autorefresh=1
type=rpm-md
EOF
# 3 指定yum源来安装
yum install --enablerepo=elasticsearch elasticsearc
方式3:使用rpm安装
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.2-x86_64.rpm
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.2-x86_64.rpm.sha512
shasum -a 512 -c elasticsearch-7.6.2-x86_64.rpm.sha512
rpm -ivh elasticsearch-7.6.2-x86_64.rpm
安装mongo-connector
pip install mongo-connector
安装elastic2-doc-manage
pip install elastic2-doc-manager[elastic5]
使用
MongoDB开启副本集
启动ES
/path/to/elasticsearch-7.6.2/bin/elaseticseach -d
数据同步
mongo-connector \
-m localhost:27015 \
-t localhost:9200 \
-d elastic2_doc_manager
使用配置文件
https://github.com/yougov/mongo-connector/wiki/Configuration-Options
mongo-connector -c config.json
{
"__comments": "__开头的字段会被忽略",
"mainAddress": "localhost:27015",
"docManagers": [
{
"docManager": "elastic2_doc_manager",
"targetURL": "localhost:9200",
"autoCommitInterval": 0,
"bulkSize": 5000,
"args": {
"clientOptions": {"timeout": 100}
}
}
]
}
查询
curl localhost:9200/_cat/indices # 查看indices列表
curl localhost:9200/pubmed?pretty #查看pubmed index的字段信息等
curl localhost:9200/pubmed/_search?pretty # 全文检索
curl 'localhost:9200/pubmed/article/5eb64effc3b702070a873076' # 查询_index/_type/_id
curl localhost:9200/pubmed/article/_search?pretty
curl localhost:9200/pubmed/_search?pretty \
-d '{"query": {"match": {"pmid": 123}}}' \
-H "Content-Type: application/json"
// URI查询
curl 'localhost:9200/pubmed/article/_search?q=pmid:1234&pretty'
插件
ik
ES的默认分词器 standard
对中文分词不好(会拆成单个汉字)
ik分词器两种模式:
- ik_smart: 粗颗粒度
- ik_max_word: 细颗粒度
测试:
curl 'localhost:9200/_analyze?pretty' \
-H "Content-Type: application/json" \
-d '{"analyzer": "ik_smart", "text": "搜狗输入法"}'
# tokens: ['搜狗', '输入法']
curl 'localhost:9200/_analyze?pretty' \
-H "Content-Type: application/json" \
-d '{"analyzer": "ik_max_word", "text": "搜狗输入法"}'
# tokens: ['搜狗', '输入法', '输入', '法']
kibana
./bin/kibana # 默认配置文件 config/kibana.yml