5. Script
Elasticsearch支持3种脚本语言:Groovy(1.4.x - 5.0 )、painless、experssion。Groovy
:
安全性问题Elasticsearch5.0以后已经放弃。painless
:
painless是一种专门用于Elasticsearch的简单,用于内联和存储脚本,类似于Java,也有注释、关键字、类型、变量、函数等,安全的脚本语言。它是Elasticsearch的默认脚本语言,可以安全地用于内联和存储脚本。expression
:
每个文档的开销较低:表达式的作用更多,可以非常快速地执行,甚至比编写native脚本还要快,支持javascript语法的子集:单个表达式。缺点:只能访问数字,布尔值,日期和geo_point字段,存储的字段不可用.
脚本模板:
{METHOD} {_index}/{action}/{_id}
{
"script":{
"lang": "painless/expression",
"source": "ctx._source.{field} {option} {值} "
}
}
使用 painless 脚本将 _index 为“order”_id 为 “1”的premium 字段减1
POST order/_update/1
{
"script":{
"lang": "painless",
"source": "ctx._source.premium -= 1"
}
}
使用 painless 脚本将 _index 为“order”_id 为 “1”的tenderAddress字段后面拼接一个字符串”666”
POST order/_update/1
{
"script":{
"lang": "painless",
"source": "ctx._source.tenderAddress +='66666' "
}
}
使用 painless 脚本删除 _index 为“order”_id 为 “1”的doc
POST order/_update/1
{
"script":{
"lang": "painless",
"source": "ctx.op='delete'"
}
}
upsert是update+insert的简写。所以使用upsert后,如果doc存在的话则是更新操作, 如果doc不存在则是插入操作。
_index 为“order”_id 为 “1”的doc下的tags字段是一个数组。
POST order/_update/1
{
"script": {
"lang": "painless",
"source": "ctx._source.tags.add(params.zhangdeshi)",
"params": {
"zhangdeshi":"真TM帅",
"xiangeshi":"真JB差"
}
},
"upsert": {
"age":"30",
"name":"李明",
"tags":[]
}
}
使用expression表达式查询一个premium字段
GET {_index}/_search
{
"script_fields": {
"{给查询结果起个名字}": {
"script": {
"lang": "expression",
"source":"doc['premium']"
}
}
}
}
使用expression表达式对premium字段打9折
GET {_index}/_search
{
"script_fields": {
"{给查询结果起个名字}": {
"script": {
"lang": "expression",
"source":"doc['premium'].value*0.9"
}
}
}
}
使用painless 查询一个premium字段打9折、打8折,使用参数
GET order/_search
{
"script_fields": {
"{给查询结果起个名字}": {
"script": {
"lang": "painless",
"source":"[doc['premium'].value*params.da9zhe,doc['premium'].value*params.da8zhe]",
"params": {
"da9zhe":0.9,
"da8zhe":0.8
}
}
}
}
}
Stored script
先保存一个script模板
POST _scripts/{script-id}
{
"script": {
"lang": "painless",
"source":"[doc['premium'].value*params.da9zhe,doc['premium'].value*params.da8zhe]",
"params": {
"da9zhe":0.9,
"da8zhe":0.8
}
}
}
使用模板
POST order/_search
{
"script_fields": {
"{给查询结果起个名字}": {
"script": {
"id":"{script-id}",
"params": {
"da9zhe":0.009,
"da8zhe":0.008
}
}
}
}
}
使用自定义脚本,类似于存储过程,可以在里面写一些逻辑, if for 或者一些自定义运算。
POST order/_update/2
{
"script": {
"lang": "painless",
"source": """
ctx._source.premium *= ctx._source.rate;
ctx._source.amount += params.danwei
""",
"params": {
"danwei":"万元"
}
}
}
查找premium大于等于10小于等于205的数据,并统计查找到的数据的insuranceCompany字段字符的总数。
GET order/_search
{
"query": {
"bool": {
"filter": [
{
"range": {
"premium": {
"gte": 10,
"lte": 205
}
}
}
]
}
},
"aggs": {
"{给聚合查村结果起个名字}": {
"sum": {
"script": {
"lang": "painless",
"source": """
int total = 0;
for(int i = 0; i < doc['insuranceCompany.keyword'].length; i++){
total++;
}
return total;
"""
}
}
}
},
"size": 0
}
写入原理
6. 分词器
分词器里面有3个最重要的属性:
char_filter
分词之前字符预处理,过滤无用字符、无用标签等。 比如: 《Elasticsearch》=> Elasticsearch 新年好 =>“新年好”。
tokenizer
真正分词用的
token_filter
停用词、时态转换、大小写转换、同义词转换、语气词处理等。 比如: has=>have
him=>he
apples=>apple
the/oh/a => 去掉
6.1 设置char_filter
- 新增索引名称为“my-index”,并指定分词器内的字符过滤器
```json
新增索引名称为“my-index”,并指定分词器内的字符过滤器
PUT my-index { “settings”: { “analysis”: { //指定字符过滤器,名字为“my-charfilter”,指定的字符会被转换为其他内容 “char_filter”: {
}, //指定分词器内的字符过滤器为我们上面定义好的字符过滤器“my-charfilter” “analyzer”: {"my-charfilter":{ //指定字符过滤器名字
"type":"mapping", //指定过滤器类型
"mappings":["&=>and","&&=>and","|=>or","||=>or"] //指定需要映射的字符
}
} } } }"my-analyzer":{ //指定分词器的名字
"tokenizer":"keyword",
"char_filter":["my-charfilter"] //指定要使用的字符过滤器
}
测试 my-index的字符过滤器my-charfilter
GET my-index/_analyze { “analyzer”: “my-analyzer”, “text”: [“666&77&&8||”] }
结果:
{ “tokens” : [ { “token” : “666and77and8or”, “start_offset” : 0, “end_offset” : 11, “type” : “word”, “position” : 0 } ] }
2. 新增索引名称为“my-index”,并指定分词器内的字符过滤器为“html_strip”,并指定逃逸标签
```json
# 新增一个索引名字为my-index2 并指定字符过滤器
PUT my-index2
{
"settings": {
"analysis": {
//指定字符过滤器的名字为“my-charfilter”,类型为“html_strip”,逃逸标签为["a","p"]
"char_filter": {
"my-charfilter":{ //指定字符过滤器名字
"type":"html_strip", //指定过滤器类型
"escaped_tags":["a","p"] //指定字符过滤器逃逸标签
}
},
//指定词法分分词器的名字为my-analyzer,字符过滤器为我们上面定义好的my-charfilter
"analyzer": {
"my-analyzer":{ //指定分词器的名字
"tokenizer":"keyword",
"char_filter":["my-charfilter"] //指定要使用的字符过滤器
}
}
}
}
}
# 测试
GET my-index2/_analyze
{
"analyzer": "my-analyzer",
"text": ["<a><b>hello word</b></a>", "<p>hello word</p>"]
}
# 结果:
{
"tokens" : [
{
"token" : "666and77and8or",
"start_offset" : 0,
"end_offset" : 11,
"type" : "word",
"position" : 0
}
]
}
- 用正则匹配,并替换指定的字符
```json
用正则匹配,并替换指定的字符
PUT my-index3 { “settings”: { “analysis”: { //设置字符过滤器 “char_filter”: {
}, //指定词法分分词器的名字为my-analyzer,字符过滤器为我们上面定义好的my-charfilter “analyzer”: {"my-charfilter":{ //指定字符过滤器名字
"type":"pattern_replace", //指定过滤器类型
"pattern":"(\\d+)-(?=\\d)", //指定正则
"replacement":"$1_" //指定匹配后的替换字符
}
} } } }"my-analyzer":{ //指定分词器的名字
"tokenizer":"standard",
"char_filter":["my-charfilter"] //指定要使用的字符过滤器
}
测试
GET my-index3/_analyze { “analyzer”: “my-analyzer”, “text”: “My credit card is 123-456-789” }
结果:
{
“tokens” : [
{
“token” : “My”,
“start_offset” : 0,
“end_offset” : 2,
“type” : “
<a name="sO5KC"></a>
## 6.2 设置token filter
什么都不带的
```json
#测试:
GET _analyze
{
"tokenizer" : "standard", //标准tokenizer
"filter" : ["lowercase"], //小写过滤
"text" : "THE Quick FoX JUMPs" //测试内容
}
# 结果:
{
"tokens" : [
{
"token" : "the",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "quick",
"start_offset" : 4,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "fox",
"start_offset" : 10,
"end_offset" : 13,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "jumps",
"start_offset" : 14,
"end_offset" : 19,
"type" : "<ALPHANUM>",
"position" : 3
}
]
}
带脚本的
#测试
GET /_analyze
{
"tokenizer" : "standard", //标准tokenizer
"filter": [
{
"type": "condition", //条件过滤器
"filter": [ "lowercase" ], //转换为小写
"script": { //条件脚本,满足脚本内的条件即会进行转换
"source": "token.getTerm().length() < 5"
}
}
],
"text": "THE QUICK BROWN FOX" //测试内容
}
结果:
{
"tokens" : [
{
"token" : "the",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "QUICK",
"start_offset" : 4,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "BROWN",
"start_offset" : 10,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "fox",
"start_offset" : 16,
"end_offset" : 19,
"type" : "<ALPHANUM>",
"position" : 3
}
]
}
6.3 设置analyzer
停用词
PUT /my_index8
{
"settings": {
"analysis": {
//设置分词器
"analyzer": {
"my_analyzer":{ //名字为“my_analyzer”
"type":"standard", //类型为标准类型
"stopwords":"_english_" //停用词为英语语法里面的标准停用词
}
}
}
}
}
#测试
GET my_index8/_analyze
{
"analyzer": "my_analyzer",
"text": "Teacher Ma is in the restroom"
}
#结果:
{
"tokens" : [
{
"token" : "teacher",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "ma",
"start_offset" : 8,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "restroom",
"start_offset" : 21,
"end_offset" : 29,
"type" : "<ALPHANUM>",
"position" : 5
}
]
}
6.4 设置analysis
PUT my-index9
{
"settings": {
"analysis": {
//设置字符过滤器
"char_filter": {
"test_char_filter": { //字符过滤器名字为“test_char_filter”
"type": "mapping", //类型为mapping
"mappings": [ "& => and", "| => or" ]
}
},
//设置过滤器
"filter": {
"test_stopwords": { //过滤器名字为test_stopwords
"type": "stop", //类型为stop
"stopwords": ["is","in","at","the","a","for"]
}
},
//设置分词器
"tokenizer": {
"punctuation": {
"type": "pattern", //类型为正则匹配型
"pattern": "[ .,!?]" //正则
}
},
"analyzer": {
"my_analyzer": { //分析器名字为my_analyzer
"type": "custom", //设置type为custom告诉Elasticsearch我们正在定义一个定制分析器
"char_filter": [ //设置字符过滤器一个是我们自定义的test_char_filter,一个是内置的html_strip
"test_char_filter",
"html_strip"
],
"tokenizer": "standard", //设置tokenizer为standard
"filter": ["lowercase","test_stopwords"]//设置过滤器为内置的lowercase和我们自定义的test_stopwords
}
}
}
}
}
# 测试
GET my-index9/_analyze
{
"text": "Teacher ma & zhang also thinks [mother's friends] is good | nice!!!",
"analyzer": "my_analyzer"
}
#结果
{
"tokens" : [
{
"token" : "teacher",
"start_offset" : 0,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "ma",
"start_offset" : 8,
"end_offset" : 10,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "and",
"start_offset" : 11,
"end_offset" : 12,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "zhang",
"start_offset" : 13,
"end_offset" : 18,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "also",
"start_offset" : 19,
"end_offset" : 23,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "thinks",
"start_offset" : 24,
"end_offset" : 30,
"type" : "<ALPHANUM>",
"position" : 5
},
{
"token" : "mother's",
"start_offset" : 32,
"end_offset" : 40,
"type" : "<ALPHANUM>",
"position" : 6
},
{
"token" : "friends",
"start_offset" : 41,
"end_offset" : 48,
"type" : "<ALPHANUM>",
"position" : 7
},
{
"token" : "good",
"start_offset" : 53,
"end_offset" : 57,
"type" : "<ALPHANUM>",
"position" : 9
},
{
"token" : "or",
"start_offset" : 58,
"end_offset" : 59,
"type" : "<ALPHANUM>",
"position" : 10
},
{
"token" : "nice",
"start_offset" : 60,
"end_offset" : 64,
"type" : "<ALPHANUM>",
"position" : 11
}
]
}
6.5 完整的mapping+analysis
PUT my_index12
{
"settings": {
"analysis": {
//设置字符过滤器
"char_filter": {
"test_char_filter": { //字符过滤器名字为“test_char_filter”
"type": "mapping", //类型为mapping
"mappings": [ "& => and", "| => or" ]
}
},
//设置过滤器
"filter": {
"test_stopwords": { //过滤器名字为test_stopwords
"type": "stop", //类型为stop
"stopwords": ["is","in","at","the","a","for"]
}
},
//设置分词器
"tokenizer": {
"punctuation": {
"type": "pattern", //类型为正则匹配型
"pattern": "[ .,!?]" //正则
}
},
"analyzer": {
"my_analyzer": { //分析器名字为my_analyzer
"type": "custom", //设置type为custom告诉Elasticsearch我们正在定义一个定制分析器
"char_filter": [ //设置字符过滤器一个是我们自定义的test_char_filter,一个是内置的html_strip
"test_char_filter",
"html_strip"
],
"tokenizer": "standard", //设置tokenizer为standard
"filter": ["lowercase","test_stopwords"]//设置过滤器为内置的lowercase和我们自定义的test_stopwords
}
}
}
},
"mappings": {
"properties": {
//设置字段“name”
"name": {
"type": "text", //设置类型为文本类型
"analyzer": "my_analyzer", //指定分词器
"search_analyzer": "standard" //设置搜索时分词器
},
//设置字段“said”
"said": {
"type": "text", //设置类型为文本类型
"analyzer": "my_analyzer", //指定分词器
"search_analyzer": "standard" //设置搜索时分词器
}
}
}
}
6.6 中文分词器 ik
1. ik安装并使用
ik是一个java项目,使用maven构建,我们把它clone下来,使用maven命令package。在target/release/下面有个zip文件。elasticsearch-analysis-ik-7.4.0.zip
把 elasticsearch-analysis-ik-7.4.0.zip解压后是这个样子
将这些文件全部copy至{``ES_source_path``}``/plugins/ik
文件夹下面(需要在plugins
下面创建ik
文件夹)。
然后重启Elasticsearch。
测试 ik中文分词器 :
# 测试 ik中文分词器
GET _analyze
{
"analyzer": "ik_max_word", //指定分词器为ik内置的两大分词器之一“ik_max_word”
"text": [ "美国留给伊拉克的是个烂摊子吗" ] //测试数据
}
# 结果
{
"tokens" : [
{
"token" : "美国",
"start_offset" : 0,
"end_offset" : 2,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "留给",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "伊拉克",
"start_offset" : 4,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "的",
"start_offset" : 7,
"end_offset" : 8,
"type" : "CN_CHAR",
"position" : 3
},
{
"token" : "是",
"start_offset" : 8,
"end_offset" : 9,
"type" : "CN_CHAR",
"position" : 4
},
{
"token" : "个",
"start_offset" : 9,
"end_offset" : 10,
"type" : "CN_CHAR",
"position" : 5
},
{
"token" : "烂摊子",
"start_offset" : 10,
"end_offset" : 13,
"type" : "CN_WORD",
"position" : 6
},
{
"token" : "摊子",
"start_offset" : 11,
"end_offset" : 13,
"type" : "CN_WORD",
"position" : 7
},
{
"token" : "吗",
"start_offset" : 13,
"end_offset" : 14,
"type" : "CN_CHAR",
"position" : 8
}
]
}
安装过程中可能遇到的问题:
java.lang.IllegalArgumentException: Plugin [analysis-ik] was built for Elasticsearch version 7.10.0 but version 7.10.1 is running
一看就是ik和Elasticsearch版本不匹配,我们需要把ik分词器内的Elasticsearch版本改为ik对应的版本。
编辑plugin-descriptor.properties
文件:里面有个“elasticsearch.version
” 将它改为Elasticsearch版本即可。
2. ik两种analyzer
ik_max_word
:细粒度,一般都使用这个。ik_smart
:粗粒度
3. config下的文件描述
- IKAnalyzer.cfg.xml : IK分词配置文件
- main.dic :主词库
- stopword.dic :英文停用词,不会建立在倒排索引中
- quantifier.dic :特殊词库:计量单位等
- suffix.dic :特殊词库:后缀名
- a. surname.dic :特殊词库:百家姓
- preposition :特殊词库:语气词
4. 如何热更新ik词库
方案一:修改IK源码,定时链接数据库,查询主词和停用词,更新到本机IK中。
方案二:定时调接口
目前该插件支持热更新 IK 分词,通过在 IKAnalyzer.cfg.xml
配置文件中提到的如下配置
其中 words_location
是指一个 URL接口地址,该请求只需满足以下两点即可完成分词热更新。
- 该 http 请求需要返回两个头部(header),一个是
**Last-Modified**
,一个是**ETag**
,这两者都是字符串类型,只要有一个发生变化,该插件就会去抓取新的分词进而更新词库。 - 该 http 请求返回的内容格式是
**一行一个分词**
,换行符用** \n**
即可。
满足上面两点要求就可以实现热更新分词了,不需要重启 ES 实例。
下面是用spring boot模拟的一个接口:
/**
* 获取ik自定义词典
* @param response
*/
@RequestMapping(value="/getCustomDict")
public void getCustomDict(HttpServletResponse response){
try {
//具体数据可以连接数据库或者缓存类获取
String[] dicWords = new String[]{"雷猴啊","割韭菜","996","实力装X","颜值担当"};
//响应数据
StringJoiner content = new StringJoiner(System.lineSeparator());
for (String dicWord : dicWords) {
content.add(dicWord);
}
response.setHeader("Last-Modified", String.valueOf(content.length()));
response.setHeader("ETag",String.valueOf(content.length()));
response.setContentType("text/plain; charset=utf-8");
OutputStream out= response.getOutputStream();
out.write(content.toString().getBytes("UTF-8"));
out.flush();
} catch (Exception e) {
e.printStackTrace();
}
}
7. 前缀、通配符、正则、模糊查询
# 设置前缀索引
PUT my_index
{
"mappings": {
"properties": {
"title":{ //字段名字
"type": "text", //类型为文本类型
"index_prefixes":{ //索引前缀设置
"min_chars":2, //字段前缀最少2个字符生成索引
"max_chars":4 //字段前缀最多4个字符生成索引
}
}
}
}
}
# 批量插入中文测试数据
POST /my_index/_bulk
{ "index": { "_id": "1"} }
{ "title": "城管打电话喊商贩去摆摊摊" }
{ "index": { "_id": "2"} }
{ "title": "笑果文化回应商贩老农去摆摊" }
{ "index": { "_id": "3"} }
{ "title": "老农耗时17年种出椅子树" }
{ "index": { "_id": "4"} }
{ "title": "夫妻结婚30多年AA制,被城管抓" }
{ "index": { "_id": "5"} }
{ "title": "黑人见义勇为阻止抢劫反被铐住" }
# 批量插入英文测试数据
POST /my_index/_bulk
{ "index": { "_id": "6"} }
{ "title": "my english" }
{ "index": { "_id": "7"} }
{ "title": "my english is good" }
{ "index": { "_id": "8"} }
{ "title": "my chinese is good" }
{ "index": { "_id": "9"} }
{ "title": "my japanese is nice" }
{ "index": { "_id": "10"} }
{ "title": "my disk is full" }
#前缀搜索
GET my_index/_search
{
"query": {
"prefix": { //prefix为前缀搜索
"title": { //指定需要前缀搜索的字段名称
"value": "城管" //指定字段值
}
}
}
}
# 测试ik分词器
GET /_analyze
{
"text": "城管打电话喊商贩去摆摊摊",
"analyzer": "ik_smart"
}
#通配符
GET my_index/_search
{
"query": {
"wildcard": {
"title": {
"value": "eng?ish"
}
}
}
}
#通配符
GET order/_search
{
"query": {
"wildcard": {
"insuranceCompany.keyword": {
"value": "中?联合"
}
}
}
}
#通配符
GET product/_search
{
"query": {
"wildcard": {
"name.keyword": {
"value": "xiaomi*nfc*",
"boost": 1.0
}
}
}
}
#正则
GET product/_search
{
"query": {
"regexp": {
"name": {
"value": "[\\s\\S]*nfc[\\s\\S]*",
"flags": "ALL",
"max_determinized_states": 10000,
"rewrite": "constant_score"
}
}
}
}
# 模糊搜索fuzzy
GET order/_search
{
"query": {
"match": {
"tenderAddress": { //指定需要搜索的字段的名字
"query": "乔呗", //指定需要搜索的模糊值
"fuzziness": "AUTO", //指定模糊模式,默认“AUTO”
"operator": "or" //指定操作类型,默认“or”
}
}
}
}
# 前缀短语搜索match_phrase_prefix
GET /order/_search
{
"query": {
//前缀短语搜索
"match_phrase_prefix": {
"bidInvationCenter": { //指定字段
"query": "中国", //前缀短语值
"analyzer": "ik_smart", //设置分词器
"max_expansions": 1, //限制匹配的最大词项,参数告诉 match_phrase 查询词条相隔多远时仍然能将文档视为匹配 什么是相隔多远? 意思是说为了让查询和文档匹配你需要移动词条多少次?
"slop": 2, //允许短语间的词项(term)间隔
"boost": 1 //用于设置该查询的权重
}
}
}
}
GET _search
{
"query": {
"match_phrase_prefix": {
"message": {
"query": "quick brown f"
}
}
}
}
8. TF-IDF算法、Java API序幕
官网的资料才是最新的,最全的:https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-overview.html
引入maven
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.11.1</version>
</dependency>
<repositories>
<repository>
<id>es-snapshots</id>
<name>elasticsearch snapshot repo</name>
<url>https://snapshots.elastic.co/maven/</url>
</repository>
</repositories>
<repository>
<id>elastic-lucene-snapshots</id>
<name>Elastic Lucene Snapshots</name>
<url>https://s3.amazonaws.com/download.elasticsearch.org/lucenesnapshots/83f9835</url>
<releases><enabled>true</enabled></releases>
<snapshots><enabled>false</enabled></snapshots>
</repository>
8.1 基本增删改查
import org.apache.http.HttpHost;
import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;
import org.elasticsearch.action.admin.indices.create.CreateIndexResponse;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexResponse;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.xcontent.XContentType;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
public class TestElasticsearch {
public static void main(String[] args) throws IOException {
RestHighLevelClient client = createClient();//创建链接,创建客户端
try {
// createIndex(client); //创建索引
// addData(client); //添加数据
// queryData(client); //查询数据
// updateData(client); //修改数据
// queryData(client); //查询数据
// deleteData(client); //删除数据
// deleteIndex(client); //删除索引
bulk(client); //批量操作
}catch (Exception e){
e.printStackTrace();
}finally {
if (client != null){
closeClient(client); //关闭客户端
}
}
}
/**
* 创建链接,创建客户端
* @return
*/
public static RestHighLevelClient createClient(){
//创建链接,创建客户端
HttpHost http9200 = new HttpHost("8.140.122.156", 9200, "http");
// HttpHost http9201 = new HttpHost("8.140.122.156", 9201, "http");
RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(http9200));
return client;
}
/**
* 关闭客户端
* @param client
* @throws IOException
*/
public static void closeClient(RestHighLevelClient client) throws IOException {
client.close();
}
/**
* 创建索引
* PUT order1
* {
* "settings": {
* "number_of_shards": 3,
* "number_of_replicas": 2
* },
* "mappings": {
* "properties": {
* "id":{
* "type": "long"
* },
* "age":{
* "type": "short"
* },
* "mail":{
* "type": "text",
* "fields": {
* "keyword" : {
* "type" : "keyword",
* "ignore_above" : 64
* }
* }
* },
* "name":{
* "type": "text",
* "fields": {
* "keyword" : {
* "type" : "keyword",
* "ignore_above" : 64
* }
* }
* },
* "createTime":{
* "type": "date",
* "format": ["yyyy-MM-dd HH:mm:ss"]
* }
* }
* }
* }
* @param client
* @throws IOException
*/
public static void createIndex(RestHighLevelClient client) throws IOException {
//设置3个分片,每个分片2副本
Settings settings = Settings.builder()
.put("number_of_shards",3)
.put("number_of_replicas",2)
.build();
//新增索引
CreateIndexRequest createIndexRequest = new CreateIndexRequest("order1");
createIndexRequest.mapping("properties","{\"id\":{\"type\":\"long\"},\"age\":{\"type\":\"short\"},\"mail\":{\"type\":\"text\",\"fields\":{\"keyword\":{\"type\":\"keyword\",\"ignore_above\":64}}},\"name\":{\"type\":\"text\",\"fields\":{\"keyword\":{\"type\":\"keyword\",\"ignore_above\":64}}},\"createTime\":{\"type\":\"date\",\"format\":[\"yyyy-MM-dd HH:mm:ss\"]}}",XContentType.JSON);
createIndexRequest.settings(settings);
//执行
CreateIndexResponse createIndexResponse = client.indices().create(createIndexRequest, RequestOptions.DEFAULT);
System.out.println("创建索引结果: "+createIndexResponse.isAcknowledged());
}
/**
* 插入数据
*
* @param client
* @throws IOException
*/
public static void addData(RestHighLevelClient client) throws IOException {
//数据
Map personMap = new HashMap();
personMap.put("id",1);
personMap.put("age",28);
personMap.put("name","王帆");
personMap.put("email","wangfan@yxinsur.com");
personMap.put("createTime","2020-02-02 12:00:00");
//新增索引
IndexRequest indexRequest = new IndexRequest("order1","_doc","1");
indexRequest.source(personMap);
//执行
IndexResponse indexResponse = client.index(indexRequest, RequestOptions.DEFAULT);
System.out.println("插入数据结果: "+indexResponse.toString());
}
/**
* 查询数据
* GET order1/_doc/1
* @param client
* @throws IOException
*/
public static void queryData(RestHighLevelClient client) throws IOException {
//查询索引
GetRequest getRequest = new GetRequest("order1","_doc","1");
//执行
GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT);
System.out.println("查询数据结果: "+getResponse.toString());
}
/**
* 修改数据
* PUT order1/_doc/1
* {
* "name": "王帆",
* "age": 29,
* "email": "wangfan@yxinsur.com"
* }
* @param client
* @throws IOException
*/
public static void updateData(RestHighLevelClient client) throws IOException {
//实际数据
Map<String,Object> personMap = new HashMap<>();
personMap.put("id",1);
personMap.put("age",29);
personMap.put("name","王帆");
personMap.put("email","wangfan@yxinsur.com");
personMap.put("createTime","2020-02-02 12:00:00");
//修改索引
UpdateRequest updateRequest = new UpdateRequest("order1","_doc","1");
updateRequest.doc(personMap);
//或者是upsert
//updateRequest.upsert(person);
//执行
UpdateResponse updateResponse = client.update(updateRequest, RequestOptions.DEFAULT);
System.out.println("修改数据结果: "+updateResponse.toString());
}
/**
* 删除数据
* DELETE order1/_doc/1
* @param client
* @throws IOException
*/
public static void deleteData(RestHighLevelClient client) throws IOException {
//删除数据
DeleteRequest deleteRequest = new DeleteRequest("order1","_doc","1");
//执行
DeleteResponse deleteResponse = client.delete(deleteRequest, RequestOptions.DEFAULT);
System.out.println("删除数据结果: "+deleteResponse.toString());
}
/**
* 删除索引
* DELETE order1/_doc/1
* @param client
* @throws IOException
*/
public static void deleteIndex(RestHighLevelClient client) throws IOException {
//删除索引
DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest("order1");
//执行
DeleteIndexResponse deleteIndexResponse = client.indices().delete(deleteIndexRequest, RequestOptions.DEFAULT);
System.out.println("删除索引结果: "+deleteIndexResponse.isAcknowledged());
}
/**
* 批量操作
* @param client
*/
public static void bulk(RestHighLevelClient client) throws IOException {
//准备数据
Map<String,Object> personMap = new HashMap<>();
personMap.put("id",2);
personMap.put("age",29);
personMap.put("name","王帆");
personMap.put("email","wangfan@yxinsur.com");
personMap.put("createTime","2020-02-02 12:00:00");
//插入数据操作
IndexRequest indexRequest2 = new IndexRequest("order1","_doc","2");
indexRequest2.source(personMap,XContentType.JSON);
IndexRequest indexRequest3 = new IndexRequest("order1","_doc","3");
indexRequest3.source(personMap,XContentType.JSON);
//修改数据操作
personMap.put("age",30);
UpdateRequest updateRequest = new UpdateRequest("order1","_doc","2");
updateRequest.doc(personMap,XContentType.JSON);
//删除数据操作
DeleteRequest deleteRequest = new DeleteRequest("order1","_doc","2");
//将操作都放入bulk里面
BulkRequest bulkRequest = new BulkRequest();
bulkRequest.add(indexRequest2);
bulkRequest.add(indexRequest3);
bulkRequest.add(updateRequest);
bulkRequest.add(deleteRequest);
//执行
BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);
System.out.println("批量操作结果: "+bulkResponse.toString());
}
}
创建索引结果: true插入数据结果: IndexResponse[index=order1,type=_doc,id=1,version=1,result=created,seqNo=0,primaryTerm=1,shards={“total”:3,”successful”:1,”failed”:0}] 查询数据结果: {“_index”:”order1”,”_type”:”_doc”,”_id”:”1”,”_version”:1,”found”:true,”_source”:{“createTime”:”2020-02-02 12:00:00”,”name”:”王帆”,”id”:1,”age”:28,”email”:”wangfan@yxinsur.com”},”fields”:{“_seq_no”:[0],”_primary_term”:[1]}}修改数据结果: UpdateResponse[index=order1,type=_doc,id=1,version=2,seqNo=1,primaryTerm=1,result=updated,shards=ShardInfo{total=3, successful=1, failures=[]}]查询数据结果: {“_index”:”order1”,”_type”:”_doc”,”_id”:”1”,”_version”:2,”found”:true,”_source”:{“createTime”:”2020-02-02 12:00:00”,”name”:”王帆”,”id”:1,”age”:29,”email”:”wangfan@yxinsur.com”},”fields”:{“_seq_no”:[1],”_primary_term”:[1]}}删除数据结果: DeleteResponse[index=order1,type=_doc,id=1,version=3,result=deleted,shards=ShardInfo{total=3, successful=1, failures=[]}]删除索引结果: true批量操作结果: org.elasticsearch.action.bulk.BulkResponse@793be5ca
9. ES集群部署
配置文件说明:
cluster.name
:
整个集群的名称,整个集群使用一个名字,其他节点通过cluster.name发现集群。
node.name
:
集群内某个节点的名称,其他节点可通过node.name发现节点,默认是机器名。
path.data
:
数据存放地址,生产环境必须不能设置为es内部,否则es更新时会直接抹除数据
path.logs
:
日志存在地址,生产环境必须不能设置为es内部,否则es更新时会直接抹除数据
bootstrap.memory_lock
:
是否禁用swap交换区(swap交换区为系统内存不够时使用磁盘作为临时空间), 生产环境必须禁用
network.host
:
当前节点绑定的Ip地址,切记一旦指定,则必须是这个地址才能访问,比如:如果配置为,127.0.0.1, 则必须只能是127.0.0.1才能访问, localhost 也不行。
http.port
:
当前节点的服务端口号
transport.port
:
当前节点的通讯端口号,集群内节点之间通讯使用此端口号, 比如选举master节点时。
discovery.seed_hosts
:
必须把所有master节点的地址+端口都写进去。
cluster.initial_master_nodes
:
集群初始化时会从这个列表中取出一个node.name选为master节点。
整个集群搭建模型:
节点名称 | node.master | node.data | 描述 |
---|---|---|---|
node01 | true | false | master节点 |
node02 | true | false | master节点 |
node03 | true | false | master节点 |
node04 | false | true | 纯数据节点 |
node05 | false | true | 纯数据节点 |
node06 | false | false | 仅投票节点,路由节点 |
机器内网ip:172.27.229.9
集器外网ip:8.140.122.156
node01 配置:
cluster.name: my-application
node.name: node01
path.data: /root/soft/elasticsearch/data/node01
path.logs: /root/soft/elasticsearch/log/node01
bootstrap.memory_lock: false
network.host: 172.27.229.9
http.port: 9200
transport.port: 9300
discovery.seed_hosts: ["172.27.229.9:9300", "172.27.229.9:9301","172.27.229.9:9302",]
cluster.initial_master_nodes: ["node01","node02","node03"]
http.cors.enabled: true
http.cors.allow-origin: "*"
node.master: true
node.data: false
node.max_local_storage_nodes: 6
node02 配置:
cluster.name: my-application
node.name: node02
path.data: /root/soft/elasticsearch/data/node02
path.logs: /root/soft/elasticsearch/log/node02
bootstrap.memory_lock: false
network.host: 172.27.229.9
http.port: 9201
transport.port: 9301
discovery.seed_hosts: ["172.27.229.9:9300", "172.27.229.9:9301","172.27.229.9:9302",]
cluster.initial_master_nodes: ["node01","node02","node03"]
http.cors.enabled: true
http.cors.allow-origin: "*"
node.master: true
node.data: false
node.max_local_storage_nodes: 6
node03 配置:
cluster.name: my-application
node.name: node03
path.data: /root/soft/elasticsearch/data/node03
path.logs: /root/soft/elasticsearch/log/node03
bootstrap.memory_lock: false
network.host: 172.27.229.9
http.port: 9202
transport.port: 9302
discovery.seed_hosts: ["172.27.229.9:9300", "172.27.229.9:9301","172.27.229.9:9302",]
cluster.initial_master_nodes: ["node01","node02","node03"]
http.cors.enabled: true
http.cors.allow-origin: "*"
node.master: true
node.data: false
node.max_local_storage_nodes: 6
node04 配置:
cluster.name: my-application
node.name: node04
path.data: /root/soft/elasticsearch/data/node04
path.logs: /root/soft/elasticsearch/log/node04
bootstrap.memory_lock: false
network.host: 172.27.229.9
http.port: 9203
transport.port: 9303
discovery.seed_hosts: ["172.27.229.9:9300", "172.27.229.9:9301","172.27.229.9:9302",]
cluster.initial_master_nodes: ["node01","node02","node03"]
http.cors.enabled: true
http.cors.allow-origin: "*"
node.master: false
node.data: true
node.max_local_storage_nodes: 6
node05 配置:
cluster.name: my-application
node.name: node05
path.data: /root/soft/elasticsearch/data/node05
path.logs: /root/soft/elasticsearch/log/node05
bootstrap.memory_lock: false
network.host: 172.27.229.9
http.port: 9204
transport.port: 9304
discovery.seed_hosts: ["172.27.229.9:9300", "172.27.229.9:9301","172.27.229.9:9302",]
cluster.initial_master_nodes: ["node01","node02","node03"]
http.cors.enabled: true
http.cors.allow-origin: "*"
node.master: false
node.data: true
node.max_local_storage_nodes: 6
node05 配置:
cluster.name: my-application
node.name: node06
path.data: /root/soft/elasticsearch/data/node06
path.logs: /root/soft/elasticsearch/log/node06
bootstrap.memory_lock: false
network.host: 172.27.229.9
http.port: 9205
transport.port: 9305
discovery.seed_hosts: ["172.27.229.9:9300", "172.27.229.9:9301","172.27.229.9:9302",]
cluster.initial_master_nodes: ["node01","node02","node03"]
http.cors.enabled: true
http.cors.allow-origin: "*"
node.master: false
node.data: false
node.max_local_storage_nodes: 6