第一章、ES概述

1.概念介绍

查询：宽泛的概念！只要将某个东西查询出来！

精确查询：
模糊查询：

搜索：一种特定的查询！搜索一般指通过某个关键字，检索出和关键字相关的信息！

搜索引擎，不适合使用关系型数据库存储数据！

原因： ①在搜索时，只输入关键字，希望可以得到匹配关键字的所有的数据！如果使用数据库，在查询时一定需要模糊查询，模糊查询会导致索引失效，全表扫描！效率低！

select xxx from xxx where xxx like  %aaa%  //索引失效，有索引，查询引擎不会用
select xxx from xxx where xxx like  aaa%  //索引有效，加速查询

②关系型数据库查询时，不能分词，联想，得到的不是期望的结果！

2.几个框架

solr : 和es的作用是一样的，都是用于搜索！

solr一般用于中小数据量的静态搜索(数据，很少发生变化)！
es可以用于PB级别数据量的动态搜索(数据可能会不断新增，变化)！

效率上： solr（老大哥）：小数据量，静态搜索，优于es！

    solr在插入数据时，创建索引会有IO阻塞，效率低！
    es(新人) ：  大数量，动态搜索，优于solr！
es在插入数据时，创建索引，无阻塞！ 不是实时，接近实时搜索，延迟秒级！

依赖： solr 依赖 zk

es不依赖任何框架！

数据类型： solr 丰富： xml,json

 es 单一：  json

扩展性： es更容易扩展，天然集群！

Lucene: 搜索场景，常用的API集合！

本质是一个框架，可以集成到项目中，提供搜索场景常用的API，方便开发！
搜索工具包！
业界公认的非常优秀的搜索框架！

Nutch : 是一个可以直接使用的产品！基于lucene提供web浏览器的搜索产品！小型google!

ES ： es内置了Lucene，使Lunece变得更好用！使用RESTFUL风格，使用ES！

  直接通过浏览器，发送REST请求，使用ES完成数据的CRUD！

3.全文检索和倒排索引

全文检索：

最初的含义： 提供一个关键字，在整篇文章中，搜索和关键字匹配的片段！
应用开发含义：    提供一个关键字，在整个数据库中，搜索和关键字匹配的数据！

如果要实现全文检索，必须依赖倒排索引！

索引：是一种数据结构，加速查询！

类似一本百科全书的目录，根据目录直接跳转到感兴趣的书页！

正排索引：在mysql中创建的索引，在hbase中创建的索引，都属于正排索引！

举例： 《唐诗三百首》(数据库)
 目录(正排索引)：  诗名 ------>  哪一页 ------> 诗的内容
搜索 《静夜思》

倒排索引：

举例： 《唐诗三百首》(数据库)
目录(倒排索引)： 存储的不是诗名和页面的对应关系！
词语   ------>  在哪些诗中出现了，诗是哪一页
    明月-------->  《静夜思》 200页， 《xxx》300页
    搜索：包含明月的古诗有哪些
    搜索引擎都使用倒排索引！

4.ES的特点

天然分片：数据在写入时，会被分为若干片，每一片会分布到集群的不同节点！

优势：横向扩容！负载均衡！提高并行IO能力！

天然集群：一台ES实例也可以组成一个集群！方便扩容！如果集群需要增加节点！

只需要在其他节点安装ES，直接启动，自动在网段中寻找ES集群，自动加入集群！

天然索引： mysql和其他的数据库，需要手动创建索引！ ES在插入数据后自动创建索引！

文档：

https://www.elastic.co/guide/en/elasticsearch/reference/6.6/index.html

5.REST

REST是一种思想和理念！推崇使用标准的url路径，表达对资源的操作方式！本质是为了简化和规范url路径的写法！

没有REST之前：在浏览器发送一个url时，可以随意写

举例：查询1号员工

[http://hadoop102:8088/gmall/getEmployeeById?id=1](http://hadoop102:8088/gmall/getEmployeeById?id=1)
[http://hadoop102:8088/gmall/findEmployeeById?id=1](http://hadoop102:8088/gmall/findEmployeeById?id=1)
    [http://hadoop102:8088/gmall/retreveEmployeeById?id=1](http://hadoop102:8088/gmall/retreveEmployeeById?id=1)
    [http://hadoop102:8088/gmall/queryEmployeeById?id=1](http://hadoop102:8088/gmall/queryEmployeeById?id=1)
    [http://hadoop102:8088/gmall/tongguoidchaxunyuangong?id=1](http://hadoop102:8088/gmall/tongguoidchaxunyuangong?id=1)

规范： /资源/id

可使用不同的请求方式，表达对资源的操作意图！

REST : /Employee/1

发送GET，代表查询
发送POST，代表新增
发送PUT，代表修改
发送DELETE ,代表删除
发送HEAD ， 判断是否存在

http://hadoop102:8088/gmall/Emp/1 GET

框架使用RESTFUL的开发理念！这个框架支持REST风格的API操作！

6.B-tree

B(balance)-tree： B树，多路平衡(自愈)树

B+tree： B-tree的改进

LSM树(mysql,hbase)

第二章、ES安装

1.安装包下载

官网: https://www.elastic.co/cn/downloads/elasticsearch

本次学习基于6.6.0版本

ElasticSearch笔记 - 图1

2.将安装包上传到linux上并解压

一.安装

# 1.解压elasticsearch-6.6.0.tar.gz到/opt/module目录下
tar -zxvf elasticsearch-6.6.0.tar.gz -C /opt/module/
# 2.在/opt/module/elasticsearch-6.6.0路径下创建data文件夹
mkdir data

# 3.修改配置文件(config/elasticsearch.yml)
#-----------------------Cluster-----------------------
cluster.name: my-application
#-----------------------Node-----------------------
node.name: node-102
#-----------------------Paths-----------------------
path.data: /opt/module/elasticsearch-6.6.0/data
path.logs: /opt/module/elasticsearch-6.6.0/logs
#-----------------------Memory-----------------------
bootstrap.memory_lock: false
bootstrap.system_call_filter: false
#-----------------------Network-----------------------
network.host: hadoop102 
#-----------------------Discovery-----------------------
discovery.zen.ping.unicast.hosts: ["hadoop102","hadoop103","hadoop104"]

# 4.将 /opt/module/elasticsearch 分发至各节点
xsync /opt/module/elasticsearch
# 5.修改hadoop103,hadoop104上的配置文件(修改node.name,network.host)

二.配置Linux系统环境

参考:http://blog.csdn.net/satiling/article/details/59697916

# 1.借用root权限,编辑/etc/security/limits.conf 添加类似如下内容,注意*不要省略
* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 4096
# 2.借用root权限修改配置sysctl.conf (/etc/sysctl.conf)
#添加如下配置
vm.max_map_count=655360
#并执行命令
sysctl -p
#3.以上修改的配置分发到各节点
xsync /etc/security/limits.conf
xsync /etc/sysctl.conf
#4.重启linux

三.启动elasticsearch

[atguigu@hadoop102 elasticsearch]$ bin/elasticsearch

打开浏览器访问hadoop102:9200

ElasticSearch笔记 - 图2

群起脚本

[atguigu@hadoop102 bin]$ vi es.sh
#!/bin/bash
es_home=/opt/module/elasticsearch-6.6.0
case $1  in
 "start") {
  for i in hadoop102 hadoop103 hadoop104
  do
    echo "==============$i=============="
ssh $i  "source /etc/profile;${es_home}/bin/elasticsearch >/dev/null 2>&1 &"
sleep 4s;
  done
};;
"stop") {
  for i in hadoop102 hadoop103 hadoop104
  do
    echo "==============$i=============="
    ssh $i "ps -ef|grep $es_home |grep -v grep|awk '{print \$2}'|xargs kill" >/dev/null 2>&1
  done
};;
esac

3.Kibana

一.安装

#1.解压kibana-6.6.0-linux-x86_64.tar.gz到/opt/module下
tar -zxvf kibana-6.6.0-linux-x86_64.tar.gz -C /opt/module/
mv kibana-6.6.0-linux-x86_64/ kibana/
#2.修改配置文件
vim config/kibana.yml
server.port: 5601
server.host: "hadoop102"
eleasticsearch.hosts: ["http://hadoop102:9200"]

二.启动kibana(先启动eleasticsearch)

[atguigu@hadoop102 kibana]$ bin/kibana

打开浏览器访问 hadoop102:5601

ElasticSearch笔记 - 图3

三.修改之前es的启动脚本

#!/bin/bash
es_home=/opt/module/elasticsearch-6.6.0
kibana_home=/opt/module/kibana
case $1  in
 "start") {
  for i in hadoop102 hadoop103 hadoop104
  do
    echo "==============$i=============="
ssh $i  "source /etc/profile;${es_home}/bin/elasticsearch >/dev/null 2>&1 &"
sleep 4s;
  done
  sleep 2s;
  nohup ${kibana_home}/bin/kibana > kibana.log 2>&1 &
};;
"stop") {
  ps -ef | grep ${kibana_home} | grep -v grep | awk '{print $2}'| xargs kill
  for i in hadoop102 hadoop103 hadoop104
  do
    echo "==============$i=============="
    ssh $i "ps -ef|grep $es_home |grep -v grep|awk '{print \$2}'|xargs kill" >/dev/null 2>&1
  done
};;
esac

第三章、ES操作

1.管理性命令

GET /_cat
# 带_xxx，都是系统内置的关键字
#查看节点状况
GET /_cat/nodes?v
#查看健康状况
GET /_cat/health
#查看所有的index
get /_cat/indices

2.index操作

#一个库
#查index
#查看所有的index
GET /_cat/indices
#查看某个index的信息
GET /_cat/indices/.kibana_1
#查看某个index的元数据信息
GET /stu1
##查看某个index的表结构
GET /.kibana_1/_mapping
#新增Index
#手动创建  需要在创建index时指定mapping信息
#6.0版本一个Index只能创建一个type，名称随意
PUT stu
{
  "mappings": {
    "table1":{
      "properties":{
        "id":{
          "type":"keyword"
        },
        "name":{
          "type":"text"
        },
        "sex":{
          "type":"integer"
        },
        "birth":{
          "type":"date"
        }
      }
    }
  }
}
#自动创建  直接向一个不存在的Index插入数据，在插入数据时，系统根据数据的类型，自动推断mapping，自动创建mapping
# POST  /indexname/typename/id
POST /stu1/table1/1
{
  "id":"1001",
  "name":"jack"
}
#删除index
DELETE /stu1
#修改index   需要执行迁移操作，从一个index读取数据，写入一个新的index
#判断是否存在index  404 - Not Found代表不存在 ，200代表存在
HEAD /stu

3.type操作


#type就等价于index
#7.0之后没有type的概念了，6.0一个index只允许创建一个type，因此index 等价于 type
#查 type  和查index一致
#删除type 就是删除index
#创建type 就是创建index
#判断type是否存在 405 - Method Not Allowed 判断index

4.数据操作


#查
#全表查询
GET /stu/table1/_search
#查询单个元素 GET /indexname/typename/id
# _id才是唯一标识
GET /stu/table1/1
#增
#POST  /indexname/typename/id
POST /stu/table1/2
{
  "id":"tom",
  "name":"tom"
}
#POST也可以实现更新操作，如果当前记录的ID不存在，就insert，存在就update。 更新是全量更新
POST /stu/table1/2
{
  "id":"1003"
}
#POST新增，不指定ID，就随机生成ID
POST /stu/table1/
{
  "id":"tom",
  "name":"tom"
}
#增量更新
#400 : 客户端发送的参数不符合要求
#404 ： 客户端发送的url路径匹配不上
#405 ：  客户端发送的url，对应的请求方式不符合
POST /stu/table1/rx4wNHwBb4g3p3m-lruA/_update
{
  "doc": {
    "id":"1003"
  }
}
#改 PUT
#新增   PUT在新增时，必须指定id!
PUT /stu/table1/3
{
  "id":"1003",
  "name":"marry"
}
#405    /stu/table1/只允许POST，不允许PUT
PUT /stu/table1/
{
  "id":"1003",
  "name":"marry"
}
#id存在就更新，不存在就插入，默认也是全量更新
PUT /stu/table1/3
{
  "name":"jack"
}
#不能增量更新
PUT /stu/table1/rx4wNHwBb4g3p3m-lruA/_update
{
  "doc": {
    "id":"1004"
  }
}
# 4xxx开头的都是客户端错误
# 405: 客户端发送的请求方式错误，例如只允许发POST，你发了PUT
# 400 : 请求参数格式错误。没有按照人家指定的格式发参数
#删
DELETE /stu/table1/rx4wNHwBb4g3p3m-lruA
#判断是否存在
HEAD /stu/table1/rx4wNHwBb4g3p3m-lruA
HEAD /stu/table1/1

5.分词操作


# text(允许分词)   keyword(不允许分词)
# 默认的分词器，用来进行英文分词，按照空格分
GET /_analyze
{
  "text": "I am a teacher!"
}
#不能分词
GET /_analyze
{
  "keyword": "I am a teacher!"
}
# 汉语按照字切分
GET /_analyze
{
  "text": "国庆节快乐"
}
#ik_smart：  智能分词。切分后的所有单词的总字数等于 被切词的总字数   输入总字数=输出总字数
GET /_analyze
{
  "analyzer": "ik_smart", 
  "text": "国庆节快乐"
}
#ik_max_word： 最大化分词。 输入总字数 <= 输出总字数
GET /_analyze
{
  "analyzer": "ik_max_word", 
  "text": "国庆节快乐"
}
#只是切词，没有NLP(自然语言处理)，没有感情，不会思考，听不懂人话
GET /_analyze
{
  "analyzer": "ik_max_word", 
  "text": "爱好抽烟喝酒烫头洗屁股眼子"
}

6.子属性

java中：
public class Person{
    public String name;
    public Address address;
}
public class Address{
        public String provinceName;
}
provinceName称为是Person类的 级联(层级联系)属性， 或子属性(属性的属性)
json中：
person:
{
    age: 20
  address:{
      "provinceName":"广东"
  }
}

注意：

 "name" : {
            "type" : "text",
            "fields" : {
              "aaa" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
     text类型的字段，如果将来需要聚合，一定需要为其设置一个子属性，子属性的类型必须是keyword类型！

7.批量导入数据语法

#导入数据：
#_bulk代表批量写
#格式 ：  {"action": {metadata}}\n {data}
# action: insert,update,delete, index(upsert): 存在就更新，不存在就插入
#metadata 指定当前向哪个index，哪个type，哪个id进行写
#_id: id  _index:xxx _type:哪个type

8.DSL中的常见关键字

关键字	含义	类比SQL
query	查询	select
bool	多个组合条件	selext xxx from xxx where age=20 and gender=male
filter	一个过滤条件	where
term	精确匹配	=
match	全文检索，会分词
must	在过滤条件中使用，代表必须包含
fuzzy	模糊音匹配	dick 联想到 nick pick
from	从哪一条开始取,索引从0开始
size	取多少条	limit
_source	只选择某些字段	select 字段
match_phrase	短语匹配，将输入的查询内容整个作为整体进行查询，不切词
multi_match	一次到多个子弹中匹配内容

第四章、聚合

1.结构

aggregations|aggs
"aggregations" : 
{
    --aggregation_name：聚合字段名
    "<aggregation_name>" : 
    {
      --聚合运算的类型，类比,sum,avg,count(Term),min,max    sum(）
        "<aggregation_type>" :
        {
                --num 对什么字段进行聚合
            <aggregation_body>
        }
        -- 对哪些表进行聚合，类比tablea，不写，将meta写在url
        [,"meta" : {  [<meta_data_body>] } ]?
        --子聚合，在当前聚合的基础上，继续聚合
        [,"aggregations" : { [<sub_aggregation>]+ } ]?
    }
--
    [,"<aggregation_name_2>" : { ... } ]*
}
count 等价于  term 
count(*)  ========  sum(if(gender = 'male',1,0))
select
    a,max(sum_num) --子聚合
from
    (select
        a,b,sum(num) sum_num，max(num) max_num
    from tablea
    where xxx
    group by a,b) tmp
group by a

2.聚合报错


 "type": "illegal_argument_exception",
  "reason": "Fielddata is disabled on text fields by default. Set fielddata=true on [gender] in order to load fielddata in memory by uninverting the inverted index. 
  Note that this can however use significant memory. Alternatively use a keyword field instead."
TEXT类型，因为涉及到分词，无法被聚合！
 解决： 使用KEYWORD类型
a_column(text)
中国人  ------> 中国，国人，中国人

3.聚合练习

- 见第五章综合练习

第五章、综合练习

#导入测试数据
#建表
PUT /test
{
    "mappings" : {
      "emps" : {
        "properties" : {
          "empid" : {
            "type" : "long"
          },
          "age" : {
            "type" : "long"
          },
          "balance" : {
            "type" : "double"
          },
          "name" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
           "gender" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "hobby" : {
            "type" : "text",
            "analyzer":"ik_max_word",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }
#导数据
POST /test/emps/_bulk
{"index":{"_id":"1"}}
{"empid":1001,"age":20,"balance":2000,"name":"李三","gender":"男","hobby":"吃饭睡觉"}
{"index":{"_id":"2"}}
{"empid":1002,"age":30,"balance":2600,"name":"李小三","gender":"男","hobby":"吃粑粑睡觉"}
{"index":{"_id":"3"}}
{"empid":1003,"age":35,"balance":2900,"name":"张伟","gender":"女","hobby":"吃,睡觉"}
{"index":{"_id":"4"}}
{"empid":1004,"age":40,"balance":2600,"name":"张伟大","gender":"男","hobby":"打篮球睡觉"}
{"index":{"_id":"5"}}
{"empid":1005,"age":23,"balance":2900,"name":"大张伟","gender":"女","hobby":"打乒乓球睡觉"}
{"index":{"_id":"6"}}
{"empid":1006,"age":26,"balance":2700,"name":"张大喂","gender":"男","hobby":"打排球睡觉"}
{"index":{"_id":"7"}}
{"empid":1007,"age":29,"balance":3000,"name":"王五","gender":"女","hobby":"打牌睡觉"}
{"index":{"_id":"8"}}
{"empid":1008,"age":28,"balance":3000,"name":"王武","gender":"男","hobby":"打桥牌"}
{"index":{"_id":"9"}}
{"empid":1009,"age":32,"balance":32000,"name":"王小五","gender":"男","hobby":"喝酒,吃烧烤"}
{"index":{"_id":"10"}}
{"empid":1010,"age":37,"balance":3600,"name":"赵六","gender":"男","hobby":"吃饭喝酒"}
{"index":{"_id":"11"}}
{"empid":1011,"age":39,"balance":3500,"name":"张小燕","gender":"女","hobby":"逛街,购物,买"}
{"index":{"_id":"12"}}
{"empid":1012,"age":42,"balance":3400,"name":"李三","gender":"男","hobby":"逛酒吧,购物"}
{"index":{"_id":"13"}}
{"empid":1013,"age":42,"balance":3400,"name":"李球","gender":"男","hobby":"体育场,购物"}
{"index":{"_id":"14"}}
{"empid":1014,"age":22,"balance":3400,"name":"李健身","gender":"男","hobby":"体育场,购物"}
{"index":{"_id":"15"}}
{"empid":1015,"age":22,"balance":3400,"name":"Nick","gender":"男","hobby":"坐飞机,购物"}

#0.查询的两种方式
#①.RESTFUL的查询方式，参数是需要附加在url的后面
#②ES定义的DSL(特定领域语言)，需要根据DSL的语法规则将参数写在请求体中
#1.全表查询，按照年龄降序排序
#① RESTFUL   知道在ES中，不同的参数是什么操作 q代表查询 sort代表排序
GET /test/emps/_search?q=*&sort=age:desc
#②DSL  学习DSL的语法规则
GET /test/emps/_search
{
  "query": {
    "match_all": {
    }
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ]
}
#2.全表查询，按照年龄降序排序，再按照工资降序排序，只取前5条记录的empid，age，balance
GET /test/emps/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    },
    {
      "balance": {
        "order": "desc"
      }
    }
  ],
  "from": 0
  , "size": 5,
  "_source": ["empid","age","balance"]
}
#3.匹配之match分词匹配： 搜索hobby是吃饭睡觉的员工
GET /_analyze
{
  "analyzer": "ik_max_word",
  "text": "吃饭睡觉"
}
GET /test/emps/_search
{
  "query": {
    "match": {
      "hobby": "吃饭睡觉"
    }
  }
}
#4.匹配之match/term不分词匹配： 搜索工资是2000的员工  
#只有text类型可以切词，balance是double类型，无法切词
#ES不建议对无法切词的类型，使用 match
GET /test/emps/_search
{
  "query": {
    "match": {
      "balance": 2000
    }
  }
}
#  匹配之term不分词匹配： 搜索工资是2000的员工 
GET /test/emps/_search
{
  "query": {
    "term": {
      "balance": 2000
    }
  }
}
# 
#5.匹配之match不分词匹配： 搜索hobby是吃饭睡觉的员工
# keyword类型不能切词，只需要使用 一个 keyword类型的hobby就行了
GET /test/emps/_search
{
  "query": {
    "match": {
      "hobby.keyword": "吃饭睡觉"
    }
  }
}
#6.匹配之短语匹配： 搜索hobby是吃饭的员工
GET /test/emps/_search
{
  "query": {
    "match_phrase": {
      "hobby": "吃饭睡觉"
    }
  }
}
#7.匹配之多字段匹配： 搜索name或hobby中带球的员工
GET /test/emps/_search
{
  "query": {
    "multi_match": {
      "query": "球",
      "fields": ["name","hobby"]
    }
  }
}
#8.匹配之多条件匹配，搜索男性中喜欢购物的员工
GET /test/emps/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "hobby": "购物"
          }
        },
        {
          "term": {
            "gender": {
              "value": "男"
            }
          }
        }
      ]
    }
  }
}
#9.匹配之多条件匹配，搜索男性中喜欢购物，还不能爱去酒吧的员工
GET /test/emps/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "hobby": "购物"
          }
        },
        {
          "term": {
            "gender": {
              "value": "男"
            }
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "hobby": "酒吧"
          }
        }
      ]
    }
  }
}
#10.匹配之多条件匹配，搜索男性中喜欢购物，还不能爱去酒吧的员工，最好在20-30之间
#should 加分
GET /test/emps/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "hobby": "购物"
          }
        },
        {
          "term": {
            "gender": {
              "value": "男"
            }
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "hobby": "酒吧"
          }
        }
      ],
      "should": [
        {
          "range": {
            "age": {
              "gt": 20,
              "lt": 30
            }
          }
        }
      ]
    }
  }
}
#11.匹配之多条件匹配，搜索男性中喜欢购物，还不能爱去酒吧的员工，最好在20-30之间，不要40岁以上的
GET /test/emps/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "hobby": "购物"
          }
        },
        {
          "term": {
            "gender": {
              "value": "男"
            }
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "hobby": "酒吧"
          }
        },
        {
          "range": {
            "age": {
              "gt": 40
            }
          }
        }
      ],
      "should": [
        {
          "range": {
            "age": {
              "gt": 20,
              "lt": 30
            }
          }
        }
      ]
    }
  }
}
GET /test/emps/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "hobby": "购物"
          }
        },
        {
          "term": {
            "gender": {
              "value": "男"
            }
          }
        }
      ],
      "must_not": [
        {
          "match": {
            "hobby": "酒吧"
          }
        }
      ],
      "should": [
        {
          "range": {
            "age": {
              "gt": 20,
              "lt": 30
            }
          }
        }
      ],
      "filter": {
        "range": {
          "age": {
            "lte": 40
          }
        }
      }
    }
  }
}
#12.匹配之字段模糊联想匹配，搜索Nick
GET /test/emps/_search
{
  "query": {
    "fuzzy": {
      "name": "Dick"
    }
  }
}
#13.聚合之单聚合，统计男女员工各多少人
#如果想取全部的聚合结果，size >= 分组数
GET /test/emps/_search
{
  "aggs": {
    "gendercount": {
      "terms": {
        "field": "gender.keyword",
        "size": 2
      }
    }
  }
}
#14.聚合之先查询再聚合，统计喜欢购物的男女员工各多少人
GET /test/emps/_search
{
  "query": {
    "match": {
      "hobby": "购物"
    }
  }, 
  "aggs": {
    "gendercount": {
      "terms": {
        "field": "gender.keyword",
        "size": 2
      }
    }
  }
}
#15.聚合之多聚合，统计喜欢购物的男女员工各多少人，及这些人总体的平均年龄
GET /test/emps/_search
{
  "query": {
    "match": {
      "hobby": "购物"
    }
  }, 
  "aggs": {
    "gendercount": {
      "terms": {
        "field": "gender.keyword",
        "size": 2
      }
    },
    "avgage":{
      "avg": {
        "field": "age"
      }
    }
  }
}
#16.聚合之多聚合和嵌套聚合，统计喜欢购物的男女员工各多少人，及这些人不同性别的平均年龄
GET /test/emps/_search
{
  "query": {
    "match": {
      "hobby": "购物"
    }
  },
  "aggs": {
    "gendercount": {
      "terms": {
        "field": "gender.keyword",
        "size": 2
      },
      "aggs": {
        "avgage": {
          "avg": {
            "field": "age"
          }
        }
      }
    }
  }
}

第六章、别名

1.对应关系

别名和索引是N对N的关系！

1个别名对于 N个索引！

1个索引可以拥有多个别名！

别名的主要应用场景：

在hive中有分区表，常见按照数据的日期分区。比如表ods_a,按照dt分区
    / ods_a /  dt= 2021-07-07
    / ods_a /  dt= 2021-07-08
只查询某一天的数据，使用分区字段进行过滤
    where dt=  2021-07-07
如果是全表查询，不加where过滤！

在ES中，如何实现一个分区表的效果？

要实现分区的效果：
    只能将每天产生的数据，放入到一个独立的index中
2021-07-07  ----------> ods_a_2021-07-07_index
    2021-07-08  ----------> ods_a_2021-07-08_index
       只查询某一天的数据，只查询某个对应的index
2021-07-07 ------>  GET ods_a_2021-07-07_index
       查询这个月的所有数据？
这个月的index在创建时，为它们赋予一个别名  2021-07_index
    使用别名查询：  GET  2021-07_index
查询每一天所有的数据？
每个index在创建时，为它们赋予一个别名  ods_a_index
使用别名查询：  GET  ods_a_index

2.别名练习

#别名的查询
#查询所有的别名
GET /_cat/aliases?v
#查某个index的别名
GET /movie_index/_alias
#增
#在创建Index时，直接指定
PUT movie_index
{  
  "aliases": {
    "movie1": {},
     "movie2": {}
  },
  "mappings": {
    "movie_type":{
      "properties": {
        "id":{
          "type": "long"
        },
        "name":{
          "type": "text",
          "analyzer": "ik_smart"
        }
      }
    }
  }
}
#为已经创建好的index，添加别名
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "movie_index",
        "alias": "movie3"
      }
    }
  ]
}
#使用别名来引用一个index的子集
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "test",
        "alias": "man",
        "filter": {
          "term": {
            "gender": "男"
          }
        }
      }
    }
  ]
}
GET /man/_search
#将movie_index的别名 movie3删除，为test添加movie3
POST _aliases
{
  "actions": [
    {
      "remove": {
        "index": "movie_index",
        "alias": "movie3"
      }
    },
    {
      "add": {
        "index": "test",
        "alias": "movie3"
      }
    }
  ]
}

第七章、模版

1.模版练习

#查看
#查看当前所有定义的模板
GET /_cat/templates
#新增
#index_patterns 指当你创建的索引名称符合当前模板的index_patterns时，调用模板帮你创建index
PUT /_template/template_movie2020
{
  "index_patterns": ["movie_test*"],
  "aliases" : { 
    "{index}-query": {},
    "movie_test-query":{}
  },
  "mappings": { 
"_doc": {
      "properties": {
        "id": {
          "type": "keyword"
        },
        "movie_name": {
          "type": "text",
          "analyzer": "ik_smart"
        }
      }
    }
  }
}
GET /test
#Rejecting mapping update to [movie_index] as the final mapping would have more than 1 type: [movie_type, t1]
#movie2 是一个别名,指向movie_index
# PUT /movie_index/t1/1
# movie_index 的唯一type 是movie_type,你又指定了t1，冲突了
PUT /movie2/t1/1
{
  "name":"jack"
}
GET /_cat/aliases
GET /movie_index
PUT /hahah/t1/1
{
  "name":"jack"
}
GET /movie_test2
PUT /movie_test2/_doc/1
{
  "name":"jack"
}
HEAD /_template/template_movie2020

第八章、Java API操作

1.准备工作

新建maven工程,导入依赖

<dependency>
        <groupId>org.apache.httpcomponents</groupId>
        <artifactId>httpclient</artifactId>
        <version>4.5.5</version>
    </dependency>
    <dependency>
        <groupId>org.apache.httpcomponents</groupId>
        <artifactId>httpmime</artifactId>
        <version>4.3.6</version>
    </dependency>
    <dependency>
        <groupId>io.searchbox</groupId>
        <artifactId>jest</artifactId>
        <version>5.3.3</version>
    </dependency>
    <dependency>
        <groupId>net.java.dev.jna</groupId>
        <artifactId>jna</artifactId>
        <version>4.5.2</version>
    </dependency>
    <dependency>
        <groupId>org.codehaus.janino</groupId>
        <artifactId>commons-compiler</artifactId>
        <version>2.7.8</version>
    </dependency>
    <dependency>
        <groupId>org.elasticsearch</groupId>
        <artifactId>elasticsearch</artifactId>
        <version>6.6.0</version>
    </dependency>
    <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.18.12</version>
            <scope>provided</scope>
   </dependency>

javabean(Emp.java)

package com.atgugu.esdemo.pojo;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
@NoArgsConstructor
@AllArgsConstructor
@Data
public class Emp {
    private String empid;
    private Integer age;
    private Double balance;
    private String name;
    private String gender;
    private String hobby;
}

2.读数据

package com.atgugu.esdemo;
import com.atgugu.esdemo.pojo.Emp;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.Search;
import io.searchbox.core.SearchResult;
import java.io.IOException;
import java.util.List;
/**
 * 一般步骤
 *    1.创建一个客户端
 *    2.连接服务端
 *    3.准备命令
 *    4.发送命令
 *    5.如果是查询,接收服务端返回的结果
 *    -------------------------------------
 *    Jest客户端大量使用以下两种模式
 *    工厂模式: new 对象Factory().get对象()
 *    建筑者模式: new 对象Builder().build()
 *    在建筑者模式中大量使用了java语法糖
 *    A.B() 返回 A
 *    -------------------------------------
 */
public class ReadDemo01 {
    public static void main(String[] args) throws IOException {
        //建厂
        JestClientFactory jestClientFactory = new JestClientFactory();
        //设置连接的集群地址
        HttpClientConfig httpClientConfig = (new HttpClientConfig.Builder("http://hadoop102:9200")).build();
        jestClientFactory.setHttpClientConfig(httpClientConfig);
        //获取连接
        JestClient jestClient = jestClientFactory.getObject();
        String queryString = "{\n" +
                "  \"query\": {\n" +
                "    \"match\": {\n" +
                "      \"hobby\": \"购物\"\n" +
                "    }\n" +
                "  },\n" +
                "  \"aggs\": {\n" +
                "    \"gendercount\": {\n" +
                "      \"terms\": {\n" +
                "        \"field\": \"gender.keyword\",\n" +
                "        \"size\": 2\n" +
                "      },\n" +
                "      \"aggs\": {\n" +
                "        \"avgage\": {\n" +
                "          \"avg\": {\n" +
                "            \"field\": \"age\"\n" +
                "          }\n" +
                "        }\n" +
                "      }\n" +
                "    }\n" +
                "  }\n" +
                "}";
        // 使用 GET /test/emps/_search
        Search search = new Search.Builder(queryString)
                .addIndex("test")
                .addType("emps")
                .build();
        SearchResult searchResult = jestClient.execute(search);
        //遍历返回最后的结果
        System.out.println("total:"+ searchResult.getTotal());
        System.out.println("max_score:"+ searchResult.getMaxScore());
        List<SearchResult.Hit<Emp, Void>> hits = searchResult.getHits(Emp.class);
        for (SearchResult.Hit<Emp, Void> hit : hits) {
            System.out.println("_index:"+hit.index);
            System.out.println("_type:"+hit.type);
            System.out.println("_id:"+hit.id);
            System.out.println("_source:"+hit.source);
        }
        //关闭
        jestClient.shutdownClient();
    }
}

3.读数据(面向对象)

package com.atgugu.esdemo;
import com.atgugu.esdemo.pojo.Emp;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.Search;
import io.searchbox.core.SearchResult;
import io.searchbox.core.search.aggregation.AvgAggregation;
import io.searchbox.core.search.aggregation.MetricAggregation;
import io.searchbox.core.search.aggregation.TermsAggregation;
import org.elasticsearch.index.query.MatchQueryBuilder;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.bucket.terms.TermsAggregationBuilder;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import java.io.IOException;
import java.util.List;
/**
 * 一般步骤
 *    1.创建一个客户端
 *    2.连接服务端
 *    3.准备命令
 *    4.发送命令
 *    5.如果是查询,接收服务端返回的结果
 *    -------------------------------------
 *    Jest客户端大量使用以下两种模式
 *    工厂模式: new 对象Factory().get对象()
 *    建筑者模式: new 对象Builder().build()
 *    在建筑者模式中大量使用了java语法糖
 *    A.B() 返回 A
 *    -------------------------------------
 */
public class ReadDemo02 {
    public static void main(String[] args) throws IOException {
        //建厂
        JestClientFactory jestClientFactory = new JestClientFactory();
        //设置连接的集群地址
        HttpClientConfig httpClientConfig = (new HttpClientConfig.Builder("http://hadoop102:9200")).build();
        jestClientFactory.setHttpClientConfig(httpClientConfig);
        //获取连接
        JestClient jestClient = jestClientFactory.getObject();
        //创建一个对象,通过这个对象,将查询条件封装
        //封装match
        MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder("hobby", "购物");
        //封装aggs
        TermsAggregationBuilder aggregationBuilder  = AggregationBuilders.terms("gendercount").field("gender.keyword").size(2)
                .subAggregation(AggregationBuilders.avg("avgage").field("age"));
        //将match放入query
        String querySource = new SearchSourceBuilder().query(matchQueryBuilder).aggregation(aggregationBuilder).toString();
        // 使用 GET /test/emps/_search
        Search search = new Search.Builder(querySource)
                .addIndex("test")
                .addType("emps")
                .build();
        SearchResult searchResult = jestClient.execute(search);
        //遍历返回最后的结果
        System.out.println("total:"+ searchResult.getTotal());
        System.out.println("max_score:"+ searchResult.getMaxScore());
        List<SearchResult.Hit<Emp, Void>> hits = searchResult.getHits(Emp.class);
        for (SearchResult.Hit<Emp, Void> hit : hits) {
            System.out.println("_index:"+hit.index);
            System.out.println("_type:"+hit.type);
            System.out.println("_id:"+hit.id);
            System.out.println("_source:"+hit.source);
        }
        MetricAggregation aggregations = searchResult.getAggregations();
        TermsAggregation genderCount = aggregations.getTermsAggregation("gendercount");
        List<TermsAggregation.Entry> buckets = genderCount.getBuckets();
        for (TermsAggregation.Entry bucket : buckets) {
            System.out.println(bucket.getKey() + ":" + bucket.getCount());
            AvgAggregation avgage = bucket.getAvgAggregation("avgage");
            System.out.println(avgage.getAvg());
        }
        //关闭
        jestClient.shutdownClient();
    }
}

4.写数据(新增)

package com.atgugu.esdemo;
import com.atgugu.esdemo.pojo.Emp;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.DocumentResult;
import io.searchbox.core.Index;
import java.io.IOException;
import java.util.List;
/**
 *  新增或修改:index
 *  删除:Delete
 *
 */
public class WriteDemo01 {
    public static void main(String[] args) throws IOException {
        //建厂
        JestClientFactory jestClientFactory = new JestClientFactory();
        //设置连接的集群地址
        HttpClientConfig httpClientConfig = (new HttpClientConfig.Builder("http://hadoop102:9200")).build();
        jestClientFactory.setHttpClientConfig(httpClientConfig);
        //获取连接
        JestClient jestClient = jestClientFactory.getObject();
        //将写的数据封装为一个对象
        Emp emp = new Emp("1018", 30, 22.22, "jack", "男", "吃饭");
        //PUT /test/emps/16
        Index index = new Index.Builder(emp)
                .type("emps")
                .index("test")
                .id("18")
                .build();
        DocumentResult result = jestClient.execute(index);
        System.out.println(result.getResponseCode());
        //关闭
        jestClient.shutdownClient();
    }
}

5.写数据(批量写)

package com.atgugu.esdemo;
import com.atgugu.esdemo.pojo.Emp;
import io.searchbox.client.JestClient;
import io.searchbox.client.JestClientFactory;
import io.searchbox.client.config.HttpClientConfig;
import io.searchbox.core.*;
import java.io.IOException;
/**
 *  新增或修改:index
 *  删除:Delete
 *  批量写:Bulk
 *
 */
public class WriteDemo02 {
    public static void main(String[] args) throws IOException {
        //建厂
        JestClientFactory jestClientFactory = new JestClientFactory();
        //设置连接的集群地址
        HttpClientConfig httpClientConfig = (new HttpClientConfig.Builder("http://hadoop102:9200")).build();
        jestClientFactory.setHttpClientConfig(httpClientConfig);
        //获取连接
        JestClient jestClient = jestClientFactory.getObject();
        //将写的数据封装为一个对象
        Emp emp = new Emp("1018", 30, 22.22, "jack", "男", "吃饭");
        //PUT /test/emps/16
        Index index = new Index.Builder(emp)
                .type("emps")
                .index("test")
                .id("16")
                .build();
        Delete delete = new Delete.Builder("18").index("test").type("emps").build();
        //将多次操作组装到一个Bulk中
        Bulk bulk = new Bulk.Builder()
                .addAction(index)
                .addAction(delete).build();
        BulkResult bulkResult = jestClient.execute(bulk);
        System.out.println(bulkResult.getResponseCode());
        //关闭
        jestClient.shutdownClient();
    }
}