ES - ES操作命令指南 - 《Leo 的知识沉淀》

一. 请求形式
二. 常用命令
三. 文档的管理
四. 查询命令:_search
五:总结

[[TOC]]

一. 请求形式

curl是linux 命令
使用命令：curl [http://curl.haxx.se](http://curl.haxx.se)
这是最简单的使用方法。
用这个命令获得了[http://curl.haxx.se](http://curl.haxx.se)指向的页面。同样，如果这里的URL指向的是一个文件或者一幅图都可以直接下载到本地。如果下载的是HTML文档，那么缺省的将不显示文件头部，即HTML文档的header。要全部显示，请加参数 -i，要只显示头部，用参数 -I。任何时候，可以使用 -v 命令看curl是怎样工作的，它向服务器发送的所有命令都会显示出来。为了断点续传，可以使用-r参数来指定传输范围。
交互命令形式:

curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'

被 < > 标记的部件：

VERB            适当的 HTTP 方法 或 谓词 : GET`、 `POST`、 `PUT`、 `HEAD 或者 `DELETE`。
PROTOCOL        http 或者 https`（如果你在 Elasticsearch 前面有一个 `https 代理）
HOST             Elasticsearch 集群中任意节点的主机名，或者用 localhost 代表本地机器上的节点。
PORT            运行 Elasticsearch HTTP 服务的端口号，默认是 9200 。
PATH            API 的终端路径（例如 _count 将返回集群中文档数量）。Path 可能包含多个组件，例如：_cluster/stats 和 _nodes/stats/jvm 。
QUERY_STRING    任意可选的查询字符串参数 (例如 ?pretty 将格式化地输出 JSON 返回值，使其更容易阅读)
BODY            一个 JSON 格式的请求体 (如果请求需要的话)

例子: 计算集群文档的数量

curl -XGET 'http://master66:9200/_count?pretty' -d '
{
   "query": {
     "match_all": {}
   }
}'

如果是http请求,直接输入:

http://10.17.139.66:9200/_count?pretty' -d '{"query":{"match_all":{}}}'

二. 常用命令

1. 检查es版本信息

`curl master66:9200`

例:

>curl 10.37.139.66:9200
{
   "name" : "master66",
   "cluster_name" : "SERVICE-ELASTICSEARCH-d6b815c84b7b4619b03dfdb47f160f0e",
   "version" : {
     "number" : "2.1.1",
     "build_hash" : "${buildNumber}",
     "build_timestamp" : "2017-01-22T07:33:54Z",
     "build_snapshot" : false,
     "lucene_version" : "5.3.1"
   },
  "tagline" : "You Know, for Search"
}

2. 查看集群状态:_cat
_cat命令可以让你查看你有那些api去查看集群状态

curl http://master66:9200/_cat
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/indices
…….

命令说明：

•    Allocation: information about the resource allocation on each server in the cluster
•    Shards: information about the allocation of (specific) shards on each server in the cluster
•    Master: information about the master server in the cluster
•    Indices: information about (specific) indices in the cluster
•    Segments: information on how an index is segmented across several servers in the cluster
•    Count: count documents in (specific) indices
•    Recovery: information about shard recovery when a shard is moved to a different node in the cluster
•    Health: display the cluster health
•    Pending tasks: as the name indicates. What is the server doing right 
now?
•    Aliases: information about aliases given to specific indices
•    Thread pool: thread pool statistics per node
•    Plugins: a list of running plugins per node
•    Fielddata: information about loaded body & text fields per node

2.1 查看健康状态

1    http://master66:9200/_cat/health?v 
2   curl master66:9200/_cat/health?v

2.2 查看节点列表

1    http://master66:9200/_cat/nodes?v 
2   curl master66:9200/_cat/nodes?v

2.3 列出所有索引及存储大小

1    http://master66:9200/_cat/indices?v 
2   curl master66:9200/_cat/indices?v

3. 索引管理
3.1 创建索引
创建索引名为XX,默认会有5个分片，1个索引, 可以添加一些设置:

curl -XPUT 'http://master66:9200/ys_test'  -d  '{    
"settings" : {    
     "analysis" : {    
         "analyzer" : {    
             "ik" : {    
                "tokenizer" : "ik_max_word"    
              }    
         }    
     }    
},    
   "mappings" : {    
      "person" : {    
         "properties" : {    
             "name" : { "type" : "string","analyzer": "ik_max_word"},    
             "age" : {"type":"integer"},    
             "sex":{"type" : "string","analyzer": "ik_max_word"}    
             }    
        }    
    }    
}'

3.2 删除索引

curl -XDELETE http://master66:9200/ys_test

也可以删除多个索引:

curl –XDELETE http://master66:9200/index_one11,index_two22
curl -XDELETE http://master66:9200/index_*

3.3 查看属性:_settings

curl -XGET http://master66:9200/ys_test/_settings?pretty
curl -XGET http://master66:9200/ys_test/_mapping?pretty

4. 别名
4.1 创建别名

1) _alias:单个操作
2) _aliases:多个操作, (多个操作合起来是)原子性的操作

已知存在一个索引dm_v1, 为其添加一个别名dm_alias

1) curl -XPUT 'localhost:9200/dm_v1/_alias/dm_alias'
2) curl -XPOST 'http://localhost:9200/_aliases' -d '
{
     "actions" : [
         { "add" : { "index" : "dm_v1", "alias" : "dm_alias" } }
      ]
}'

4.2 删除别名:

1) curl -XDELETE 'localhost:9200/dm_v1/_alias/dm_alias'
2) curl -XPOST 'http://localhost:9200/_aliases' -d '
{
"actions" : [
    { "remove" : { "index" : "dm_v1", "alias" : "dm_alias" } }
 ]
}'

4.3 在线应用的索引迁移
删除别名的同时添加别名到新的索引，该操作是原子性的，不用担心存在别名没有指向任何索引的瞬间:

curl -XPOST 'http://localhost:9200/_aliases' -d '
{
    "actions" : [
         { "remove" : { "index" : "dm_v1", "alias" : "dm_alias" } },
         { "add" : { "index" : "dm_v2", "alias" : "dm_alias" } }
     ]
}'

应用: 如果在线应用有需求更改,需要重新设计索引.就可以利用别名在零停机下从旧索引迁移到新索引
4.4 通过别名查询所指向的索引

curl -XGET 'localhost:9200/_alias/dm_alias'
curl -XGET 'localhost:9200/_alias/dm*'

查询指向该索引下的所有别名：

curl -XGET 'localhost:9200/dm_v1/_alias/*'

4.5 别名的高级使用

聚合多个索引
POST /_aliases
{
“actions”: [
{
“add”: {
“index”: “dm_v1”,
“alias”: “dm”
}
},
{
“add”: {
“index”: “dm_v2”,
“alias”: “dm”
}
}
]
}

注:为索引dm_v1和索引dm_v2创建了一个共同的索引别名dm,这样在对dm的(仅限于)读,会同时作用于dm_v1和dm_v2。
2)filtered的别名

POST /_aliases
{
  "actions": [
    {
      "add": {
        "index": "my_index",
        "alias": "my_index__teamA_alias",
        "filter":{
            "term":{
                "team":"teamA"
            }
         }
       }
     },
    {
       "add": {
         "index": "my_index",
         "alias": "my_index__teamB_alias",
         "filter":{
             "term":{
                 "team":"teamB"
             }
         }
       }
     },
     {
        "add": {
          "index": "my_index",
          "alias": "my_index__team_alias"
        }
      }
   ]
}

GET /my_index__teamA_alias/_search 只能看到teamA的数据
GET /my_index__teamB_alias/_search 只能看到teamB的数据
GET /my_index__team_alias/_search 既能看到teamA的，也能看到teamB的数据

三. 文档的管理

1. 添加
指定: _index 、 _type 和 _id, 如果已经存在会强制覆盖
注: 在相同的 _index 、 _type 和 _id 不存在时才接受我们的添加请求URL 末端使用 /_create

curl -XPUT 'http://master66:9200/ys_test/person/1' -d '{
"name":"刘明","age":"40","sex":"男"
}'
curl -XPUT 'http://master66:9200/ys_test/person/2' -d '{
"name":"李明","age":"20","sex":"男"
}'
curl -XPUT 'http://master66:9200/ys_test/person/3' -d '{
"name":"李莉","age":"25","sex":"女"
}'
curl -XPUT 'http://master66:9200/ys_test/person/4' -d '{
"name":"刘明明","age":"35","sex":"男"
}'
curl -XPUT 'http://master66:9200/ys_test/person/5' -d '{
"name":"刘明宇","age":"40","sex":"男"
}'

ys_test/person/3/ 已经存在,下面的命令会报错

curl -XPUT 'http://master66:9200/ys_test/person/3/_create' -d '{
"name":"zhangyu2","age":"26"
}'

2. 删除
指定你要删除的文档. _index/_type/_id

curl -XDELETE 'http://master66:9200/ys_test/person/4'

3. 部分文档更新: _update
只更新部分文档时, json请求中加doc字段:

curl -XPOST 'master66:9200/ys_test/person/2/_update' -d '{
  "doc" :{
    "age" : 23,
    "name" : "李霞"
   }
}'

四. 查询命令:_search

1. DSL命令综述:

{
    size: # number of results to return (defaults to 10) 返回多少个值
    from: # offset into results (defaults to 0)  偏移量
    fields: # 只在指定字段查询,并返回指定字段.
    _source: # 全部字段查询,但是返回结果中只返回指定字段,
    sort:  # define sort order - see http://elasticsearch.org/guide/reference/api/search/sort.html
    query: {
         "query" object following the Query DSL: http://elasticsearch.org/guide/reference/query-dsl/},
    aggs: {
          # 统计数据
         Facets provide summary information about a particular field or fields in the data
    },
    filter: {
         #filter objects
         #a filter is a simple "filter" (query) on a specific field.
         #Simple means e.g. checking against a specific value or range of values
    },
}

2. query
2.1 query 和 filter
fiter是精确查询，对待的文档检索的结果是是/否；query对应文档检索是对文档相关性评分。
表现（性能Performance）区别：filter返回是和条件匹配的一个简单的列表这是很快可以计算得到的并且也很容易在内存中做缓存；query不仅要找到匹配的文档，而且还要计算每个文档的相关性（评分），这就很明显比filter花费更多的计算。
query与filter 区别如下：

query是要相关性评分的，filter不要；
query结果无法缓存，filter可以。

所以，选择参考：

(1) 全文搜索、评分排序，使用query；
(2) 是非过滤，精确匹配，使用filter。

注:

1 query和filter搜索命令是可以互用的. 2 不知道搜索那个字段时:可以用_all指定所有字段 3 index, type, id 也可以作为字段. 分别是_index, _type, _id

2.2 match: 查询语句匹配

{
  "query": {
      "match": {
         "name" : {   # content是es中的field
            "query" : "刘明"   # query是es中命令字段, 是特定的
            }
        }
  }
}

简写:

{
  "query": {
    "match": {
      "name" : "刘明" 
   }
  }
}

全字段搜索

{
  "query": {
    "match": {
      "_all" : "刘明" 
    }
  }
}

2.3 match_phrase: 精确匹配
不知道搜索那个字段时:可以用_all指定所有字段

{
  "query": {
     "match_phrase": {
        "name" : {
           "query" : "刘明明"
      }
    }
  }
}

简写

{
  "query": {
     "match_phrase": {
        "name" : "刘明明"
     }
   }
}

完全匹配可能比较严，我们会希望有个可调节因子，少匹配一个也满足，那就需要使用slop。

{
  "query": {
     "match_phrase": {
        "name" : {
           "query" : "刘明明",
           "slop" : 1
         }
      }
  }
}

4. multi_match: 多字段匹配
如果我们希望两个字段进行匹配，其中一个字段有这个文档就满足的话，使用multi_match

{
  "query": {
     "multi_match": {
        "query" : "刘明",
        "fields" : ["name", "sex"]
      }
   }
}

全匹配的文档占的评分比较高: 加字段"best_fields"
越多字段匹配的文档评分越高，就要使用most_fields
这个词条的分词词汇是分配到不同字段中的，那么就使用cross_fields

{
  "query": {
    "multi_match": {
        "query": "刘明",
        "type": "best_fields",
        "fields" : ["name", "sex"],
        "tie_breaker": 0.3
    }
  }
}

5. term and terms: 完全匹配, 不使用分词器
term 针对一个字段:

{
  "query": {
     "term": {
        "name": "刘明"
    }
  }
}

terms针对多个值, 相当于sql语句中的in

{
  "filter": {
    "terms": {
       "sex": [
           "男",
           "女"
       ]
    }
  }
}

6. 前缀查询(Prefix Query)

{
"query": {
   "prefix": {
       "name": "李"
     }
   }
}

7.通配符: wildcard
它使用标准的shell通配符?用来匹配任意字符，*用来匹配零个或者多个字符

{
    "query": {
       "wildcard": {
            "name": "李*" 
      }
   }
}

8.正则: regexp

{
   "query": {
        "regexp": {
        "name": "刘.*" 
      }
    }
}

9.ids 匹配多个id

{
  "query": {
      "ids": {
         "type": "my_type",
         "values": [
              "1",
              "4",
              "100"
          ]
      }
  }
}

10. range 范围查询

gt(>)、lt(< less then)、gte(>=)、lte(<=)

{
  "query": {
     "range": {
        "age": {
           "gt": 30
       }
     }
  }
}

11. exists and missing 字段是否存在
exists过滤指定字段没有值的文档, missing过滤缺失字段的文档

{
  "query": {
    "exists": {
      "field": "name"
    }
  }
}

12.组合过滤/查询
bool查询的使用

must
返回的文档必须满足must子句的条件，并且参与计算分值
filter
返回的文档必须满足filter子句的条件。但是不会像Must一样，参与计算分值
should
返回的文档可能满足should子句的条件。在一个Bool查询中，如果没有must或者filter，有一个或者多个should子句，那么只要满足一个就可以返回。
minimum_should_match参数定义了至少满足几个子句。
must_not
返回的文档必须不满足must_not定义的条件。

{
  "query": {
    "bool": {
      "must": {
        "term": {
          "sex": "男"
        }
     },
     "filter": {
       "match": {
         "name": "刘"
       }
    },
    "must_not": {
      "range": {
        "age": {
          "from": 10,
          "to": 38
         }
       }
      }
    }
  }
}

13. 聚合:_search
官方指南
聚合

•    高阶概念
•    尝试聚合
•    条形图
•    按时间统计
•    范围限定的聚合
•    过滤和聚合
•    多桶排序
•    近似聚合
•    通过聚合发现异常指标
•    Doc Values and Fielddata

13.1 metric度量
类似于sql 中的avg、max、min 等方法
13.1.1平均值,最大,最小,求和,唯一值

平均值:avg
最大:max
最小:min
唯一值: cardinality

http://10.17.139.66:9200/ys_test/_search?size=0
{
  "aggs": {
    "min_age": {
      "min": {
        "field": "age"
       }
     }
  }
}

13.2 桶Buckets
Buckets相当于SQL中的分组group by

{ 
  "aggs" : { 
    "sexs" : {   // 这次聚合的名字,没有意义
      "terms" : { 
        "field" : "sex" 
       } 
    } 
  } 
}

3.3 统计一个索引下，各类型的文档总数

curl -XGET '10.17.139.66:9200/relation/_search?pretty' -H 'Content-Type: application/json' -d'
   {
       "size" : 0,
         "aggs": {
           "type_count": { 
             "terms": {
               "field": "_type",
                "size": 0
             }
          }
     }
}'

注:

第一个size是为了不显示匹配中的文档(太多的文档数据, 不利于人观看返回结果) 第二个size是为了让每个聚合的类型都返回结果并显示,而不是在"sum_other_doc_count"字段中显示.在relation索引的统计过程中由于relation类型比较多,就发现个别类型不显示在聚合结果里,而是在sum_other_doc_count字段中了。

五:总结

es中的API按照大类分为下面几种：

•    文档API: 提供对文档的增删改查操作
•    搜索API: 提供对文档进行某个字段的查询
•    索引API: 提供对索引进行操作
•    查看API: 按照更直观的形式返回数据，更适用于控制台请求展示
•    集群API: 对集群进行查看和操作的API

文档API

•    Index API: 创建并建立索引
•    Get API: 获取文档
•    DELETE API: 删除文档
•    UPDATE API: 更新文档
•    Multi Get API: 一次批量获取文档
•    Bulk API: 批量操作，批量操作中可以执行增删改查
•    DELETE By Query API: 根据查询删除
•    Term Vectors: 词组分析，只能针对一个文档
•    Multi termvectors API: 多个文档的词组分析
•    multiGet的时候内部的行为是将一个请求分为多个，到不同的node中进行请求，再将结果合并起来。

如果某个node的请求查询失败了，那么这个请求仍然会返回数据，只是返回的数据只有请求成功的节点的查询数据集合。
词组分析的功能能查出比如某个文档中的某个字段被索引分词的情况。
对应的接口说明和例子
搜索API

•    基本搜索接口:     搜索的条件在url中
•    DSL搜索接口:      搜索的条件在请求的body中
•    搜索模版设置接口:  可以设置搜索的模版，模版的功能是可以根据不同的传入参数，进行不同的实际搜索
•    搜索分片查询接口:  查询这个搜索会使用到哪个索引和分片
•    Suggest接口:     搜索建议接口，输入一个词，根据某个字段，返回搜索建议。
•    批量搜索接口:     把批量请求放在一个文件中，批量搜索接口读取这个文件，进行搜索查询
•    Count接口:       只返回符合搜索的文档个数
•    文档存在接口:     判断是否有符合搜索的文档存在
•    验证接口:         判断某个搜索请求是否合法，不合法返回错误信息
•    解释接口:         使用这个接口能返回某个文档是否符合某个查询，为什么符合等信息
•    抽出器接口:       简单来说，可以用这个接口指定某个文档符合某个搜索，事先未文档建立对应搜索

对应的接口说明和例子
索引API

•    创建索引接口(POST my_index)
•    删除索引接口(DELETE my_index)
•    获取索引信息接口(GET my_index)
•    索引是否存在接口(HEAD my_index)
•    打开/关闭索引接口(my_index/_close, my_index/_open)
•    设置索引映射接口(PUT my_index/_mapping)
•    获取索引映射接口(GET my_index/_mapping)
•    获取字段映射接口(GET my_index/_mapping/field/my_field)
•    类型是否存在接口(HEAD my_index/my_type)
•    删除映射接口(DELTE my_index/_mapping/my_type)
•    索引别名接口(_aliases)
•    更新索引设置接口(PUT my_index/_settings)
•    获取索引设置接口(GET my_index/_settings)
•    分析接口(_analyze): 分析某个字段是如何建立索引的
•    建立索引模版接口(_template): 为索引建立模版，以后新创建的索引都可以按照这个模版进行初始化
•    预热接口(_warmer): 某些查询可以事先预热，这样预热后的数据存放在内存中，增加后续查询效率
•    状态接口(_status): 索引状态
•    批量索引状态接口(_stats): 批量查询索引状态
•    分片信息接口(_segments): 提供分片信息级别的信息
•    索引恢复接口(_recovery): 进行索引恢复操作
•    清除缓存接口(_cache/clear): 清除所有的缓存
•    输出接口(_flush)
•    刷新接口(_refresh)
•    优化接口(_optimize): 对索引进行优化
•    升级接口(_upgrade): 这里的升级指的是把索引升级到lucence的最新格式

对应的接口说明和例子
查看API

•    查看别名接口(_cat/aliases): 查看索引别名
•    查看分配资源接口(_cat/allocation)
•    查看文档个数接口(_cat/count)
•    查看字段分配情况接口(_cat/fielddata)
•    查看健康状态接口(_cat/health)
•    查看索引信息接口(_cat/indices)
•    查看master信息接口(_cat/master)
•    查看nodes信息接口(_cat/nodes)
•    查看正在挂起的任务接口(_cat/pending_tasks)
•    查看插件接口(_cat/plugins)
•    查看修复状态接口(_cat/recovery)
•    查看线城池接口(_cat/thread_pool)
•    查看分片信息接口(_cat/shards)
•    查看lucence的段信息接口(_cat/segments)

对应的接口说明和例子
集群API

•    查看集群健康状态接口(_cluster/health)
•    查看集群状况接口(_cluster/state)
•    查看集群统计信息接口(_cluster/stats)
•    查看集群挂起的任务接口(_cluster/pending_tasks)
•    集群重新路由操作(_cluster/reroute)
•    更新集群设置(_cluster/settings)
•    节点状态(_nodes/stats)
•    节点信息(_nodes)
•    节点的热线程(_nodes/hot_threads)
•    关闭节点(\nodes/_master/_shutdown)

对应的接口说明和例子
附录: