概览
URI Search
Request Body Search
Query String & Simple Query String
Mapping
- Dynamic Mapping
- 显式定义Mapping
设置 index 为 false
设定Null_value
设置 Copy to
- Index Template & Dynamic Template
  - Index Template
Create a default template
查看template信息
- 聚合分析

概览

主要分两种
- URI Search
示例: curl -XGET "http://localhost:9200/test_index/_search?q=cusomer"
- Request Body Search
示例: curl -XGET "http://localhost:9200/test_index/_search" -H 'Content-Type: application/json' -d
'{
    "query":{
        "match_all":{}
    }
}'

URI Search

示例
GET /movies/_search?q=2012&df=title&sort=year:desc&from=0&size=10&timeout=1s
{
    "profile":true
}
- q 指定查询语句, 使用Query String Syntax
- df 默认字段, 不指定会对所有字段进行查询, q=2012&df=title 等价于 q=title:2012
- sort 排序, from size 分页
- profile 可以查看查询是如何被执行的
具体不同查询类型操作, 略

Request Body Search

示例
POST /movies/_search
{
  "script_fileds":{ // 脚本字段
    "new_filed":{
        "script":{
          "lang":"painless",
        "source":"doc['order_date'].value+'hello'"
      }
    }
  },
  "_source":["order_date","category.keyword"], // 指定返回的字段
  "sort":[{"order_date":"desc"}],
    "from":10,
  "size":20,
  "query":{
      "match_all":{}
  }
}
- 使用查询表达式
POST movies/_search
{
  "query": {
    "match": {
      "title": {
        "query": "last christmas",
        "operator": "and"
      }
    }
  }
}
POST movies/_search
{
  "query": {
    "match_phrase": {
      "title":{
        "query": "one love",
        "slop": 1 
      }
    }
  }
}

Query String & Simple Query String

简单来说后者禁止了一些高级查询并忽略了一些语法错误

Mapping

Mapping类似数据库的schema定义, 作用:

定义索引中字段名称
定义字段数据类型
定义字段的倒排索引的相关配置, 如analyzed, analyzer

字段数据类型:

简单类型
- Text/Keyword
- Date
- Integer/Floating
- Boolean
- IPv4/IPv6
复杂类型-对象和嵌套对象
特殊类型
- geo_point&geo_shape/percolator

Dynamic Mapping

写入文档时, 如果索引不存在, 则会自动创建索引, 此时es会根据文档信息, 推断出字段类型

示例:
# dynamic mapping，推断字段的类型
PUT mapping_test/_doc/1
{
    "uid" : "123",
    "isVip" : false,
    "isAdmin": "true",
    "age":19,
    "heigh":180
}
# 查看 Dynamic
GET mapping_test/_mapping
从下面结果可以看到uid,isAdmin只会被识别为字符串, 被设置为Text, 并添加了keyword子字段
{
  "mapping_test" : {
    "mappings" : {
      "properties" : {
        "age" : {
          "type" : "long"
        },
        "heigh" : {
          "type" : "long"
        },
        "isAdmin" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "isVip" : {
          "type" : "boolean"
        },
        "uid" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

Dynamic Mapping配置

#修改为dynamic false
PUT dynamic_mapping_test/_mapping
{
  "dynamic": "false"
}

显式定义Mapping

示例:
PUT movies
{
    "mappings":{...}
}
初建时可以基于dynamic mapping进行修改

常见参数:

index, 控制当前字段是否被索引, 默认为true
index options, 控制倒排索引记录的内容, text默认position, 其他默认docs
- docs, 记录doc id
- freqs, 记录doc id/term frequencies
- positions. 记录doc id/term frequencies/term position
- offsets, 记录doc id/term frequencies/term position/character offects
null_value, 需要对Null值实现搜索, 只有keyword类型支持设定null_value

copy_to, 字段拷贝 ```json

设置 index 为 false

PUT users { “mappings” : {

"properties" : {
  "firstName" : {
    "type" : "text"
  },
  "lastName" : {
    "type" : "text"
  },
  "mobile" : {
    "type" : "text",
    "index": false
  }
}

} }

设定Null_value

PUT users { “mappings” : { “properties” : { “firstName” : { “type” : “text” }, “lastName” : { “type” : “text” }, “mobile” : { “type” : “keyword”, “null_value”: “NULL” }

}
}

}

设置 Copy to

PUT users { “mappings”: { “properties”: { “firstName”:{ “type”: “text”, “copy_to”: “fullName” }, “lastName”:{ “type”: “text”, “copy_to”: “fullName” } } } }

PUT users/_doc/1 { “firstName”:”Ruan”, “lastName”: “Yiming” }

GET users/_search?q=fullName:(Ruan Yiming)

POST users/_search { “query”: { “match”: { “fullName”:{ “query”: “Ruan Yiming”, “operator”: “and” } } } }


<a name="NjWSn"></a>
### 多字段特性
- mapping中的keyword指的是不需要分词的精确值, text是会被分词的.
- text类型字段自动生成mapping时, 会自动添加一个keyword子字段供精确检索
- 此外, 还可以添加自定义子字段, 采用自定义分词器, 以满足不同条件下的搜索
自定义分词补充
1. Character Filters, 提前进行文本处理, 如增加删除以及替换字符, 可配置多个
自带的Character Filters:
- HTML strip, 去除html标签
- Mapping, 字符串替换
- Pattern replace, 正则匹配替换
2. Tokenizer, 将原始文本按照一定的规则, 切分为词
es内置的Tokenizers:<br />whitespace/standard/uax_url_email/pattern/keyword/path hierarchy
3. Token Filters, 将Tokenizer输出的单词进行增加修改删除
自带的Token Filters:<br />Lowercase/stop/synonym(添加近义词)
```json
PUT logs/_doc/1
{"level":"DEBUG"}
GET /logs/_mapping
POST _analyze
{
  "tokenizer":"keyword",
  "char_filter":["html_strip"],
  "text": "<b>hello world</b>"
}
POST _analyze
{
  "tokenizer":"path_hierarchy",
  "text":"/user/ymruan/a/b/c/d/e"
}
#使用char filter进行替换
POST _analyze
{
  "tokenizer": "standard",
  "char_filter": [
      {
        "type" : "mapping",
        "mappings" : [ "- => _"]
      }
    ],
  "text": "123-456, I-test! test-990 650-555-1234"
}
//char filter 替换表情符号
POST _analyze
{
  "tokenizer": "standard",
  "char_filter": [
      {
        "type" : "mapping",
        "mappings" : [ ":) => happy", ":( => sad"]
      }
    ],
    "text": ["I am felling :)", "Feeling :( today"]
}
// white space and snowball
GET _analyze
{
  "tokenizer": "whitespace",
  "filter": ["stop","snowball"],
  "text": ["The gilrs in China are playing this game!"]
}
// whitespace与stop
GET _analyze
{
  "tokenizer": "whitespace",
  "filter": ["stop","snowball"],
  "text": ["The rain in Spain falls mainly on the plain."]
}
//remove 加入lowercase后，The被当成 stopword删除
GET _analyze
{
  "tokenizer": "whitespace",
  "filter": ["lowercase","stop","snowball"],
  "text": ["The gilrs in China are playing this game!"]
}
//正则表达式
GET _analyze
{
  "tokenizer": "standard",
  "char_filter": [
      {
        "type" : "pattern_replace",
        "pattern" : "http://(.*)",
        "replacement" : "$1"
      }
    ],
    "text" : "http://www.elastic.co"
}

Index Template & Dynamic Template

Index Template

索引模板, 可以设定一个模板固化mapping和setting, 并按照一定规则, 自动匹配到新创建的索引上

模板仅在索引新建时有用, 修改模板不会影响已有的索引
可以设定多个索引模板, 指定order, 多个设置会按规则merge
优先级为: 用户指定 > order高的Index Template > order底的index Template ```json
Create a default template
PUT _template/template_default { “index_patterns”: [“*”], “order” : 0, “version”: 1, “settings”: { “number_of_shards”: 1, “number_of_replicas”:1 } }

PUT /_template/template_test { “index_patterns” : [“test*”], “order” : 1, “settings” : { “number_of_shards”: 1, “number_of_replicas” : 2 }, “mappings” : { “date_detection”: false, “numeric_detection”: true } }

查看template信息

GET /_template/template_default GET /_template/temp*

DELETE /_template/template_default DELETE /_template/template_test


<a name="xfObH"></a>
### Dynamic Template
应用在某一个具体的索引上, 可以自定义一些字段类型推断的规则, 如:
- is开头的字段都设置成boolean
```json
#Dynaminc Mapping 根据类型和字段名
DELETE my_index
PUT my_index/_doc/1
{
  "firstName":"Ruan",
  "isVIP":"true"
}
GET my_index/_mapping
DELETE my_index
#示例, 字符串设置成boolean, 字符串设置成keyword
PUT my_index
{
  "mappings": {
    "dynamic_templates": [
            {
        "strings_as_boolean": {
          "match_mapping_type":   "string",
          "match":"is*",
          "mapping": {
            "type": "boolean"
          }
        }
      },
      {
        "strings_as_keywords": {
          "match_mapping_type":   "string",
          "mapping": {
            "type": "keyword"
          }
        }
      }
    ]
  }
}
DELETE my_index
#示例, 结合路径, 组合姓名
PUT my_index
{
  "mappings": {
    "dynamic_templates": [
      {
        "full_name": {
          "path_match":   "name.*",
          "path_unmatch": "*.middle",
          "mapping": {
            "type":       "text",
            "copy_to":    "full_name"
          }
        }
      }
    ]
  }
}
PUT my_index/_doc/1
{
  "name": {
    "first":  "John",
    "middle": "Winston",
    "last":   "Lennon"
  }
}
GET my_index/_search?q=full_name:John

聚合分析

es的聚合(aggregation)是对数据进行统计分析的功能

Elasticsearch聚合分析简介
课程Demo
需要通过Kibana导入Sample Data的飞机航班数据。具体参考“2.2节-Kibana的安装与界面快速浏览”
#按照目的地进行分桶统计
GET kibana_sample_data_flights/_search
{
    "size": 0,
    "aggs":{
        "flight_dest":{
            "terms":{
                "field":"DestCountry"
            }
        }
    }
}
#查看航班目的地的统计信息，增加平均，最高最低价格
GET kibana_sample_data_flights/_search
{
    "size": 0,
    "aggs":{
        "flight_dest":{
            "terms":{
                "field":"DestCountry"
            },
            "aggs":{
                "avg_price":{
                    "avg":{
                        "field":"AvgTicketPrice"
                    }
                },
                "max_price":{
                    "max":{
                        "field":"AvgTicketPrice"
                    }
                },
                "min_price":{
                    "min":{
                        "field":"AvgTicketPrice"
                    }
                }
            }
        }
    }
}
#价格统计信息+天气信息
GET kibana_sample_data_flights/_search
{
    "size": 0,
    "aggs":{
        "flight_dest":{
            "terms":{
                "field":"DestCountry"
            },
            "aggs":{
                "stats_price":{
                    "stats":{
                        "field":"AvgTicketPrice"
                    }
                },
                "wather":{
                  "terms": {
                    "field": "DestWeather",
                    "size": 5
                  }
                }
            }
        }
    }
}
相关阅读
https://www.elastic.co/guide/en/elasticsearch/reference/7.1/search-aggregations.html

信长笔记

搜索入门

概览