本节参考 官方文档 检索示例:

导入样本测试数据

准备一份顾客银行账户信息的虚构的JSON文档样本。每个文档都有下列的 schema(模式)。

  1. {
  2. "account_number": 1,
  3. "balance": 39225,
  4. "firstname": "Amber",
  5. "lastname": "Duke",
  6. "age": 32,
  7. "gender": "M",
  8. "address": "880 Holmes Lane",
  9. "employer": "Pyrami",
  10. "email": "amberduke@pyrami.com",
  11. "city": "Brogan",
  12. "state": "IL"
  13. }

https://raw.githubusercontent.com/elastic/elasticsearch/master/docs/src/test/resources/accounts.json
访问失败可请求:镜像地址 。导入测试数据。
POST bank/account/_bulk
image.png

检索示例介绍

下面的请求都是在Kibana dev-tools 操作

请求接口

  1. GET /bank/_search
  2. {
  3. "query": {
  4. "match_all": {}
  5. },
  6. "sort": [
  7. {
  8. "account_number": "asc"
  9. }
  10. ]
  11. }
  12. # query 查询条件
  13. # sort 排序条件

结果

  1. {
  2. "took" : 7,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 1,
  6. "successful" : 1,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 1000,
  13. "relation" : "eq"
  14. },
  15. "max_score" : null,
  16. "hits" : [
  17. {
  18. "_index" : "bank",
  19. "_type" : "account",
  20. "_id" : "0",
  21. "_score" : null,
  22. "_source" : {
  23. "account_number" : 0,
  24. "balance" : 16623,
  25. "firstname" : "Bradshaw",
  26. "lastname" : "Mckenzie",
  27. "age" : 29,
  28. "gender" : "F",
  29. "address" : "244 Columbus Place",
  30. "employer" : "Euron",
  31. "email" : "bradshawmckenzie@euron.com",
  32. "city" : "Hobucken",
  33. "state" : "CO"
  34. },
  35. "sort" : [
  36. 0
  37. ]
  38. },
  39. ...
  40. ]
  41. }
  42. }

响应字段解释

  • took – how long it took Elasticsearch to run the query, in milliseconds
  • timed_out – whether or not the search request timed out
  • _shards – how many shards were searched and a breakdown of how many shards succeeded, failed, or were skipped.
  • max_score – the score of the most relevant document found
  • hits.total.value - how many matching documents were found
  • hits.sort - the document’s sort position (when not sorting by relevance score)
  • hits._score - the document’s relevance score (not applicable when using match_all)

    响应结果说明

    Elasticsearch 默认会分页返回10条数据,不会一下返回所有数据。

    请求方式说明

    ES支持两种基本方式检索;

  • 通过REST request uri 发送搜索参数 (uri +检索参数);

  • 通过REST request body 来发送它们(uri+请求体);

也就是说除了上面示例的请求接口,根据请求体进行检索外;
还可以用GET请求参数的方式检索:

  1. GET bank/_search?q=*&sort=account_number:asc
  2. # q=* 查询所有
  3. # sort=account_number:asc 按照account_number进行升序排列

Query DSL

本小节参考官方文档:Query DSL
Elasticsearch提供了一个可以执行查询的Json风格的DSL。这个被称为Query DSL,该查询语言非常全面。

1. 基本语法格式

一个查询语句的典型结构:

  1. QUERY_NAME:{
  2. ARGUMENT:VALUE,
  3. ARGUMENT:VALUE,...
  4. }

如果针对于某个字段,那么它的结构如下:

  1. {
  2. QUERY_NAME:{
  3. FIELD_NAME:{
  4. ARGUMENT:VALUE,
  5. ARGUMENT:VALUE,...
  6. }
  7. }
  8. }

请求示例:

  1. GET bank/_search
  2. {
  3. "query": {
  4. "match_all": {}
  5. },
  6. "from": 0,
  7. "size": 5,
  8. "sort": [
  9. {
  10. "account_number": {
  11. "order": "desc"
  12. },
  13. "balance": {
  14. "order": "asc"
  15. }
  16. }
  17. ]
  18. }
  19. # match_all 查询类型【代表查询所有的所有】,es中可以在query中组合非常多的查询类型完成复杂查询;
  20. # from+size 限定,完成分页功能;从第几条数据开始,每页有多少数据
  21. # sort 排序,多字段排序,会在前序字段相等时后续字段内部排序,否则以前序为准;

2. 返回部分字段

请求示例:

  1. GET bank/_search
  2. {
  3. "query": {
  4. "match_all": {}
  5. },
  6. "from": 0,
  7. "size": 5,
  8. "sort": [
  9. {
  10. "account_number": {
  11. "order": "desc"
  12. }
  13. }
  14. ],
  15. "_source": ["balance","firstname"]
  16. }
  17. # _source 指定返回结果中包含的字段名

结果示例:

  1. {
  2. "took" : 2,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 1,
  6. "successful" : 1,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 1000,
  13. "relation" : "eq"
  14. },
  15. "max_score" : null,
  16. "hits" : [
  17. {
  18. "_index" : "bank",
  19. "_type" : "account",
  20. "_id" : "999",
  21. "_score" : null,
  22. "_source" : {
  23. "firstname" : "Dorothy",
  24. "balance" : 6087
  25. },
  26. "sort" : [
  27. 999
  28. ]
  29. },
  30. ...
  31. ]
  32. }
  33. }

3. match-匹配查询

精确查询-基本数据类型(非文本)

  1. GET bank/_search
  2. {
  3. "query": {
  4. "match": {
  5. "account_number": 20
  6. }
  7. }
  8. }
  9. # 查找匹配 account_number 20 的数据 非文本推荐使用 term

模糊查询-文本字符串

  1. GET bank/_search
  2. {
  3. "query": {
  4. "match": {
  5. "address": "mill lane"
  6. }
  7. }
  8. }
  9. # 查找匹配 address 包含 mill lane 的数据

match即全文检索,对检索字段进行分词匹配,会按照响应的评分 _score 排序,原理是倒排索引。

精确匹配-文本字符串

  1. GET bank/_search
  2. {
  3. "query": {
  4. "match": {
  5. "address.keyword": "288 Mill Street"
  6. }
  7. }
  8. }
  9. # 查找 address 288 Mill Street 的数据。
  10. # 这里的查找是精确查找,只有完全匹配时才会查找出存在的记录,
  11. # 如果想模糊查询应该使用match_phrase 短语匹配

4. match_phrase-短语匹配

将需要匹配的值当成一整个单词(不分词)进行检索

  1. GET bank/_search
  2. {
  3. "query": {
  4. "match_phrase": {
  5. "address": "mill lane"
  6. }
  7. }
  8. }
  9. # 这里会检索 address 匹配包含短语 mill lane 的数据

5. multi_math-多字段匹配

  1. GET bank/_search
  2. {
  3. "query": {
  4. "multi_match": {
  5. "query": "mill",
  6. "fields": [
  7. "city",
  8. "address"
  9. ]
  10. }
  11. }
  12. }
  13. # 检索 city address 匹配包含 mill 的数据,会对查询条件分词

6. bool-复合查询

复合语句可以合并,任何其他查询语句,包括符合语句。这也就意味着,复合语句之间
可以互相嵌套,可以表达非常复杂的逻辑。

  • must:必须达到must所列举的所有条件
  • must_not,必须不匹配must_not所列举的所有条件。
  • should,应该满足should所列举的条件。
    1. GET bank/_search
    2. {
    3. "query": {
    4. "bool": {
    5. "must": [
    6. {
    7. "match": {
    8. "gender": "M"
    9. }
    10. },
    11. {
    12. "match": {
    13. "address": "mill"
    14. }
    15. }
    16. ]
    17. }
    18. }
    19. }
    20. # 查询 gender M address 包含 mill 的数据

    7. filter-结果过滤

    并不是所有的查询都需要产生分数,特别是哪些仅用于filtering过滤的文档。为了不计算分数,elasticsearch会自动检查场景并且优化查询的执行。
    filter 对结果进行过滤,且不计算相关性得分。
    1. GET bank/_search
    2. {
    3. "query": {
    4. "bool": {
    5. "must": [
    6. {
    7. "match": {
    8. "address": "mill"
    9. }
    10. }
    11. ],
    12. "filter": {
    13. "range": {
    14. "balance": {
    15. "gte": "10000",
    16. "lte": "20000"
    17. }
    18. }
    19. }
    20. }
    21. }
    22. }
    23. # 这里先是查询所有匹配 address 包含 mill 的文档,
    24. # 然后再根据 10000<=balance<=20000 进行过滤查询结果

    Each must, should, and must_not element in a Boolean query is referred to as a query clause. How well a document meets the criteria in each must or should clause contributes to the document’s relevance score. The higher the score, the better the document matches your search criteria. By default, Elasticsearch returns documents ranked by these relevance scores. 在boolean查询中,must, shouldmust_not 元素都被称为查询子句 。 文档是否符合每个“must”或“should”子句中的标准,决定了文档的“相关性得分”。 得分越高,文档越符合您的搜索条件。 默认情况下,Elasticsearch 返回根据这些相关性得分排序的文档。

    The criteria in a must_not clause is treated as a filter. It affects whether or not the document is included in the results, but does not contribute to how documents are scored. You can also explicitly specify arbitrary filters to include or exclude documents based on structured data. “must_not”子句中的条件被视为“过滤器”。 它影响文档是否包含在结果中,但不影响文档的评分方式。还可以显式地指定任意过滤器来包含或排除基于结构化数据的文档。

8. term-精确检索

Avoid using the term query for [text](https://www.elastic.co/guide/en/elasticsearch/reference/7.11/text.html) fields. 避免使用 term 查询文本字段 By default, Elasticsearch changes the values of text fields as part of analysis. This can make finding exact matches for text field values difficult. 默认情况下,Elasticsearch 会通过analysis分词将文本字段的值拆分为一部分,这使精确匹配文本字段的值变得困难。 To search text field values, use the [match](https://www.elastic.co/guide/en/elasticsearch/reference/7.11/query-dsl-match-query.html) query instead. 如果要查询文本字段值,请使用 match 查询代替。

https://www.elastic.co/guide/en/elasticsearch/reference/7.11/query-dsl-term-query.html

在上面3.match-匹配查询中有介绍对于非文本字段的精确查询,Elasticsearch 官方对于这种非文本字段,使用 term来精确检索是一个推荐的选择。

  1. GET bank/_search
  2. {
  3. "query": {
  4. "term": {
  5. "age": "28"
  6. }
  7. }
  8. }
  9. # 查找 age 28 的数据

9. Aggregation-执行聚合

https://www.elastic.co/guide/en/elasticsearch/reference/7.11/search-aggregations.html

聚合语法

  1. GET /my-index-000001/_search
  2. {
  3. "aggs":{
  4. "aggs_name":{ # 这次聚合的名字,方便展示在结果集中
  5. "AGG_TYPE":{ # 聚合的类型(avg,term,terms)
  6. }
  7. }
  8. }
  9. }

示例1-搜索address中包含mill的所有人的年龄分布以及平均年龄

  1. GET bank/_search
  2. {
  3. "query": {
  4. "match": {
  5. "address": "Mill"
  6. }
  7. },
  8. "aggs": {
  9. "ageAgg": {
  10. "terms": {
  11. "field": "age",
  12. "size": 10
  13. }
  14. },
  15. "ageAvg": {
  16. "avg": {
  17. "field": "age"
  18. }
  19. },
  20. "balanceAvg": {
  21. "avg": {
  22. "field": "balance"
  23. }
  24. }
  25. },
  26. "size": 0
  27. }
  28. # "ageAgg": { --- 聚合名为 ageAgg
  29. # "terms": { --- 聚合类型为 term
  30. # "field": "age", --- 聚合字段为 age
  31. # "size": 10 --- 取聚合后前十个数据
  32. # }
  33. # },
  34. # ------------------------
  35. # "ageAvg": { --- 聚合名为 ageAvg
  36. # "avg": { --- 聚合类型为 avg 求平均值
  37. # "field": "age" --- 聚合字段为 age
  38. # }
  39. # },
  40. # ------------------------
  41. # "balanceAvg": { --- 聚合名为 balanceAvg
  42. # "avg": { --- 聚合类型为 avg 求平均值
  43. # "field": "balance" --- 聚合字段为 balance
  44. # }
  45. # }
  46. # ------------------------
  47. # "size": 0 --- 不显示命中结果,只看聚合信息

结果:

  1. {
  2. "took" : 10,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 1,
  6. "successful" : 1,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 4,
  13. "relation" : "eq"
  14. },
  15. "max_score" : null,
  16. "hits" : [ ]
  17. },
  18. "aggregations" : {
  19. "ageAgg" : {
  20. "doc_count_error_upper_bound" : 0,
  21. "sum_other_doc_count" : 0,
  22. "buckets" : [
  23. {
  24. "key" : 38,
  25. "doc_count" : 2
  26. },
  27. {
  28. "key" : 28,
  29. "doc_count" : 1
  30. },
  31. {
  32. "key" : 32,
  33. "doc_count" : 1
  34. }
  35. ]
  36. },
  37. "ageAvg" : {
  38. "value" : 34.0
  39. },
  40. "balanceAvg" : {
  41. "value" : 25208.0
  42. }
  43. }
  44. }

示例2-按照年龄聚合,并且求这些年龄段的这些人的平均薪资

  1. GET bank/_search
  2. {
  3. "query": {
  4. "match_all": {}
  5. },
  6. "aggs": {
  7. "ageAgg": {
  8. "terms": {
  9. "field": "age",
  10. "size": 100
  11. },
  12. "aggs": {
  13. "ageAvg": {
  14. "avg": {
  15. "field": "balance"
  16. }
  17. }
  18. }
  19. }
  20. },
  21. "size": 0
  22. }

结果:

  1. {
  2. "took" : 12,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 1,
  6. "successful" : 1,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 1000,
  13. "relation" : "eq"
  14. },
  15. "max_score" : null,
  16. "hits" : [ ]
  17. },
  18. "aggregations" : {
  19. "ageAgg" : {
  20. "doc_count_error_upper_bound" : 0,
  21. "sum_other_doc_count" : 0,
  22. "buckets" : [
  23. {
  24. "key" : 31,
  25. "doc_count" : 61,
  26. "ageAvg" : {
  27. "value" : 28312.918032786885
  28. }
  29. },
  30. {
  31. "key" : 39,
  32. "doc_count" : 60,
  33. "ageAvg" : {
  34. "value" : 25269.583333333332
  35. }
  36. },
  37. {
  38. "key" : 26,
  39. "doc_count" : 59,
  40. "ageAvg" : {
  41. "value" : 23194.813559322032
  42. }
  43. },
  44. {
  45. "key" : 32,
  46. "doc_count" : 52,
  47. "ageAvg" : {
  48. "value" : 23951.346153846152
  49. }
  50. },
  51. {
  52. "key" : 35,
  53. "doc_count" : 52,
  54. "ageAvg" : {
  55. "value" : 22136.69230769231
  56. }
  57. },
  58. {
  59. "key" : 36,
  60. "doc_count" : 52,
  61. "ageAvg" : {
  62. "value" : 22174.71153846154
  63. }
  64. },
  65. {
  66. "key" : 22,
  67. "doc_count" : 51,
  68. "ageAvg" : {
  69. "value" : 24731.07843137255
  70. }
  71. },
  72. {
  73. "key" : 28,
  74. "doc_count" : 51,
  75. "ageAvg" : {
  76. "value" : 28273.882352941175
  77. }
  78. },
  79. {
  80. "key" : 33,
  81. "doc_count" : 50,
  82. "ageAvg" : {
  83. "value" : 25093.94
  84. }
  85. },
  86. {
  87. "key" : 34,
  88. "doc_count" : 49,
  89. "ageAvg" : {
  90. "value" : 26809.95918367347
  91. }
  92. },
  93. {
  94. "key" : 30,
  95. "doc_count" : 47,
  96. "ageAvg" : {
  97. "value" : 22841.106382978724
  98. }
  99. },
  100. {
  101. "key" : 21,
  102. "doc_count" : 46,
  103. "ageAvg" : {
  104. "value" : 26981.434782608696
  105. }
  106. },
  107. {
  108. "key" : 40,
  109. "doc_count" : 45,
  110. "ageAvg" : {
  111. "value" : 27183.17777777778
  112. }
  113. },
  114. {
  115. "key" : 20,
  116. "doc_count" : 44,
  117. "ageAvg" : {
  118. "value" : 27741.227272727272
  119. }
  120. },
  121. {
  122. "key" : 23,
  123. "doc_count" : 42,
  124. "ageAvg" : {
  125. "value" : 27314.214285714286
  126. }
  127. },
  128. {
  129. "key" : 24,
  130. "doc_count" : 42,
  131. "ageAvg" : {
  132. "value" : 28519.04761904762
  133. }
  134. },
  135. {
  136. "key" : 25,
  137. "doc_count" : 42,
  138. "ageAvg" : {
  139. "value" : 27445.214285714286
  140. }
  141. },
  142. {
  143. "key" : 37,
  144. "doc_count" : 42,
  145. "ageAvg" : {
  146. "value" : 27022.261904761905
  147. }
  148. },
  149. {
  150. "key" : 27,
  151. "doc_count" : 39,
  152. "ageAvg" : {
  153. "value" : 21471.871794871793
  154. }
  155. },
  156. {
  157. "key" : 38,
  158. "doc_count" : 39,
  159. "ageAvg" : {
  160. "value" : 26187.17948717949
  161. }
  162. },
  163. {
  164. "key" : 29,
  165. "doc_count" : 35,
  166. "ageAvg" : {
  167. "value" : 29483.14285714286
  168. }
  169. }
  170. ]
  171. }
  172. }
  173. }

示例3-查出所有年龄分布,并且这些年龄段中M的平均薪资和F的平均薪资以及这个年龄段的总体平均薪资

  1. GET bank/_search
  2. {
  3. "query": {
  4. "match_all": {}
  5. },
  6. "aggs": {
  7. "ageAgg": {
  8. "terms": {
  9. "field": "age",
  10. "size": 100
  11. },
  12. "aggs": {
  13. "genderAgg": {
  14. "terms": {
  15. "field": "gender.keyword"
  16. },
  17. "aggs": {
  18. "balanceAvg": {
  19. "avg": {
  20. "field": "balance"
  21. }
  22. }
  23. }
  24. },
  25. "ageBalanceAvg": {
  26. "avg": {
  27. "field": "balance"
  28. }
  29. }
  30. }
  31. }
  32. },
  33. "size": 0
  34. }
  35. # "field": "gender.keyword" gendertxt没法聚合 必须加.keyword精确替代

结果:

  1. {
  2. "took" : 17,
  3. "timed_out" : false,
  4. "_shards" : {
  5. "total" : 1,
  6. "successful" : 1,
  7. "skipped" : 0,
  8. "failed" : 0
  9. },
  10. "hits" : {
  11. "total" : {
  12. "value" : 1000,
  13. "relation" : "eq"
  14. },
  15. "max_score" : null,
  16. "hits" : [ ]
  17. },
  18. "aggregations" : {
  19. "ageAgg" : {
  20. "doc_count_error_upper_bound" : 0,
  21. "sum_other_doc_count" : 0,
  22. "buckets" : [
  23. {
  24. "key" : 31,
  25. "doc_count" : 61,
  26. "genderAgg" : {
  27. "doc_count_error_upper_bound" : 0,
  28. "sum_other_doc_count" : 0,
  29. "buckets" : [
  30. {
  31. "key" : "M",
  32. "doc_count" : 35,
  33. "balanceAvg" : {
  34. "value" : 29565.628571428573
  35. }
  36. },
  37. {
  38. "key" : "F",
  39. "doc_count" : 26,
  40. "balanceAvg" : {
  41. "value" : 26626.576923076922
  42. }
  43. }
  44. ]
  45. },
  46. "ageBalanceAvg" : {
  47. "value" : 28312.918032786885
  48. }
  49. },
  50. {
  51. "key" : 39,
  52. "doc_count" : 60,
  53. "genderAgg" : {
  54. "doc_count_error_upper_bound" : 0,
  55. "sum_other_doc_count" : 0,
  56. "buckets" : [
  57. {
  58. "key" : "F",
  59. "doc_count" : 38,
  60. "balanceAvg" : {
  61. "value" : 26348.684210526317
  62. }
  63. },
  64. {
  65. "key" : "M",
  66. "doc_count" : 22,
  67. "balanceAvg" : {
  68. "value" : 23405.68181818182
  69. }
  70. }
  71. ]
  72. },
  73. "ageBalanceAvg" : {
  74. "value" : 25269.583333333332
  75. }
  76. },
  77. ...
  78. ]
  79. }
  80. }
  81. }