一、聚合分析简介

1. ES聚合分析是什么?

聚合分析是数据库中重要的功能特性,完成对一个查询的数据集中数据的聚合计算,如:找出某字段(或计算表达式的结果)的最大值、最小值,计算和、平均值等。ES作为搜索引擎兼数据库,同样提供了强大的聚合分析能力。
对一个数据集求最大、最小、和、平均值等指标的聚合,在ES中称为指标聚合 metric
而关系型数据库中除了有聚合函数外,还可以对查询出的数据进行分组group by,再在组上进行指标聚合。在 ES 中group by 称为分桶桶聚合 bucketing
ES中还提供了矩阵聚合(matrix)、管道聚合(pipleline),但还在完善中。

2. ES聚合分析查询的写法

在查询请求体中以aggregations节点按如下语法定义聚合分析:

  1. "aggregations" : {
  2. "<aggregation_name>" : { <!--聚合的名字 -->
  3. "<aggregation_type>" : { <!--聚合的类型 -->
  4. <aggregation_body> <!--聚合体:对哪些字段进行聚合 -->
  5. }
  6. [,"meta" : { [<meta_data_body>] } ]? <!--元 -->
  7. [,"aggregations" : { [<sub_aggregation>]+ } ]? <!--在聚合里面在定义子聚合 -->
  8. }
  9. [,"<aggregation_name_2>" : { ... } ]*<!--聚合的名字 -->
  10. }

说明:
aggregations 也可简写为 aggs

3. 聚合分析的值来源

聚合计算的值可以取字段的值,也可是脚本计算的结果

二、指标聚合

1. max min sum avg

示例1:查询所有记录中年龄的最大值

  1. POST /book1/_search?pretty
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "maxage": {
  6. "max": {
  7. "field": "age"
  8. }
  9. }
  10. }
  11. }

结果1:

  1. {
  2. "took": 4,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "maxage": {
  17. "value": 54
  18. }
  19. }
  20. }

示例2:加上查询条件,查询名字包含’test’的年龄最大值:

  1. POST /book1/_search?pretty
  2. {
  3. "query":{
  4. "term":{
  5. "name":"test"
  6. }
  7. },
  8. "size": 2,
  9. "sort": [
  10. {
  11. "age": {
  12. "order": "desc"
  13. }
  14. }
  15. ],
  16. "aggs": {
  17. "maxage": {
  18. "max": {
  19. "field": "age"
  20. }
  21. }
  22. }
  23. }

结果2:

  1. {
  2. "took": 3,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 5,
  12. "max_score": null,
  13. "hits": [
  14. {
  15. "_index": "book1",
  16. "_type": "english",
  17. "_id": "6IUkUmUBRzBxBrDgFok2",
  18. "_score": null,
  19. "_source": {
  20. "name": "test goog my money",
  21. "age": [
  22. 14,
  23. 54,
  24. 45,
  25. 34
  26. ],
  27. "class": "dsfdsf",
  28. "addr": "中国"
  29. },
  30. "sort": [
  31. 54
  32. ]
  33. },
  34. {
  35. "_index": "book1",
  36. "_type": "english",
  37. "_id": "54UiUmUBRzBxBrDgfIl9",
  38. "_score": null,
  39. "_source": {
  40. "name": "test goog my money",
  41. "age": [
  42. 11,
  43. 13,
  44. 14
  45. ],
  46. "class": "dsfdsf",
  47. "addr": "中国"
  48. },
  49. "sort": [
  50. 14
  51. ]
  52. }
  53. ]
  54. },
  55. "aggregations": {
  56. "maxage": {
  57. "value": 54
  58. }
  59. }
  60. }

示例3:值来源于脚本,查询所有记录的平均年龄是多少,并对平均年龄加10

  1. POST /book1/_search?pretty
  2. {
  3. "size":0,
  4. "aggs": {
  5. "avg_age": {
  6. "avg": {
  7. "script": {
  8. "source": "doc.age.value"
  9. }
  10. }
  11. },
  12. "avg_age10": {
  13. "avg": {
  14. "script": {
  15. "source": "doc.age.value + 10"
  16. }
  17. }
  18. }
  19. }
  20. }

结果3:

  1. {
  2. "took": 3,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "avg_age": {
  17. "value": 7.585365853658536
  18. },
  19. "avg_age10": {
  20. "value": 17.585365853658537
  21. }
  22. }
  23. }

示例4:指定field,在脚本中用_value 取字段的值

  1. POST /book1/_search?pretty
  2. {
  3. "size":0,
  4. "aggs": {
  5. "sun_age": {
  6. "sum": {
  7. "field":"age",
  8. "script": {
  9. "source": "_value * 2"
  10. }
  11. }
  12. }
  13. }
  14. }

结果4:

  1. {
  2. "took": 4,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "sun_age": {
  17. "value": 942
  18. }
  19. }
  20. }

示例5:为没有值字段指定值。如未指定,缺失该字段值的文档将被忽略:

  1. POST /book1/_search?pretty
  2. {
  3. "size":0,
  4. "aggs": {
  5. "sun_age": {
  6. "avg": {
  7. "field":"age",
  8. "missing":15
  9. }
  10. }
  11. }
  12. }

结果5:

  1. {
  2. "took": 12,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "sun_age": {
  17. "value": 12.847826086956522
  18. }
  19. }
  20. }

2. 文档计数 count

示例1:统计银行索引book下年龄为12的文档数量

  1. POST book1/english/_count
  2. {
  3. "query":{
  4. "match":{
  5. "age":12
  6. }
  7. }
  8. }

结果1:

  1. {
  2. "count": 16,
  3. "_shards": {
  4. "total": 5,
  5. "successful": 5,
  6. "skipped": 0,
  7. "failed": 0
  8. }
  9. }

3. Value count 统计某字段有值的文档数

示例1:

  1. POST /book1/_search?size=0
  2. {
  3. "aggs":{
  4. "age_count":{
  5. "value_count":{
  6. "field":"age"
  7. }
  8. }
  9. }
  10. }

结果1:

  1. {
  2. "took": 1,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_count": {
  17. "value": 38
  18. }
  19. }
  20. }

4. cardinality 值去重计数

示例1:

  1. POST /book1/_search?size=0
  2. {
  3. "aggs":{
  4. "age_count":{
  5. "value_count":{
  6. "field":"age"
  7. }
  8. },
  9. "name_count":{
  10. "cardinality":{
  11. "field":"age"
  12. }
  13. }
  14. }
  15. }

结果1:

  1. {
  2. "took": 16,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "name_count": {
  17. "value": 11
  18. },
  19. "age_count": {
  20. "value": 38
  21. }
  22. }
  23. }

说明:有值的38个,去掉重复的之后以一共有11个。

5. stats 统计 count max min avg sum 5个值

示例1:

  1. POST /book1/_search?size=0
  2. {
  3. "aggs":{
  4. "age_count":{
  5. "stats":{
  6. "field":"age"
  7. }
  8. }
  9. }
  10. }

结果1:

  1. {
  2. "took": 12,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_count": {
  17. "count": 38,
  18. "min": 1,
  19. "max": 54,
  20. "avg": 12.394736842105264,
  21. "sum": 471
  22. }
  23. }
  24. }

6. Extended stats

高级统计,比stats多4个统计结果: 平方和、方差、标准差、平均值加/减两个标准差的区间。

示例1:

  1. POST /book1/_search?size=0
  2. {
  3. "aggs":{
  4. "age_stats":{
  5. "extended_stats":{
  6. "field":"age"
  7. }
  8. }
  9. }
  10. }

结果1:

  1. {
  2. "took": 8,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_stats": {
  17. "count": 38,
  18. "min": 1,
  19. "max": 54,
  20. "avg": 12.394736842105264,
  21. "sum": 471,
  22. "sum_of_squares": 11049,
  23. "variance": 137.13365650969527,
  24. "std_deviation": 11.710408041981085,
  25. "std_deviation_bounds": {
  26. "upper": 35.81555292606743,
  27. "lower": -11.026079241856905
  28. }
  29. }
  30. }
  31. }

7. Percentiles 占比百分位对应的值统计

示例1:

对指定字段(脚本)的值按从小到大累计每个值对应的文档数的占比(占所有命中文档数的百分比),返回指定占比比例对应的值。默认返回[ 1, 5, 25, 50, 75, 95, 99 ]分位上的值。如下中间的结果,可以理解为:占比为50%的文档的age值 <= 12,或反过来:age<=12的文档数占总命中文档数的50%。

  1. POST /book1/_search?size=0
  2. {
  3. "aggs":{
  4. "age_percentiles":{
  5. "percentiles":{
  6. "field":"age"
  7. }
  8. }
  9. }
  10. }

结果1:

  1. {
  2. "took": 16,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_percentiles": {
  17. "values": {
  18. "1.0": 1,
  19. "5.0": 1,
  20. "25.0": 1,
  21. "50.0": 12,
  22. "75.0": 13,
  23. "95.0": 40.600000000000016,
  24. "99.0": 54
  25. }
  26. }
  27. }
  28. }

示例2:指定分位值(占比50%,96%,99%的范围值分别是多少)

  1. POST /book1/_search?size=0
  2. {
  3. "aggs":{
  4. "age_percentiles":{
  5. "percentiles":{
  6. "field":"age",
  7. "percents" : [50,96,99]
  8. }
  9. }
  10. }
  11. }

结果2:

  1. {
  2. "took": 6,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_percentiles": {
  17. "values": {
  18. "50.0": 12,
  19. "96.0": 44.779999999999966,
  20. "99.0": 54
  21. }
  22. }
  23. }
  24. }

说明:50%的数值<= 12, 96%的数值<= 96%, 99%的数值<= 54

8. Percentiles rank 统计值小于等于指定值的文档占比

示例1:统计年龄小于25和30的文档的占比,和第7项相反

  1. POST /book1/_search?size=0
  2. {
  3. "aggs":{
  4. "aggs_perc_rank":{
  5. "percentile_ranks":{
  6. "field":"age",
  7. "values" : [12,35]
  8. }
  9. }
  10. }
  11. }

结果1:

  1. {
  2. "took": 8,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "aggs_perc_rank": {
  17. "values": {
  18. "12.0": 71.05263157894737,
  19. "35.0": 92.76315789473685
  20. }
  21. }
  22. }
  23. }

结果说明:年龄小于12的文档占比为71%,年龄小于35的文档占比为92%,

9. Geo Bounds aggregation 求文档集中的地理位置坐标点的范围

参考官网链接:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geobounds-aggregation.html

10. Geo Centroid aggregation 求地理位置中心点坐标值

参考官网链接:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-geocentroid-aggregation.html

三、桶聚合

ES系列十四、ES聚合分析(聚合分析简介、指标聚合、桶聚合) - 图1

1. Terms Aggregation 根据字段值项分组聚合

示例1:

  1. POST /book1/_search?size=0
  2. {
  3. "aggs":{
  4. "age_terms":{
  5. "terms":{
  6. "field":"age"
  7. }
  8. }
  9. }
  10. }

说明:相当于group by age
结果1:

  1. {
  2. "took": 4,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_terms": {
  17. "doc_count_error_upper_bound": 0,
  18. "sum_other_doc_count": 1,
  19. "buckets": [
  20. {
  21. "key": 12,
  22. "doc_count": 16
  23. },
  24. {
  25. "key": 1,
  26. "doc_count": 11
  27. },
  28. {
  29. "key": 13,
  30. "doc_count": 2
  31. },
  32. {
  33. "key": 14,
  34. "doc_count": 2
  35. },
  36. {
  37. "key": 11,
  38. "doc_count": 1
  39. },
  40. {
  41. "key": 16,
  42. "doc_count": 1
  43. },
  44. {
  45. "key": 21,
  46. "doc_count": 1
  47. },
  48. {
  49. "key": 33,
  50. "doc_count": 1
  51. },
  52. {
  53. "key": 34,
  54. "doc_count": 1
  55. },
  56. {
  57. "key": 45,
  58. "doc_count": 1
  59. }
  60. ]
  61. }
  62. }
  63. }

结果说明:
“doc_count_error_upper_bound”: 0:文档计数的最大偏差值
“sum_other_doc_count”: 1:未返回的其他文档数,不在桶里的文档数量
默认情况下返回按文档计数从高到低的前10个分组:

示例2:sizz可以指定返回多少组数

  1. POST /book1/_search?size=0
  2. {
  3. "aggs":{
  4. "age_terms":{
  5. "terms":{
  6. "field":"age",
  7. "size":5
  8. }
  9. }
  10. }
  11. }

结果2:

  1. {
  2. "took": 4,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_terms": {
  17. "doc_count_error_upper_bound": 0,
  18. "sum_other_doc_count": 6,
  19. "buckets": [
  20. {
  21. "key": 12,
  22. "doc_count": 16
  23. },
  24. {
  25. "key": 1,
  26. "doc_count": 11
  27. },
  28. {
  29. "key": 13,
  30. "doc_count": 2
  31. },
  32. {
  33. "key": 14,
  34. "doc_count": 2
  35. },
  36. {
  37. "key": 11,
  38. "doc_count": 1
  39. }
  40. ]
  41. }
  42. }
  43. }

示例3:每个分组上显示偏差值

  1. POST /book1/_search?size=0
  2. {
  3. "aggs":{
  4. "age_terms":{
  5. "terms":{
  6. "field":"age",
  7. "size":5,
  8. "show_term_doc_count_error": true
  9. }
  10. }
  11. }
  12. }

结果3:

  1. {
  2. "took": 5,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_terms": {
  17. "doc_count_error_upper_bound": 0,
  18. "sum_other_doc_count": 6,
  19. "buckets": [
  20. {
  21. "key": 12,
  22. "doc_count": 16,
  23. "doc_count_error_upper_bound": 0
  24. },
  25. {
  26. "key": 1,
  27. "doc_count": 11,
  28. "doc_count_error_upper_bound": 0
  29. },
  30. {
  31. "key": 13,
  32. "doc_count": 2,
  33. "doc_count_error_upper_bound": 0
  34. },
  35. {
  36. "key": 14,
  37. "doc_count": 2,
  38. "doc_count_error_upper_bound": 0
  39. },
  40. {
  41. "key": 11,
  42. "doc_count": 1,
  43. "doc_count_error_upper_bound": 0
  44. }
  45. ]
  46. }
  47. }
  48. }

示例4:shard_size 指定每个分片上返回多少个分组

  1. POST /book1/_search?size=0
  2. {
  3. "aggs":{
  4. "age_terms":{
  5. "terms":{
  6. "field":"age",
  7. "size":3,
  8. "shard_size": 20
  9. }
  10. }
  11. }
  12. }

结果4:

  1. {
  2. "took": 3,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_terms": {
  17. "doc_count_error_upper_bound": 0,
  18. "sum_other_doc_count": 9,
  19. "buckets": [
  20. {
  21. "key": 12,
  22. "doc_count": 16
  23. },
  24. {
  25. "key": 1,
  26. "doc_count": 11
  27. },
  28. {
  29. "key": 13,
  30. "doc_count": 2
  31. }
  32. ]
  33. }
  34. }
  35. }

order 指定分组的排序

示例5:根据分组值”_key”排序

  1. POST /book1/_search?size=0
  2. {
  3. "aggs":{
  4. "age_terms":{
  5. "terms":{
  6. "field":"age",
  7. "size":3,
  8. "order":{"_key":"desc"}
  9. }
  10. }
  11. }
  12. }

结果5:

  1. {
  2. "took": 6,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_terms": {
  17. "doc_count_error_upper_bound": 0,
  18. "sum_other_doc_count": 35,
  19. "buckets": [
  20. {
  21. "key": 54,
  22. "doc_count": 1
  23. },
  24. {
  25. "key": 45,
  26. "doc_count": 1
  27. },
  28. {
  29. "key": 34,
  30. "doc_count": 1
  31. }
  32. ]
  33. }
  34. }
  35. }

示例6:根据文档计数”_count”排序

  1. POST /book1/_search?size=0
  2. {
  3. "aggs":{
  4. "age_terms":{
  5. "terms":{
  6. "field":"age",
  7. "size":3,
  8. "order":{"_count":"desc"}
  9. }
  10. }
  11. }
  12. }

结果6:

  1. {
  2. "took": 91,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_terms": {
  17. "doc_count_error_upper_bound": 0,
  18. "sum_other_doc_count": 9,
  19. "buckets": [
  20. {
  21. "key": 12,
  22. "doc_count": 16
  23. },
  24. {
  25. "key": 1,
  26. "doc_count": 11
  27. },
  28. {
  29. "key": 13,
  30. "doc_count": 2
  31. }
  32. ]
  33. }
  34. }
  35. }

示例7:取分组指标值排序

  1. POST /book1/_search?size=0
  2. {
  3. "aggs":{
  4. "age_terms":{
  5. "terms":{
  6. "field":"age",
  7. "order":{"max_age":"desc"}
  8. },
  9. "aggs":{
  10. "max_age":{
  11. "max":{
  12. "field":"age"
  13. }
  14. },
  15. "min_age":{
  16. "min":{
  17. "field":"age"
  18. }
  19. }
  20. }
  21. }
  22. }
  23. }

说明:先根据age 分组,再计算每个组的最大最小值,最后根据最大值倒排

示例8:筛选分组-正则表达式匹配值

  1. POST book1/_search?size=0
  2. {
  3. "aggs":{
  4. "tags":{
  5. "terms":{
  6. "field":"name",
  7. "include":"里*",
  8. "exclude":"test*"
  9. }
  10. }
  11. }
  12. }

结果8:

  1. {
  2. "took": 22,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "tags": {
  17. "doc_count_error_upper_bound": 0,
  18. "sum_other_doc_count": 0,
  19. "buckets": [
  20. {
  21. "key": "里",
  22. "doc_count": 13
  23. }
  24. ]
  25. }
  26. }
  27. }

示例9:筛选分组-指定值列表

  1. POST book1/_search?size=0
  2. {
  3. "aggs":{
  4. "Chinese":{
  5. "terms":{
  6. "field":"name",
  7. "include":["里","国"]
  8. }
  9. },
  10. "Test":{
  11. "terms":{
  12. "field":"name",
  13. "exclude":["test","the"]
  14. }
  15. }
  16. }
  17. }

结果9:

  1. {
  2. "took": 23,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "Test": {
  17. "doc_count_error_upper_bound": 6,
  18. "sum_other_doc_count": 559,
  19. "buckets": [
  20. {
  21. "key": "里",
  22. "doc_count": 12
  23. },
  24. {
  25. "key": "否",
  26. "doc_count": 11
  27. },
  28. {
  29. "key": "a",
  30. "doc_count": 7
  31. },
  32. {
  33. "key": "default",
  34. "doc_count": 7
  35. },
  36. {
  37. "key": "document",
  38. "doc_count": 7
  39. },
  40. {
  41. "key": "for",
  42. "doc_count": 7
  43. },
  44. {
  45. "key": "absolute",
  46. "doc_count": 6
  47. },
  48. {
  49. "key": "account",
  50. "doc_count": 6
  51. },
  52. {
  53. "key": "accurate",
  54. "doc_count": 6
  55. },
  56. {
  57. "key": "documents",
  58. "doc_count": 6
  59. }
  60. ]
  61. },
  62. "Chinese": {
  63. "doc_count_error_upper_bound": 0,
  64. "sum_other_doc_count": 0,
  65. "buckets": [
  66. {
  67. "key": "国",
  68. "doc_count": 4
  69. }
  70. ]
  71. }
  72. }
  73. }

示例10:根据脚本计算值分组

  1. POST book1/_search?size=0
  2. {
  3. "aggs":{
  4. "name":{
  5. "terms":{
  6. "script":{
  7. "source":"doc['age'].value + doc.age.value",
  8. "lang": "painless"
  9. }
  10. }
  11. }
  12. }
  13. }

说明:脚本取值的方式doc[‘age’].value 或者 doc.age.value
结果10:

  1. {
  2. "took": 18,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "name": {
  17. "doc_count_error_upper_bound": 0,
  18. "sum_other_doc_count": 0,
  19. "buckets": [
  20. {
  21. "key": "24",
  22. "doc_count": 16
  23. },
  24. {
  25. "key": "2",
  26. "doc_count": 11
  27. },
  28. {
  29. "key": "0",
  30. "doc_count": 8
  31. },
  32. {
  33. "key": "22",
  34. "doc_count": 1
  35. },
  36. {
  37. "key": "26",
  38. "doc_count": 1
  39. },
  40. {
  41. "key": "28",
  42. "doc_count": 1
  43. },
  44. {
  45. "key": "32",
  46. "doc_count": 1
  47. },
  48. {
  49. "key": "42",
  50. "doc_count": 1
  51. },
  52. {
  53. "key": "66",
  54. "doc_count": 1
  55. }
  56. ]
  57. }
  58. }
  59. }

2. filter Aggregation 对满足过滤查询的文档进行聚合计算

示例1:在查询命中的文档中选取符合过滤条件的文档进行聚合,先过滤再聚合(和上面的示例9示例9:筛选分组,区分开:先聚合再过滤)

  1. POST book1/_search?size=0
  2. {
  3. "aggs":{
  4. "age_terms":{
  5. "filter":{
  6. "match":{"name":"test"}
  7. },
  8. "aggs":{
  9. "avg_age":{
  10. "avg":{"field":"age" }
  11. }
  12. }
  13. }
  14. }
  15. }

结果1:

  1. {
  2. "took": 152,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_terms": {
  17. "doc_count": 5,
  18. "avg_age": {
  19. "value": 19.9
  20. }
  21. }
  22. }
  23. }

3. Filters Aggregation 多个过滤组聚合计算

示例1:分别统计包含‘test’,和‘里’的文档的个数

  1. POST book1/_search?size=0
  2. {
  3. "aggs":{
  4. "age_terms":{
  5. "filters":{
  6. "filters":{
  7. "test":{
  8. "match":{"name":"test"}
  9. },
  10. "china":{
  11. "match":{"name":"里"}
  12. }
  13. }
  14. }
  15. }
  16. }
  17. }

结果:

  1. {
  2. "took": 3,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_terms": {
  17. "buckets": {
  18. "china": {
  19. "doc_count": 13
  20. },
  21. "test": {
  22. "doc_count": 5
  23. }
  24. }
  25. }
  26. }
  27. }

例如:日志中选出 error和warning日志的个数,作日志预警

  1. GET logs/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "messages": {
  6. "filters": {
  7. "filters": {
  8. "errors": {
  9. "match": {
  10. "body": "error"
  11. }
  12. },
  13. "warnings": {
  14. "match": {
  15. "body": "warning"
  16. }
  17. }
  18. }
  19. }
  20. }
  21. }
  22. }

示例2:为其他值组指定key

  1. POST book1/_search?size=0
  2. {
  3. "aggs":{
  4. "age_terms":{
  5. "filters":{
  6. "other_bucket_key": "other_messages",
  7. "filters":{
  8. "test":{
  9. "match":{"name":"test"}
  10. },
  11. "china":{
  12. "match":{"name":"里"}
  13. }
  14. }
  15. }
  16. }
  17. }
  18. }

结果2:

  1. {
  2. "took": 9,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_terms": {
  17. "buckets": {
  18. "china": {
  19. "doc_count": 13
  20. },
  21. "test": {
  22. "doc_count": 5
  23. },
  24. "other_messages": {
  25. "doc_count": 23
  26. }
  27. }
  28. }
  29. }
  30. }

4. Range Aggregation 范围分组聚合

示例1:

  1. POST book1/_search?size=0
  2. {
  3. "aggs":{
  4. "age_range":{
  5. "range":{
  6. "field":"age",
  7. "keyed":true,
  8. "ranges":[
  9. {
  10. "to":20,
  11. "key":"TW"
  12. },
  13. {
  14. "from":25,
  15. "to":40,
  16. "key":"TH"
  17. },
  18. {
  19. "from":60,
  20. "key":"SIX"
  21. }
  22. ]
  23. }
  24. }
  25. }
  26. }

结果1:

  1. {
  2. "took": 3,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "age_range": {
  17. "buckets": {
  18. "TW": {
  19. "to": 20,
  20. "doc_count": 31
  21. },
  22. "TH": {
  23. "from": 25,
  24. "to": 40,
  25. "doc_count": 2
  26. },
  27. "SIX": {
  28. "from": 60,
  29. "doc_count": 0
  30. }
  31. }
  32. }
  33. }
  34. }

5. Date Range Aggregation 时间范围分组聚合

示例1:

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "range": {
  5. "date_range": {
  6. "field": "date",
  7. "format": "MM-yyy",
  8. "ranges": [
  9. {
  10. "to": "now-10M/M"
  11. },
  12. {
  13. "from": "now-10M/M"
  14. }
  15. ]
  16. }
  17. }
  18. }
  19. }

结果1:

  1. {
  2. "took": 115,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "range": {
  17. "buckets": [
  18. {
  19. "key": "*-2017-08-01T00:00:00.000Z",
  20. "to": 1501545600000,
  21. "to_as_string": "2017-08-01T00:00:00.000Z",
  22. "doc_count": 0
  23. },
  24. {
  25. "key": "2017-08-01T00:00:00.000Z-*",
  26. "from": 1501545600000,
  27. "from_as_string": "2017-08-01T00:00:00.000Z",
  28. "doc_count": 0
  29. }
  30. ]
  31. }
  32. }
  33. }

6. Date Histogram Aggregation 时间直方图(柱状)聚合

就是按天、月、年等进行聚合统计。可按 year (1y), quarter (1q), month (1M), week (1w), day (1d), hour (1h), minute (1m), second (1s) 间隔聚合或指定的时间间隔聚合。

示例1:

  1. POST /bank/_search?size=0
  2. {
  3. "aggs": {
  4. "sales_over_time": {
  5. "date_histogram": {
  6. "field": "date",
  7. "interval": "month"
  8. }
  9. }
  10. }
  11. }

结果1:

  1. {
  2. "took": 9,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 1000,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "sales_over_time": {
  17. "buckets": []
  18. }
  19. }
  20. }

7. Missing Aggregation 缺失值的桶聚合

示例:统计没有值的文档的数量

  1. POST /book/_search?size=0
  2. {
  3. "aggs" : {
  4. "account_without_a_age" : {
  5. "missing" : { "field" : "age" }
  6. }
  7. }
  8. }

结果1:

  1. {
  2. "took": 10,
  3. "timed_out": false,
  4. "_shards": {
  5. "total": 5,
  6. "successful": 5,
  7. "skipped": 0,
  8. "failed": 0
  9. },
  10. "hits": {
  11. "total": 41,
  12. "max_score": 0,
  13. "hits": []
  14. },
  15. "aggregations": {
  16. "account_without_age": {
  17. "doc_count": 8
  18. }
  19. }
  20. }

8. Geo Distance Aggregation 地理距离分区聚合

参考官网链接:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-geodistance-aggregation.html