🥖ElasticSearch-10-聚合查询之Metric聚合

前文主要讲了 ElasticSearch提供的三种聚合方式之桶聚合(Bucket Aggregation),本文主要讲讲指标聚合(Metric Aggregation)

💛如何理解metric聚合

bucket聚合中,我画了一张图辅助你构筑体系,那么metric聚合又如何理解呢?

如果你直接去看官方文档,大概也有十几种:

ElasticSearch-10-聚合查询之Metric聚合 - 图1

那么metric聚合又如何理解呢?我认为从两个角度:

  • 从分类看:Metric聚合分析分为单值分析多值分析两类
  • 从功能看:根据具体的应用场景设计了一些分析api, 比如地理位置,百分数等等

融合上述两个方面,我们可以梳理出大致的一个mind图:

  • 单值分析: 只输出一个分析结果

    • 标准stat型

      • avg 平均值
      • max 最大值
      • min 最小值
      • sum
      • value_count 数量
    • 其它类型

      • cardinality 基数(distinct去重)
      • weighted_avg 带权重的avg
      • median_absolute_deviation 中位值
  • 多值分析: 单值之外的

    • stats型

      • stats 包含avg,max,min,sum和count
      • matrix_stats 针对矩阵模型
      • extended_stats
      • string_stats 针对字符串
    • 百分数型

      • percentiles 百分数范围
      • percentile_ranks 百分数排行
    • 地理位置型

      • geo_bounds Geo bounds
      • geo_centroid Geo-centroid
      • geo_line Geo-Line
    • Top型

      • top_hits 分桶后的top hits
      • top_metrics

通过上述列表(我就不画图了),我们构筑的体系是基于分类功能,而不是具体的项(比如avg,percentiles…);这是不同的认知维度: 具体的项是碎片化,分类和功能这种是你需要构筑的体系

💛单值分析: 标准stat类型

1️⃣avg 平均值

计算班级的平均分

  1. GET /ralph_index/_search?size=1
  2. {
  3. "aggs": {
  4. "avg_age": {
  5. "avg": {
  6. "field": "age"
  7. }
  8. }
  9. }
  10. }

返回

ElasticSearch-10-聚合查询之Metric聚合 - 图2

2️⃣ max 最大值

计算销售最高价

  1. GET /ralph_index/_search?size=0
  2. {
  3. "aggs": {
  4. "max_age": {
  5. "max": {
  6. "field": "age"
  7. }
  8. }
  9. }
  10. }

返回

  1. {
  2. ...
  3. "aggregations" : {
  4. "max_age" : {
  5. "value" : 40.0
  6. }
  7. }

3️⃣ min 最小值

计算销售最低价

  1. GET /ralph_index/_search?size=0
  2. {
  3. "aggs": {
  4. "min_age": {
  5. "min": {
  6. "field": "age"
  7. }
  8. }
  9. }
  10. }

返回

  1. {
  2. ...
  3. "aggregations" : {
  4. "min_age" : {
  5. "value" : 20.0
  6. }
  7. }
  8. }

4️⃣ sum

计算销售总价

  1. POST /sales/_search?size=0
  2. {
  3. "query": {
  4. "constant_score": {
  5. "filter": {
  6. "match": { "type": "hat" }
  7. }
  8. }
  9. },
  10. "aggs": {
  11. "hat_prices": { "sum": { "field": "price" } }
  12. }
  13. }

返回

  1. {
  2. ...
  3. "aggregations": {
  4. "hat_prices": {
  5. "value": 450.0
  6. }
  7. }
  8. }

5️⃣value_count 数量

销售数量统计

  1. POST /sales/_search?size=0
  2. {
  3. "aggs" : {
  4. "types_count" : { "value_count" : { "field" : "type" } }
  5. }
  6. }

返回

  1. {
  2. ...
  3. "aggregations": {
  4. "types_count": {
  5. "value": 7
  6. }
  7. }
  8. }

💛单值分析: 其它类型

1️⃣ weighted_avg 带权重的avg

  1. POST /exams/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "weighted_grade": {
  6. "weighted_avg": {
  7. "value": {
  8. "field": "grade"
  9. },
  10. "weight": {
  11. "field": "weight"
  12. }
  13. }
  14. }
  15. }
  16. }

返回

  1. {
  2. ...
  3. "aggregations": {
  4. "weighted_grade": {
  5. "value": 70.0
  6. }
  7. }
  8. }

2️⃣ cardinality 基数(distinct去重)

  1. POST /sales/_search?size=0
  2. {
  3. "aggs": {
  4. "type_count": {
  5. "cardinality": {
  6. "field": "type"
  7. }
  8. }
  9. }
  10. }

返回

  1. {
  2. ...
  3. "aggregations": {
  4. "type_count": {
  5. "value": 3
  6. }
  7. }
  8. }

3️⃣ median_absolute_deviation 中位值

  1. GET reviews/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "review_average": {
  6. "avg": {
  7. "field": "rating"
  8. }
  9. },
  10. "review_variability": {
  11. "median_absolute_deviation": {
  12. "field": "rating"
  13. }
  14. }
  15. }
  16. }

返回

  1. {
  2. ...
  3. "aggregations": {
  4. "review_average": {
  5. "value": 3.0
  6. },
  7. "review_variability": {
  8. "value": 2.0
  9. }
  10. }
  11. }

💛 非单值分析:stats型

1️⃣ stats 包含avg,max,min,sum和count

  1. POST /exams/_search?size=0
  2. {
  3. "aggs": {
  4. "grades_stats": { "stats": { "field": "grade" } }
  5. }
  6. }

返回

  1. {
  2. ...
  3. "aggregations": {
  4. "grades_stats": {
  5. "count": 2,
  6. "min": 50.0,
  7. "max": 100.0,
  8. "avg": 75.0,
  9. "sum": 150.0
  10. }
  11. }
  12. }

2️⃣ matrix_stats 针对矩阵模型

以下示例说明了使用矩阵统计量来描述收入与贫困之间的关系。

  1. GET /_search
  2. {
  3. "aggs": {
  4. "statistics": {
  5. "matrix_stats": {
  6. "fields": [ "poverty", "income" ]
  7. }
  8. }
  9. }
  10. }

返回

  1. {
  2. ...
  3. "aggregations": {
  4. "statistics": {
  5. "doc_count": 50,
  6. "fields": [ {
  7. "name": "income",
  8. "count": 50,
  9. "mean": 51985.1,
  10. "variance": 7.383377037755103E7,
  11. "skewness": 0.5595114003506483,
  12. "kurtosis": 2.5692365287787124,
  13. "covariance": {
  14. "income": 7.383377037755103E7,
  15. "poverty": -21093.65836734694
  16. },
  17. "correlation": {
  18. "income": 1.0,
  19. "poverty": -0.8352655256272504
  20. }
  21. }, {
  22. "name": "poverty",
  23. "count": 50,
  24. "mean": 12.732000000000001,
  25. "variance": 8.637730612244896,
  26. "skewness": 0.4516049811903419,
  27. "kurtosis": 2.8615929677997767,
  28. "covariance": {
  29. "income": -21093.65836734694,
  30. "poverty": 8.637730612244896
  31. },
  32. "correlation": {
  33. "income": -0.8352655256272504,
  34. "poverty": 1.0
  35. }
  36. } ]
  37. }
  38. }
  39. }

3️⃣ extended_stats

根据从汇总文档中提取的数值计算统计信息。

  1. GET /exams/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "grades_stats": { "extended_stats": { "field": "grade" } }
  6. }
  7. }

上面的汇总计算了所有文档的成绩统计信息。聚合类型为extended_stats,并且字段设置定义将在其上计算统计信息的文档的数字字段。

  1. {
  2. ...
  3. "aggregations": {
  4. "grades_stats": {
  5. "count": 2,
  6. "min": 50.0,
  7. "max": 100.0,
  8. "avg": 75.0,
  9. "sum": 150.0,
  10. "sum_of_squares": 12500.0,
  11. "variance": 625.0,
  12. "variance_population": 625.0,
  13. "variance_sampling": 1250.0,
  14. "std_deviation": 25.0,
  15. "std_deviation_population": 25.0,
  16. "std_deviation_sampling": 35.35533905932738,
  17. "std_deviation_bounds": {
  18. "upper": 125.0,
  19. "lower": 25.0,
  20. "upper_population": 125.0,
  21. "lower_population": 25.0,
  22. "upper_sampling": 145.71067811865476,
  23. "lower_sampling": 4.289321881345245
  24. }
  25. }
  26. }
  27. }

4️⃣ string_stats 针对字符串

用于计算从聚合文档中提取的字符串值的统计信息。这些值可以从特定的关键字字段中检索。

  1. POST /my-index-000001/_search?size=0
  2. {
  3. "aggs": {
  4. "message_stats": { "string_stats": { "field": "message.keyword" } }
  5. }
  6. }

返回

  1. {
  2. ...
  3. "aggregations": {
  4. "message_stats": {
  5. "count": 5,
  6. "min_length": 24,
  7. "max_length": 30,
  8. "avg_length": 28.8,
  9. "entropy": 3.94617750050791
  10. }
  11. }
  12. }

💛 非单值分析:百分数型

1️⃣ percentiles 百分数范围

针对从聚合文档中提取的数值计算一个或多个百分位数。

  1. GET latency/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "load_time_outlier": {
  6. "percentiles": {
  7. "field": "load_time"
  8. }
  9. }
  10. }
  11. }

默认情况下,百分位度量标准将生成一定范围的百分位:[1,5,25,50,75,95,99]。

  1. {
  2. ...
  3. "aggregations": {
  4. "load_time_outlier": {
  5. "values": {
  6. "1.0": 5.0,
  7. "5.0": 25.0,
  8. "25.0": 165.0,
  9. "50.0": 445.0,
  10. "75.0": 725.0,
  11. "95.0": 945.0,
  12. "99.0": 985.0
  13. }
  14. }
  15. }
  16. }

2️⃣percentile_ranks 百分数排行

根据从汇总文档中提取的数值计算一个或多个百分位等级。

  1. GET latency/_search
  2. {
  3. "size": 0,
  4. "aggs": {
  5. "load_time_ranks": {
  6. "percentile_ranks": {
  7. "field": "load_time",
  8. "values": [ 500, 600 ]
  9. }
  10. }
  11. }
  12. }

返回

  1. {
  2. ...
  3. "aggregations": {
  4. "load_time_ranks": {
  5. "values": {
  6. "500.0": 90.01,
  7. "600.0": 100.0
  8. }
  9. }
  10. }
  11. }

上述结果表示90.01%的页面加载在500ms内完成,而100%的页面加载在600ms内完成。

💛 非单值分析:地理位置型

1️⃣ geo_bounds Geo bounds

  1. PUT /museums
  2. {
  3. "mappings": {
  4. "properties": {
  5. "location": {
  6. "type": "geo_point"
  7. }
  8. }
  9. }
  10. }
  11. POST /museums/_bulk?refresh
  12. {"index":{"_id":1}}
  13. {"location": "52.374081,4.912350", "name": "NEMO Science Museum"}
  14. {"index":{"_id":2}}
  15. {"location": "52.369219,4.901618", "name": "Museum Het Rembrandthuis"}
  16. {"index":{"_id":3}}
  17. {"location": "52.371667,4.914722", "name": "Nederlands Scheepvaartmuseum"}
  18. {"index":{"_id":4}}
  19. {"location": "51.222900,4.405200", "name": "Letterenhuis"}
  20. {"index":{"_id":5}}
  21. {"location": "48.861111,2.336389", "name": "Musée du Louvre"}
  22. {"index":{"_id":6}}
  23. {"location": "48.860000,2.327000", "name": "Musée d'Orsay"}
  24. POST /museums/_search?size=0
  25. {
  26. "query": {
  27. "match": { "name": "musée" }
  28. },
  29. "aggs": {
  30. "viewport": {
  31. "geo_bounds": {
  32. "field": "location",
  33. "wrap_longitude": true
  34. }
  35. }
  36. }
  37. }

上面的汇总展示了如何针对具有商店业务类型的所有文档计算位置字段的边界框

  1. {
  2. ...
  3. "aggregations": {
  4. "viewport": {
  5. "bounds": {
  6. "top_left": {
  7. "lat": 48.86111099738628,
  8. "lon": 2.3269999679178
  9. },
  10. "bottom_right": {
  11. "lat": 48.85999997612089,
  12. "lon": 2.3363889567553997
  13. }
  14. }
  15. }
  16. }
  17. }

2️⃣ geo_centroid Geo-centroid

  1. PUT /museums
  2. {
  3. "mappings": {
  4. "properties": {
  5. "location": {
  6. "type": "geo_point"
  7. }
  8. }
  9. }
  10. }
  11. POST /museums/_bulk?refresh
  12. {"index":{"_id":1}}
  13. {"location": "52.374081,4.912350", "city": "Amsterdam", "name": "NEMO Science Museum"}
  14. {"index":{"_id":2}}
  15. {"location": "52.369219,4.901618", "city": "Amsterdam", "name": "Museum Het Rembrandthuis"}
  16. {"index":{"_id":3}}
  17. {"location": "52.371667,4.914722", "city": "Amsterdam", "name": "Nederlands Scheepvaartmuseum"}
  18. {"index":{"_id":4}}
  19. {"location": "51.222900,4.405200", "city": "Antwerp", "name": "Letterenhuis"}
  20. {"index":{"_id":5}}
  21. {"location": "48.861111,2.336389", "city": "Paris", "name": "Musée du Louvre"}
  22. {"index":{"_id":6}}
  23. {"location": "48.860000,2.327000", "city": "Paris", "name": "Musée d'Orsay"}
  24. POST /museums/_search?size=0
  25. {
  26. "aggs": {
  27. "centroid": {
  28. "geo_centroid": {
  29. "field": "location"
  30. }
  31. }
  32. }
  33. }

上面的汇总显示了如何针对所有具有犯罪类型的盗窃文件计算位置字段的质心。

  1. {
  2. ...
  3. "aggregations": {
  4. "centroid": {
  5. "location": {
  6. "lat": 51.00982965203002,
  7. "lon": 3.9662131341174245
  8. },
  9. "count": 6
  10. }
  11. }
  12. }

3️⃣ geo_line Geo-Line

  1. PUT test
  2. {
  3. "mappings": {
  4. "dynamic": "strict",
  5. "_source": {
  6. "enabled": false
  7. },
  8. "properties": {
  9. "my_location": {
  10. "type": "geo_point"
  11. },
  12. "group": {
  13. "type": "keyword"
  14. },
  15. "@timestamp": {
  16. "type": "date"
  17. }
  18. }
  19. }
  20. }
  21. POST /test/_bulk?refresh
  22. {"index": {}}
  23. {"my_location": {"lat":37.3450570, "lon": -122.0499820}, "@timestamp": "2013-09-06T16:00:36"}
  24. {"index": {}}
  25. {"my_location": {"lat": 37.3451320, "lon": -122.0499820}, "@timestamp": "2013-09-06T16:00:37Z"}
  26. {"index": {}}
  27. {"my_location": {"lat": 37.349283, "lon": -122.0505010}, "@timestamp": "2013-09-06T16:00:37Z"}
  28. POST /test/_search?filter_path=aggregations
  29. {
  30. "aggs": {
  31. "line": {
  32. "geo_line": {
  33. "point": {"field": "my_location"},
  34. "sort": {"field": "@timestamp"}
  35. }
  36. }
  37. }
  38. }

将存储桶中的所有geo_point值聚合到由所选排序字段排序的LineString中。

  1. {
  2. "aggregations": {
  3. "line": {
  4. "type" : "Feature",
  5. "geometry" : {
  6. "type" : "LineString",
  7. "coordinates" : [
  8. [
  9. -122.049982,
  10. 37.345057
  11. ],
  12. [
  13. -122.050501,
  14. 37.349283
  15. ],
  16. [
  17. -122.049982,
  18. 37.345132
  19. ]
  20. ]
  21. },
  22. "properties" : {
  23. "complete" : true
  24. }
  25. }
  26. }
  27. }

💛 非单值分析:Top型

1️⃣ top_hits 分桶后的top hits

  1. POST /sales/_search?size=0
  2. {
  3. "aggs": {
  4. "top_tags": {
  5. "terms": {
  6. "field": "type",
  7. "size": 3
  8. },
  9. "aggs": {
  10. "top_sales_hits": {
  11. "top_hits": {
  12. "sort": [
  13. {
  14. "date": {
  15. "order": "desc"
  16. }
  17. }
  18. ],
  19. "_source": {
  20. "includes": [ "date", "price" ]
  21. },
  22. "size": 1
  23. }
  24. }
  25. }
  26. }
  27. }
  28. }

返回

  1. {
  2. ...
  3. "aggregations": {
  4. "top_tags": {
  5. "doc_count_error_upper_bound": 0,
  6. "sum_other_doc_count": 0,
  7. "buckets": [
  8. {
  9. "key": "hat",
  10. "doc_count": 3,
  11. "top_sales_hits": {
  12. "hits": {
  13. "total" : {
  14. "value": 3,
  15. "relation": "eq"
  16. },
  17. "max_score": null,
  18. "hits": [
  19. {
  20. "_index": "sales",
  21. "_type": "_doc",
  22. "_id": "AVnNBmauCQpcRyxw6ChK",
  23. "_source": {
  24. "date": "2015/03/01 00:00:00",
  25. "price": 200
  26. },
  27. "sort": [
  28. 1425168000000
  29. ],
  30. "_score": null
  31. }
  32. ]
  33. }
  34. }
  35. },
  36. {
  37. "key": "t-shirt",
  38. "doc_count": 3,
  39. "top_sales_hits": {
  40. "hits": {
  41. "total" : {
  42. "value": 3,
  43. "relation": "eq"
  44. },
  45. "max_score": null,
  46. "hits": [
  47. {
  48. "_index": "sales",
  49. "_type": "_doc",
  50. "_id": "AVnNBmauCQpcRyxw6ChL",
  51. "_source": {
  52. "date": "2015/03/01 00:00:00",
  53. "price": 175
  54. },
  55. "sort": [
  56. 1425168000000
  57. ],
  58. "_score": null
  59. }
  60. ]
  61. }
  62. }
  63. },
  64. {
  65. "key": "bag",
  66. "doc_count": 1,
  67. "top_sales_hits": {
  68. "hits": {
  69. "total" : {
  70. "value": 1,
  71. "relation": "eq"
  72. },
  73. "max_score": null,
  74. "hits": [
  75. {
  76. "_index": "sales",
  77. "_type": "_doc",
  78. "_id": "AVnNBmatCQpcRyxw6ChH",
  79. "_source": {
  80. "date": "2015/01/01 00:00:00",
  81. "price": 150
  82. },
  83. "sort": [
  84. 1420070400000
  85. ],
  86. "_score": null
  87. }
  88. ]
  89. }
  90. }
  91. }
  92. ]
  93. }
  94. }
  95. }

2️⃣ top_metrics

  1. POST /test/_bulk?refresh
  2. {"index": {}}
  3. {"s": 1, "m": 3.1415}
  4. {"index": {}}
  5. {"s": 2, "m": 1.0}
  6. {"index": {}}
  7. {"s": 3, "m": 2.71828}
  8. POST /test/_search?filter_path=aggregations
  9. {
  10. "aggs": {
  11. "tm": {
  12. "top_metrics": {
  13. "metrics": {"field": "m"},
  14. "sort": {"s": "desc"}
  15. }
  16. }
  17. }
  18. }

结果

  1. {
  2. "aggregations": {
  3. "tm": {
  4. "top": [ {"sort": [3], "metrics": {"m": 2.718280076980591 } } ]
  5. }
  6. }
  7. }

文件转载